*Introduction to WinBUGS for Ecologists*in the past two days. One thing that concerns me particularly is how well these Bayesian packages handle large data set (i.e., N>10,000), which is the size of the data sets that I work with. So I modified the data generation program provided by the authors and increased the sample sizes:

n1 <- 6000 # Number of females

n2 <- 4000 # Number of males

mu1 <- 105 # Population mean of females

mu2 <- 77.5 # Population mean of males

sigma <- 2.75 # Average population SD of both

n <- n1+n2 # Total sample size

y1 <- rnorm(n1, mu1, sigma) # Data for females

y2 <- rnorm(n2, mu2, sigma) # Date for males

y <- c(y1, y2) # Aggregate both data sets

x <- rep(c(0,1), c(n1, n2)) # Indicator for male

The BUGS program looks like this:

model {

# Priors

mu1 ~ dnorm(0,0.001)

mu2 ~ dnorm(0,0.001)

tau1 <- 1 / ( sigma1 * sigma1)

sigma1 ~ dunif(0, 1000) # Note: Large var. = Small precision

tau2 <- 1 / ( sigma2 * sigma2)

sigma2 ~ dunif(0, 1000)

# Likelihood

for (i in 1:n1) {

y1[i] ~ dnorm(mu1, tau1)

}

for (i in 1:n2) {

y2[i] ~ dnorm(mu2, tau2)

}

# Derived quantities

delta <- mu2 - mu1

}

While the Stan looks like this:

data {

int n1;

int n2;

real y1[n1];

real y2[n2];

}

parameters {

real alpha1;

real alpha2;

real sigma1;

real sigma2;

}

model {

y1 ~ normal(alpha1, sigma1);

y2 ~ normal(alpha2, sigma2);

}

generated quantities {

real dif;

dif <- sigma1 - sigma2;

}

For 20,000 iterations, OpenBUGS took 7852.813 seconds, JAGS took 2252.273 seconds, Stan took 60-150 seconds (the Stan team is still trying to iron out some bugs, which may explain the wide range of run time). While the run time of both OpenBUGS and JAGS are sensitive to the sample size, the run time of Stan seems much less sensitive to sample size.

However, if the Stan code is written as this:

model {

for (n in 1:n1)

y1[n] ~ normal(alpha1, sigma1);

for (n in 1:n2)

y2[n] ~ normal(alpha2, sigma2);

}

Then Stan has no apparent performance advantage over JAGS.

However, if the Stan code is written as this:

model {

for (n in 1:n1)

y1[n] ~ normal(alpha1, sigma1);

for (n in 1:n2)

y2[n] ~ normal(alpha2, sigma2);

}

Then Stan has no apparent performance advantage over JAGS.

## 2 comments:

Hi,

I am just starting to learn rstan, and was hoping to replicate your example here, but it fails to run. I was wondering whether you might be able to post the sequence of commands that you ran to get this running so I can try to figure out what I'm doing that's wrong.

Thanks,

Matt

Hi,

I am just starting to learn rstan, and was hoping to replicate your example here, but it fails to run. I was wondering whether you might be able to post the sequence of commands that you ran to get this running so I can try to figure out what I'm doing that's wrong.

Thanks,

Matt

Post a Comment