Thursday, October 18, 2012

Comparison between OpenBUGS, JAGS, and Stan

I have been playing with the examples of Introduction to WinBUGS for Ecologists in the past two days.  One thing that concerns me particularly is how well these Bayesian packages handle large data set (i.e., N>10,000), which is the size of the data sets that I work with. So I modified the data generation program provided by the authors and increased the sample sizes:


n1 <- 6000 # Number of females
n2 <- 4000 # Number of males
mu1 <- 105 # Population mean of females
mu2 <- 77.5 # Population mean of males
sigma <- 2.75 # Average population SD of both

n <- n1+n2 # Total sample size
y1 <- rnorm(n1, mu1, sigma) # Data for females
y2 <- rnorm(n2, mu2, sigma) # Date for males
y <- c(y1, y2) # Aggregate both data sets
x <- rep(c(0,1), c(n1, n2)) # Indicator for male


The BUGS program looks like this:


model {
# Priors
 mu1 ~ dnorm(0,0.001)
 mu2 ~ dnorm(0,0.001)
 tau1 <- 1 / ( sigma1 * sigma1)
 sigma1 ~ dunif(0, 1000) # Note: Large var. = Small precision
 tau2 <- 1 / ( sigma2 * sigma2)
 sigma2 ~ dunif(0, 1000)

# Likelihood
 for (i in 1:n1) {
    y1[i] ~ dnorm(mu1, tau1) 
 }

 for (i in 1:n2) {
    y2[i] ~ dnorm(mu2, tau2) 
 }

# Derived quantities
 delta <- mu2 - mu1
}

While the Stan looks like this:

data {
     int n1;
     int n2;
     real y1[n1];
     real y2[n2];
}
parameters {
           real alpha1;
           real alpha2;
           real sigma1;
           real sigma2;
}
model {
    y1 ~ normal(alpha1, sigma1);          
    y2 ~ normal(alpha2, sigma2);
}
generated quantities {
          real dif;
          dif <- sigma1 - sigma2;
}

For 20,000 iterations, OpenBUGS took 7852.813 seconds, JAGS took 2252.273 seconds, Stan took 60-150 seconds (the Stan team is still trying to iron out some bugs, which may explain the wide range of run time). While the run time of both OpenBUGS and JAGS are sensitive to the sample size, the run time of Stan seems much less sensitive to sample size.

However, if the Stan code is written as this:


model {
      for (n in 1:n1)
          y1[n] ~ normal(alpha1, sigma1);
      for (n in 1:n2)
          y2[n] ~ normal(alpha2, sigma2);
}


Then Stan has no apparent performance advantage over JAGS. 

2 comments:

Unknown said...

Hi,
I am just starting to learn rstan, and was hoping to replicate your example here, but it fails to run. I was wondering whether you might be able to post the sequence of commands that you ran to get this running so I can try to figure out what I'm doing that's wrong.

Thanks,
Matt

Unknown said...

Hi,
I am just starting to learn rstan, and was hoping to replicate your example here, but it fails to run. I was wondering whether you might be able to post the sequence of commands that you ran to get this running so I can try to figure out what I'm doing that's wrong.

Thanks,
Matt

Counter