Thursday, February 11, 2010

Handling hierarchical data structure in R

R has a comprehensive set of tools for the handling of hierarchical data structure. The most widely used package is probably "nlme" and "lme4", contributed by Douglas Bates and colleagues. While "nlme" is older and probably more mature than "lme4", it seems that the latter is increasingly taking over the former because it comes with full set of tools to handle non-Gaussian distributions.

Based on "nlme" and "lme4", the package "mgcv" and "gamm4" can estimate generalized additive model on multilevel data structure. This is especially useful if one have continuous covariates such as age or calendar year, whose effects are not very well understood in the literature and grouping may distort the results.

If you don't feel comfortable with the assumption that the unobserved heterogeneity components are normally distributed, then the package "npmlreg" can be used to conduct sensitivity analysis through nonparametric maximum likelihood method.

The new "MCMCglmm" package extends R's modeling capacity to multilevel and multivariate models such as those that can be estimated using aML and Sabre. Sure, MCMCglmm relies on MCMC instead of maximum likelihood as the method of estimation, which may be slower especially when the data is huge (as is often the case in demographic studies), but as the author indicated, this package has already gained significant performance advantage over traditional Bayesian packages such as WinBUGS and JAGS, and is almost as general as the latter, with respect to multilevel regression-like models.

The Sabre team also produced a R wrapper of their package for multilevel multiprocess modeling. Unfortunately, their R package is based on an outdated version (v.5 instead of v.6) of Sabre and the project seems to be halted due to lack of fund.

It is not difficult to conduct Bayesian multilevel analysis in R using WinBUGS and JAGS via packages such as "R2WinBUGS" and "R2jags". These packages can estimate a very large number of models. This means that almost anything one can think of can be model with R.

Another package, ADMB, also has several interfaces to R. I hope they are getting better and better because ADMB is such a powerful and versatile package that, if one can write down the likelihood function, it can maximize it for you.

3 comments:

Michael Bishop said...

Comparison and contrast with

http://cran.r-project.org/web/packages/arm/ ?

Shige said...

I thought "arm" does not do model estimation; it is designed to post-process model results, no?

Anonymous said...

Nice fill someone in on and this post helped me alot in my college assignement. Gratefulness you on your information.

Counter