Sunday, April 14, 2013

Stan as a unified statistical estimation and intepretation engine

Applied researchers who are used to Stata or R need a reason to learn and use Stan. Besides the usual Bayesian vs. frequentist discussions, there is also a practical one.

Stan provides a unified interface for statistical estimation and interpretation. I have been using R with the Zelig package for estimation and interpretation during the past years, which is great. The problem is that, because Zelig is built upon a large number of existing R packages written by many different researchers and the quality of these packages vary greatly, working with Zelig means that you are working with all these other packages and researchers as well. In addition, even though in theory you can modify the source of these packages to suite your needs, but applied researches rarely have the energy or skills to tweak the FORTRAN or C code.

Stan provides a modeling language, which makes it easy for user to tweak their model (of course the underlying C++ code is also available). It is simulation-based and uses posterior distribution for inference, which means that there is no need for an additional simulation step (as what Zelig brings to frequentist models). After some testing, I have come to the conclusion that Stan is fast and stable enough for my daily data analysis work.

The best of all, this package comes from a research group with very good reputation and their discussion list is unbelievably helpful.

These are good enough reasons for me to switch to Stan.

Tuesday, April 09, 2013

Using R on Amazon EC2

Detailed instructions can be found here.

Monday, April 08, 2013

Friday, April 05, 2013

Wednesday, April 03, 2013

R 3.0 supports long vector

It is a very exciting news that the new R 3.0 supports long vector! This is major step toward a viable big data platform.