Shige's Research Blog: February 2010

Saturday, February 27, 2010

New version of Eclipse

A new version of Eclipse, v.3.5.2 is out. This one works better on my 64-bit Linux machine in general and works very well with StatET.

Ben Bolker has an interesting paper (outline of a paper) comparing different approaches to estimate GLMM in R environment, which is very helpful to what I am doing right now.

The paper pointed out the following options to fit GLMM using R:

glmer
glmmML
glmm (from the repeated package)
glmmADMB
MCMCglmm
glmmBUGS
glmmPQL
BUGS (through R2WinBUGS)
glmmAK

And I would like to add one more, npmlreg.

I am not aware of the glmmAK package before. From the first glance, it seems to be very promising in the sense that it seems to allow non-Gaussian random effect in a Bayesian framework, something similar to what npmlreg does with ML method.

============== edited on March 3 ====================

DPpackage is another package that can estimate GLMM in a Bayesian framework.

============== edited on March 5 ====================

ASReml/ASReml-R is another choice. It is not free software, but it does seem to have some unique strengths. Maybe I should download a demo copy and try it myself.

============== edited on March 29 ===================

hglm is another possibility.

Beamer themes

I did not know there are so many different themes that can be used for a Beamer presentation slide.

Friday, February 26, 2010

National Family Health Survey, India

This data looks very interesting.

When using open source makes you an enemy of the state

According to this post, a US copyright lobby has long trying to make the case that using open source software is the same thing as using pirate software.

Thursday, February 25, 2010

A few interesting books to consider

Wednesday, February 24, 2010

MIGRATE-N

This looks interesting.

Tuesday, February 23, 2010

Getting Started with Sweave: R, LaTeX, Eclipse, StaET, & TeXlipse

This is a helpful introduction.

Monday, February 22, 2010

Theory without code

As a demographer, I would not pay much attention to statistical methodological papers that do not describe how to implement their methodology either using existing software or using their own code. The reason is simple: I don't have the time and patience to gain a understanding of the methodology deep enough to be able to implement it myself; after all, it is the substantive research questions in demography that interest me most!

Open source statistical platform such as R, WinBUGS, and JAGS have made it so easy for statisticians to code their ideas into readily usable software, so the question is: why not?

More option for cure model

Looks like the cure model can also be estimated using another R package GAMLSS.

de Castro, M., V. G. Cancho, and J. Rodrigues. 2009. “A hands-on approach for fitting long-term survival models under the GAMLSS framework.” Computer Methods and Programs in Biomedicine.

Saturday, February 20, 2010

A few software packages for mixed models

Less well-known but definitely deserve attention:

All these softwares are free to use, but unfortunately none of them seem to be open sourced.

Information provided by this web site is very helpful.

GLMM for ecologists and evolutionary biologists

This web site is devoted to the use of GLMM in the filed of evolutionary biologists.

Friday, February 19, 2010

Zotero 2.0 is here

Here is a list of new features, looks great.

Thursday, February 18, 2010

Theoretical Population Biology

This journal strongly encourages the use of LaTeX, very interesting.

Gummi

Gummi is a nother IDE for LaTeX, looks interesting.

Tuesday, February 16, 2010

Generalized linear mixed effect model problem

I am trying to compare cohort difference in infant mortality using generalized linear mixed model. I first estimated the model in Stata:

xi:xtlogit inftmort i.cohort, i(code)

which converged nicely:

Fitting comparison model:

Iteration 0: log likelihood = -1754.4476
Iteration 1: log likelihood = -1749.3366
Iteration 2: log likelihood = -1749.2491
Iteration 3: log likelihood = -1749.2491

Fitting full model:

tau = 0.0 log likelihood = -1749.2491
tau = 0.1 log likelihood = -1743.8418
tau = 0.2 log likelihood = -1739.0769
tau = 0.3 log likelihood = -1736.4914
tau = 0.4 log likelihood = -1739.5415

Iteration 0: log likelihood = -1736.4914
Iteration 1: log likelihood = -1722.6629
Iteration 2: log likelihood = -1694.9114
Iteration 3: log likelihood = -1694.6509
Iteration 4: log likelihood = -1694.649
Iteration 5: log likelihood = -1694.649

Random-effects logistic regression Number of obs = 21694
Group variable: code Number of groups = 10789

Random effects u_i ~ Gaussian Obs per group: min = 1
   avg = 2.0
   max = 9

   Wald chi2(2) = 8.05
Log likelihood = -1694.649 Prob > chi2 = 0.0178

------------------------------------------------------------------------------
   inftmort | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
  _Icohort_2 | -.5246846 .1850328 -2.84 0.005 -.8873422 -.1620269
  _Icohort_3 | -.1424331 .140369 -1.01 0.310 -.4175513 .132685
   _cons | -5.214642 .1839703 -28.35 0.000 -5.575217 -4.854067
-------------+----------------------------------------------------------------
   /lnsig2u | .9232684 .1416214 .6456956 1.200841
-------------+----------------------------------------------------------------
   sigma_u | 1.586665 .1123528 1.381055 1.822885
   rho | .4335015 .0347791 .3669899 .5024984
------------------------------------------------------------------------------
Likelihood-ratio test of rho=0: chibar2(01) = 109.20 Prob >= chibar2 = 0.000

Then I tried the same model using lme4:

m2 <- glmer(inftmort ~ (1|code) + as.factor(cohort), family=binomial, data=d)

I got:

Warning message:
In mer_finalize(ans) : false convergence (8)

And the results are quite different from what I got from Stata. I tried to estimate the model using "glmmML" and also got into trouble. This time the error message is:

Warning message:
In glmmML.fit(X, Y, weights, cluster.weights, start.coef, start.sigma, :
  Hessian non-positive definite. No variance!

I then tried MCMCglmm:

prior=list(R=list(V=1, nu=0, fix=1), G=list(G1=list(V=1, nu=0)))
m2 <- MCMCglmm(inftmort ~ 1 + as.factor(cohort), random=~code, family = "categorical", data=d, prior=prior)

It seems to be working and produced estimates that are comparable to what Stata produced (not identical, of course):

Iterations = 3001:12991
Thinning interval = 10
Number of chains = 1
Sample size per chain = 1000

1. Empirical mean and standard deviation for each variable,
   plus standard error of the mean:

   Mean SD Naive SE Time-series SE
(Intercept) -5.7145 0.1805 0.005708 0.02295
as.factor(cohort)2 -0.5633 0.1788 0.005653 0.02194
as.factor(cohort)3 -0.1888 0.1471 0.004653 0.01912

2. Quantiles for each variable:

   2.5% 25% 50% 75% 97.5%
(Intercept) -6.0464 -5.8324 -5.7251 -5.61120 -5.2817
as.factor(cohort)2 -0.8977 -0.6985 -0.5551 -0.43084 -0.2371
as.factor(cohort)3 -0.4614 -0.2947 -0.1867 -0.09644 0.1232

This is puzzling.

Sunday, February 14, 2010

Spreadsheet to LaTeX solutions

Excel2latex and spreadsheet2latex seem to be the only two solutions at the time.

Saturday, February 13, 2010

Cure model using R

The package "nltm" seems to be able to estimate proportional hazard and proportional odds cure models. I will do some experiments and see how it goes.

Friday, February 12, 2010

Sweave and LaTeX

This page contains some useful tips and links.

Reshaping data in R

This post explains how to reshape data in R and how to created nicely formated LaTeX using the xtable package.

Thursday, February 11, 2010

Handling hierarchical data structure in R

R has a comprehensive set of tools for the handling of hierarchical data structure. The most widely used package is probably "nlme" and "lme4", contributed by Douglas Bates and colleagues. While "nlme" is older and probably more mature than "lme4", it seems that the latter is increasingly taking over the former because it comes with full set of tools to handle non-Gaussian distributions.

Based on "nlme" and "lme4", the package "mgcv" and "gamm4" can estimate generalized additive model on multilevel data structure. This is especially useful if one have continuous covariates such as age or calendar year, whose effects are not very well understood in the literature and grouping may distort the results.

If you don't feel comfortable with the assumption that the unobserved heterogeneity components are normally distributed, then the package "npmlreg" can be used to conduct sensitivity analysis through nonparametric maximum likelihood method.

The new "MCMCglmm" package extends R's modeling capacity to multilevel and multivariate models such as those that can be estimated using aML and Sabre. Sure, MCMCglmm relies on MCMC instead of maximum likelihood as the method of estimation, which may be slower especially when the data is huge (as is often the case in demographic studies), but as the author indicated, this package has already gained significant performance advantage over traditional Bayesian packages such as WinBUGS and JAGS, and is almost as general as the latter, with respect to multilevel regression-like models.

The Sabre team also produced a R wrapper of their package for multilevel multiprocess modeling. Unfortunately, their R package is based on an outdated version (v.5 instead of v.6) of Sabre and the project seems to be halted due to lack of fund.

It is not difficult to conduct Bayesian multilevel analysis in R using WinBUGS and JAGS via packages such as "R2WinBUGS" and "R2jags". These packages can estimate a very large number of models. This means that almost anything one can think of can be model with R.

Another package, ADMB, also has several interfaces to R. I hope they are getting better and better because ADMB is such a powerful and versatile package that, if one can write down the likelihood function, it can maximize it for you.

App Store Model Provides Security, Stability: Evidence, Please?

Here is an interesting post on the claim that Iphone has to be locked down to make it safe and stable.

Wednesday, February 10, 2010

Loglinear models using R

For those sociologists who want to estimate complicated loglinear models (e.g. Goodman's RC model) using R, the package "VGAM" seems to be a good choice.

GTalX - Google Voice Chat has arrived in Ubuntu 9.10 (Karmic)

This is a good news.

Tuesday, February 09, 2010

Mixed Effects Models and Extensions in Ecology with R

This is a truly wonderful book. It deals with advanced statistical models in the way that people without advanced mathematical background can follow easily. I think I may have found a perfect textbook for my graduate course in advanced demographic analysis. All I need to do is to introduce some demographic examples to make it easier to understand to demography students.

Sunday, February 07, 2010

Statistics with R

An online book for R. Very helpful.

Skype on Ipad via 3G

Well, this could be a good use of ipad, according to this post, and this post, consider its low $30 data plan.

Saturday, February 06, 2010

Apple iPad: bashed by bloggers around the web

A summary of why Apple's ipad is a bad idea.

Friday, February 05, 2010

Graphics using Stata vs. R

Here is an informative blog post comparing simple histogram produced using Stata and various methods using R.