## Saturday, February 27, 2010

### New version of Eclipse

A new version of Eclipse, v.3.5.2 is out. This one works better on my 64-bit Linux machine in general and works very well with StatET.

### An interesting paper

Ben Bolker has an interesting paper (outline of a paper) comparing different approaches to estimate GLMM in R environment, which is very helpful to what I am doing right now.

The paper pointed out the following options to fit GLMM using R:

The paper pointed out the following options to fit GLMM using R:

- glmer
- glmmML
- glmm (from the repeated package)
- glmmADMB
- MCMCglmm
- glmmBUGS
- glmmPQL
- BUGS (through R2WinBUGS)
- glmmAK

And I would like to add one more, npmlreg.

I am not aware of the glmmAK package before. From the first glance, it seems to be very promising in the sense that it seems to allow non-Gaussian random effect in a Bayesian framework, something similar to what npmlreg does with ML method.

============== edited on March 3 ====================

DPpackage is another package that can estimate GLMM in a Bayesian framework.

============== edited on March 5 ====================

ASReml/ASReml-R is another choice. It is not free software, but it does seem to have some unique strengths. Maybe I should download a demo copy and try it myself.

============== edited on March 29 ===================

hglm is another possibility.

I am not aware of the glmmAK package before. From the first glance, it seems to be very promising in the sense that it seems to allow non-Gaussian random effect in a Bayesian framework, something similar to what npmlreg does with ML method.

============== edited on March 3 ====================

DPpackage is another package that can estimate GLMM in a Bayesian framework.

============== edited on March 5 ====================

ASReml/ASReml-R is another choice. It is not free software, but it does seem to have some unique strengths. Maybe I should download a demo copy and try it myself.

============== edited on March 29 ===================

hglm is another possibility.

### Beamer themes

I did not know there are so many different themes that can be used for a Beamer presentation slide.

## Friday, February 26, 2010

### When using open source makes you an enemy of the state

According to this post, a US copyright lobby has long trying to make the case that using open source software is the same thing as using pirate software.

## Thursday, February 25, 2010

## Tuesday, February 23, 2010

## Monday, February 22, 2010

### Theory without code

As a demographer, I would not pay much attention to statistical methodological papers that do not describe how to implement their methodology either using existing software or using their own code. The reason is simple: I don't have the time and patience to gain a understanding of the methodology deep enough to be able to implement it myself; after all, it is the substantive research questions in demography that interest me most!

Open source statistical platform such as R, WinBUGS, and JAGS have made it so easy for statisticians to code their ideas into readily usable software, so the question is: why not?

Open source statistical platform such as R, WinBUGS, and JAGS have made it so easy for statisticians to code their ideas into readily usable software, so the question is: why not?

### More option for cure model

Looks like the cure model can also be estimated using another R package GAMLSS.

de Castro, M., V. G. Cancho, and J. Rodrigues. 2009. “A hands-on approach for fitting long-term survival models under the GAMLSS framework.” Computer Methods and Programs in Biomedicine.

## Saturday, February 20, 2010

### A few software packages for mixed models

Less well-known but definitely deserve attention:

All these softwares are free to use, but unfortunately none of them seem to be open sourced.

Information provided by this web site is very helpful.

Information provided by this web site is very helpful.

### GLMM for ecologists and evolutionary biologists

This web site is devoted to the use of GLMM in the filed of evolutionary biologists.

## Friday, February 19, 2010

## Thursday, February 18, 2010

## Tuesday, February 16, 2010

### Generalized linear mixed effect model problem

I am trying to compare cohort difference in infant mortality using generalized linear mixed model. I first estimated the model in Stata:

xi:xtlogit inftmort i.cohort, i(code)

which converged nicely:

Fitting comparison model:

Iteration 0: log likelihood = -1754.4476

Iteration 1: log likelihood = -1749.3366

Iteration 2: log likelihood = -1749.2491

Iteration 3: log likelihood = -1749.2491

Fitting full model:

tau = 0.0 log likelihood = -1749.2491

tau = 0.1 log likelihood = -1743.8418

tau = 0.2 log likelihood = -1739.0769

tau = 0.3 log likelihood = -1736.4914

tau = 0.4 log likelihood = -1739.5415

Iteration 0: log likelihood = -1736.4914

Iteration 1: log likelihood = -1722.6629

Iteration 2: log likelihood = -1694.9114

Iteration 3: log likelihood = -1694.6509

Iteration 4: log likelihood = -1694.649

Iteration 5: log likelihood = -1694.649

Random-effects logistic regression Number of obs = 21694

Group variable: code Number of groups = 10789

Random effects u_i ~ Gaussian Obs per group: min = 1

avg = 2.0

max = 9

Wald chi2(2) = 8.05

Log likelihood = -1694.649 Prob > chi2 = 0.0178

------------------------------------------------------------------------------

inftmort | Coef. Std. Err. z P>|z| [95% Conf. Interval]

-------------+----------------------------------------------------------------

_Icohort_2 | -.5246846 .1850328 -2.84 0.005 -.8873422 -.1620269

_Icohort_3 | -.1424331 .140369 -1.01 0.310 -.4175513 .132685

_cons | -5.214642 .1839703 -28.35 0.000 -5.575217 -4.854067

-------------+----------------------------------------------------------------

/lnsig2u | .9232684 .1416214 .6456956 1.200841

-------------+----------------------------------------------------------------

sigma_u | 1.586665 .1123528 1.381055 1.822885

rho | .4335015 .0347791 .3669899 .5024984

------------------------------------------------------------------------------

Likelihood-ratio test of rho=0: chibar2(01) = 109.20 Prob >= chibar2 = 0.000

Then I tried the same model using lme4:

m2 <- glmer(inftmort ~ (1|code) + as.factor(cohort), family=binomial, data=d)

I got:

Warning message:

In mer_finalize(ans) : false convergence (8)

And the results are quite different from what I got from Stata. I tried to estimate the model using "glmmML" and also got into trouble. This time the error message is:

Warning message:

In glmmML.fit(X, Y, weights, cluster.weights, start.coef, start.sigma, :

Hessian non-positive definite. No variance!

I then tried MCMCglmm:

prior=list(R=list(V=1, nu=0, fix=1), G=list(G1=list(V=1, nu=0)))

m2 <- MCMCglmm(inftmort ~ 1 + as.factor(cohort), random=~code, family = "categorical", data=d, prior=prior)

It seems to be working and produced estimates that are comparable to what Stata produced (not identical, of course):

Iterations = 3001:12991

Thinning interval = 10

Number of chains = 1

Sample size per chain = 1000

1. Empirical mean and standard deviation for each variable,

plus standard error of the mean:

Mean SD Naive SE Time-series SE

(Intercept) -5.7145 0.1805 0.005708 0.02295

as.factor(cohort)2 -0.5633 0.1788 0.005653 0.02194

as.factor(cohort)3 -0.1888 0.1471 0.004653 0.01912

2. Quantiles for each variable:

2.5% 25% 50% 75% 97.5%

(Intercept) -6.0464 -5.8324 -5.7251 -5.61120 -5.2817

as.factor(cohort)2 -0.8977 -0.6985 -0.5551 -0.43084 -0.2371

as.factor(cohort)3 -0.4614 -0.2947 -0.1867 -0.09644 0.1232

This is puzzling.

xi:xtlogit inftmort i.cohort, i(code)

which converged nicely:

Fitting comparison model:

Iteration 0: log likelihood = -1754.4476

Iteration 1: log likelihood = -1749.3366

Iteration 2: log likelihood = -1749.2491

Iteration 3: log likelihood = -1749.2491

Fitting full model:

tau = 0.0 log likelihood = -1749.2491

tau = 0.1 log likelihood = -1743.8418

tau = 0.2 log likelihood = -1739.0769

tau = 0.3 log likelihood = -1736.4914

tau = 0.4 log likelihood = -1739.5415

Iteration 0: log likelihood = -1736.4914

Iteration 1: log likelihood = -1722.6629

Iteration 2: log likelihood = -1694.9114

Iteration 3: log likelihood = -1694.6509

Iteration 4: log likelihood = -1694.649

Iteration 5: log likelihood = -1694.649

Random-effects logistic regression Number of obs = 21694

Group variable: code Number of groups = 10789

Random effects u_i ~ Gaussian Obs per group: min = 1

avg = 2.0

max = 9

Wald chi2(2) = 8.05

Log likelihood = -1694.649 Prob > chi2 = 0.0178

------------------------------------------------------------------------------

inftmort | Coef. Std. Err. z P>|z| [95% Conf. Interval]

-------------+----------------------------------------------------------------

_Icohort_2 | -.5246846 .1850328 -2.84 0.005 -.8873422 -.1620269

_Icohort_3 | -.1424331 .140369 -1.01 0.310 -.4175513 .132685

_cons | -5.214642 .1839703 -28.35 0.000 -5.575217 -4.854067

-------------+----------------------------------------------------------------

/lnsig2u | .9232684 .1416214 .6456956 1.200841

-------------+----------------------------------------------------------------

sigma_u | 1.586665 .1123528 1.381055 1.822885

rho | .4335015 .0347791 .3669899 .5024984

------------------------------------------------------------------------------

Likelihood-ratio test of rho=0: chibar2(01) = 109.20 Prob >= chibar2 = 0.000

Then I tried the same model using lme4:

m2 <- glmer(inftmort ~ (1|code) + as.factor(cohort), family=binomial, data=d)

I got:

Warning message:

In mer_finalize(ans) : false convergence (8)

And the results are quite different from what I got from Stata. I tried to estimate the model using "glmmML" and also got into trouble. This time the error message is:

Warning message:

In glmmML.fit(X, Y, weights, cluster.weights, start.coef, start.sigma, :

Hessian non-positive definite. No variance!

I then tried MCMCglmm:

prior=list(R=list(V=1, nu=0, fix=1), G=list(G1=list(V=1, nu=0)))

m2 <- MCMCglmm(inftmort ~ 1 + as.factor(cohort), random=~code, family = "categorical", data=d, prior=prior)

It seems to be working and produced estimates that are comparable to what Stata produced (not identical, of course):

Iterations = 3001:12991

Thinning interval = 10

Number of chains = 1

Sample size per chain = 1000

1. Empirical mean and standard deviation for each variable,

plus standard error of the mean:

Mean SD Naive SE Time-series SE

(Intercept) -5.7145 0.1805 0.005708 0.02295

as.factor(cohort)2 -0.5633 0.1788 0.005653 0.02194

as.factor(cohort)3 -0.1888 0.1471 0.004653 0.01912

2. Quantiles for each variable:

2.5% 25% 50% 75% 97.5%

(Intercept) -6.0464 -5.8324 -5.7251 -5.61120 -5.2817

as.factor(cohort)2 -0.8977 -0.6985 -0.5551 -0.43084 -0.2371

as.factor(cohort)3 -0.4614 -0.2947 -0.1867 -0.09644 0.1232

This is puzzling.

Labels:
Bayesian,
GLMM,
multilevel model,
R,
Stata

## Sunday, February 14, 2010

### Spreadsheet to LaTeX solutions

Excel2latex and spreadsheet2latex seem to be the only two solutions at the time.

## Saturday, February 13, 2010

### Cure model using R

The package "nltm" seems to be able to estimate proportional hazard and proportional odds cure models. I will do some experiments and see how it goes.

## Friday, February 12, 2010

### Reshaping data in R

This post explains how to reshape data in R and how to created nicely formated LaTeX using the xtable package.

## Thursday, February 11, 2010

### Handling hierarchical data structure in R

R has a comprehensive set of tools for the handling of hierarchical data structure. The most widely used package is probably "nlme" and "lme4", contributed by Douglas Bates and colleagues. While "nlme" is older and probably more mature than "lme4", it seems that the latter is increasingly taking over the former because it comes with full set of tools to handle non-Gaussian distributions.

Based on "nlme" and "lme4", the package "mgcv" and "gamm4" can estimate generalized additive model on multilevel data structure. This is especially useful if one have continuous covariates such as age or calendar year, whose effects are not very well understood in the literature and grouping may distort the results.

If you don't feel comfortable with the assumption that the unobserved heterogeneity components are normally distributed, then the package "npmlreg" can be used to conduct sensitivity analysis through nonparametric maximum likelihood method.

The new "MCMCglmm" package extends R's modeling capacity to multilevel and multivariate models such as those that can be estimated using aML and Sabre. Sure, MCMCglmm relies on MCMC instead of maximum likelihood as the method of estimation, which may be slower especially when the data is huge (as is often the case in demographic studies), but as the author indicated, this package has already gained significant performance advantage over traditional Bayesian packages such as WinBUGS and JAGS, and is almost as general as the latter, with respect to multilevel regression-like models.

The Sabre team also produced a R wrapper of their package for multilevel multiprocess modeling. Unfortunately, their R package is based on an outdated version (v.5 instead of v.6) of Sabre and the project seems to be halted due to lack of fund.

It is not difficult to conduct Bayesian multilevel analysis in R using WinBUGS and JAGS via packages such as "R2WinBUGS" and "R2jags". These packages can estimate a very large number of models. This means that almost anything one can think of can be model with R.

Another package, ADMB, also has several interfaces to R. I hope they are getting better and better because ADMB is such a powerful and versatile package that, if one can write down the likelihood function, it can maximize it for you.

Based on "nlme" and "lme4", the package "mgcv" and "gamm4" can estimate generalized additive model on multilevel data structure. This is especially useful if one have continuous covariates such as age or calendar year, whose effects are not very well understood in the literature and grouping may distort the results.

If you don't feel comfortable with the assumption that the unobserved heterogeneity components are normally distributed, then the package "npmlreg" can be used to conduct sensitivity analysis through nonparametric maximum likelihood method.

The new "MCMCglmm" package extends R's modeling capacity to multilevel and multivariate models such as those that can be estimated using aML and Sabre. Sure, MCMCglmm relies on MCMC instead of maximum likelihood as the method of estimation, which may be slower especially when the data is huge (as is often the case in demographic studies), but as the author indicated, this package has already gained significant performance advantage over traditional Bayesian packages such as WinBUGS and JAGS, and is almost as general as the latter, with respect to multilevel regression-like models.

The Sabre team also produced a R wrapper of their package for multilevel multiprocess modeling. Unfortunately, their R package is based on an outdated version (v.5 instead of v.6) of Sabre and the project seems to be halted due to lack of fund.

It is not difficult to conduct Bayesian multilevel analysis in R using WinBUGS and JAGS via packages such as "R2WinBUGS" and "R2jags". These packages can estimate a very large number of models. This means that almost anything one can think of can be model with R.

Another package, ADMB, also has several interfaces to R. I hope they are getting better and better because ADMB is such a powerful and versatile package that, if one can write down the likelihood function, it can maximize it for you.

### App Store Model Provides Security, Stability: Evidence, Please?

Here is an interesting post on the claim that Iphone has to be locked down to make it safe and stable.

## Wednesday, February 10, 2010

### Loglinear models using R

For those sociologists who want to estimate complicated loglinear models (e.g. Goodman's RC model) using R, the package "VGAM" seems to be a good choice.

## Tuesday, February 09, 2010

### Mixed Effects Models and Extensions in Ecology with R

This is a truly wonderful book. It deals with advanced statistical models in the way that people without advanced mathematical background can follow easily. I think I may have found a perfect textbook for my graduate course in advanced demographic analysis. All I need to do is to introduce some demographic examples to make it easier to understand to demography students.

## Sunday, February 07, 2010

### Skype on Ipad via 3G

## Saturday, February 06, 2010

## Friday, February 05, 2010

### Graphics using Stata vs. R

Here is an informative blog post comparing simple histogram produced using Stata and various methods using R.

Subscribe to:
Posts (Atom)