Saturday, February 27, 2010
New version of Eclipse
A new version of Eclipse, v.3.5.2 is out. This one works better on my 64-bit Linux machine in general and works very well with StatET.
An interesting paper
Ben Bolker has an interesting paper (outline of a paper) comparing different approaches to estimate GLMM in R environment, which is very helpful to what I am doing right now.
The paper pointed out the following options to fit GLMM using R:
The paper pointed out the following options to fit GLMM using R:
- glmer
- glmmML
- glmm (from the repeated package)
- glmmADMB
- MCMCglmm
- glmmBUGS
- glmmPQL
- BUGS (through R2WinBUGS)
- glmmAK
And I would like to add one more, npmlreg.
I am not aware of the glmmAK package before. From the first glance, it seems to be very promising in the sense that it seems to allow non-Gaussian random effect in a Bayesian framework, something similar to what npmlreg does with ML method.
============== edited on March 3 ====================
DPpackage is another package that can estimate GLMM in a Bayesian framework.
============== edited on March 5 ====================
ASReml/ASReml-R is another choice. It is not free software, but it does seem to have some unique strengths. Maybe I should download a demo copy and try it myself.
============== edited on March 29 ===================
hglm is another possibility.
I am not aware of the glmmAK package before. From the first glance, it seems to be very promising in the sense that it seems to allow non-Gaussian random effect in a Bayesian framework, something similar to what npmlreg does with ML method.
============== edited on March 3 ====================
DPpackage is another package that can estimate GLMM in a Bayesian framework.
============== edited on March 5 ====================
ASReml/ASReml-R is another choice. It is not free software, but it does seem to have some unique strengths. Maybe I should download a demo copy and try it myself.
============== edited on March 29 ===================
hglm is another possibility.
Beamer themes
I did not know there are so many different themes that can be used for a Beamer presentation slide.
Friday, February 26, 2010
When using open source makes you an enemy of the state
According to this post, a US copyright lobby has long trying to make the case that using open source software is the same thing as using pirate software.
Thursday, February 25, 2010
Wednesday, February 24, 2010
Tuesday, February 23, 2010
Monday, February 22, 2010
Theory without code
As a demographer, I would not pay much attention to statistical methodological papers that do not describe how to implement their methodology either using existing software or using their own code. The reason is simple: I don't have the time and patience to gain a understanding of the methodology deep enough to be able to implement it myself; after all, it is the substantive research questions in demography that interest me most!
Open source statistical platform such as R, WinBUGS, and JAGS have made it so easy for statisticians to code their ideas into readily usable software, so the question is: why not?
Open source statistical platform such as R, WinBUGS, and JAGS have made it so easy for statisticians to code their ideas into readily usable software, so the question is: why not?
More option for cure model
Looks like the cure model can also be estimated using another R package GAMLSS.
de Castro, M., V. G. Cancho, and J. Rodrigues. 2009. “A hands-on approach for fitting long-term survival models under the GAMLSS framework.” Computer Methods and Programs in Biomedicine.
Saturday, February 20, 2010
A few software packages for mixed models
Less well-known but definitely deserve attention:
All these softwares are free to use, but unfortunately none of them seem to be open sourced.
Information provided by this web site is very helpful.
Information provided by this web site is very helpful.
GLMM for ecologists and evolutionary biologists
This web site is devoted to the use of GLMM in the filed of evolutionary biologists.
Friday, February 19, 2010
Thursday, February 18, 2010
Tuesday, February 16, 2010
Generalized linear mixed effect model problem
I am trying to compare cohort difference in infant mortality using generalized linear mixed model. I first estimated the model in Stata:
xi:xtlogit inftmort i.cohort, i(code)
which converged nicely:
Fitting comparison model:
Iteration 0: log likelihood = -1754.4476
Iteration 1: log likelihood = -1749.3366
Iteration 2: log likelihood = -1749.2491
Iteration 3: log likelihood = -1749.2491
Fitting full model:
tau = 0.0 log likelihood = -1749.2491
tau = 0.1 log likelihood = -1743.8418
tau = 0.2 log likelihood = -1739.0769
tau = 0.3 log likelihood = -1736.4914
tau = 0.4 log likelihood = -1739.5415
Iteration 0: log likelihood = -1736.4914
Iteration 1: log likelihood = -1722.6629
Iteration 2: log likelihood = -1694.9114
Iteration 3: log likelihood = -1694.6509
Iteration 4: log likelihood = -1694.649
Iteration 5: log likelihood = -1694.649
Random-effects logistic regression Number of obs = 21694
Group variable: code Number of groups = 10789
Random effects u_i ~ Gaussian Obs per group: min = 1
avg = 2.0
max = 9
Wald chi2(2) = 8.05
Log likelihood = -1694.649 Prob > chi2 = 0.0178
------------------------------------------------------------------------------
inftmort | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
_Icohort_2 | -.5246846 .1850328 -2.84 0.005 -.8873422 -.1620269
_Icohort_3 | -.1424331 .140369 -1.01 0.310 -.4175513 .132685
_cons | -5.214642 .1839703 -28.35 0.000 -5.575217 -4.854067
-------------+----------------------------------------------------------------
/lnsig2u | .9232684 .1416214 .6456956 1.200841
-------------+----------------------------------------------------------------
sigma_u | 1.586665 .1123528 1.381055 1.822885
rho | .4335015 .0347791 .3669899 .5024984
------------------------------------------------------------------------------
Likelihood-ratio test of rho=0: chibar2(01) = 109.20 Prob >= chibar2 = 0.000
Then I tried the same model using lme4:
m2 <- glmer(inftmort ~ (1|code) + as.factor(cohort), family=binomial, data=d)
I got:
Warning message:
In mer_finalize(ans) : false convergence (8)
And the results are quite different from what I got from Stata. I tried to estimate the model using "glmmML" and also got into trouble. This time the error message is:
Warning message:
In glmmML.fit(X, Y, weights, cluster.weights, start.coef, start.sigma, :
Hessian non-positive definite. No variance!
I then tried MCMCglmm:
prior=list(R=list(V=1, nu=0, fix=1), G=list(G1=list(V=1, nu=0)))
m2 <- MCMCglmm(inftmort ~ 1 + as.factor(cohort), random=~code, family = "categorical", data=d, prior=prior)
It seems to be working and produced estimates that are comparable to what Stata produced (not identical, of course):
Iterations = 3001:12991
Thinning interval = 10
Number of chains = 1
Sample size per chain = 1000
1. Empirical mean and standard deviation for each variable,
plus standard error of the mean:
Mean SD Naive SE Time-series SE
(Intercept) -5.7145 0.1805 0.005708 0.02295
as.factor(cohort)2 -0.5633 0.1788 0.005653 0.02194
as.factor(cohort)3 -0.1888 0.1471 0.004653 0.01912
2. Quantiles for each variable:
2.5% 25% 50% 75% 97.5%
(Intercept) -6.0464 -5.8324 -5.7251 -5.61120 -5.2817
as.factor(cohort)2 -0.8977 -0.6985 -0.5551 -0.43084 -0.2371
as.factor(cohort)3 -0.4614 -0.2947 -0.1867 -0.09644 0.1232
This is puzzling.
xi:xtlogit inftmort i.cohort, i(code)
which converged nicely:
Fitting comparison model:
Iteration 0: log likelihood = -1754.4476
Iteration 1: log likelihood = -1749.3366
Iteration 2: log likelihood = -1749.2491
Iteration 3: log likelihood = -1749.2491
Fitting full model:
tau = 0.0 log likelihood = -1749.2491
tau = 0.1 log likelihood = -1743.8418
tau = 0.2 log likelihood = -1739.0769
tau = 0.3 log likelihood = -1736.4914
tau = 0.4 log likelihood = -1739.5415
Iteration 0: log likelihood = -1736.4914
Iteration 1: log likelihood = -1722.6629
Iteration 2: log likelihood = -1694.9114
Iteration 3: log likelihood = -1694.6509
Iteration 4: log likelihood = -1694.649
Iteration 5: log likelihood = -1694.649
Random-effects logistic regression Number of obs = 21694
Group variable: code Number of groups = 10789
Random effects u_i ~ Gaussian Obs per group: min = 1
avg = 2.0
max = 9
Wald chi2(2) = 8.05
Log likelihood = -1694.649 Prob > chi2 = 0.0178
------------------------------------------------------------------------------
inftmort | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
_Icohort_2 | -.5246846 .1850328 -2.84 0.005 -.8873422 -.1620269
_Icohort_3 | -.1424331 .140369 -1.01 0.310 -.4175513 .132685
_cons | -5.214642 .1839703 -28.35 0.000 -5.575217 -4.854067
-------------+----------------------------------------------------------------
/lnsig2u | .9232684 .1416214 .6456956 1.200841
-------------+----------------------------------------------------------------
sigma_u | 1.586665 .1123528 1.381055 1.822885
rho | .4335015 .0347791 .3669899 .5024984
------------------------------------------------------------------------------
Likelihood-ratio test of rho=0: chibar2(01) = 109.20 Prob >= chibar2 = 0.000
Then I tried the same model using lme4:
m2 <- glmer(inftmort ~ (1|code) + as.factor(cohort), family=binomial, data=d)
I got:
Warning message:
In mer_finalize(ans) : false convergence (8)
And the results are quite different from what I got from Stata. I tried to estimate the model using "glmmML" and also got into trouble. This time the error message is:
Warning message:
In glmmML.fit(X, Y, weights, cluster.weights, start.coef, start.sigma, :
Hessian non-positive definite. No variance!
I then tried MCMCglmm:
prior=list(R=list(V=1, nu=0, fix=1), G=list(G1=list(V=1, nu=0)))
m2 <- MCMCglmm(inftmort ~ 1 + as.factor(cohort), random=~code, family = "categorical", data=d, prior=prior)
It seems to be working and produced estimates that are comparable to what Stata produced (not identical, of course):
Iterations = 3001:12991
Thinning interval = 10
Number of chains = 1
Sample size per chain = 1000
1. Empirical mean and standard deviation for each variable,
plus standard error of the mean:
Mean SD Naive SE Time-series SE
(Intercept) -5.7145 0.1805 0.005708 0.02295
as.factor(cohort)2 -0.5633 0.1788 0.005653 0.02194
as.factor(cohort)3 -0.1888 0.1471 0.004653 0.01912
2. Quantiles for each variable:
2.5% 25% 50% 75% 97.5%
(Intercept) -6.0464 -5.8324 -5.7251 -5.61120 -5.2817
as.factor(cohort)2 -0.8977 -0.6985 -0.5551 -0.43084 -0.2371
as.factor(cohort)3 -0.4614 -0.2947 -0.1867 -0.09644 0.1232
This is puzzling.
Sunday, February 14, 2010
Spreadsheet to LaTeX solutions
Excel2latex and spreadsheet2latex seem to be the only two solutions at the time.
Saturday, February 13, 2010
Cure model using R
The package "nltm" seems to be able to estimate proportional hazard and proportional odds cure models. I will do some experiments and see how it goes.
Friday, February 12, 2010
Reshaping data in R
This post explains how to reshape data in R and how to created nicely formated LaTeX using the xtable package.
Thursday, February 11, 2010
Handling hierarchical data structure in R
R has a comprehensive set of tools for the handling of hierarchical data structure. The most widely used package is probably "nlme" and "lme4", contributed by Douglas Bates and colleagues. While "nlme" is older and probably more mature than "lme4", it seems that the latter is increasingly taking over the former because it comes with full set of tools to handle non-Gaussian distributions.
Based on "nlme" and "lme4", the package "mgcv" and "gamm4" can estimate generalized additive model on multilevel data structure. This is especially useful if one have continuous covariates such as age or calendar year, whose effects are not very well understood in the literature and grouping may distort the results.
If you don't feel comfortable with the assumption that the unobserved heterogeneity components are normally distributed, then the package "npmlreg" can be used to conduct sensitivity analysis through nonparametric maximum likelihood method.
The new "MCMCglmm" package extends R's modeling capacity to multilevel and multivariate models such as those that can be estimated using aML and Sabre. Sure, MCMCglmm relies on MCMC instead of maximum likelihood as the method of estimation, which may be slower especially when the data is huge (as is often the case in demographic studies), but as the author indicated, this package has already gained significant performance advantage over traditional Bayesian packages such as WinBUGS and JAGS, and is almost as general as the latter, with respect to multilevel regression-like models.
The Sabre team also produced a R wrapper of their package for multilevel multiprocess modeling. Unfortunately, their R package is based on an outdated version (v.5 instead of v.6) of Sabre and the project seems to be halted due to lack of fund.
It is not difficult to conduct Bayesian multilevel analysis in R using WinBUGS and JAGS via packages such as "R2WinBUGS" and "R2jags". These packages can estimate a very large number of models. This means that almost anything one can think of can be model with R.
Another package, ADMB, also has several interfaces to R. I hope they are getting better and better because ADMB is such a powerful and versatile package that, if one can write down the likelihood function, it can maximize it for you.
Based on "nlme" and "lme4", the package "mgcv" and "gamm4" can estimate generalized additive model on multilevel data structure. This is especially useful if one have continuous covariates such as age or calendar year, whose effects are not very well understood in the literature and grouping may distort the results.
If you don't feel comfortable with the assumption that the unobserved heterogeneity components are normally distributed, then the package "npmlreg" can be used to conduct sensitivity analysis through nonparametric maximum likelihood method.
The new "MCMCglmm" package extends R's modeling capacity to multilevel and multivariate models such as those that can be estimated using aML and Sabre. Sure, MCMCglmm relies on MCMC instead of maximum likelihood as the method of estimation, which may be slower especially when the data is huge (as is often the case in demographic studies), but as the author indicated, this package has already gained significant performance advantage over traditional Bayesian packages such as WinBUGS and JAGS, and is almost as general as the latter, with respect to multilevel regression-like models.
The Sabre team also produced a R wrapper of their package for multilevel multiprocess modeling. Unfortunately, their R package is based on an outdated version (v.5 instead of v.6) of Sabre and the project seems to be halted due to lack of fund.
It is not difficult to conduct Bayesian multilevel analysis in R using WinBUGS and JAGS via packages such as "R2WinBUGS" and "R2jags". These packages can estimate a very large number of models. This means that almost anything one can think of can be model with R.
Another package, ADMB, also has several interfaces to R. I hope they are getting better and better because ADMB is such a powerful and versatile package that, if one can write down the likelihood function, it can maximize it for you.
App Store Model Provides Security, Stability: Evidence, Please?
Here is an interesting post on the claim that Iphone has to be locked down to make it safe and stable.
Wednesday, February 10, 2010
Loglinear models using R
For those sociologists who want to estimate complicated loglinear models (e.g. Goodman's RC model) using R, the package "VGAM" seems to be a good choice.
Tuesday, February 09, 2010
Mixed Effects Models and Extensions in Ecology with R
This is a truly wonderful book. It deals with advanced statistical models in the way that people without advanced mathematical background can follow easily. I think I may have found a perfect textbook for my graduate course in advanced demographic analysis. All I need to do is to introduce some demographic examples to make it easier to understand to demography students.
Sunday, February 07, 2010
Skype on Ipad via 3G
Saturday, February 06, 2010
Friday, February 05, 2010
Graphics using Stata vs. R
Here is an informative blog post comparing simple histogram produced using Stata and various methods using R.
Subscribe to:
Posts (Atom)