## Saturday, December 25, 2010

### Maximum Likelihood Estimation and Inference: With examples in R, SAS and ADMB

As the first textbook on ADMB, this book is highly expected.

## Friday, December 17, 2010

### Where to find good data sets

This post provides some good information on where to find good data sets for statistical analysis.

## Thursday, December 09, 2010

### Programming languages on the rise

This post list seven programming languages that are rising, from Python to Ruby to R to... Cobol.

## Monday, December 06, 2010

## Saturday, December 04, 2010

### Comparison of results

I am doing a simple comparison of different estimation procedures in dealing with a simple binomial model. Here is where I got started:

---------------------------------------------

library(INLA)

library(npmlreg)

library(MCMCglmm)

library(DPpackage)

data(Seeds)

# Using INLA

formula = r ~ x1*x2 + f(plate, model="iid")

mod.inla = inla(formula, data=Seeds, family="binomial", Ntrials=n)

summary(mod.seeds)

# Using npmlreg

mod.ml <- alldist(cbind(r, n-r) ~ x1*x2 , random=~1, data=Seeds, family=binomial, random.distribution="gq")

summary(mod.ml)

# Using MCMCglmm

prior <- list(R=list(V=1, nu=0.002))

mod.mcmc <- MCMCglmm(cbind(r, n-r) ~ x1*x2, family="multinomial2", data=Seeds, prior=prior)

summary(mod.mcmc$Sol)

# Using DPpackage

---------------------------------------------

library(INLA)

library(npmlreg)

library(MCMCglmm)

library(DPpackage)

data(Seeds)

# Using INLA

formula = r ~ x1*x2 + f(plate, model="iid")

mod.inla = inla(formula, data=Seeds, family="binomial", Ntrials=n)

summary(mod.seeds)

# Using npmlreg

mod.ml <- alldist(cbind(r, n-r) ~ x1*x2 , random=~1, data=Seeds, family=binomial, random.distribution="gq")

summary(mod.ml)

# Using MCMCglmm

prior <- list(R=list(V=1, nu=0.002))

mod.mcmc <- MCMCglmm(cbind(r, n-r) ~ x1*x2, family="multinomial2", data=Seeds, prior=prior)

summary(mod.mcmc$Sol)

# Using DPpackage

-----------------------------------------------

I will keep updating by adding new things (estimation procedures, predictive simulations, etc.)

## Wednesday, December 01, 2010

## Tuesday, November 30, 2010

### Wrapping text around figures in LaTeX

I found this page very helpful in showing how to create figures with text wrapping around. And this one for making tables.

I am using the LaTeX package by Bruce Donald for my next NIH proposal.

I am using the LaTeX package by Bruce Donald for my next NIH proposal.

## Sunday, November 28, 2010

### Top 5 Most Anticipated Tablets of 2010-2011

This is the list. What's interesting is that the Adam can be used as a tablet and a e-book reader and the reader mode is much more power efficient.

## Friday, November 26, 2010

### Computational tools for Bayesian analysis

The increasing number of R-oriented Bayesian computational tools such as MCMCpack, MCMCglmm, DPpackage, R-INLA, spBayes, have made BUGS less and less crucial for day to day Bayesian computation. Honestly, I cannot figure out a single analysis that BUGS can do but at least one of the above mentioned packages cannot.

## Saturday, November 20, 2010

## Wednesday, November 17, 2010

### Emacs, LaTeX, and forward search

## Saturday, November 13, 2010

## Wednesday, November 10, 2010

### PPA for Emacs

## Wednesday, November 03, 2010

## Sunday, October 31, 2010

## Friday, October 29, 2010

### Ggplot2, some tricks

Here is how to remove the grey in the background as well as the grid, a modification of the default black/white theme.

### Lotus Symphony 3

A new version of Lotus Symphony is out. Interesting that IBM does not provide a 64-bit build for Linux.

## Thursday, October 28, 2010

### Lattice vs. ggplot2

Both lattice and ggplot2 seem really interesting and worthy of learning. But I only have time to learn one of them, and the choice is not an easy one.

Here is an awesome reference; this blog is generally very interesting; and here is something about the "brew" package.

Here is an awesome reference; this blog is generally very interesting; and here is something about the "brew" package.

Labels:
blog,
graph,
R,
reproducible research

## Friday, October 22, 2010

### Useful blog

### Managing a statistical analysis project

This post is very helpful for people who are serious in statistical analysis.

### For a wider use of R

Two things that are crucial for a wider use of R among applied researchers. The first one is data manipulation/reshaping tool. I think the package "reshape" and "reshape2" have done good job and have largely removed the barrier. The second one is a universal output processing engine, something like the "eststo, esttab, estsadd" combo in Stata. The package "xtable", "memisc", "aprstable", and "estout" have made some progress, but they all are limited to some estimation procedures and none of them provides a universal output process engine for "all" estimation procedures. One of the main reason I am still paying for Stata is the convenience of directly producing publication quality LaTeX tables from

I wish more attention can be devoted to this latter issue.

*any*estimation procedures without having to manually change anything.I wish more attention can be devoted to this latter issue.

## Thursday, October 21, 2010

### Samsung Galaxy Tab

According to this source, the Samsung Galaxy Tab will cost about $600, which I think is a little too much.

## Wednesday, October 20, 2010

### The "tikzDevice" package

The tikzDevice package is quite amazing. Here are two graphs I just made, with (lower) and without (upper) using the tikzDevice package. The difference in quality is huge.

Since the LaTeX source file for the figure is quite large in size and may take significantly longer to compile, it makes sense to export the file into PDF file and include in the main LaTeX file in the usual way. That way, one gets the pleasing effect produced by TikZ and does not suffer from longer compilation time.

Also, the package "pgfSweave" is also very helpful if one wants to combine R sessions with LaTeX session.

Here is a tikz editor that works great on my 32-bit Ubuntu box but cannot be compiled on my 64-bit box. It can produce PDF, EPS, and PNG files directly from LaTeX code. TpX is another editor for tikz. It is currently Windows-only but a alpha version based on Lazarus is under development.

Some additional information is provided by this post.

Since the LaTeX source file for the figure is quite large in size and may take significantly longer to compile, it makes sense to export the file into PDF file and include in the main LaTeX file in the usual way. That way, one gets the pleasing effect produced by TikZ and does not suffer from longer compilation time.

Also, the package "pgfSweave" is also very helpful if one wants to combine R sessions with LaTeX session.

Here is a tikz editor that works great on my 32-bit Ubuntu box but cannot be compiled on my 64-bit box. It can produce PDF, EPS, and PNG files directly from LaTeX code. TpX is another editor for tikz. It is currently Windows-only but a alpha version based on Lazarus is under development.

Some additional information is provided by this post.

## Monday, October 18, 2010

### Ubuntu 11.04

Looks like Ubuntu 11.04 will bring some significant changes: new GCC, new Gnome, and new file system.

## Friday, October 15, 2010

## Tuesday, October 12, 2010

### Labelbook

The Stata command "labelbook" can be used to create output that shows both the raw values and value labels side by side, a feature very helpful dealing with second-hand data sources.

## Monday, October 04, 2010

### Estimation of quantile treatment effects with Stata

Here is a very useful article published on Stata Journal.

## Saturday, October 02, 2010

## Friday, October 01, 2010

### Using emacs with rcs

I find this blog post regarding how to use emacs and rcs together very helpful.

## Thursday, September 30, 2010

## Tuesday, September 28, 2010

### Document Foundation

LibreOffice, a new OpenOffice branch. People have learned their lesson from what happened to OpenSolaris, I guess.

By the way, by carefully reading the "natbib" package manual, I think I figure out how to change the default in-text citation from "...(Caldwell, 1979)" to "..(Caldwell 1979)".

By the way, by carefully reading the "natbib" package manual, I think I figure out how to change the default in-text citation from "...(Caldwell, 1979)" to "..(Caldwell 1979)".

## Sunday, September 26, 2010

### BibTeX style file

Here is a helpful post about how to tailor .bst file, with links to may more resources.

## Saturday, September 25, 2010

### R2admb

The new version of this package (0.6) is quite serviceable already. It also comes with a 17-page document and some interesting examples.

## Wednesday, September 22, 2010

### Handbook on Impact Evaluation: Quantitative Methods and Practices

Here is a the book, did not receive very positive review from the Stata community though.

## Tuesday, September 21, 2010

### C++ matrix library

This is a help post regarding matrix computation in C++ using the Armadillo library. This library seems to have a more comprehensive collection of numerical routines.

## Sunday, September 19, 2010

## Saturday, September 18, 2010

### Writing Scientific Software: a guide to good style

Here are some interesting and useful code templates.

## Thursday, September 16, 2010

### Improving R performance

The combination or Rcpp and Inline packages seems to be really useful to improve the performance of R, as suggested in this post. Inline also works with FORTRAN, which I like better for numerical computations.

## Monday, September 13, 2010

### LD_LIBRARY_PATH and alternative

I compiled and installed the most recent version of Armadillo library, and tested a sample program here. Both compilation and build went well, but when I tried to run the resulted program, I got an error message saying that the library files were not found. I followed the suggestions found here and created a new file " /etc/ld.so.conf.d/local.conf" which has only one line of contents:

/usr/local/lib64

Which seems to do the trick and,from my understanding, is better than have a line in my .bashrc for "LD_LIBRARY_PATH".

==================== EDITED SEPT. 24, 2010 ==============================

Now I realized that every time a new library is installed, I need to run "sudo sudo ldconfig".

==================== EDITED SEPT. 24, 2010 ==============================

Now I realized that every time a new library is installed, I need to run "sudo sudo ldconfig".

## Friday, September 10, 2010

### Autostart R console

An recent patch to the StatET fixed the autostart function. Now the R console starts automatically when I launch Eclipse, which is exactly what I want.

Eclipse 3.6 no longer has the "run last launched tools" shortcut key, but this autostart thing is probably even better.

Eclipse 3.6 no longer has the "run last launched tools" shortcut key, but this autostart thing is probably even better.

### Oracle vs. Google: FSF's Response

FSF's response to the law suite between Oracle and Google deserves some attention.

## Thursday, September 09, 2010

### Revolution R Enterprise, some thoughts

I requested an academic copy of the Revolution R Enterprise today. Since they only have Windows or Redhat version, I installed it on an old spare machine dual-booting Linux and Windows. I like what they did with the IDE (the debugger is nice) and some of their tailored made packages dealing with large data and parallel processing. It should be a good choice to introduce R to students. I guess I am not surprised to see the Windows version is made using MS Visual Studio, but having to install a load of VS-related stuff just to install this relatively simple software (with respect to the GUI, of course) make me feel a little, ...unhappy.

I think this is a serious alternative (for the first time) to the "official" R on Windows platform. And Revolution is generous to let faculty and students to use their software for free. If I decide to teach my students R, this software is absolutely my first choice.

By the way, I was not able to get the package "MCMCglmm" running because the required package "tensorA" is not in their official repository, and the version I pulled off CRAN is no good.

============== UPDATES (SEPT. 11, 2010) ========================

The problem with MCMCglmm seems to be fixed for the W32 system: the package "tensorA" was added to the official repository so I can run my favorite package for mixed effect modeling now. I still cannot use it on Win64 system.

I think this is a serious alternative (for the first time) to the "official" R on Windows platform. And Revolution is generous to let faculty and students to use their software for free. If I decide to teach my students R, this software is absolutely my first choice.

By the way, I was not able to get the package "MCMCglmm" running because the required package "tensorA" is not in their official repository, and the version I pulled off CRAN is no good.

============== UPDATES (SEPT. 11, 2010) ========================

The problem with MCMCglmm seems to be fixed for the W32 system: the package "tensorA" was added to the official repository so I can run my favorite package for mixed effect modeling now. I still cannot use it on Win64 system.

## Wednesday, September 08, 2010

### Spell checking with Eclipse

I like Eclipse as a LaTeX and Sweave platform. This web page explains how to do spell checking with Eclipse, which got me confused for quite a while.

## Saturday, September 04, 2010

### Xubuntu 10.04

I finally decided to upgrade my 4-year old Sony SZ laptop from Ubuntu 9.04 to Xubuntu 10.04. The main reason was that it is getting more and more difficult to keep up with some of the software I use on daily basis on a distro that is almost two years old.

The new system is snappy, and this xfce-based distro works really on this old laptop.

The new system is snappy, and this xfce-based distro works really on this old laptop.

## Thursday, September 02, 2010

## Wednesday, September 01, 2010

### The equinox theme is getting better and better

I especially like the new equinox evolution them and the equinox evolution squared window border. I am very pleased about my computer UI now.

## Tuesday, August 31, 2010

## Monday, August 30, 2010

## Sunday, August 29, 2010

## Tuesday, August 24, 2010

### R-INLA

From a quick look at the web site, R-INLA seems to be able to carry out bayesian analysis without the actual MCMC sampling. This makes it possible to quickly try various different model specifications without hours of waiting time. This package looks very interesting and definitely deserves more attention.

Since the Linux version is binary, I have to manually set the two main files so they are allowed to run. I hope the team can soon make a version ready for CRAN.

Since the Linux version is binary, I have to manually set the two main files so they are allowed to run. I hope the team can soon make a version ready for CRAN.

## Monday, August 23, 2010

### Runjags

To my knowledge, there are three interface packages between R and JAGS: rjags, R2jags, and runjags. I just realized that with runjags, you don't have to have a separate JAGS model file but to embed it in your R source file. This certainly has its appeal and might be considered as a unique advantage of runjags.

## Sunday, August 22, 2010

### Introduction to Data Technologies

Here is another free data analysis book that can be used to teach students to work with different types of data.

## Saturday, August 21, 2010

## Monday, August 16, 2010

### Revolution R Enterprise

I realized that the Revolution R Enterprise has only a version for Redhat Linux but not Ubuntu. Too bad.

## Sunday, August 08, 2010

### R package for hidden Markov model

The depmixS4 package estimates hidden Markov models, among many others. I found it very useful. Here is a short into. paper.

## Wednesday, August 04, 2010

### Integrating R with Geany

According to landroni, R can also be easily integrated with Geany:

Download 0.19 (non-optional), then you need to set send_selection_unsafe=true in geany.conf and assign "Send selection to terminal" to a ctrl+r keybinding or similar.

As long as your Geany installation has support for the embedded virtual terminal emulator (VTE), you're ready to go (as far as I know, this requirement excludes Windows). Start R in the terminal, write some R code and send the line or selection with ctrl+r.

As long as your Geany installation has support for the embedded virtual terminal emulator (VTE), you're ready to go (as far as I know, this requirement excludes Windows). Start R in the terminal, write some R code and send the line or selection with ctrl+r.

## Monday, August 02, 2010

## Sunday, August 01, 2010

### NetLogo-R-Extension

Here is a bridge between NetLogo and R, which looks very interesting. But I was not able to get it work on my Ubuntu box (running the latest R).

### Programming in NetLogo

Here is a nice introduction to NetLogo from a programming language perspective.

## Thursday, July 29, 2010

### Another tablet, looking good

Based on this post, Motorola is going to release an Android-based tablet in the coming fall.

## Wednesday, July 28, 2010

### Pdf support in ebook reader

Based on this review, the Sony reader (daily edition) has better pdf support than both nook and kindle, and should be the #1 choice for handling pdf books (now costs $249). Here is a more detailed explanation of how sony reader deals with pdf files.

This web site deserves some attention, and this is an excellent review.

This web site deserves some attention, and this is an excellent review.

## Monday, July 26, 2010

### Ebook reader price cut

All major ebook readers have recently experienced major price cut (for example, Kindle DX, Nook). It seems that the Apple Ipad has something to do with it.

### Looking forward to HP's webOS-based slate computer

From what I have read, webOS is a really cool and mature Linux-based OS for handhold device. Since HP is planning to release a webOS-based slate computer ("palmpad"?) later this year, I think it will be worth waiting for.

## Tuesday, July 20, 2010

## Monday, July 19, 2010

### Flexmix taks long time...

I am running a mixture logit model on a sample of 10,000 cases. The model takes a long time (3-4 hours) to run, which makes it nearly impossible to try different specifications and such. It probably makes sense to use a subsample for exploratory analysis then fit the final model on the full sample.

This post describes how to draw a random sample in R.

This post describes how to draw a random sample in R.

## Sunday, July 18, 2010

### Npmlreg and Flexmix

In my attempt to estimate a mixture regression model for the long-term impact of prenatal famine exposure, I compared results obtained from NPMLREG and Flexmix using the "rainfall" data set came with the package "forward" and was able to achieve identical results from:

m.np <- allvc(cbind(Cases, Total-Cases) ~ Rain, random= ~1|ID, data=rainfall, k=2, family=binomial(logit))

and

m.mix <- stepFlexmix(cbind(Cases, Total-Cases) ~ 1|ID, model=FLXMRglmfix(family="binomial", fixed=~Rain), k=2, nrep=5, data=rainfall)

To test the random effect model, I created the new variable ID by:

rainfall$ID <- seq(from=1, to=34, by=1)

With Flexmix, it is possible to go a step further and let the treatment variable to vary between latent groups by:

m.mix <- stepFlexmix(cbind(Cases, Total-Cases) ~ Rain|ID, model=FLXMRglmfix(family="binomial"), k=2, nrep=5, data=rainfall)

## Saturday, July 17, 2010

### Latent variable analysis package for R

This is another R package for latent variable analysis. I hope it can soon include multilevel and mixture modeling capacity. Another promising package is OpenMx, which is also under active development and very promising. At this moment, OpenMx seems to have more features than lavaan (e.g. categorical dependent variable, mixture model, etc.), but lavaan promises to include them in the next year or so.

## Tuesday, July 13, 2010

### Online LaTeX table generater

It probably uses some Perl or Python script to add all the formatting marks, very convenient.

### Flexmix for binary dependent variable

It looks that FlexMix does not handle binary dependent variables directly, you have to transform it into "cbind(event, total-event)" format.

## Monday, July 05, 2010

### Convergence diagnostics with MCMCglmm

MCMCglmm does not produce more than one chain. So the way to check convergence is to run the model twice, each with different starting values, then conduct the diagnostic with:

gelman.diag(m1$Sol, m2$Sol)

### Handling missing data in R

Some estimation procedures, including MCMCglmm, does not handle missing values directly. Before estimating these models, missing values need to be excluded. From this post, the command to do this is:

newdata <- na.omit(mydata)

## Monday, June 28, 2010

## Thursday, June 24, 2010

### A few words describing population projection

Projection in demography is calculating survivors down cohort lines of those living at a given point in time, calculating births in each successive period, and adding a suitable allowance for migration. Of the various ways of looking at population dynamics, the one most convenient for the present purpose is the matrix approach (Chapter 7). Such calculations were used, before Leslie (1945) expressed them in matrix form, by Cannan (1895), Bowley (1924), and especially Whelpton (1936).

----------------------------------------------Page 272, Applied Mathematical Demography

Here is some intro. materials, implemented in Stata. Even though I have been using Stata for a long time, I have never spent any timing learning Mata, too bad.

----------------------------------------------Page 272, Applied Mathematical Demography

Here is some intro. materials, implemented in Stata. Even though I have been using Stata for a long time, I have never spent any timing learning Mata, too bad.

## Wednesday, June 23, 2010

### Using World Bank data in R

This R package makes it easy to use data and information from World Bank.

### A simple JAGS demo

Here is a simple JAGS demo. The "progress.bar="gui"" option for the "coda.samples()" function is very convenient, but was not documented in the manual.

### OpenBUGS and rbugs package

I just realized that OpenBUGS 3.10 has a native Linux version and it can be called directly from within R via the package "rbugs". OpenBUGS is probably closer to WinBUGS than JAGS, and it is always nice to alternatives!

It will be fun to do some benchmark test between OpenBUGS and JAGS.

It will be fun to do some benchmark test between OpenBUGS and JAGS.

## Tuesday, June 22, 2010

### Making glmmBUGS work with JAGS

To streamline the process of making glmmBUGS to work with JAGS, I made changes to the "writeBugsModel.R" and replaced all "inprod2" with "inprod" and "dflat()" with "dnorm(0.0,1.0E-6)". I will spend some time looking at other sources files and see if anything else should be changed.

### JAGS' glm module

I did not realize that new GLM module for JAGS 2.0 can be so effective in reducing the running time. Using the example I posed it yesterday (an example from the glmmBUGS package), without loading the glm module, it took 20.223 minutes to draw 10,000 samples from the posterior distribution; with the glm module loaded, it took 11.869 to get the same results:

Without glm module:

user system elapsed

20.170 0.020 20.223

With glm module:

user system elapsed

11.040 0.000 11.869

Without glm module:

user system elapsed

20.170 0.020 20.223

With glm module:

user system elapsed

11.040 0.000 11.869

### Dflat() prior

Here is an example of replacing the "dflat()" prior with "dnorm()" prior, yet without much explanation.

## Monday, June 21, 2010

### A question

I posted a question to the R-Mixed-Model mailing list but have not had any responses yet:

---------------------------------------------------------------------------------------

Here is the answer from Jens Ã…strÃ¶

---------------------------------------------------------------------------------------

Hi!

Not an expert by any means, but I got it to run by doing this:

1) change inprod2 to inprod in the model file (JAGS does not have

inprod2 function)

2) change ~dflat() to other uninformative prior, e.g.

~dnorm(0.0,1.0E-6) (Jags does not have dflat distribution)

3) specify/compile the model with e.g.

bac.jags<-jags.model("model.bug",data=bacrag$ragged,n.chains=4)

4) update it, update(bac.jags,1000)

5) Collect coda samples,

bacResult<-coda.samples(temp,names(getInits()),n.iter=10000,thin=10)

This seemed to converge well, (gelman.diag(bacResult))

Good luck

----------------------------------------------------------------------------------------

It works.

---------------------------------------------------------------------------------------

Dear All, I wonder if anybody has tried to make glmmBUGS work with JAGS. My attempt was not successful. Here is the simple example I copied from the glmmBUGS Vignettes: ------------------------ library(MASS) data(bacteria) bacterianew <- bacteria bacterianew$yInt = as.integer(bacterianew$y == "y") levels(bacterianew$trt) <- c("placebo", "drug", "drugplus") library(glmmBUGS) bacrag <- glmmBUGS(formula = yInt ~ trt + week, data=bacterianew, effects = "ID", modelFile="model.bug", family="bernoulli") names(bacrag$ragged) source("getInits.R") startingValues = bacrag$startingValues ------------------------ With WinBUGS, it runs well: ------------------------ library(R2WinBUGS) bacResult1 = bugs(bacrag$ragged, getInits, model.file="model.bug", n.chain=3, n.iter=2000, n.burnin=100, parameters=names(getInits()), n.thin=10) ------------------------ But with JAGS, I got error message "Error in FUN(50L[[1L]], ...) : invalid first argument" ------------------------ library(R2jags) jags.parms=names(getInits()) bacResult = jags(data=bacrag$ragged, n.chain=3, n.iter=2000, model.file=model.bug) ------------------------ I hope somebody can help me figuring out how to make this work. Many thanks. Best, Shige---------------------------------------------------------------------------------------

Here is the answer from Jens Ã…strÃ¶

---------------------------------------------------------------------------------------

Hi!

Not an expert by any means, but I got it to run by doing this:

1) change inprod2 to inprod in the model file (JAGS does not have

inprod2 function)

2) change ~dflat() to other uninformative prior, e.g.

~dnorm(0.0,1.0E-6) (Jags does not have dflat distribution)

3) specify/compile the model with e.g.

bac.jags<-jags.model("model.bug",data=bacrag$ragged,n.chains=4)

4) update it, update(bac.jags,1000)

5) Collect coda samples,

bacResult<-coda.samples(temp,names(getInits()),n.iter=10000,thin=10)

This seemed to converge well, (gelman.diag(bacResult))

Good luck

----------------------------------------------------------------------------------------

It works.

## Saturday, June 19, 2010

## Thursday, June 17, 2010

## Sunday, June 13, 2010

### The BIAS project

The BIAS project has some useful resources and information. And here is a course on spatial analysis using R.

## Thursday, June 10, 2010

## Wednesday, June 09, 2010

### When Linux stopped responding ...

My laptop (running Ubuntu 9.04) stopped responding when I was running update and doing many other things. This post saved my day.

## Sunday, June 06, 2010

### Using Multilevel Models to Analyze Couple and Family Treatment Data: Basic and Advanced Issues

Rarely published papers make their source code open and well organized. Here is an exception.

## Friday, June 04, 2010

## Friday, May 21, 2010

### Bayesian disease mapping

Here is a slide showing how to make R to extract results from WinBUGS and show them in maps.

## Thursday, May 20, 2010

## Sunday, May 16, 2010

### Bayesian change point analysis

Here is an example using MCMCpack. The same model can be done with WinBUGS (and thus JAGS), as demonstrated in "Bayesian Computation with R".

## Thursday, May 13, 2010

## Sunday, May 09, 2010

### Sweave with Emacs and ESS, problem solved!

With help from Seb, the nagging problem I had when trying to sweaving with Emacs and ESS (http://old.nabble.com/ESS-and-Sweave-td28339734.html) has been solved.

First of all, apply the patch for ess-swv.el, provided by Seb. Second, set the default pdf viewer for ESS by "C-h v ess-pdf-viewer-pref ", then change the value from "nill" to "evince". Now I can directly sweave my Rnw document by pressing "M-n s", and see the pdf output by pressing "M-s P".

It has been a quite pleasant experience interacting with free software developer/maintainer.

By the way, Emacs 23.2 just got out, and it looks great on my Ubuntu box (the font scale problem that comes with the proprietary Nvidia driver is gone).

First of all, apply the patch for ess-swv.el, provided by Seb. Second, set the default pdf viewer for ESS by "C-h v ess-pdf-viewer-pref ", then change the value from "nill" to "evince". Now I can directly sweave my Rnw document by pressing "M-n s", and see the pdf output by pressing "M-s P".

It has been a quite pleasant experience interacting with free software developer/maintainer.

By the way, Emacs 23.2 just got out, and it looks great on my Ubuntu box (the font scale problem that comes with the proprietary Nvidia driver is gone).

## Friday, May 07, 2010

### OpenOffice quick-starter caused Ubuntu unable to shutdown?

According to this post, the shutdown problem on Ubuntu 10.04 was caused by the Openoffice quick-starter program. Once the program is closed, the problem goes away. So strange.

## Thursday, May 06, 2010

### Problem with the proprietary Nvidia driver

The reason why my ubuntu 10.04 box cannot shutdown is the proprietary nvidia driver. I learned this the hard way: I backed up all my data and wiped out my existing installation and installed a fresh new copy, and experimented step by step. I was able to replicate the problem when I enabled the proprietary driver, and the problem when away when I disabled it.

Although I cannot enable the desktop effects, at least I have a stable and efficient computational environment now.

-------------------------------- May 8 ----------------------------------------------

This is not true. The problem with shutdown is caused by OpenOffice quick starter program, not the proprietary Nvidia driver. After enabling the driver, 1) boot-up time increased from 20s to 25s, 2) Emacs fonts increased several fold.

Although I cannot enable the desktop effects, at least I have a stable and efficient computational environment now.

-------------------------------- May 8 ----------------------------------------------

This is not true. The problem with shutdown is caused by OpenOffice quick starter program, not the proprietary Nvidia driver. After enabling the driver, 1) boot-up time increased from 20s to 25s, 2) Emacs fonts increased several fold.

## Wednesday, May 05, 2010

## Monday, May 03, 2010

### Compile aML with GCC 4.4.3 (64-bit)

I was able to successfully compile aML with GCC 4.4.3, which is part of Ubuntu 10.04. I only need to make changes on two files to avoid warning messages: "encrypt.f" (replacing "data str95/'Y,h=(ufSBF&H6W!LZ|boq1yJsX^Ax+K3U5r[mDlNOg7jn;-V:z]k/e *)v#2GQC$t''{c\\">%@~w?8dMREiT0a*9}<`\\._4PIp'/" with "data str95/'Y,h=(ufSBF&H6W!LZ|boq1yJsX^Ax+K3U5r[mDlNOg7jn;-V:z]k/e *)v#2GQC$t''{c\">%@~w?8dMREiT0a*9}<`\._4PIp'/") and "getdatainfo.f" (replacing "data backslash/'\\'/" with "data backslash/'\'/")

## Sunday, May 02, 2010

## Saturday, May 01, 2010

## Friday, April 30, 2010

### Ubuntu 10.04 cursor issue

I am having the same problems as these guys. The workaround seems to be to disable the desktop effect first, change the cursor, then enable desktop effect back. Hope to have a more convenient way way soon.

One good thing with the new release is that the annoying sound comes when the login screen shows up can be disabled. I tried many ways but still could not disable it with 9.10.

One good thing with the new release is that the annoying sound comes when the login screen shows up can be disabled. I tried many ways but still could not disable it with 9.10.

### Upgrade to Ubuntu 10.04

Just upgraded my Ubuntu box from 9.10 to 10.04. The process went all right, the overall system performance seems satisfactory, but I did not like the change to move the window controls from upper right side to the upper left side.

And I am still using the Clearlook theme.

And I am still using the Clearlook theme.

### JAGS for Bayesian analysis

One of the many reasons I like Jackman's book "Bayesian Analysis for the Social Sciences" is that it relies on JAGS as the primary computational platform (as opposed to WinBUGS). From what I know, it is the first Bayesian book that does this.

## Thursday, April 29, 2010

## Wednesday, April 28, 2010

### Sweave vs. pgfSweave

I think pgfSweave should replace the default of Sweave system because it brings significant improvements over the original Sweave. I certain have made it my default Sweave tool on my system.

------------------------------- EDITED ON MAY 1, 2010 --------------------------------------

Since StatET made Sweaving so simple and easy, I don't have incentive to work directly on LaTeX documents any more. It is actually much simpler and more intuitive to work directly on a Sweave documents and see results in the final PDF output.

------------------------------- EDITED ON MAY 1, 2010 --------------------------------------

Since StatET made Sweaving so simple and easy, I don't have incentive to work directly on LaTeX documents any more. It is actually much simpler and more intuitive to work directly on a Sweave documents and see results in the final PDF output.

## Tuesday, April 27, 2010

### StatET does not work with OpenJDK

After several failed attempts to get StatET to work on my older Ubuntu box, I realized that it does not work with OpenJDK. After replacing the default OpenJDK with the official Sun JDK, everything works fine.

### Word wrap in eclipse

## Friday, April 23, 2010

### Trouble with ESS and Sweave

Last time I tried to sweave a document from with Emacs+ESS, I was using an earlier version of ESS (the current version is 5.8), and things seemed to be fine. Today when I tried to sweave a simple document and produced PDF output, I got error message of ".pdf": exited abnormally with code 2". I posted a message to the ESS mailing list, but so far no solutions have been reported.

Here is a very simple .Rnw file I used to reproduce the error:

Here is a very simple .Rnw file I used to reproduce the error:

--------------------------------------- \documentclass[12pt]{article} %\usepackage{tikz} %\usepackage{fullpage} \usepackage{Sweave} \begin{document} This document outlines the analysis to be carried out to look at the relationship between prenatal famine exposure and fetal loss risk. \end{document} -----------------------------------------

`Before this can be fixed, I will rely on Eclipse for all my sweaving. `

---------------- April 26 ------------------------

`Here is my own solution: `

```
```

from line 177 of ess-svy.el:- (shell-command (concat - (if (and ess-microsoft-p (w32-shell-dos-semantics)) - "start \"" pdfviewer "\" \"" namestem ".pdf\"" - "\"" pdfviewer "\" \"" namestem ".pdf\" &")))) + (shell-command (concat+ "\"" pdfviewer "\" \"" namestem ".pdf\"")))

### Rumour: Apple To Acquire ARM?

I believe that this, if not a rumor, is against most people's interest, manufactures and consumers alike.

## Tuesday, April 20, 2010

## Monday, April 12, 2010

### Rampant cheating hurts China's research ambitions

This is interesting. Smells almost like the Great Leap Forward.

## Friday, April 09, 2010

### GLMM using DPpackage

I was able to fit a semi-parametric Bayesian GLMM model using DPpackage. It took me many hours to sample from the posterior distribution (DPM prior):

MCMC scan 1000 of 5000 (CPU time: 18950.080 s)

MCMC scan 2000 of 5000 (CPU time: 22510.100 s)

MCMC scan 3000 of 5000 (CPU time: 28293.830 s)

MCMC scan 4000 of 5000 (CPU time: 35111.930 s)

MCMC scan 5000 of 5000 (CPU time: 46726.330 s)

Which translates to 5.26, 6.25, 9.75, 12.98 hours. This makes it less suitable for routine (especially exploratory) data analysis.

I compared the results from DPpackage and that from MCMCglmm, and they are not that different, and the latter took only a small fraction of the time required by the former!

The lack of difference in results puzzled me. I compared from results from random effect logistic regression assuming Gaussian random effect and results from NPML, assuming a nonparametric distribution of the random effect, the differences are quite significant.

---------------------------- UPDATED ON APRIL 11 ----------------------------------------------------------------

Using DP prior instead of DPM prior, it took about 4.7 hours to run the model. The results are slightly different and the parameter I am interested in increased from .41 to .42. Now I am trying PT prior and see how it goes.

DPpackage is a exciting new tool for applied researchers, and A LOT OF new and cool things can be done with it. With convenient new Bayesian tools like MCMCpack, MCMCglmm, and DPpackage, I will not be surprised to see more Bayesian publications coming out in social sciences.

I compared the results from DPpackage and that from MCMCglmm, and they are not that different, and the latter took only a small fraction of the time required by the former!

The lack of difference in results puzzled me. I compared from results from random effect logistic regression assuming Gaussian random effect and results from NPML, assuming a nonparametric distribution of the random effect, the differences are quite significant.

---------------------------- UPDATED ON APRIL 11 ----------------------------------------------------------------

Using DP prior instead of DPM prior, it took about 4.7 hours to run the model. The results are slightly different and the parameter I am interested in increased from .41 to .42. Now I am trying PT prior and see how it goes.

DPpackage is a exciting new tool for applied researchers, and A LOT OF new and cool things can be done with it. With convenient new Bayesian tools like MCMCpack, MCMCglmm, and DPpackage, I will not be surprised to see more Bayesian publications coming out in social sciences.

## Tuesday, April 06, 2010

### Five open source alternatives to the iPad

Here you can find some alternatives to Apple's ipad, running on Linux.

## Monday, April 05, 2010

## Sunday, April 04, 2010

### A Practical Guide to Geostatistical Mapping

Good book and good introduction to geostatistical modeling.

## Friday, April 02, 2010

### BAT: Bayesian analysis toolkit

BAT is a Bayesian analysis software. Looks like it needs quite some programming to get anything useful done.

----------------------- added on April 4 --------------------------

I seem to have troubles compiling it on my 64-bit Ubuntu box.

----------------------- added on April 4 --------------------------

I seem to have troubles compiling it on my 64-bit Ubuntu box.

## Thursday, April 01, 2010

### Apple got slapped

According to this post, Apple does not seem to be the first one to have the idea of "multitouch".

## Wednesday, March 31, 2010

### Econometrics Toolbox for Matlab

The Econometrics Toolbox for Matlab has just been updated. This is a quite comprehensive applied econometrics toolkit and I found it to be very useful.

## Sunday, March 28, 2010

### Difference in differences

The DID estimator is a useful tool to obtain a more fine-grained estimate of famine exposure on a variety of health and demographic outcomes, as long as the famine severity measure is good. However, in my case (fecundity), the provincial-level excess famine mortality level does not see to work well. I suspect that the provincial-level mortality measure is too crude and may not be able to adequately reflect individual's nutritional status.

Labels:
causal inference,
difference-in-difference

## Monday, March 22, 2010

### Making Simple R Package

This post explains how to create a simple R package that contains R and FORTRAN source code.

## Friday, March 19, 2010

### The "apsrtable" is getting better

Now we have both "memisc" and "apsrtable" for producing publication-ready tables from model results.

## Thursday, March 18, 2010

### BUGS language: strength and weakness

Here is an informative introduction to the strength and weakness of the BUGS language in conducting Bayesian data analysis. Since my analysis often utilize large data (tens of thousands, even millions), BUGS may not be a viable option in many cases. I hope the soon-to-come new version of JAGS can overcome some of the weaknesses mentioned in Gelman's post.

## Wednesday, March 17, 2010

### Stata 11

Got my Stata 11 today, was a little disappointed because the new program editor feature is not available on Linux platform.

I like the new multiple imputation feature though.

I like the new multiple imputation feature though.

### Measuring the length of time to run a function

This post describes how to time the run time of a R function.

## Tuesday, March 16, 2010

## Sunday, March 14, 2010

### Here comes Linux's iPad clones

Linux-based IPad clone will soon emerge and will dominate the market, based on this post.

## Saturday, March 13, 2010

## Thursday, March 11, 2010

## Wednesday, March 10, 2010

## Saturday, March 06, 2010

### OpenBUGS has a new home

Looks like OpenBUGS now has a new home! And now the command line tools can actually run on my Linux box.

## Friday, March 05, 2010

### GLMM revisted

A short while ago, I reported some discrepancies between the results produced by "lme4" and other R packages as well as Stata. Today I upgraded to the most recent version of "lme4a" and re-ran my model. The error of false convergence disappeared and the new results are very much in line with other packages.

I am glad that the problem eventually got sorted out.

For this version, the adaptive GH quadrature has not been implemented yet; so the only option to estimate a GLMM is laplace transformation.

I am glad that the problem eventually got sorted out.

For this version, the adaptive GH quadrature has not been implemented yet; so the only option to estimate a GLMM is laplace transformation.

## Tuesday, March 02, 2010

### Several screenshots

After some tweaking, I finally got Eclipse+StatET working the way I want. The first figure shows a running R session; the second figure shows the maximized running R session. The following shortcut keys are very helpful:

- Ctrl + A: select what you to run;
- Ctrl + R + R: run the selected program then go to the R console;
- Ctrl + M: maximize the R console;
- Ctril + T + 0: run Sweave of pgfSweave.

I like the way Eclipse organize the project and windows.

### Creating Beamer Handouts with Notes

This post provides some very helpful information. Unfortunately, the package is not designed to incorporate author's notes into the printed outline; instead, it is designed to print blank outline pages so the audience can jot down their own notes there.

## Saturday, February 27, 2010

### New version of Eclipse

A new version of Eclipse, v.3.5.2 is out. This one works better on my 64-bit Linux machine in general and works very well with StatET.

### An interesting paper

Ben Bolker has an interesting paper (outline of a paper) comparing different approaches to estimate GLMM in R environment, which is very helpful to what I am doing right now.

The paper pointed out the following options to fit GLMM using R:

The paper pointed out the following options to fit GLMM using R:

- glmer
- glmmML
- glmm (from the repeated package)
- glmmADMB
- MCMCglmm
- glmmBUGS
- glmmPQL
- BUGS (through R2WinBUGS)
- glmmAK

And I would like to add one more, npmlreg.

I am not aware of the glmmAK package before. From the first glance, it seems to be very promising in the sense that it seems to allow non-Gaussian random effect in a Bayesian framework, something similar to what npmlreg does with ML method.

============== edited on March 3 ====================

DPpackage is another package that can estimate GLMM in a Bayesian framework.

============== edited on March 5 ====================

ASReml/ASReml-R is another choice. It is not free software, but it does seem to have some unique strengths. Maybe I should download a demo copy and try it myself.

============== edited on March 29 ===================

hglm is another possibility.

I am not aware of the glmmAK package before. From the first glance, it seems to be very promising in the sense that it seems to allow non-Gaussian random effect in a Bayesian framework, something similar to what npmlreg does with ML method.

============== edited on March 3 ====================

DPpackage is another package that can estimate GLMM in a Bayesian framework.

============== edited on March 5 ====================

ASReml/ASReml-R is another choice. It is not free software, but it does seem to have some unique strengths. Maybe I should download a demo copy and try it myself.

============== edited on March 29 ===================

hglm is another possibility.

### Beamer themes

I did not know there are so many different themes that can be used for a Beamer presentation slide.

## Friday, February 26, 2010

### When using open source makes you an enemy of the state

According to this post, a US copyright lobby has long trying to make the case that using open source software is the same thing as using pirate software.

## Thursday, February 25, 2010

## Tuesday, February 23, 2010

## Monday, February 22, 2010

### Theory without code

As a demographer, I would not pay much attention to statistical methodological papers that do not describe how to implement their methodology either using existing software or using their own code. The reason is simple: I don't have the time and patience to gain a understanding of the methodology deep enough to be able to implement it myself; after all, it is the substantive research questions in demography that interest me most!

Open source statistical platform such as R, WinBUGS, and JAGS have made it so easy for statisticians to code their ideas into readily usable software, so the question is: why not?

Open source statistical platform such as R, WinBUGS, and JAGS have made it so easy for statisticians to code their ideas into readily usable software, so the question is: why not?

### More option for cure model

Looks like the cure model can also be estimated using another R package GAMLSS.

de Castro, M., V. G. Cancho, and J. Rodrigues. 2009. “A hands-on approach for fitting long-term survival models under the GAMLSS framework.” Computer Methods and Programs in Biomedicine.

## Saturday, February 20, 2010

### A few software packages for mixed models

Less well-known but definitely deserve attention:

All these softwares are free to use, but unfortunately none of them seem to be open sourced.

Information provided by this web site is very helpful.

Information provided by this web site is very helpful.

### GLMM for ecologists and evolutionary biologists

This web site is devoted to the use of GLMM in the filed of evolutionary biologists.

## Friday, February 19, 2010

## Thursday, February 18, 2010

## Tuesday, February 16, 2010

### Generalized linear mixed effect model problem

I am trying to compare cohort difference in infant mortality using generalized linear mixed model. I first estimated the model in Stata:

xi:xtlogit inftmort i.cohort, i(code)

which converged nicely:

Fitting comparison model:

Iteration 0: log likelihood = -1754.4476

Iteration 1: log likelihood = -1749.3366

Iteration 2: log likelihood = -1749.2491

Iteration 3: log likelihood = -1749.2491

Fitting full model:

tau = 0.0 log likelihood = -1749.2491

tau = 0.1 log likelihood = -1743.8418

tau = 0.2 log likelihood = -1739.0769

tau = 0.3 log likelihood = -1736.4914

tau = 0.4 log likelihood = -1739.5415

Iteration 0: log likelihood = -1736.4914

Iteration 1: log likelihood = -1722.6629

Iteration 2: log likelihood = -1694.9114

Iteration 3: log likelihood = -1694.6509

Iteration 4: log likelihood = -1694.649

Iteration 5: log likelihood = -1694.649

Random-effects logistic regression Number of obs = 21694

Group variable: code Number of groups = 10789

Random effects u_i ~ Gaussian Obs per group: min = 1

avg = 2.0

max = 9

Wald chi2(2) = 8.05

Log likelihood = -1694.649 Prob > chi2 = 0.0178

------------------------------------------------------------------------------

inftmort | Coef. Std. Err. z P>|z| [95% Conf. Interval]

-------------+----------------------------------------------------------------

_Icohort_2 | -.5246846 .1850328 -2.84 0.005 -.8873422 -.1620269

_Icohort_3 | -.1424331 .140369 -1.01 0.310 -.4175513 .132685

_cons | -5.214642 .1839703 -28.35 0.000 -5.575217 -4.854067

-------------+----------------------------------------------------------------

/lnsig2u | .9232684 .1416214 .6456956 1.200841

-------------+----------------------------------------------------------------

sigma_u | 1.586665 .1123528 1.381055 1.822885

rho | .4335015 .0347791 .3669899 .5024984

------------------------------------------------------------------------------

Likelihood-ratio test of rho=0: chibar2(01) = 109.20 Prob >= chibar2 = 0.000

Then I tried the same model using lme4:

m2 <- glmer(inftmort ~ (1|code) + as.factor(cohort), family=binomial, data=d)

I got:

Warning message:

In mer_finalize(ans) : false convergence (8)

And the results are quite different from what I got from Stata. I tried to estimate the model using "glmmML" and also got into trouble. This time the error message is:

Warning message:

In glmmML.fit(X, Y, weights, cluster.weights, start.coef, start.sigma, :

Hessian non-positive definite. No variance!

I then tried MCMCglmm:

prior=list(R=list(V=1, nu=0, fix=1), G=list(G1=list(V=1, nu=0)))

m2 <- MCMCglmm(inftmort ~ 1 + as.factor(cohort), random=~code, family = "categorical", data=d, prior=prior)

It seems to be working and produced estimates that are comparable to what Stata produced (not identical, of course):

Iterations = 3001:12991

Thinning interval = 10

Number of chains = 1

Sample size per chain = 1000

1. Empirical mean and standard deviation for each variable,

plus standard error of the mean:

Mean SD Naive SE Time-series SE

(Intercept) -5.7145 0.1805 0.005708 0.02295

as.factor(cohort)2 -0.5633 0.1788 0.005653 0.02194

as.factor(cohort)3 -0.1888 0.1471 0.004653 0.01912

2. Quantiles for each variable:

2.5% 25% 50% 75% 97.5%

(Intercept) -6.0464 -5.8324 -5.7251 -5.61120 -5.2817

as.factor(cohort)2 -0.8977 -0.6985 -0.5551 -0.43084 -0.2371

as.factor(cohort)3 -0.4614 -0.2947 -0.1867 -0.09644 0.1232

This is puzzling.

xi:xtlogit inftmort i.cohort, i(code)

which converged nicely:

Fitting comparison model:

Iteration 0: log likelihood = -1754.4476

Iteration 1: log likelihood = -1749.3366

Iteration 2: log likelihood = -1749.2491

Iteration 3: log likelihood = -1749.2491

Fitting full model:

tau = 0.0 log likelihood = -1749.2491

tau = 0.1 log likelihood = -1743.8418

tau = 0.2 log likelihood = -1739.0769

tau = 0.3 log likelihood = -1736.4914

tau = 0.4 log likelihood = -1739.5415

Iteration 0: log likelihood = -1736.4914

Iteration 1: log likelihood = -1722.6629

Iteration 2: log likelihood = -1694.9114

Iteration 3: log likelihood = -1694.6509

Iteration 4: log likelihood = -1694.649

Iteration 5: log likelihood = -1694.649

Random-effects logistic regression Number of obs = 21694

Group variable: code Number of groups = 10789

Random effects u_i ~ Gaussian Obs per group: min = 1

avg = 2.0

max = 9

Wald chi2(2) = 8.05

Log likelihood = -1694.649 Prob > chi2 = 0.0178

------------------------------------------------------------------------------

inftmort | Coef. Std. Err. z P>|z| [95% Conf. Interval]

-------------+----------------------------------------------------------------

_Icohort_2 | -.5246846 .1850328 -2.84 0.005 -.8873422 -.1620269

_Icohort_3 | -.1424331 .140369 -1.01 0.310 -.4175513 .132685

_cons | -5.214642 .1839703 -28.35 0.000 -5.575217 -4.854067

-------------+----------------------------------------------------------------

/lnsig2u | .9232684 .1416214 .6456956 1.200841

-------------+----------------------------------------------------------------

sigma_u | 1.586665 .1123528 1.381055 1.822885

rho | .4335015 .0347791 .3669899 .5024984

------------------------------------------------------------------------------

Likelihood-ratio test of rho=0: chibar2(01) = 109.20 Prob >= chibar2 = 0.000

Then I tried the same model using lme4:

m2 <- glmer(inftmort ~ (1|code) + as.factor(cohort), family=binomial, data=d)

I got:

Warning message:

In mer_finalize(ans) : false convergence (8)

And the results are quite different from what I got from Stata. I tried to estimate the model using "glmmML" and also got into trouble. This time the error message is:

Warning message:

In glmmML.fit(X, Y, weights, cluster.weights, start.coef, start.sigma, :

Hessian non-positive definite. No variance!

I then tried MCMCglmm:

prior=list(R=list(V=1, nu=0, fix=1), G=list(G1=list(V=1, nu=0)))

m2 <- MCMCglmm(inftmort ~ 1 + as.factor(cohort), random=~code, family = "categorical", data=d, prior=prior)

It seems to be working and produced estimates that are comparable to what Stata produced (not identical, of course):

Iterations = 3001:12991

Thinning interval = 10

Number of chains = 1

Sample size per chain = 1000

1. Empirical mean and standard deviation for each variable,

plus standard error of the mean:

Mean SD Naive SE Time-series SE

(Intercept) -5.7145 0.1805 0.005708 0.02295

as.factor(cohort)2 -0.5633 0.1788 0.005653 0.02194

as.factor(cohort)3 -0.1888 0.1471 0.004653 0.01912

2. Quantiles for each variable:

2.5% 25% 50% 75% 97.5%

(Intercept) -6.0464 -5.8324 -5.7251 -5.61120 -5.2817

as.factor(cohort)2 -0.8977 -0.6985 -0.5551 -0.43084 -0.2371

as.factor(cohort)3 -0.4614 -0.2947 -0.1867 -0.09644 0.1232

This is puzzling.

Labels:
Bayesian,
GLMM,
multilevel model,
R,
Stata

## Sunday, February 14, 2010

### Spreadsheet to LaTeX solutions

Excel2latex and spreadsheet2latex seem to be the only two solutions at the time.

## Saturday, February 13, 2010

### Cure model using R

The package "nltm" seems to be able to estimate proportional hazard and proportional odds cure models. I will do some experiments and see how it goes.

## Friday, February 12, 2010

### Reshaping data in R

This post explains how to reshape data in R and how to created nicely formated LaTeX using the xtable package.

## Thursday, February 11, 2010

### Handling hierarchical data structure in R

R has a comprehensive set of tools for the handling of hierarchical data structure. The most widely used package is probably "nlme" and "lme4", contributed by Douglas Bates and colleagues. While "nlme" is older and probably more mature than "lme4", it seems that the latter is increasingly taking over the former because it comes with full set of tools to handle non-Gaussian distributions.

Based on "nlme" and "lme4", the package "mgcv" and "gamm4" can estimate generalized additive model on multilevel data structure. This is especially useful if one have continuous covariates such as age or calendar year, whose effects are not very well understood in the literature and grouping may distort the results.

If you don't feel comfortable with the assumption that the unobserved heterogeneity components are normally distributed, then the package "npmlreg" can be used to conduct sensitivity analysis through nonparametric maximum likelihood method.

The new "MCMCglmm" package extends R's modeling capacity to multilevel and multivariate models such as those that can be estimated using aML and Sabre. Sure, MCMCglmm relies on MCMC instead of maximum likelihood as the method of estimation, which may be slower especially when the data is huge (as is often the case in demographic studies), but as the author indicated, this package has already gained significant performance advantage over traditional Bayesian packages such as WinBUGS and JAGS, and is almost as general as the latter, with respect to multilevel regression-like models.

The Sabre team also produced a R wrapper of their package for multilevel multiprocess modeling. Unfortunately, their R package is based on an outdated version (v.5 instead of v.6) of Sabre and the project seems to be halted due to lack of fund.

It is not difficult to conduct Bayesian multilevel analysis in R using WinBUGS and JAGS via packages such as "R2WinBUGS" and "R2jags". These packages can estimate a very large number of models. This means that almost anything one can think of can be model with R.

Another package, ADMB, also has several interfaces to R. I hope they are getting better and better because ADMB is such a powerful and versatile package that, if one can write down the likelihood function, it can maximize it for you.

Based on "nlme" and "lme4", the package "mgcv" and "gamm4" can estimate generalized additive model on multilevel data structure. This is especially useful if one have continuous covariates such as age or calendar year, whose effects are not very well understood in the literature and grouping may distort the results.

If you don't feel comfortable with the assumption that the unobserved heterogeneity components are normally distributed, then the package "npmlreg" can be used to conduct sensitivity analysis through nonparametric maximum likelihood method.

The new "MCMCglmm" package extends R's modeling capacity to multilevel and multivariate models such as those that can be estimated using aML and Sabre. Sure, MCMCglmm relies on MCMC instead of maximum likelihood as the method of estimation, which may be slower especially when the data is huge (as is often the case in demographic studies), but as the author indicated, this package has already gained significant performance advantage over traditional Bayesian packages such as WinBUGS and JAGS, and is almost as general as the latter, with respect to multilevel regression-like models.

The Sabre team also produced a R wrapper of their package for multilevel multiprocess modeling. Unfortunately, their R package is based on an outdated version (v.5 instead of v.6) of Sabre and the project seems to be halted due to lack of fund.

It is not difficult to conduct Bayesian multilevel analysis in R using WinBUGS and JAGS via packages such as "R2WinBUGS" and "R2jags". These packages can estimate a very large number of models. This means that almost anything one can think of can be model with R.

Another package, ADMB, also has several interfaces to R. I hope they are getting better and better because ADMB is such a powerful and versatile package that, if one can write down the likelihood function, it can maximize it for you.

### App Store Model Provides Security, Stability: Evidence, Please?

Here is an interesting post on the claim that Iphone has to be locked down to make it safe and stable.

## Wednesday, February 10, 2010

### Loglinear models using R

For those sociologists who want to estimate complicated loglinear models (e.g. Goodman's RC model) using R, the package "VGAM" seems to be a good choice.

## Tuesday, February 09, 2010

### Mixed Effects Models and Extensions in Ecology with R

This is a truly wonderful book. It deals with advanced statistical models in the way that people without advanced mathematical background can follow easily. I think I may have found a perfect textbook for my graduate course in advanced demographic analysis. All I need to do is to introduce some demographic examples to make it easier to understand to demography students.

## Sunday, February 07, 2010

### Skype on Ipad via 3G

## Saturday, February 06, 2010

## Friday, February 05, 2010

### Graphics using Stata vs. R

Here is an informative blog post comparing simple histogram produced using Stata and various methods using R.

## Saturday, January 30, 2010

### An example of MCMCglmm

Here is a very useful example of how to estimate multinomial logit with random effect using MCMCglmm package.

## Friday, January 29, 2010

## Thursday, January 28, 2010

## Wednesday, January 27, 2010

### IPad

Apple's new toy, ipad, looks interesting. But 10 hour's battery life per charge seems to be less impressive than I originally thought. Some laptop computers can do that. Also, I was surprised to find out its operating system is IPhone OS instead of OSX.

Being an Apple product also means that, in order to do anything useful, you have to buy and buy and buy...

## Monday, January 25, 2010

### Submitting to a sociology journal

This blog has some useful information for those who are thinking about submitting to a sociology journal.

## Monday, January 18, 2010

## Wednesday, January 13, 2010

### Word count for PDF (LaTeX) file

## Friday, January 08, 2010

## Thursday, January 07, 2010

## Wednesday, January 06, 2010

### Apophenia

The January 2010 source release of the Apophenia library cannot be compiled on my Ubuntu machines (64-bit 9.10 and 32-bit 9.04); but the developmental version (downloaded via git) compiles flawlessly. Not sure what is going on.

### R AnalyticFlow

R AnalyticFlow is an interesting piece of software. From its web site:

"R AnalyticFlow is a software which enables

state-of-the-art data analysis by drawing analysis flowcharts.

You can effectively share processes of data analysis

in collaborative works"

state-of-the-art data analysis by drawing analysis flowcharts.

You can effectively share processes of data analysis

in collaborative works"

## Monday, January 04, 2010

### A customized Ubuntu-based Chinese distro

Linux Deepin looks promising. But it does not seem to provide a 64-bit version.

## Sunday, January 03, 2010

## Friday, January 01, 2010

### Create LaTeX table for descriptive statistics using Estout

I want to create a table of descriptive statistics. To be more specific, The table should contains 6 rows and 4 columns, something like this:

Descriptive Statistics

-----------------------------------------------------------

Cohort A Cohort B Cohort C

-----------------------------------------------------------

Urban Residence 0.263 0.313 0.264

Years of Schooling 5.585 7.280 7.592

Ethnic Majority 0.928 0.918 0.915

Want Children 0.990 0.992 0.991

Ever Pregnant 0.991 0.987 0.992

-----------------------------------------------------------

Observations 3553 2450 4694

-----------------------------------------------------------

The official document for Estout provides some clues but does not make it crystal clear. Here are the steps I followed:

- Generate an arbitrary variable;
- Run regression, separately for each cohort;
- Collect the results;
- Make the table

Here are the code:

eststo clear

gen y=uniform()

quietly regress y urban eduy han want_birth preg if cohort==1, noconstant

estadd mean

eststo d1

quietly regress y urban eduy han want_birth preg if cohort==2, noconstant

estadd mean

eststo d2

quietly regress y urban eduy han want_birth preg if cohort==3, noconstant

estadd mean

eststo d3

esttab d1 d2 d3, cells(mean(fmt(%8.3f))) label nonotes nonumber nodepvars mtitles("Cohort A" "Cohort B" "Cohort C") width(1\hsize) title(Descriptive Statistics)

Subscribe to:
Posts (Atom)