Saturday, December 25, 2010

Friday, December 17, 2010

Where to find good data sets

This post provides some good information on where to find good data sets for statistical analysis.

Thursday, December 09, 2010

Programming languages on the rise

This post list seven programming languages that are rising, from Python to Ruby to R to... Cobol.

Monday, December 06, 2010

Very interesting blog

That is a very interesting blog on R + LaTeX + Sweave!

Saturday, December 04, 2010

Comparison of results

I am doing a simple comparison of different estimation procedures in dealing with a simple binomial model. Here is where I got started:



# Using INLA
formula = r ~ x1*x2 + f(plate, model="iid")
mod.inla = inla(formula, data=Seeds, family="binomial", Ntrials=n)

# Using npmlreg <- alldist(cbind(r, n-r) ~ x1*x2 , random=~1, data=Seeds, family=binomial, random.distribution="gq")

# Using MCMCglmm
prior <- list(R=list(V=1, nu=0.002))
mod.mcmc <- MCMCglmm(cbind(r, n-r) ~ x1*x2, family="multinomial2", data=Seeds, prior=prior)

# Using DPpackage
I will keep updating by adding new things (estimation procedures, predictive simulations, etc.)

Wednesday, December 01, 2010

Tuesday, November 30, 2010

Wrapping text around figures in LaTeX

I found this page very helpful in showing how to create figures with text wrapping around. And this one for making tables.

I am using the LaTeX package by Bruce Donald for my next NIH proposal.

Sunday, November 28, 2010

Top 5 Most Anticipated Tablets of 2010-2011

This is the list. What's interesting is that the Adam can be used as a tablet and a e-book reader and the reader mode is much more power efficient.

Friday, November 26, 2010

Computational tools for Bayesian analysis

The increasing number of R-oriented Bayesian computational tools such as MCMCpack, MCMCglmm, DPpackage, R-INLA, spBayes, have made BUGS less and less crucial for day to day Bayesian computation. Honestly, I cannot figure out a single analysis that BUGS can do but at least one of the above mentioned packages cannot.

Saturday, November 20, 2010

Wednesday, November 17, 2010

Emacs, LaTeX, and forward search

This post explains how to set the .emacs file to get the stuff work.

Here is another very helpful post on how to set up Emacs with AUCTeX.

Wednesday, November 10, 2010

Openair project

The openair project creates open source tool for analyzing air pollution data.

PPA for Emacs

Here is a PPA site for the most recent version of Emacs.

Debian package for the most recent version of AucTeX, and ESS. It is generally easier to use these binary packages than building from sources.

Wednesday, November 03, 2010


The CppBUGS project is really interesting. I hope it can be tightly integrated with R.

Sunday, October 31, 2010

Cruz Tablet

This looks interesting. The most impressive part is the battery life: up to 10 hours.

Friday, October 29, 2010

Ggplot2, some tricks

Here is how to remove the grey in the background as well as the grid, a modification of the default black/white theme.

Lotus Symphony 3

A new version of Lotus Symphony is out. Interesting that IBM does not provide a 64-bit build for Linux.

Thursday, October 28, 2010

Lattice vs. ggplot2

Both lattice and ggplot2 seem really interesting and worthy of learning. But I only have time to learn one of them, and the choice is not an easy one.

Here is an awesome reference; this blog is generally very interesting; and here is something about the "brew" package.

Friday, October 22, 2010

Useful blog

Here is another useful blog. This post regarding how to use EPS directly in LaTeX file is especially helpful.

Managing a statistical analysis project

This post is very helpful for people who are serious in statistical analysis.

For a wider use of R

Two things that are crucial for a wider use of R among applied researchers. The first one is data manipulation/reshaping tool. I think the package "reshape" and "reshape2" have done good job and have largely removed the barrier. The second one is a universal output processing engine, something like the "eststo, esttab, estsadd" combo in Stata. The package "xtable", "memisc", "aprstable", and "estout" have made some progress, but they all are limited to some estimation procedures and none of them provides a universal output process engine for "all" estimation procedures. One of the main reason I am still paying for Stata is the convenience of directly producing publication quality LaTeX tables from any estimation procedures without having to manually change anything.

I wish more attention can be devoted to this latter issue.

Diagram in LaTeX

This web site provides helpful information on how to make diagrams using LaTeX.

Thursday, October 21, 2010

Samsung Galaxy Tab

According to this source, the Samsung Galaxy Tab will cost about $600, which I think is a little too much.

Wednesday, October 20, 2010

The "tikzDevice" package

The tikzDevice package is quite amazing. Here are two graphs I just made, with (lower) and without (upper) using the tikzDevice package. The difference in quality is huge.

Since the LaTeX source file for the figure is quite large in size and may take significantly longer to compile, it makes sense to export the file into PDF file and include in the main LaTeX file in the usual way. That way, one gets the pleasing effect produced by TikZ and does not suffer from longer compilation time.

Also, the package "pgfSweave" is also very helpful if one wants to combine R sessions with LaTeX session.

Here is a tikz editor that works great on my 32-bit Ubuntu box but cannot be compiled on my 64-bit box. It can produce PDF, EPS, and PNG files directly from LaTeX code. TpX is another editor for tikz. It is currently Windows-only but a alpha version based on Lazarus is under development.

Some additional information is provided by this post.

The lazy economist

Here is a cool blog.

Monday, October 18, 2010

Ubuntu 11.04

Looks like Ubuntu 11.04 will bring some significant changes: new GCC, new Gnome, and new file system.

Friday, October 15, 2010

Harddisk partition guide

Here is a detailed guide to hard drive partitioning for Ubuntu.

Tuesday, October 12, 2010


The Stata command "labelbook" can be used to create output that shows both the raw values and value labels side by side, a feature very helpful dealing with second-hand data sources.

Monday, October 04, 2010

Estimation of quantile treatment effects with Stata

Here is a very useful article published on Stata Journal.

Linux vs. OSX

Here is another interesting comparison, and some interesting comments there.

Saturday, October 02, 2010

Friday, October 01, 2010

OpenMx 1.0

Just realized that this SEM package for R has released version 1.0. Great...

Using emacs with rcs

I find this blog post regarding how to use emacs and rcs together very helpful.

Thursday, September 30, 2010

Nice wallpaper

Here are some nice wallpapers for Linux.

Tuesday, September 28, 2010


According to this, RIM will release a tablet running QNX, cool!

QNX may be even better than WebOS (certainly more mature).

Document Foundation

LibreOffice, a new OpenOffice branch. People have learned their lesson from what happened to OpenSolaris, I guess.

By the way, by carefully reading the "natbib" package manual, I think I figure out how to change the default in-text citation from "...(Caldwell, 1979)" to "..(Caldwell 1979)".

Sunday, September 26, 2010

BibTeX style file

Here is a helpful post about how to tailor .bst file, with links to may more resources.

Free as in Freedom

Here is a good online book. And here are more of them.

Saturday, September 25, 2010


Here are some tips to speedup matrix computation in R. The author did not provide any benchmark though.


The new version of this package (0.6) is quite serviceable already. It also comes with a 17-page document and some interesting examples.

Wednesday, September 22, 2010

Handbook on Impact Evaluation: Quantitative Methods and Practices

Here is a the book, did not receive very positive review from the Stata community though.

Tuesday, September 21, 2010

C++ matrix library

This is a help post regarding matrix computation in C++ using the Armadillo library. This library seems to have a more comprehensive collection of numerical routines.

Sunday, September 19, 2010

Saturday, September 18, 2010

Thursday, September 16, 2010

Improving R performance

The combination or Rcpp and Inline packages seems to be really useful to improve the performance of R, as suggested in this post. Inline also works with FORTRAN, which I like better for numerical computations.

Monday, September 13, 2010

LD_LIBRARY_PATH and alternative

I compiled and installed the most recent version of Armadillo library, and tested a sample program here. Both compilation and build went well, but when I tried to run the resulted program, I got an error message saying that the library files were not found. I followed the suggestions found here and created a new file " /etc/" which has only one line of contents:
Which seems to do the trick and,from my understanding, is better than have a line in my .bashrc for "LD_LIBRARY_PATH".

==================== EDITED SEPT. 24, 2010 ==============================

Now I realized that every time a new library is installed, I need to run "sudo sudo ldconfig". 

Friday, September 10, 2010

Autostart R console

An recent patch to the StatET fixed the autostart function. Now the R console starts automatically when I launch Eclipse, which is exactly what I want.

Eclipse 3.6 no longer has the "run last launched tools" shortcut key, but this autostart thing is probably even better.

Oracle vs. Google: FSF's Response

FSF's response to the law suite between Oracle and Google deserves some attention. 

Thursday, September 09, 2010

Revolution R Enterprise, some thoughts

I requested an academic copy of the Revolution R Enterprise today. Since they only have Windows or Redhat version, I installed it on an old spare machine dual-booting Linux and Windows. I like what they did with the IDE (the debugger is nice) and some of their tailored made packages dealing with large data and parallel processing. It should be a good choice to introduce R to students. I guess I am not surprised to see the Windows version is made using MS Visual Studio, but having to install a load of VS-related stuff just to install this relatively simple software (with respect to the GUI, of course) make me feel a little, ...unhappy.

I think this is a serious alternative (for the first time) to the "official" R on Windows platform. And Revolution is generous to let faculty and students to use their software for free. If I decide to teach my students R, this software is absolutely my first choice.

By the way, I was not able to get the package "MCMCglmm" running because the required package "tensorA" is not in their official repository, and the version I pulled off CRAN is no good.

============== UPDATES (SEPT. 11, 2010) ========================

The problem with MCMCglmm seems to be fixed for the W32 system: the package "tensorA" was added to the official repository so I can run my favorite package for mixed effect modeling now. I still cannot use it on Win64 system.

Wednesday, September 08, 2010

Spell checking with Eclipse

I like Eclipse as a LaTeX and Sweave platform. This web page explains how to do spell checking with Eclipse, which got me confused for quite a while.

Saturday, September 04, 2010

Xubuntu 10.04

I finally decided to upgrade my 4-year old Sony SZ laptop from Ubuntu 9.04 to Xubuntu 10.04. The main reason was that it is getting more and more difficult to keep up with some of the software I use on daily basis on a distro that is almost two years old.

The new system is snappy, and this xfce-based distro works really on this old laptop.

Thursday, September 02, 2010

Wednesday, September 01, 2010

The equinox theme is getting better and better

I especially like the new equinox evolution them and the equinox evolution squared window border. I am very pleased about my computer UI now.

Tuesday, August 31, 2010


BayesX now has a source code version that can be directly build under Linux. This is really cool.

Monday, August 30, 2010

Some Fortran code

Here are some Fortran code for Stata-related operations.

Sunday, August 29, 2010

JAGS tutorial

This is a nice tutorial for JAGS. Here is another one.

Tuesday, August 24, 2010


From a quick look at the web site, R-INLA seems to be able to carry out bayesian analysis without the actual MCMC sampling. This makes it possible to quickly try various different model specifications without hours of waiting time. This package looks very interesting and definitely deserves more attention.

Since the Linux version is binary,  I have to manually set the two main files so they are allowed to run. I hope the team can soon make a version ready for CRAN.

Monday, August 23, 2010


To my knowledge, there are three interface packages between R and JAGS: rjags, R2jags, and runjags. I just realized that with runjags, you don't have to have a separate JAGS model file but to embed it in your R source file. This certainly has its appeal and might be considered as a unique advantage of runjags.

Sunday, August 22, 2010

Introduction to Data Technologies

Here is another free data analysis book that can be used to teach students to work with different types of data.

Saturday, August 21, 2010


Is Incanter a reincarnation of Lisp-Stat? I thought the battle between Lisp and S/R was over long time ago...

JAGS tutorial

Here is a JAGS tutorial, very helpful.

Monday, August 16, 2010

Revolution R Enterprise

I realized that the Revolution R Enterprise has only a version for Redhat Linux but not Ubuntu. Too bad.

Tuesday, August 10, 2010


Grocer is a econometric toolbox for Scilab. Looks promising.

Sunday, August 08, 2010

R package for hidden Markov model

The depmixS4 package estimates hidden Markov models, among many others. I found it very useful. Here is a short into. paper.

I like this new theme

Just replaced this new theme with the good old ClearLook theme.

Wednesday, August 04, 2010

Integrating R with Geany

According to landroni, R can also be easily integrated with Geany:

Download 0.19 (non-optional), then you need to set send_selection_unsafe=true in geany.conf and assign "Send selection to terminal" to a ctrl+r keybinding or similar.

As long as your Geany installation has support for the embedded virtual terminal emulator (VTE), you're ready to go (as far as I know, this requirement excludes Windows). Start R in the terminal, write some R code and send the line or selection with ctrl+r.

Monday, August 02, 2010

Integrating R and Gedit

Here is how to integrate R and Gedit using Rgedit plugin.

Sunday, August 01, 2010


Here is a bridge between NetLogo and R, which looks very interesting. But I was not able to get it work on my Ubuntu box (running the latest R).

Programming in NetLogo

Here is a nice introduction to NetLogo from a programming language perspective.

Thursday, July 29, 2010


Red-R seems to have some interesting ideas behind it. Of course it needs some time to mature.

Another tablet, looking good

Based on this post, Motorola is going to release an Android-based tablet in the coming fall.

Wednesday, July 28, 2010

Pdf support in ebook reader

Based on this review, the Sony reader (daily edition) has better pdf support than both nook and kindle, and should be the #1 choice for handling pdf books (now costs $249). Here is a more detailed explanation of how sony reader deals with pdf files.

This web site deserves some attention, and this is an excellent review.

Monday, July 26, 2010

Recent emmigration from China...

This is an interesting article (in Chinese).

Ebook reader price cut

All major ebook readers have recently experienced major price cut (for example, Kindle DX, Nook). It seems that the Apple Ipad has something to do with it.

Looking forward to HP's webOS-based slate computer

From what I have read, webOS is a really cool and mature Linux-based OS for handhold device. Since HP is planning to release a webOS-based slate computer ("palmpad"?) later this year, I think it will be worth waiting for.

Tuesday, July 20, 2010

Living in China

This article (in Chinese) is really interesting...

Monday, July 19, 2010

Flexmix taks long time...

I am running a mixture logit model on a sample of 10,000 cases. The model takes a long time (3-4 hours) to run, which makes it nearly impossible to try different specifications and such. It probably makes sense to use a subsample for exploratory analysis then fit the final model on the full sample.

This post describes how to draw a random sample in R.

Sunday, July 18, 2010

Npmlreg and Flexmix

In my attempt to estimate a mixture regression model for the long-term impact of prenatal famine exposure, I compared results obtained from NPMLREG and Flexmix using the "rainfall" data set came with the package "forward" and was able to achieve identical results from: <- allvc(cbind(Cases, Total-Cases) ~ Rain, random= ~1|ID, data=rainfall, k=2, family=binomial(logit))
m.mix <- stepFlexmix(cbind(Cases, Total-Cases) ~ 1|ID, model=FLXMRglmfix(family="binomial", fixed=~Rain), k=2, nrep=5, data=rainfall)
To test the random effect model, I created the new variable ID by:
rainfall$ID <- seq(from=1, to=34, by=1)
With Flexmix, it is possible to go a step further and let the treatment variable to vary between latent groups by: 
m.mix <- stepFlexmix(cbind(Cases, Total-Cases) ~ Rain|ID, model=FLXMRglmfix(family="binomial"), k=2, nrep=5, data=rainfall)

Saturday, July 17, 2010

Latent variable analysis package for R

This is another R package for latent variable analysis. I hope it can soon include multilevel and mixture modeling capacity. Another promising package is OpenMx, which is also under active development and very promising. At this moment, OpenMx seems to have more features than lavaan (e.g. categorical dependent variable, mixture model, etc.), but lavaan promises to include them in the next year or so.

Tuesday, July 13, 2010

Online LaTeX table generater

It probably uses some Perl or Python script to add all the formatting marks, very convenient.

Flexmix for binary dependent variable

It looks that FlexMix does not handle binary dependent variables directly, you have to transform it into "cbind(event, total-event)" format.

Monday, July 05, 2010

Convergence diagnostics with MCMCglmm

MCMCglmm does not produce more than one chain. So the way to check convergence is to run the model twice, each with different starting values, then conduct the diagnostic with:
gelman.diag(m1$Sol, m2$Sol)

Handling missing data in R

Some estimation procedures, including MCMCglmm, does not handle missing values directly. Before estimating these models, missing values need to be excluded. From this post, the command to do this is:
newdata <- na.omit(mydata) 

Monday, June 28, 2010

Ranking of sociology program

Here is a recent ranking of sociology programs.

Thursday, June 24, 2010

A few words describing population projection

Projection in demography is calculating survivors down cohort lines of those living at a given point in time, calculating births in each successive period, and adding a suitable allowance for migration. Of the various ways of looking at population dynamics, the one most convenient for the present purpose is the matrix approach (Chapter 7). Such calculations were used, before Leslie (1945) expressed them in matrix form, by Cannan (1895), Bowley (1924), and especially Whelpton (1936).

----------------------------------------------Page 272, Applied Mathematical Demography

Here is some intro. materials, implemented in Stata. Even though I have been using Stata for a long time, I have never spent any timing learning Mata, too bad.

Wednesday, June 23, 2010

Using World Bank data in R

This R package makes it easy to use data and information from World Bank.

A simple JAGS demo

Here is a simple JAGS demo. The ""gui"" option for the "coda.samples()" function is very convenient, but was not documented in the manual. 

OpenBUGS and rbugs package

I just realized that OpenBUGS 3.10 has a native Linux version and it can be called directly from within R via the package "rbugs". OpenBUGS is probably closer to WinBUGS than JAGS, and it is always nice to alternatives!

It will be fun to do some benchmark test between OpenBUGS and JAGS. 

Tuesday, June 22, 2010

MS small basic

This looks fun.

Making glmmBUGS work with JAGS

To streamline the process of making glmmBUGS to work with JAGS, I made changes to the "writeBugsModel.R" and replaced all "inprod2" with "inprod" and "dflat()" with "dnorm(0.0,1.0E-6)". I will spend some time looking at other sources files and see if anything else should be changed.

JAGS' glm module

I did not realize that new GLM module for JAGS 2.0 can be so effective in reducing the running time. Using the example I posed it yesterday (an example from the glmmBUGS package), without loading the glm module, it took 20.223 minutes to draw 10,000 samples from the posterior distribution; with the glm module loaded, it took 11.869 to get the same results:

Without glm module:
   user  system elapsed
 20.170   0.020  20.223 

With glm module:
   user  system elapsed
 11.040   0.000  11.869 

Dflat() prior

Here is an example of replacing the "dflat()" prior with "dnorm()" prior, yet without much explanation.

Monday, June 21, 2010

A question

I posted a question to the R-Mixed-Model mailing list but have not had any responses yet:
Dear All,

I wonder if anybody has tried to make glmmBUGS work with JAGS. My
attempt was not successful.

Here is the simple example I copied from the glmmBUGS Vignettes:


bacterianew <- bacteria
bacterianew$yInt = as.integer(bacterianew$y == "y")

levels(bacterianew$trt) <- c("placebo", "drug", "drugplus")


bacrag <- glmmBUGS(formula = yInt ~ trt + week, data=bacterianew,
effects = "ID",
                   modelFile="model.bug", family="bernoulli")



startingValues = bacrag$startingValues

With WinBUGS, it runs well:


bacResult1 = bugs(bacrag$ragged, getInits, model.file="model.bug", n.chain=3,
  n.iter=2000, n.burnin=100, parameters=names(getInits()), n.thin=10)

But with JAGS, I got error message "Error in FUN(50L[[1L]], ...) :
invalid first argument"



bacResult = jags(data=bacrag$ragged, n.chain=3, n.iter=2000,

I hope somebody can help me figuring out how to make this work. Many thanks.


Here is the answer from Jens Åströ

Not an expert by any means, but I got it to run by doing this:

1) change inprod2 to inprod in the model file (JAGS does not have
inprod2 function)

2) change ~dflat() to other uninformative prior, e.g.
~dnorm(0.0,1.0E-6) (Jags does not have dflat distribution)

3) specify/compile the model with e.g.

4) update it, update(bac.jags,1000)

5) Collect coda samples,

This seemed to converge well, (gelman.diag(bacResult))

Good luck

It works.

Saturday, June 19, 2010

R-INLA package

Another R package for mixed effect modeling. Looks promising.

Alex, yet another ebook reader

Alex is yet another ebook reader. It Looks promising. Here is a review.

Thursday, June 17, 2010


Mendely is another bibliographic tool that works with Windows, OSX, and Linux. It has some nice features and is particularly suitable for online bibliographic collection and collaboration.

Sunday, June 13, 2010

The BIAS project

The BIAS project has some useful resources and information. And here is a course on spatial analysis using R.

Wednesday, June 09, 2010

Bayesian propensity score

Why and how, here are more details.

When Linux stopped responding ...

My laptop (running Ubuntu 9.04) stopped responding when I was running update and doing many other things. This post saved my day.

Sunday, June 06, 2010

Friday, June 04, 2010

Friday, May 21, 2010

Bayesian disease mapping

Here is a slide showing how to make R to extract results from WinBUGS and show them in maps.

Thursday, May 20, 2010

Sunday, May 16, 2010

Bayesian change point analysis

Here is an example using MCMCpack. The same model can be done with WinBUGS (and thus JAGS), as demonstrated in "Bayesian Computation with R".

Sunday, May 09, 2010

Emacs and screen

This post explains how to get Emacs works with the "screen" command.

Sweave with Emacs and ESS, problem solved!

With help from Seb, the nagging problem I had when trying to sweaving with Emacs and ESS ( has been solved.

First of all, apply the patch for ess-swv.el, provided by Seb. Second, set the default pdf viewer for ESS by "C-h v ess-pdf-viewer-pref ", then change the value from "nill" to "evince". Now I can directly sweave my Rnw document by pressing "M-n s", and see the pdf output by pressing "M-s P".

It has been a quite pleasant experience interacting with free software developer/maintainer.

By the way, Emacs 23.2 just got out, and it looks great on my Ubuntu box (the font scale problem that comes with the proprietary Nvidia driver is gone).

Friday, May 07, 2010

OpenOffice quick-starter caused Ubuntu unable to shutdown?

According to this post, the shutdown problem on Ubuntu 10.04 was caused by the Openoffice quick-starter program. Once the program is closed, the problem goes away. So strange.

Thursday, May 06, 2010

Problem with the proprietary Nvidia driver

The reason why my ubuntu 10.04 box cannot shutdown is the proprietary nvidia driver. I learned this the hard way: I backed up all my data and wiped out my existing installation and installed a fresh new copy, and experimented step by step. I was able to replicate the problem when I enabled the proprietary driver, and the problem when away when I disabled it.

Although I cannot enable the desktop effects, at least I have a stable and efficient computational environment now.

-------------------------------- May 8 ----------------------------------------------

This is not true. The problem with shutdown is caused by OpenOffice quick starter program, not the proprietary Nvidia driver. After enabling the driver, 1) boot-up time increased from 20s to 25s, 2) Emacs fonts increased several fold.

Wednesday, May 05, 2010

Monday, May 03, 2010

Compile aML with GCC 4.4.3 (64-bit)

I was able to successfully compile aML with GCC 4.4.3, which is part of Ubuntu 10.04. I only need to make changes on two files to avoid warning messages: "encrypt.f" (replacing "data str95/'Y,h=(ufSBF&H6W!LZ|boq1yJsX^Ax+K3U5r[mDlNOg7jn;-V:z]k/e *)v#2GQC$t''{c\\">%@~w?8dMREiT0a*9}<`\\._4PIp'/" with "data str95/'Y,h=(ufSBF&H6W!LZ|boq1yJsX^Ax+K3U5r[mDlNOg7jn;-V:z]k/e *)v#2GQC$t''{c\">%@~w?8dMREiT0a*9}<`\._4PIp'/") and "getdatainfo.f" (replacing "data backslash/'\\'/" with "data backslash/'\'/")

Sunday, May 02, 2010

Making Ubuntu boot in 19 seconds

Some serious Ubuntu optimization tips here.

Saturday, May 01, 2010

Friday, April 30, 2010

Ubuntu 10.04 cursor issue

I am having the same problems as these guys. The workaround seems to be to disable the desktop effect first, change the cursor, then enable desktop effect back. Hope to have a more convenient way way soon.

One good thing with the new release is that the annoying sound comes when the login screen shows up can be disabled. I tried many ways but still could not disable it with 9.10.

Upgrade to Ubuntu 10.04

Just upgraded my Ubuntu box from 9.10 to 10.04. The process went all right, the overall system performance seems satisfactory, but I did not like the change to move the window controls from upper right side to the upper left side.

And I am still using the Clearlook theme.

JAGS for Bayesian analysis

One of the many reasons I like Jackman's book "Bayesian Analysis for the Social Sciences" is that it relies on JAGS as the primary computational platform (as opposed to WinBUGS). From what I know, it is the first Bayesian book that does this.

Thursday, April 29, 2010

Wednesday, April 28, 2010

Sweave vs. pgfSweave

I think pgfSweave should replace the default of Sweave system because it brings significant improvements over the original Sweave. I certain have made it my default Sweave tool on my system.

------------------------------- EDITED ON MAY 1, 2010 --------------------------------------

Since StatET made Sweaving so simple and easy, I don't have incentive to work directly on LaTeX documents any more. It is actually much simpler and more intuitive to work directly on a Sweave documents and see results in the final PDF output.

Tuesday, April 27, 2010

StatET does not work with OpenJDK

After several failed attempts to get StatET to work on my older Ubuntu box, I realized that it does not work with OpenJDK. After replacing the default OpenJDK with the official Sun JDK, everything works fine.

Spell checking in Eclipse

This post explains how to improve spell checking in Eclipse.

The new JAGS is out

Version 2.0 of the cross-platform BUGS language implementation is out.

Word wrap in eclipse

For those who struggle to get reasonable word wrapping function in Eclipse, this is a good news.

Here is the original source.

Friday, April 23, 2010

Trouble with ESS and Sweave

Last time I tried to sweave a document from with Emacs+ESS, I was using an earlier version of ESS (the current version is 5.8), and things seemed to be fine. Today when I tried to sweave a simple document and produced PDF output, I got error message of ".pdf": exited abnormally with code 2". I posted a message to the ESS mailing list, but so far no solutions have been reported. 

Here is a very simple .Rnw file I used to reproduce the error: 




This document outlines the analysis to be carried out to look at the
relationship between prenatal famine exposure and fetal loss risk.

Before this can be fixed, I will rely on Eclipse for all my sweaving. 
---------------- April 26 ------------------------
Here is my own solution: 

from line 177 of ess-svy.el:

-      (shell-command (concat -       (if (and ess-microsoft-p (w32-shell-dos-semantics)) -           "start \"" pdfviewer "\" \"" namestem ".pdf\"" -                 "\"" pdfviewer "\" \"" namestem ".pdf\" &")))) +      (shell-command (concat
+                 "\"" pdfviewer "\" \"" namestem ".pdf\"")))

Rumour: Apple To Acquire ARM?

I believe that this, if not a rumor, is against most people's interest, manufactures and consumers alike.

Tuesday, April 20, 2010

Monday, April 12, 2010

Rampant cheating hurts China's research ambitions

This is interesting. Smells almost like the Great Leap Forward.

Friday, April 09, 2010

GLMM using DPpackage

I was able to fit a semi-parametric Bayesian GLMM model using DPpackage. It took me many hours to sample from the posterior distribution (DPM prior):

MCMC scan 1000 of 5000 (CPU time: 18950.080 s)
MCMC scan 2000 of 5000 (CPU time: 22510.100 s)
MCMC scan 3000 of 5000 (CPU time: 28293.830 s)
MCMC scan 4000 of 5000 (CPU time: 35111.930 s)
MCMC scan 5000 of 5000 (CPU time: 46726.330 s)

Which translates to 5.26, 6.25, 9.75, 12.98 hours. This makes it less suitable for routine (especially exploratory) data analysis.

I compared the results from DPpackage and that from MCMCglmm, and they are not that different, and the latter took only a small fraction of the time required by the former!

The lack of difference in results puzzled me. I compared from results from random effect logistic regression assuming Gaussian random effect and results from NPML, assuming a nonparametric distribution of the random effect, the differences are quite significant.

---------------------------- UPDATED ON APRIL 11 ----------------------------------------------------------------

Using DP prior instead of DPM prior, it took about 4.7 hours to run the model. The results are slightly different and the parameter I am interested in increased from .41 to .42. Now I am trying PT prior and see how it goes.

DPpackage is a exciting new tool for applied researchers, and A LOT OF new and cool things can be done with it. With convenient new Bayesian tools like MCMCpack, MCMCglmm, and DPpackage, I will not be surprised to see more Bayesian publications coming out in social sciences. 

Tuesday, April 06, 2010

Emacs color theme

Here is an emacs color theme gallery.

Five open source alternatives to the iPad

Here you can find some alternatives to Apple's ipad, running on Linux.

12 More of the Best Free Linux Books

Very good and free books on Linux can be found here.


The new version of the fabulous open source GIS software, GeoDa, called OpenGeoDa, has a Linux version as well as win32 and apple version. I have not tried it, but this is really a great news!

Sunday, April 04, 2010

Friday, April 02, 2010

BAT: Bayesian analysis toolkit

BAT is a Bayesian analysis software. Looks like it needs quite some programming to get anything useful done.

----------------------- added on April 4 --------------------------

I seem to have troubles compiling it on my 64-bit Ubuntu box.

Thursday, April 01, 2010

Apple got slapped

According to this post, Apple does not seem to be the first one to have the idea of "multitouch".

Wednesday, March 31, 2010

Econometrics Toolbox for Matlab

The Econometrics Toolbox for Matlab has just been updated. This is a quite comprehensive applied econometrics toolkit and I found it to be very useful.

Sunday, March 28, 2010

Difference in differences

The DID estimator is a useful tool to obtain a more fine-grained estimate of famine exposure on a variety of health and demographic outcomes, as long as the famine severity measure is good. However, in my case (fecundity), the provincial-level excess famine mortality level does not see to work well. I suspect that the provincial-level mortality measure is too crude and may not be able to adequately reflect individual's nutritional status.

Monday, March 22, 2010

Making Simple R Package

This post explains how to create a simple R package that contains R and FORTRAN source code.

Friday, March 19, 2010

The "apsrtable" is getting better

Now we have both "memisc" and "apsrtable" for producing publication-ready tables from model results.

Thursday, March 18, 2010

BUGS language: strength and weakness

Here is an informative introduction to the strength and weakness of the BUGS language in conducting Bayesian data analysis. Since my analysis often utilize large data (tens of thousands, even millions), BUGS may not be a viable option in many cases. I hope the soon-to-come new version of JAGS can overcome some of the weaknesses mentioned in Gelman's post.

Wednesday, March 17, 2010

Stata 11

Got my Stata 11 today, was a little disappointed because the new program editor feature is not available on Linux platform.

I like the new multiple imputation feature though.

Measuring the length of time to run a function

This post describes how to time the run time of a R function.

Sunday, March 14, 2010

Here comes Linux's iPad clones

Linux-based IPad clone will soon emerge and will dominate the market, based on this post.

Saturday, March 13, 2010


Here is an interesting news site (Chinese language).

Dell Mini Tablet

This Android-based tablet from Dell looks interesting.

Thursday, March 11, 2010

Wednesday, March 10, 2010

Saturday, March 06, 2010

OpenBUGS has a new home

Looks like OpenBUGS now has a new home! And now the command line tools can actually run on my Linux box.

Microsoft' digital journal

If it is true, then it is much more appealing to Apple's Ipad.

Friday, March 05, 2010

Statistics books with code

I like all statistics books with computer codes, like this one.

GLMM revisted

A short while ago, I reported some discrepancies between the results produced by "lme4" and other R packages as well as Stata. Today I upgraded to the most recent version of "lme4a" and re-ran my model. The error of false convergence disappeared and the new results are very much in line with other packages.

I am glad that the problem eventually got sorted out.

For this version, the adaptive GH quadrature has not been implemented yet; so the only option to estimate a GLMM is laplace transformation.

Tuesday, March 02, 2010


Glad to find out that AUCTeX has a new version out . Now the default PDF viewer has been changed from xpdf to evince.

Several screenshots

After some tweaking, I finally got Eclipse+StatET working the way I want. The first figure shows a running R session; the second figure shows the maximized running R session. The following shortcut keys are very helpful:

  1. Ctrl + A: select what you to run;
  2. Ctrl + R + R: run the selected program then go to the R console;
  3. Ctrl + M: maximize the R console;
  4. Ctril + T + 0: run Sweave of pgfSweave.
I like the way Eclipse organize the project and windows. 

Creating Beamer Handouts with Notes

This post provides some very helpful information. Unfortunately, the package is not designed to incorporate author's notes into the printed outline; instead, it is designed to print blank outline pages so the audience can jot down their own notes there.

Saturday, February 27, 2010

New version of Eclipse

A new version of Eclipse, v.3.5.2 is out. This one works better on my 64-bit Linux machine in general and works very well with StatET.

An interesting paper

Ben Bolker has an interesting paper (outline of a paper) comparing different approaches to estimate GLMM in R environment, which is very helpful to what I am doing right now.

The paper pointed out the following options to fit GLMM using R:
  1. glmer
  2. glmmML
  3. glmm (from the repeated package)
  4. glmmADMB
  5. MCMCglmm
  6. glmmBUGS
  7. glmmPQL
  8. BUGS (through R2WinBUGS)
  9. glmmAK
And I would like to add one more, npmlreg.

I am not aware of the glmmAK package before. From the first glance, it seems to be very promising in the sense that it seems to allow non-Gaussian random effect in a Bayesian framework, something similar to what npmlreg does with ML method.

============== edited on March 3 ====================

DPpackage is another package that can estimate GLMM in a Bayesian framework.

============== edited on March 5 ====================

ASReml/ASReml-R is another choice. It is not free software, but it does seem to have some unique strengths. Maybe I should download a demo copy and try it myself.

============== edited on March 29 ===================

hglm is another possibility. 

Beamer themes

I did not know there are so many different themes that can be used for a Beamer presentation slide.

Friday, February 26, 2010

National Family Health Survey, India

This data looks very interesting.

When using open source makes you an enemy of the state

According to this post, a US copyright lobby has long trying to make the case that using open source software is the same thing as using pirate software.

Wednesday, February 24, 2010

Monday, February 22, 2010

Theory without code

As a demographer, I would not pay much attention to statistical methodological papers that do not describe how to implement their methodology either using existing software or using their own code. The reason is simple: I don't have the time and patience to gain a understanding of the methodology deep enough to be able to implement it myself; after all, it is the substantive research questions in demography that interest me most!

Open source statistical platform such as R, WinBUGS, and JAGS have made it so easy for statisticians to code their ideas into readily usable software, so the question is: why not?

More option for cure model

Looks like the cure model can also be estimated using another R package GAMLSS. 

de Castro, M., V. G. Cancho, and J. Rodrigues. 2009. “A hands-on approach for fitting long-term survival models under the GAMLSS framework.” Computer Methods and Programs in Biomedicine.  

Saturday, February 20, 2010

A few software packages for mixed models

Less well-known but definitely deserve attention:
  1. DMU
  3. VCE
All these softwares are free to use, but unfortunately none of them seem to be open sourced.

Information provided by this web site is very helpful. 

GLMM for ecologists and evolutionary biologists

This web site is devoted to the use of GLMM in the filed of evolutionary biologists.

Friday, February 19, 2010

Thursday, February 18, 2010

Theoretical Population Biology

This journal strongly encourages the use of LaTeX, very interesting.


Gummi is a nother IDE for LaTeX, looks interesting.

Tuesday, February 16, 2010

Generalized linear mixed effect model problem

I am trying to compare cohort difference in infant mortality using generalized linear mixed model. I first estimated the model in Stata:

xi:xtlogit inftmort i.cohort, i(code)

which converged nicely:

Fitting comparison model:

Iteration 0:   log likelihood = -1754.4476
Iteration 1:   log likelihood = -1749.3366
Iteration 2:   log likelihood = -1749.2491
Iteration 3:   log likelihood = -1749.2491

Fitting full model:

tau =  0.0     log likelihood = -1749.2491
tau =  0.1     log likelihood = -1743.8418
tau =  0.2     log likelihood = -1739.0769
tau =  0.3     log likelihood = -1736.4914
tau =  0.4     log likelihood = -1739.5415

Iteration 0:   log likelihood = -1736.4914  
Iteration 1:   log likelihood = -1722.6629  
Iteration 2:   log likelihood = -1694.9114  
Iteration 3:   log likelihood = -1694.6509  
Iteration 4:   log likelihood =  -1694.649  
Iteration 5:   log likelihood =  -1694.649  

Random-effects logistic regression              Number of obs      =     21694
Group variable: code                            Number of groups   =     10789

Random effects u_i ~ Gaussian                   Obs per group: min =         1
                                                               avg =       2.0
                                                               max =         9

                                                Wald chi2(2)       =      8.05
Log likelihood  =  -1694.649                    Prob > chi2        =    0.0178

    inftmort |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
  _Icohort_2 |  -.5246846   .1850328    -2.84   0.005    -.8873422   -.1620269
  _Icohort_3 |  -.1424331    .140369    -1.01   0.310    -.4175513     .132685
       _cons |  -5.214642   .1839703   -28.35   0.000    -5.575217   -4.854067
    /lnsig2u |   .9232684   .1416214                      .6456956    1.200841
     sigma_u |   1.586665   .1123528                      1.381055    1.822885
         rho |   .4335015   .0347791                      .3669899    .5024984
Likelihood-ratio test of rho=0: chibar2(01) =   109.20 Prob >= chibar2 = 0.000

Then I tried the same model using lme4:

m2 <- glmer(inftmort ~ (1|code) + as.factor(cohort), family=binomial, data=d)

I got:

Warning message:
In mer_finalize(ans) : false convergence (8)

And the results are quite different from what I got from Stata. I tried to estimate the model using "glmmML" and also got into trouble. This time the error message is:

Warning message:
In, Y, weights, cluster.weights, start.coef, start.sigma,  :
  Hessian non-positive definite. No variance!

I then tried MCMCglmm:

prior=list(R=list(V=1, nu=0, fix=1), G=list(G1=list(V=1, nu=0)))
m2 <- MCMCglmm(inftmort ~ 1 + as.factor(cohort), random=~code, family = "categorical", data=d, prior=prior)

It seems to be working and produced estimates that are comparable to what Stata produced (not identical, of course):

Iterations = 3001:12991
Thinning interval = 10 
Number of chains = 1 
Sample size per chain = 1000 

1. Empirical mean and standard deviation for each variable,
   plus standard error of the mean:

                      Mean     SD Naive SE Time-series SE
(Intercept)        -5.7145 0.1805 0.005708        0.02295
as.factor(cohort)2 -0.5633 0.1788 0.005653        0.02194
as.factor(cohort)3 -0.1888 0.1471 0.004653        0.01912

2. Quantiles for each variable:

                      2.5%     25%     50%      75%   97.5%
(Intercept)        -6.0464 -5.8324 -5.7251 -5.61120 -5.2817
as.factor(cohort)2 -0.8977 -0.6985 -0.5551 -0.43084 -0.2371
as.factor(cohort)3 -0.4614 -0.2947 -0.1867 -0.09644  0.1232 

This is puzzling.

Sunday, February 14, 2010

Spreadsheet to LaTeX solutions

Excel2latex and spreadsheet2latex seem to be the only two solutions at the time.

Saturday, February 13, 2010

Cure model using R

The package "nltm" seems to be able to estimate proportional hazard and proportional odds cure models. I will do some experiments and see how it goes.

Friday, February 12, 2010

Sweave and LaTeX

This page contains some useful tips and links.

Reshaping data in R

This post explains how to reshape data in R and how to created nicely formated LaTeX using the xtable package.

Thursday, February 11, 2010

Handling hierarchical data structure in R

R has a comprehensive set of tools for the handling of hierarchical data structure. The most widely used package is probably "nlme" and "lme4", contributed by Douglas Bates and colleagues. While "nlme" is older and probably more mature than "lme4", it seems that the latter is increasingly taking over the former because it comes with full set of tools to handle non-Gaussian distributions.

Based on "nlme" and "lme4", the package "mgcv" and "gamm4" can estimate generalized additive model on multilevel data structure. This is especially useful if one have continuous covariates such as age or calendar year, whose effects are not very well understood in the literature and grouping may distort the results.

If you don't feel comfortable with the assumption that the unobserved heterogeneity components are normally distributed, then the package "npmlreg" can be used to conduct sensitivity analysis through nonparametric maximum likelihood method.

The new "MCMCglmm" package extends R's modeling capacity to multilevel and multivariate models such as those that can be estimated using aML and Sabre. Sure, MCMCglmm relies on MCMC instead of maximum likelihood as the method of estimation, which may be slower especially when the data is huge (as is often the case in demographic studies), but as the author indicated, this package has already gained significant performance advantage over traditional Bayesian packages such as WinBUGS and JAGS, and is almost as general as the latter, with respect to multilevel regression-like models.

The Sabre team also produced a R wrapper of their package for multilevel multiprocess modeling. Unfortunately, their R package is based on an outdated version (v.5 instead of v.6) of Sabre and the project seems to be halted due to lack of fund.

It is not difficult to conduct Bayesian multilevel analysis in R using WinBUGS and JAGS via packages such as "R2WinBUGS" and "R2jags". These packages can estimate a very large number of models. This means that almost anything one can think of can be model with R.

Another package, ADMB, also has several interfaces to R. I hope they are getting better and better because ADMB is such a powerful and versatile package that, if one can write down the likelihood function, it can maximize it for you.

App Store Model Provides Security, Stability: Evidence, Please?

Here is an interesting post on the claim that Iphone has to be locked down to make it safe and stable.

Wednesday, February 10, 2010

Loglinear models using R

For those sociologists who want to estimate complicated loglinear models (e.g. Goodman's RC model) using R, the package "VGAM" seems to be a good choice.

GTalX - Google Voice Chat has arrived in Ubuntu 9.10 (Karmic)

This is a good news.

Tuesday, February 09, 2010

Mixed Effects Models and Extensions in Ecology with R

This is a truly wonderful book. It deals with advanced statistical models in the way that people without advanced mathematical background can follow easily. I think I may have found a perfect textbook for my graduate course in advanced demographic analysis. All I need to do is to introduce some demographic examples to make it easier to understand to demography students.

Sunday, February 07, 2010

Statistics with R

An online book for R. Very helpful.

Skype on Ipad via 3G

Well, this could be a good use of ipad, according to this post, and this post, consider its low $30 data plan.

Saturday, February 06, 2010

Friday, February 05, 2010

Graphics using Stata vs. R

Here is an informative blog post comparing simple histogram produced using Stata and various methods using R.

Saturday, January 30, 2010


MORGAN is a software for genetic analysis using MCMC.

An example of MCMCglmm

Here is a very useful example of how to estimate multinomial logit with random effect using MCMCglmm package.

Friday, January 29, 2010

Kindle DX as a textbook reader

Here is an interesting argument for Kindle DX.

Useful blog

on using R, especially the MCMCglmm Package.

Thursday, January 28, 2010


IRex seems to be a good alternative to Nook and Kindle. And it is running Linux! Also, its 8.1 inch screen will certainly make reading much more comfortable than the 6 inch Nook or Kindle.

IPad = IBad?

Interesting comments.

Wednesday, January 27, 2010


Apple's new toy, ipad, looks interesting. But 10 hour's battery life per charge seems to be less impressive than I originally thought. Some laptop computers can do that. Also, I was surprised to find out its operating system is IPhone OS instead of OSX.

Being an Apple product also means that, in order to do anything useful, you have to buy and buy and buy...

Monday, January 25, 2010

Submitting to a sociology journal

This blog has some useful information for those who are thinking about submitting to a sociology journal.

Monday, January 18, 2010

An online summary of different mortality estimates between 1959 and 1961

This online article provides a good summary.

Here is another one.

Here are some discussions.

Wednesday, January 13, 2010

VMware shortcut keys

Here is a list of useful shortcut keys for VMware.

Google may leave China

According to a CNN report, Google may pull their business out of China.

Word count for PDF (LaTeX) file

According to this site, This is as simple as one command line:
pdftotext yourfilename.pdf - | wc -w

Even more options can be found here.

Thursday, January 07, 2010


PyMC is a Python implementation of MCMC sampling algorithm, looks very promising. Not it is in its second major release, with a full-blown manual. have the potential for a serious Bayesian package.

Wednesday, January 06, 2010

Event History Analysis with Stata

Book web site here.


The January 2010 source release of the Apophenia library cannot be compiled on my Ubuntu machines (64-bit 9.10 and 32-bit 9.04); but the developmental version (downloaded via git) compiles flawlessly. Not sure what is going on.

R AnalyticFlow

R AnalyticFlow is an interesting piece of software. From its web site:

"R AnalyticFlow is a software which enables
state-of-the-art data analysis by drawing analysis flowcharts.
You can effectively share processes of data analysis
in collaborative works"

Monday, January 04, 2010

A customized Ubuntu-based Chinese distro

Linux Deepin looks promising. But it does not seem to provide a 64-bit version.

Sunday, January 03, 2010

Some Stata tips

Here are some useful Stata tips.


LyZ, Zotero for LyX, looks promising.

Friday, January 01, 2010

ADMB 9.1

A new version of ADMB has just come out, on the first day of the new year!

Create LaTeX table for descriptive statistics using Estout

I want to create a table of descriptive statistics. To be more specific, The table should contains 6 rows and 4 columns, something like this:
Descriptive Statistics
Cohort A Cohort B Cohort C
Urban Residence 0.263 0.313 0.264
Years of Schooling 5.585 7.280 7.592
Ethnic Majority 0.928 0.918 0.915
Want Children 0.990 0.992 0.991
Ever Pregnant 0.991 0.987 0.992
Observations 3553 2450 4694
The official document for Estout provides some clues but does not make it crystal clear. Here are the steps I followed:
  1. Generate an arbitrary variable;
  2. Run regression, separately for each cohort;
  3. Collect the results;
  4. Make the table
Here are the code:

eststo clear
gen y=uniform()
quietly regress y urban eduy han want_birth preg if cohort==1, noconstant
estadd mean
eststo d1

quietly regress y urban eduy han want_birth preg if cohort==2, noconstant
estadd mean
eststo d2

quietly regress y urban eduy han want_birth preg if cohort==3, noconstant
estadd mean
eststo d3

esttab d1 d2 d3, cells(mean(fmt(%8.3f))) label nonotes nonumber nodepvars mtitles("Cohort A" "Cohort B" "Cohort C") width(1\hsize) title(Descriptive Statistics)