Friday, December 21, 2012


Cinnarch is a new Linux distro. It combines the minimalist design of Arch and the cinnamon desktop environment.

Friday, December 14, 2012

Emacs support for knitr

Here is a simple elisp script.

Wednesday, December 12, 2012

Six cores with HT vs. eight physical cores

This blog is very helpful in clarifying things.

Friday, December 07, 2012

GCC for windows

TDM-GCC is good windows port of the GCC tool set. It has been integrated into the windows version of Code::Blocks.

Tuesday, December 04, 2012

Knitr problem

I got some weird problem with my R workflow recently. For example, the very simple code chunk below:


qplot(displ, hwy, data = mpg, colour = factor(cyl))


would not build correctly. Sometimes it gave error message, sometimes it created empty page, etc. Installing the latest version of knitr from the github repo seemed to solve the problem.

Friday, November 30, 2012

Using Rcpp with Rstudio

It is quite amazing to see how easy and straightforward it has become to integrate C++ and R, with the help from Rcpp (especially with the new "attribute" function) and Rstudio. More information can be found here.

Thursday, November 29, 2012


TikzEdt is an editor and previewer forTikZ LaTeX graphics package.

Generic computers and open source software

The combination of generic hardware and open source software makes sense because of the reasons list here.

Rcpp tutorial

Here is a good introduction to Rcpp.

Wednesday, November 28, 2012

Use github

I am getting started with github. I found this post helpful.

Knitr, Rstudio, and TikZ

I did not realize that this is such a powerful combination until today. It makes it possible to directly work on a Sweave file that has both text and code in the same document quite attractive (as opposed to begin an R program and a LaTeX file, then combine them together in the end).

Do not use GTK 3 theme wit Mate desktop

I am not sure why, but using GTK 3 theme with Mate desktop significantly slows it down.

Tuesday, November 20, 2012

Extra themes for ggplot2

Ggthemes is a new package that provides additional themes for ggplot2. It can even be used to draw a Stata-like graph. I particularly like the Tufte-style, as outlined in The Visual Display of Quantitative Information. 

Monday, November 19, 2012

Mint Debian

I finally got tired of the old Ubuntu 10.04, which has bee sitting in my main computer for over two years. Instead of a more recent version of Ubuntu or Mint, I replaced it with Mint Debian. I then followed the instruction in this post to install the proprietary NVIDIA driver to get the full power of my graphics card. I really like the simple and elegant design of the Mate desktop and hope I don't ever need to re-install system on this machine ever again (because Mint Debian is a rolling distribution).

Wednesday, November 14, 2012

Zelig 4 is trouble

Zelig 4 is causing so many problems. I had to uninstall it and went back to the old 3.55.

Sunday, November 11, 2012


Excel2latex is a handy tool to get Word tables into LaTeX.

Sunday, November 04, 2012

R Markdown

The R Markdown that is integrated into Rstudio looks quite promising.

Tuesday, October 23, 2012

The BUGS Book

THE BUGS book. No need to say more.


I played with Lubuntu for a while and really liked its minimalist and yet elegant design. I think it should be my main desktop OS.

Thursday, October 18, 2012

Comparison between OpenBUGS, JAGS, and Stan

I have been playing with the examples of Introduction to WinBUGS for Ecologists in the past two days.  One thing that concerns me particularly is how well these Bayesian packages handle large data set (i.e., N>10,000), which is the size of the data sets that I work with. So I modified the data generation program provided by the authors and increased the sample sizes:

n1 <- 6000 # Number of females
n2 <- 4000 # Number of males
mu1 <- 105 # Population mean of females
mu2 <- 77.5 # Population mean of males
sigma <- 2.75 # Average population SD of both

n <- n1+n2 # Total sample size
y1 <- rnorm(n1, mu1, sigma) # Data for females
y2 <- rnorm(n2, mu2, sigma) # Date for males
y <- c(y1, y2) # Aggregate both data sets
x <- rep(c(0,1), c(n1, n2)) # Indicator for male

The BUGS program looks like this:

model {
# Priors
 mu1 ~ dnorm(0,0.001)
 mu2 ~ dnorm(0,0.001)
 tau1 <- 1 / ( sigma1 * sigma1)
 sigma1 ~ dunif(0, 1000) # Note: Large var. = Small precision
 tau2 <- 1 / ( sigma2 * sigma2)
 sigma2 ~ dunif(0, 1000)

# Likelihood
 for (i in 1:n1) {
    y1[i] ~ dnorm(mu1, tau1) 

 for (i in 1:n2) {
    y2[i] ~ dnorm(mu2, tau2) 

# Derived quantities
 delta <- mu2 - mu1

While the Stan looks like this:

data {
     int n1;
     int n2;
     real y1[n1];
     real y2[n2];
parameters {
           real alpha1;
           real alpha2;
           real sigma1;
           real sigma2;
model {
    y1 ~ normal(alpha1, sigma1);          
    y2 ~ normal(alpha2, sigma2);
generated quantities {
          real dif;
          dif <- sigma1 - sigma2;

For 20,000 iterations, OpenBUGS took 7852.813 seconds, JAGS took 2252.273 seconds, Stan took 60-150 seconds (the Stan team is still trying to iron out some bugs, which may explain the wide range of run time). While the run time of both OpenBUGS and JAGS are sensitive to the sample size, the run time of Stan seems much less sensitive to sample size.

However, if the Stan code is written as this:

model {
      for (n in 1:n1)
          y1[n] ~ normal(alpha1, sigma1);
      for (n in 1:n2)
          y2[n] ~ normal(alpha2, sigma2);

Then Stan has no apparent performance advantage over JAGS. 

Processing Stan samples with ggmcmc

Very helpful post.

Tuesday, October 16, 2012

Monday, October 15, 2012

Stan code for my favorite book

I am learning Stan. So I decided to translate the BUGS code for my favorite applied Bayesian book into  Stan code. I just did the translation for a couple of chapters and I feel I know Stan much better already.

Sunday, October 14, 2012

Insightful analysis of the recent quarrel between China and Japan

I've seen lots of economic, political, and social analysis about it, but this cultural analysis is very interesting and insightful.

Saturday, October 13, 2012

From BUGS with BRugs to JAGS with rjags

John Kruschke provided helpful suggestions about how to convert BUGS programs into JAGS. It would be interesting to see some suggestions about how to convert BUGS or JAGS programs into Stan.

Friday, October 12, 2012

Stan and INLA

Since both Stan and INLA provide their own implementation of the classical BUGS examples, it is very illustrative and educational to compare the two sets of codes.

For example, the INLA code for the "seeds" example (my of my favorites) is like this:


formula = r ~ x1*x2+f(plate,model="iid")
mod.seeds = inla(formula,data=Seeds,family="binomial",Ntrials=n)

## improved estimation of the hyperparameters
h.seeds = inla.hyperpar(mod.seeds)

while the Stan code is like this:

data {
    int I;
    int n[I];
    int N[I];
    real x1[I];
    real x2[I];
parameters {
    real alpha0;
    real alpha1;
    real alpha12;
    real alpha2;
    real tau;
    real b[I];
transformed parameters {
    real sigma;
    sigma  <- 1.0 / sqrt(tau);
model {
   alpha0 ~ normal(0.0,1.0E3);
   alpha1 ~ normal(0.0,1.0E3);
   alpha2 ~ normal(0.0,1.0E3);
   alpha12 ~ normal(0.0,1.0E3);
   tau ~ gamma(1.0E-3,1.0E-3);

   b ~ normal(0.0, sigma);

   for (i in 1:I) {
      n[i] ~ binomial(N[i], inv_logit(alpha0 
                                      + alpha1 * x1[i] 
                                      + alpha2 * x2[i]
                                      + alpha12 * x1[i] * x2[i] 
                                      + b[i]) );

INLA has a more R-like syntax and is much more terse, but Stan is much more flexible and can handle very complicated models that INLA may have troubles with. 

Thursday, October 11, 2012

Some thoughts on Stan

Stan is very promising. The glmmBUGS package should be easily extended to produce Stan code in place of or in addition to BUGS/JAGS code, which will makes it even easier for novice uses to get started.

Tuesday, October 09, 2012

Prediction, missing data, etc. in Stan


N <- 1001
N_miss <- ceiling(N / 10)
N_obs <- N - N_miss

mu <- 3
sigma <- 2

y_obs <- rnorm(N_obs, mu, sigma)

missing_data_code <-
data {
  int N_obs;
  int N_miss;
  real y_obs[N_obs];
parameters {
  real mu;
  real sigma;
  real y_miss[N_miss];
model {
  // add prior on mu and sigma here if you want
  y_obs ~ normal(mu,sigma);
  y_miss ~ normal(mu,sigma);
generated quantities {
  real y_diff;
  y_diff <- y_miss[101] - y_miss[1];

results <- stan(model_code = missing_data_code,
                data = list(N_obs = N_obs, N_miss = N_miss, y_obs = y_obs))

y_diff <- apply(extract(results, c("y_miss[1]", "y_miss[101]")), 1:2, diff)

Sunday, September 30, 2012

Friday, September 28, 2012

Predictions using INLA

Here are some detailed explanations.

Thursday, September 13, 2012

R2MLwiN package

The new package R2MLwiN package bridges R and MLwinN software for multilevel analysis. From the examples provided, it looks very promising. It will be great if a command line version of MLwiN can be made available under Linux so the package can be useful to everybody. Also, it will be great if a similar package for Mplus can be developed in the future.

There is also a Stata package that connects MLwiN with Stata. A quick glance suggests that the two packages have similar functions.

Friday, September 07, 2012


LazStats, a statistical package written in Lazarus/FreePascal.

Friday, August 31, 2012


Stan is a new software package for Bayesian analysis. It also comes with an R interface. According to this benchmark, it may be a viable choice for real world data analysis. I took a similar approach as ADMB by first converting a syntax file into C++ source files and then generating native machine code.

Monday, August 13, 2012

Chinese support for Kindle DX

This system works well. Now I can read Chinese texts on my Kindle DX.

Sunday, August 12, 2012


Onyx, a free GUI for OpenMX with built-in LaTeX support, looks pretty cool. 

Monday, August 06, 2012

12 programming languages for the future

I have never heard some of them before.

Bayesian inference: Pros and Cons

Here is a succinct summary.

CDE becomes free software

According to this source, CDE (common desktop environment) is now free software. Not sure at this moment whether that will make any real difference to our daily computing though.

Thursday, June 28, 2012

Chinees GIS resources

Here is a list of useful Chinese GIS resources:

  1. CHGIS (download and CD)
  2. China Dimensions
  3. CloudMade (dowload)
  4. GRASS-Wiki

Monday, June 18, 2012

A new skype for Linux

Skype 4.0 for Linux is out.

Friday, June 15, 2012

Scythe library

After 5 years, the Scythe library has a new version out.

Monday, June 11, 2012

Installed R packages

I did not realize I have such a long list of R packages install (ls /usr/local/lib/R/library >> list.txt):

abind e1071 labeling optimx sabreR
ade4 Ecdat LaF optmatch sandwich
ade4TkGUI effects LaplacesDemon orthopolynom scales
AICcmodavg ellipse lars parallel segmented
akima emdbook lattice parser sem
Amelia Epi lava pcaPP sfsmisc
anchors etable lavaan permute shapefiles
aod etm leaps pgfSweave simecol
ape evaluate lessR pixmap SiZer
apsrtable expm lme4 pkgmaker sna
arm ff lmtest plm snow
base fields locfit plyr snowfall
bayesDem filehash lpSolve polynom sos
bayesLife flexmix magic popbio sp
bayesm FME mapdata popdemo spacetime
bayesPop foreach mapproj primer spam
bayesTFR forecast maps prodlim SparseM
BayesX foreign maptools proto spatial
BayesXsrc formatR markdown pscl SpatialEpi
bbmle Formula MASS psych spatstat
bdsmatrix forward Matching qgraph spBayes
bibtex fracdiff MatchIt quadprog spdep
biganalytics gam Matrix quantreg spgwr
biglm gamlss matrixcalc R2BayesX sphet
bigmemory MatrixModels R2HTML splancs
bigtabulate gamlss.dist maxLik R2jags splines
BiocGenerics gclus MBA R2WinBUGS splm
BiocInstaller gdata MBESS RandomFields stashR
bit gee mclust randomForest statmod
bitops geepack MCMCglmm RANN stats
boot geoR MCMCpack raster stats4
brew ggplot2 McSpatial RColorBrewer stringr
cacheSweave gof mediation Rcpp SuppDists
cairoDevice gpclib memisc RcppArmadillo survey
car gplots memoise rcppbugs survival
caTools graph MEMSS RcppDE svUnit
cem graphics methods RcppEigen systemfit
chron grDevices mets RCurl tables
class grid mgcv registry tabplot
classInt gstat minpack.lm reldist tcltk
cluster gsubfn minqa reshape tensorA
coda gtools miscTools reshape2 testthat
codetools gWidgets mitools rgam tikzDevice
colorspace gWidgetsRGtk2 mixAK rgdal timereg
combinat hexbin mixtools rgenoud tools
compiler highlight mlbench rgeos triangle
corpcor Hmisc mlmRev rgl truncnorm
cumSeg ibdreg mlogit RGtk2 tseries
datasets igraph mnormt rj TTR
data.table igraph0 modeltools rjags ucminf
DBI inline MPV rJava utils
DCluster int64 mstate vcd
deldir iplots multcomp Rmixmod vegan
DEoptim IPSUR MuMIn rms VGAM
depmixS4 ipw munsell robustbase visreg
deSolve iterators mvtnorm rockchalk waveslim
devtools JavaGD ncdf RODBC WhatIf
dichromat JGR nlme rootSolve XLConnect
digest KernSmooth nnet rpart XLConnectJars
diptest knitcitations npmlreg rrcov XML
doBy knitr numDeriv RSiteSearch xtable

Rsolnp xts

RSQLite Zelig

RUnit zic

Rz zoo

Saturday, June 09, 2012

Natural selection is still with us

This looks very interesting. I cannot wait to get hold of the paper.


Looks very nice, but I have not figured out what it can do to help with my current computing tasks.

Saturday, June 02, 2012

Drawing a map

This post explains how to draw a China map using ggplot2 and online data sources. The blog is generally interesting.

This one is based on OpenStreetMap. Very helpful.

Thursday, May 31, 2012

Mint Debian

After about eight months, my Mint Debian has been broken beyond reparation. So I decided to wipe it out and do a fresh installation. This time I am not pointing directly to the Debian repository but to the Mint repository.

I like Mint Debian better than Mint main edition. I have come to the conclusion that as long as I am not directly pointing to the Debian repository (or at least not taking every single update everyday), the installation should be stable enough as my main workstation OS.

Tuesday, May 29, 2012

Free statistics text provides a free statistical textbook, along with data sets and supplementary materials. is another one.

Monday, May 28, 2012

Simple rcpp examples

Here are some nice examples for Rcpp package.

Saturday, May 26, 2012

Compiling Emacs, again

This post shows how to get Emacs compiled on Debian/Ubuntu platform.

Sunday, May 20, 2012


With the new "R2BayesX" package, now it looks like the BayesX software has been fully integrated into the R environment.

The R2BayesX provides a powerful general purpose package with simple and easy to learn syntax (compared to BUGS or ADMB). Now BayesX is under GPL, it will probably be embraced by the free software community wholeheartedly.

Saturday, May 19, 2012

Nine reasons for C++ 11

Can be found in this post.

Java vs. COBOL and R vs. SAS

Here is an interesting comparison.

Friday, May 18, 2012

Multinomial logit model model with random effects

I just found out that the "mlogit" package can estimate multinomial logit model with random effects just like aML and GLLAMM.

Sex selection in historical China

Here is a short article (in Chinese).

Thursday, May 17, 2012

R, Rstudio, and Markdown

Here is a helpful tutorial.

Monday, May 14, 2012

New Rstudio is pretty cool

The new version of Rstudio has option to use "knitr" instead of "sweave", which is really cool!

Saturday, April 28, 2012

Disk partition guide for Mint Debian

This post explains how to partition hard disk for Mint Debian.

Tuesday, April 24, 2012

New version of g++, a good reason to upgrade

My Ubuntu 10.04 runs fine on my workstation except for one thing ... its g++4.4.3 does not have the new c++ language feature which is used in software like CppBugs and its R binding called rcppbugs. The idea behind these two packages are very interesting, which gives me a good reason to upgrade to a new version of the OS (Debian, Mint, or Ubuntu).

Thursday, April 19, 2012

Is Markdown/Pandoc any good?

Here is a post about LaTeX vs. Markdown. Looks like it is not easy to translate between the two systems automatically.

Tuesday, April 17, 2012

Beta regression examples

Here are some examples of beta regression model.

Monday, April 16, 2012

Cinnamon really works

Mint Linux's Cinnamon works rather well on my old laptop. It takes a bit longer to boot up to the desktop than Xfce, but once you get there, the user experience is rather good. In addition, it is very easy to use themes contributed by others (just put them into ~./theme folder).

Saturday, April 14, 2012

Lcmm package example

Some latent class examples using the lcmm package.

Wednesday, April 11, 2012

JAGS resources

Very good resource list of using JAGS for Bayesian computing

Tuesday, April 10, 2012

New output processing package

The "rockchalk" package looks promising. I hope it can be as general as the "esttab" package for Stata.

A practical Bayesian book

Introduction to WinBUGS for Ecologists is a good practical Bayesian book. I find the sample code very useful. The simple example of forming predictions (p. 116) is straightforward and can be easily tailored to be used in different research situations.

Doing Bayesian Data Analysis: A Tutorial with R and BUGS is another useful practical Bayesian book. My copy just arrived today. Can't wait to play with some of the examples there.

Monday, April 09, 2012

Friday, March 30, 2012

Sex ratio and malnutrition

One of my recent papers received some media attentions, including Nature and Science.

Here is a list of additional news coverage.

Sunday, February 26, 2012

R resources

A nice collection of R resources.

Monday, February 20, 2012


Knowing the weakness of your tool is a shortcut toward getting really good at it.

Sunday, February 19, 2012

Equations in LaTeX

Here are some useful tips on how to do equations right in LaTeX.

Saturday, February 18, 2012

Rstudio font

The development version of Rstudio finally has an option to use customized fonts. This is great.

Do you see the leash on your neck?

According to this post, both Apple and MS are tightening the leash on general desktop computer users by restricting their freedom of installing and running the software they like, with their shiny new OSs.

Fortunately we still have Linux and BSD.

Friday, February 17, 2012

Time series analysis using R

This is a tutorial of time series analysis using R.

Monday, February 13, 2012

Saturday, February 11, 2012

The "tables" package

The "tables" package makes Sweave a real option for exploratory data analysis and report generation.

Wednesday, January 25, 2012


Cinnamon is a promising project aiming to deliver a more usable Linux desktop environment.

Monday, January 23, 2012

Clang vs. gcc

This article about clang vs. gcc is helpful.

Sunday, January 22, 2012

Some Rcpp benchmarks

I ran the Fibonacci number example from the Rcpp package on a number of computers and operating systems. Here are the results:

A. On my main computer (Core 2 Extreme 3.06GHz, 8 GB memory) running Ubuntu 10.04 (g++ 4.4.3):
        test replications elapsed relative user.self sys.self
3 fibRcpp(N)            1   0.148   1.0000      0.14     0.01
1    fibR(N)            1  87.078 588.3649     87.03     0.04
2   fibRC(N)            1  91.209 616.2770     91.14     0.07

B. Same computer running Windows Vista (g++ 4.5.0):

        test replications elapsed relative user.self sys.self
3 fibRcpp(N)            1    0.21   1.0000      0.21     0.00
1    fibR(N)            1   92.08 438.4762     90.47     0.05
2   fibRC(N)            1   94.39 449.4762     93.13     0.03

C. On my second laptop (Core 2 Duo 2.53GHz, 4 GB memory) running Windows 7 (g++ 4.5.0):

        test replications elapsed relative user.self sys.self
3 fibRcpp(N)            1    0.17   1.0000      0.17     0.00
1    fibR(N)            1   73.62 433.0588     73.47     0.00
2   fibRC(N)            1   74.27 436.8824     74.20     0.03

D. On the same computer running Revolution R Enterprise 5:
      test replications elapsed relative user.self sys.self
2 fibRC(N)            1   72.31 1.000000     72.09        0
1  fibR(N)            1   72.99 1.009404     72.79        0 

E. On my third laptop (Core 2 Duo 2.50GHz, 2 GB memory) running Mint Debian (g++ 4.6.2):
        test replications elapsed relative user.self sys.self
3 fibRcpp(N)            1   0.148   1.0000     0.148     0.00
1    fibR(N)            1  65.535 442.8041     65.328    0.200
2   fibRC(N)            1  65.664 443.6757     65.492    0.172

Why the faster computer performed worse, on both R and Rcpp versions?

Rcpp on windows

I got Rcpp working on my windows machine by installing the Rtools bundle. It is not clearly to me how to get GSL installed so the RcppGSL will also work.

Friday, January 20, 2012

Richard Stallman was right all along

One more person agreed with Richard Stallman.

Wednesday, January 18, 2012

Package "rgdal" broke

The new version of "rgdal" package cannot be compiled on my system (both Ubuntu and Debian).

UPDATE: they fixed it by releasing a new version (0.7.8).


The NetLogo developers really want to get things right: today they released the 7th release candidate for the new version (v. 5)!

Tuesday, January 17, 2012

R is becoming increasingly popular

According to this, R is the 19th most popular language in the first month of 2012!

Sunday, January 15, 2012

Thursday, January 12, 2012

Visual debugger and the debug mode of the autorun R console

The StatET team kept their promise and delivered the autorun R console with debug mode on. This, combined with the visual debugger, makes the StatET a very appealing cross-platform environment for working with R.

Sunday, January 08, 2012

Useful python libraries for social scientists

Here is a list of useful python libraries for social scientists.