Shige's Research Blog: 2012

Friday, December 21, 2012

Cinnarch

Cinnarch is a new Linux distro. It combines the minimalist design of Arch and the cinnamon desktop environment.

Friday, December 14, 2012

Wednesday, December 12, 2012

Six cores with HT vs. eight physical cores

This blog is very helpful in clarifying things.

Friday, December 07, 2012

GCC for windows

TDM-GCC is good windows port of the GCC tool set. It has been integrated into the windows version of Code::Blocks.

Tuesday, December 04, 2012

I got some weird problem with my R workflow recently. For example, the very simple code chunk below:
--------------------------------------------------
\documentclass[preview]{standalone}

\begin{document}

\begin{figure}
<dev='tikz'>>=
library(ggplot2)
qplot(displ, hwy, data = mpg, colour = factor(cyl))
@
\end{figure}

\end{document}
--------------------------------------------------
would not build correctly. Sometimes it gave error message, sometimes it created empty page, etc. Installing the latest version of knitr from the github repo seemed to solve the problem.

Friday, November 30, 2012

Using Rcpp with Rstudio

It is quite amazing to see how easy and straightforward it has become to integrate C++ and R, with the help from Rcpp (especially with the new "attribute" function) and Rstudio. More information can be found here.

Thursday, November 29, 2012

TikzEdt

TikzEdt is an editor and previewer forTikZ LaTeX graphics package.

Generic computers and open source software

The combination of generic hardware and open source software makes sense because of the reasons list here.

Rcpp tutorial

Here is a good introduction to Rcpp.

Wednesday, November 28, 2012

Use github

I am getting started with github. I found this post helpful.

Knitr, Rstudio, and TikZ

I did not realize that this is such a powerful combination until today. It makes it possible to directly work on a Sweave file that has both text and code in the same document quite attractive (as opposed to begin an R program and a LaTeX file, then combine them together in the end).

Do not use GTK 3 theme wit Mate desktop

I am not sure why, but using GTK 3 theme with Mate desktop significantly slows it down.

Tuesday, November 20, 2012

Extra themes for ggplot2

Ggthemes is a new package that provides additional themes for ggplot2. It can even be used to draw a Stata-like graph. I particularly like the Tufte-style, as outlined in The Visual Display of Quantitative Information.

Monday, November 19, 2012

Mint Debian

I finally got tired of the old Ubuntu 10.04, which has bee sitting in my main computer for over two years. Instead of a more recent version of Ubuntu or Mint, I replaced it with Mint Debian. I then followed the instruction in this post to install the proprietary NVIDIA driver to get the full power of my graphics card. I really like the simple and elegant design of the Mate desktop and hope I don't ever need to re-install system on this machine ever again (because Mint Debian is a rolling distribution).

Wednesday, November 14, 2012

Zelig 4 is trouble

Zelig 4 is causing so many problems. I had to uninstall it and went back to the old 3.55.

Sunday, November 11, 2012

Excel2latex

Excel2latex is a handy tool to get Word tables into LaTeX.

Tuesday, November 06, 2012

Gnome is like the protagonist of a romantic comedy ...

Insightful analysis!

Sunday, November 04, 2012

R Markdown

The R Markdown that is integrated into Rstudio looks quite promising.

Tuesday, October 23, 2012

The BUGS Book

THE BUGS book. No need to say more.

Lubuntu

I played with Lubuntu for a while and really liked its minimalist and yet elegant design. I think it should be my main desktop OS.

Thursday, October 18, 2012

Comparison between OpenBUGS, JAGS, and Stan

I have been playing with the examples of Introduction to WinBUGS for Ecologists in the past two days. One thing that concerns me particularly is how well these Bayesian packages handle large data set (i.e., N>10,000), which is the size of the data sets that I work with. So I modified the data generation program provided by the authors and increased the sample sizes:

n1 <- 6000 # Number of females
n2 <- 4000 # Number of males
mu1 <- 105 # Population mean of females
mu2 <- 77.5 # Population mean of males
sigma <- 2.75 # Average population SD of both

n <- n1+n2 # Total sample size
y1 <- rnorm(n1, mu1, sigma) # Data for females
y2 <- rnorm(n2, mu2, sigma) # Date for males
y <- c(y1, y2) # Aggregate both data sets
x <- rep(c(0,1), c(n1, n2)) # Indicator for male

The BUGS program looks like this:

model {
# Priors
mu1 ~ dnorm(0,0.001)
mu2 ~ dnorm(0,0.001)
tau1 <- 1 / ( sigma1 * sigma1)
sigma1 ~ dunif(0, 1000) # Note: Large var. = Small precision
tau2 <- 1 / ( sigma2 * sigma2)
sigma2 ~ dunif(0, 1000)

# Likelihood
for (i in 1:n1) {
y1[i] ~ dnorm(mu1, tau1)
}

for (i in 1:n2) {
y2[i] ~ dnorm(mu2, tau2)
}

# Derived quantities
delta <- mu2 - mu1
}

While the Stan looks like this:

data {

int n1;

int n2;

real y1[n1];

real y2[n2];

}

parameters {

real alpha1;

real alpha2;

real sigma1;

real sigma2;

}

model {

y1 ~ normal(alpha1, sigma1);

y2 ~ normal(alpha2, sigma2);

}

generated quantities {

real dif;

dif <- sigma1 - sigma2;

}

For 20,000 iterations, OpenBUGS took 7852.813 seconds, JAGS took 2252.273 seconds, Stan took 60-150 seconds (the Stan team is still trying to iron out some bugs, which may explain the wide range of run time). While the run time of both OpenBUGS and JAGS are sensitive to the sample size, the run time of Stan seems much less sensitive to sample size.

However, if the Stan code is written as this:

model {
for (n in 1:n1)
y1[n] ~ normal(alpha1, sigma1);
for (n in 1:n2)
y2[n] ~ normal(alpha2, sigma2);
}

Then Stan has no apparent performance advantage over JAGS.

Processing Stan samples with ggmcmc

Very helpful post.

Tuesday, October 16, 2012

Applied Bayesian Statistics

This book looks interesting.

Monday, October 15, 2012

Stan code for my favorite book

I am learning Stan. So I decided to translate the BUGS code for my favorite applied Bayesian book into Stan code. I just did the translation for a couple of chapters and I feel I know Stan much better already.

Sunday, October 14, 2012

Insightful analysis of the recent quarrel between China and Japan

I've seen lots of economic, political, and social analysis about it, but this cultural analysis is very interesting and insightful.

Saturday, October 13, 2012

From BUGS with BRugs to JAGS with rjags

John Kruschke provided helpful suggestions about how to convert BUGS programs into JAGS. It would be interesting to see some suggestions about how to convert BUGS or JAGS programs into Stan.

Friday, October 12, 2012

Stan and INLA

Since both Stan and INLA provide their own implementation of the classical BUGS examples, it is very illustrative and educational to compare the two sets of codes.

For example, the INLA code for the "seeds" example (my of my favorites) is like this:

library(INLA)

data(Seeds)
formula = r ~ x1*x2+f(plate,model="iid")
mod.seeds = inla(formula,data=Seeds,family="binomial",Ntrials=n)

## improved estimation of the hyperparameters
h.seeds = inla.hyperpar(mod.seeds)

while the Stan code is like this:

data {
int I;
int n[I];
int N[I];
real x1[I];
real x2[I];
}
parameters {
real alpha0;
real alpha1;
real alpha12;
real alpha2;
real tau;
real b[I];
}
transformed parameters {
real sigma;
sigma <- 1.0 / sqrt(tau);
}
model {
alpha0 ~ normal(0.0,1.0E3);
alpha1 ~ normal(0.0,1.0E3);
alpha2 ~ normal(0.0,1.0E3);
alpha12 ~ normal(0.0,1.0E3);
tau ~ gamma(1.0E-3,1.0E-3);

b ~ normal(0.0, sigma);

for (i in 1:I) {
n[i] ~ binomial(N[i], inv_logit(alpha0
+ alpha1 * x1[i]
+ alpha2 * x2[i]
+ alpha12 * x1[i] * x2[i]
+ b[i]) );
}
}

INLA has a more R-like syntax and is much more terse, but Stan is much more flexible and can handle very complicated models that INLA may have troubles with.

Thursday, October 11, 2012

Some thoughts on Stan

Stan is very promising. The glmmBUGS package should be easily extended to produce Stan code in place of or in addition to BUGS/JAGS code, which will makes it even easier for novice uses to get started.

Wednesday, October 10, 2012

A few Stan tutorials

Here is a list of Stan tutorials:

Tuesday, October 09, 2012

Prediction, missing data, etc. in Stan

library(rstan)

N <- 1001

N_miss <- ceiling(N / 10)

N_obs <- N - N_miss

mu <- 3

sigma <- 2

y_obs <- rnorm(N_obs, mu, sigma)

missing_data_code <-

data {

int N_obs;

int N_miss;

real y_obs[N_obs];

}

parameters {

real mu;

real sigma;

real y_miss[N_miss];

}

model {

// add prior on mu and sigma here if you want

y_obs ~ normal(mu,sigma);

y_miss ~ normal(mu,sigma);

}

generated quantities {

real y_diff;

y_diff <- y_miss[101] - y_miss[1];

}

results <- stan(model_code = missing_data_code,

data = list(N_obs = N_obs, N_miss = N_miss, y_obs = y_obs))

y_diff <- apply(extract(results, c("y_miss[1]", "y_miss[101]")), 1:2, diff)

Sunday, September 30, 2012

Full Bayesian computing

A useful short paper.

Friday, September 28, 2012

Predictions using INLA

Here are some detailed explanations.

Thursday, September 13, 2012

R2MLwiN package

The new package R2MLwiN package bridges R and MLwinN software for multilevel analysis. From the examples provided, it looks very promising. It will be great if a command line version of MLwiN can be made available under Linux so the package can be useful to everybody. Also, it will be great if a similar package for Mplus can be developed in the future.

There is also a Stata package that connects MLwiN with Stata. A quick glance suggests that the two packages have similar functions.

Friday, September 07, 2012

LazStats

LazStats, a statistical package written in Lazarus/FreePascal.

Friday, August 31, 2012

Stan

Stan is a new software package for Bayesian analysis. It also comes with an R interface. According to this benchmark, it may be a viable choice for real world data analysis. I took a similar approach as ADMB by first converting a syntax file into C++ source files and then generating native machine code.

Monday, August 13, 2012

Chinese support for Kindle DX

This system works well. Now I can read Chinese texts on my Kindle DX.

Sunday, August 12, 2012

Onyx

Onyx, a free GUI for OpenMX with built-in LaTeX support, looks pretty cool.

Monday, August 06, 2012

12 programming languages for the future

I have never heard some of them before.

Bayesian inference: Pros and Cons

Here is a succinct summary.

CDE becomes free software

According to this source, CDE (common desktop environment) is now free software. Not sure at this moment whether that will make any real difference to our daily computing though.

Thursday, June 28, 2012

Chinees GIS resources

Here is a list of useful Chinese GIS resources:

CHGIS (download and CD)
China Dimensions
CloudMade (dowload)
GRASS-Wiki

Monday, June 18, 2012

A new skype for Linux

Skype 4.0 for Linux is out.

Friday, June 15, 2012

Scythe library

After 5 years, the Scythe library has a new version out.

Monday, June 11, 2012

Installed R packages

I did not realize I have such a long list of R packages install (ls /usr/local/lib/R/library >> list.txt):

abind	e1071	labeling	optimx	sabreR
ade4	Ecdat	LaF	optmatch	sandwich
ade4TkGUI	effects	LaplacesDemon	orthopolynom	scales
AICcmodavg	ellipse	lars	parallel	segmented
akima	emdbook	lattice	parser	sem
Amelia	Epi	lava	pcaPP	sfsmisc
anchors	etable	lavaan	permute	shapefiles
aod	etm	leaps	pgfSweave	simecol
ape	evaluate	lessR	pixmap	SiZer
apsrtable	expm	lme4	pkgmaker	sna
arm	ff	lmtest	plm	snow
base	fields	locfit	plyr	snowfall
bayesDem	filehash	lpSolve	polynom	sos
bayesLife	flexmix	magic	popbio	sp
bayesm	FME	mapdata	popdemo	spacetime
bayesPop	foreach	mapproj	primer	spam
bayesTFR	forecast	maps	prodlim	SparseM
BayesX	foreign	maptools	proto	spatial
BayesXsrc	formatR	markdown	pscl	SpatialEpi
bbmle	Formula	MASS	psych	spatstat
bdsmatrix	forward	Matching	qgraph	spBayes
bibtex	fracdiff	MatchIt	quadprog	spdep
biganalytics	gam	Matrix	quantreg	spgwr
biglm	gamlss	matrixcalc	R2BayesX	sphet
bigmemory	gamlss.data	MatrixModels	R2HTML	splancs
bigtabulate	gamlss.dist	maxLik	R2jags	splines
BiocGenerics	gclus	MBA	R2WinBUGS	splm
BiocInstaller	gdata	MBESS	RandomFields	stashR
bit	gee	mclust	randomForest	statmod
bitops	geepack	MCMCglmm	RANN	stats
boot	geoR	MCMCpack	raster	stats4
brew	ggplot2	McSpatial	RColorBrewer	stringr
cacheSweave	gof	mediation	Rcpp	SuppDists
cairoDevice	gpclib	memisc	RcppArmadillo	survey
car	gplots	memoise	rcppbugs	survival
caTools	graph	MEMSS	RcppDE	svUnit
cem	graphics	methods	RcppEigen	systemfit
chron	grDevices	mets	RCurl	tables
class	grid	mgcv	registry	tabplot
classInt	gstat	minpack.lm	reldist	tcltk
cluster	gsubfn	minqa	reshape	tensorA
coda	gtools	miscTools	reshape2	testthat
codetools	gWidgets	mitools	rgam	tikzDevice
colorspace	gWidgetsRGtk2	mixAK	rgdal	timereg
combinat	hexbin	mixtools	rgenoud	tools
compiler	highlight	mlbench	rgeos	triangle
corpcor	Hmisc	mlmRev	rgl	truncnorm
cumSeg	ibdreg	mlogit	RGtk2	tseries
datasets	igraph	mnormt	rj	TTR
data.table	igraph0	modeltools	rjags	ucminf
DBI	inline	MPV	rJava	utils
DCluster	int64	mstate	rj.gd	vcd
deldir	iplots	multcomp	Rmixmod	vegan
DEoptim	IPSUR	MuMIn	rms	VGAM
depmixS4	ipw	munsell	robustbase	visreg
deSolve	iterators	mvtnorm	rockchalk	waveslim
devtools	JavaGD	ncdf	RODBC	WhatIf
dichromat	JGR	nlme	rootSolve	XLConnect
digest	KernSmooth	nnet	rpart	XLConnectJars
diptest	knitcitations	npmlreg	rrcov	XML
doBy	knitr	numDeriv	RSiteSearch	xtable
doRNG			Rsolnp	xts
DPpackage			RSQLite	Zelig
			RUnit	zic
			Rz	zoo

Saturday, June 09, 2012

Natural selection is still with us

This looks very interesting. I cannot wait to get hold of the paper.

mintBox

Looks very nice, but I have not figured out what it can do to help with my current computing tasks.

Saturday, June 02, 2012

Drawing a map

This post explains how to draw a China map using ggplot2 and online data sources. The blog is generally interesting.

This one is based on OpenStreetMap. Very helpful.

Thursday, May 31, 2012

Mint Debian

After about eight months, my Mint Debian has been broken beyond reparation. So I decided to wipe it out and do a fresh installation. This time I am not pointing directly to the Debian repository but to the Mint repository.

I like Mint Debian better than Mint main edition. I have come to the conclusion that as long as I am not directly pointing to the Debian repository (or at least not taking every single update everyday), the installation should be stable enough as my main workstation OS.

Tuesday, May 29, 2012

Free statistics text

OpenIntro.org provides a free statistical textbook, along with data sets and supplementary materials.

Ipsur.org is another one.

Monday, May 28, 2012

Simple rcpp examples

Here are some nice examples for Rcpp package.

Saturday, May 26, 2012

Compiling Emacs, again

This post shows how to get Emacs compiled on Debian/Ubuntu platform.

Sunday, May 20, 2012

BayesX

With the new "R2BayesX" package, now it looks like the BayesX software has been fully integrated into the R environment.

The R2BayesX provides a powerful general purpose package with simple and easy to learn syntax (compared to BUGS or ADMB). Now BayesX is under GPL, it will probably be embraced by the free software community wholeheartedly.

Saturday, May 19, 2012

Nine reasons for C++ 11

Can be found in this post.

Java vs. COBOL and R vs. SAS

Here is an interesting comparison.

Friday, May 18, 2012

Multinomial logit model model with random effects

I just found out that the "mlogit" package can estimate multinomial logit model with random effects just like aML and GLLAMM.

Sex selection in historical China

Here is a short article (in Chinese).

Thursday, May 17, 2012

R, Rstudio, and Markdown

Here is a helpful tutorial.

Monday, May 14, 2012

New Rstudio is pretty cool

The new version of Rstudio has option to use "knitr" instead of "sweave", which is really cool!

Saturday, April 28, 2012

Disk partition guide for Mint Debian

This post explains how to partition hard disk for Mint Debian.

Tuesday, April 24, 2012

New version of g++, a good reason to upgrade

My Ubuntu 10.04 runs fine on my workstation except for one thing ... its g++4.4.3 does not have the new c++ language feature which is used in software like CppBugs and its R binding called rcppbugs. The idea behind these two packages are very interesting, which gives me a good reason to upgrade to a new version of the OS (Debian, Mint, or Ubuntu).

Thursday, April 19, 2012

Is Markdown/Pandoc any good?

Here is a post about LaTeX vs. Markdown. Looks like it is not easy to translate between the two systems automatically.

Tuesday, April 17, 2012

Beta regression examples

Here are some examples of beta regression model.

Monday, April 16, 2012

Cinnamon really works

Mint Linux's Cinnamon works rather well on my old laptop. It takes a bit longer to boot up to the desktop than Xfce, but once you get there, the user experience is rather good. In addition, it is very easy to use themes contributed by others (just put them into ~./theme folder).

Saturday, April 14, 2012

Lcmm package example

Some latent class examples using the lcmm package.

Wednesday, April 11, 2012

JAGS resources

Very good resource list of using JAGS for Bayesian computing

Tuesday, April 10, 2012

New output processing package

The "rockchalk" package looks promising. I hope it can be as general as the "esttab" package for Stata.

A practical Bayesian book

Introduction to WinBUGS for Ecologists is a good practical Bayesian book. I find the sample code very useful. The simple example of forming predictions (p. 116) is straightforward and can be easily tailored to be used in different research situations.

Doing Bayesian Data Analysis: A Tutorial with R and BUGS is another useful practical Bayesian book. My copy just arrived today. Can't wait to play with some of the examples there.

Monday, April 09, 2012

From bbitex to biblatex

This post is helpuful to get started.

Integrating Eclipse and Rcpp

Interesting ideas.

Friday, March 30, 2012

Sex ratio and malnutrition

One of my recent papers received some media attentions, including Nature and Science.

Here is a list of additional news coverage.

Thursday, March 01, 2012

Generate predicted confidence intervals from logistic regression

using Stata.

Sunday, February 26, 2012

R resources

A nice collection of R resources.

Monday, February 20, 2012

R_inferno

Knowing the weakness of your tool is a shortcut toward getting really good at it.

Sunday, February 19, 2012

Equations in LaTeX

Here are some useful tips on how to do equations right in LaTeX.

Saturday, February 18, 2012

Rstudio font

The development version of Rstudio finally has an option to use customized fonts. This is great.

Do you see the leash on your neck?

According to this post, both Apple and MS are tightening the leash on general desktop computer users by restricting their freedom of installing and running the software they like, with their shiny new OSs.

Fortunately we still have Linux and BSD.

Friday, February 17, 2012

Time series analysis using R

This is a tutorial of time series analysis using R.

Monday, February 13, 2012

JAGS tutorial

Here is a nice tutorial for JAGS (pdf format).

Replicating the Revolutionary Big Data Analysis using ordinary R

This post shows how.

Saturday, February 11, 2012

The "tables" package

The "tables" package makes Sweave a real option for exploratory data analysis and report generation.

Wednesday, January 25, 2012

Cinnamon

Cinnamon is a promising project aiming to deliver a more usable Linux desktop environment.

Monday, January 23, 2012

Clang vs. gcc

This article about clang vs. gcc is helpful.

Sunday, January 22, 2012

Some Rcpp benchmarks

I ran the Fibonacci number example from the Rcpp package on a number of computers and operating systems. Here are the results:

A. On my main computer (Core 2 Extreme 3.06GHz, 8 GB memory) running Ubuntu 10.04 (g++ 4.4.3):

test replications elapsed relative user.self sys.self

3 fibRcpp(N) 1 0.148 1.0000 0.14 0.01

1 fibR(N) 1 87.078 588.3649 87.03 0.04

2 fibRC(N) 1 91.209 616.2770 91.14 0.07

B. Same computer running Windows Vista (g++ 4.5.0):

test replications elapsed relative user.self sys.self
3 fibRcpp(N) 1 0.21 1.0000 0.21 0.00
1 fibR(N) 1 92.08 438.4762 90.47 0.05
2 fibRC(N) 1 94.39 449.4762 93.13 0.03

C. On my second laptop (Core 2 Duo 2.53GHz, 4 GB memory) running Windows 7 (g++ 4.5.0):

test replications elapsed relative user.self sys.self
3 fibRcpp(N) 1 0.17 1.0000 0.17 0.00
1 fibR(N) 1 73.62 433.0588 73.47 0.00
2 fibRC(N) 1 74.27 436.8824 74.20 0.03

D. On the same computer running Revolution R Enterprise 5:
test replications elapsed relative user.self sys.self
2 fibRC(N) 1 72.31 1.000000 72.09 0
1 fibR(N) 1 72.99 1.009404 72.79 0

E. On my third laptop (Core 2 Duo 2.50GHz, 2 GB memory) running Mint Debian (g++ 4.6.2):

test replications elapsed relative user.self sys.self

3 fibRcpp(N) 1 0.148 1.0000 0.148 0.00

1 fibR(N) 1 65.535 442.8041 65.328 0.200

2 fibRC(N) 1 65.664 443.6757 65.492 0.172

Why the faster computer performed worse, on both R and Rcpp versions?

Rcpp on windows

I got Rcpp working on my windows machine by installing the Rtools bundle. It is not clearly to me how to get GSL installed so the RcppGSL will also work.

Friday, January 20, 2012

Richard Stallman was right all along

One more person agreed with Richard Stallman.

Wednesday, January 18, 2012

Package "rgdal" broke

The new version of "rgdal" package cannot be compiled on my system (both Ubuntu and Debian).

UPDATE: they fixed it by releasing a new version (0.7.8).

NetLogo

The NetLogo developers really want to get things right: today they released the 7th release candidate for the new version (v. 5)!

Tuesday, January 17, 2012

R is becoming increasingly popular

According to this, R is the 19th most popular language in the first month of 2012!

Sunday, January 15, 2012

Cultural revolution and the economic future of China

This article (in China) is worth reading.

Thursday, January 12, 2012

Visual debugger and the debug mode of the autorun R console

The StatET team kept their promise and delivered the autorun R console with debug mode on. This, combined with the visual debugger, makes the StatET a very appealing cross-platform environment for working with R.

Sunday, January 08, 2012

Useful python libraries for social scientists

Here is a list of useful python libraries for social scientists.