There is probably no better way to demonstrate computer code than ipython notebook. Here are a number of examples that I find useful:

## Monday, December 23, 2013

## Thursday, December 19, 2013

## Thursday, December 05, 2013

### Big data and Bayesian inference

This post provides invaluable information about the use of Bayesian method in big data analysis. I did not realize before that the LapLacesDemon package has provides such a powerful and versatile environment for Bayesian inference.

Right now the package works at fairly low level. The user needs to specify the likelihood function and take care of various kinds of details. A wrapper package that provides a simplified interface and reasonable default options will be an excellent addition and probably will help attracting a much wider use base.

Right now the package works at fairly low level. The user needs to specify the likelihood function and take care of various kinds of details. A wrapper package that provides a simplified interface and reasonable default options will be an excellent addition and probably will help attracting a much wider use base.

## Wednesday, November 27, 2013

## Monday, November 25, 2013

### Use the open source R in the similar way to SAS

Here is how. Very good examples!

By the way, learning materials for the ff package are surprisingly sparse.

By the way, learning materials for the ff package are surprisingly sparse.

## Saturday, November 23, 2013

### Three ways to run Bayesian models in R

Nice post here.

I tested the code on my machine. LaplacesDemon (LD) performs surprisingly well. For N = 20,000, JAGS takes 746 seconds, Stan takes 50 seconds (including compilation), and LD takes 42 seconds.

When N = 200,000, JAGS takes 85,44 seconds, Stan takes 280 seconds, and LD takes 382 seconds to complete. In other words, JAGS becomes impossible with "larger" data; both Stan and LD remains viable. With LD's big data capability, it is even possible that, with data size exceeds memory size, LD remains viable but Stan may not be.

It is, however, quite tricky to get LD to work properly. I spent a day tweaking a very simple linear regression model (by trying different samplers, different transformations of the data, etc.); even with very large of iterations, the demon could not be appeased.

I tested the code on my machine. LaplacesDemon (LD) performs surprisingly well. For N = 20,000, JAGS takes 746 seconds, Stan takes 50 seconds (including compilation), and LD takes 42 seconds.

When N = 200,000, JAGS takes 85,44 seconds, Stan takes 280 seconds, and LD takes 382 seconds to complete. In other words, JAGS becomes impossible with "larger" data; both Stan and LD remains viable. With LD's big data capability, it is even possible that, with data size exceeds memory size, LD remains viable but Stan may not be.

It is, however, quite tricky to get LD to work properly. I spent a day tweaking a very simple linear regression model (by trying different samplers, different transformations of the data, etc.); even with very large of iterations, the demon could not be appeased.

## Thursday, November 21, 2013

## Sunday, November 10, 2013

## Saturday, November 09, 2013

## Saturday, October 26, 2013

### Parallel problem and a fix

By default, R only uses one of the 12 cores in my workstation. According to this post, the easiest solution is to put "system(sprintf("taskset -p 0xffffffff %d", Sys.getpid()))" in my .Rprofile.

Even better, one can simply put a line "mcaffinity(1:12)" everytime the "parallel" package is loaded.

Even better, one can simply put a line "mcaffinity(1:12)" everytime the "parallel" package is loaded.

## Friday, October 11, 2013

### The new version (1.0) of the "mixtools" packages looks promising

It's a welcomed addition to the mixture modeling toolkit for R users.

### Using Zelig and ggplot2 together

I am not the only person doing that, apparently. Here is an example.

## Sunday, October 06, 2013

### R packages for statistical exam generation

Looks like both the "exams" and "professR" packages can do that.

## Saturday, October 05, 2013

### Gmisc package

The "gmisc" package provides a function called "htmlTable" that can produce good-looking tables in markdown files.

With the author's help, I was able to make the reshaped data (using the "reshape2" package) work with the "htmlTable" function to create arbitrary descriptive statistics table with Markdown.

With the author's help, I was able to make the reshaped data (using the "reshape2" package) work with the "htmlTable" function to create arbitrary descriptive statistics table with Markdown.

## Friday, October 04, 2013

### Who's Afraid of Peer Review?

Science, open access, ethics, and peer review: some intriguing results. From this map, India really stands out.

Here is a list of predatory publishers, compiled by Jeffrey Beall.

Here is a list of predatory publishers, compiled by Jeffrey Beall.

## Wednesday, October 02, 2013

### Trouble publishing on rpubs and solution

### Knitcitations package

The "knitcitations" package provides some very cool features that makes it much easier to do literature search and review when working with Markdown and LaTeX.

## Friday, September 27, 2013

### Rstudio, Markdown, and presentation

I recently discovered that using Markdown and Rstudio together can greatly simplify the kind of presentation I do for teaching and professional conference, that is, put computer code and the output side by side. Here is a simple demonstration.

## Thursday, September 26, 2013

### R to LaTeX options

## Wednesday, September 25, 2013

### Mint Debian update pack #7

A new update pack for Mint Debian is out! Among many other things, it comes with Linux kernel 3.10 and gcc 4.7.3. My upgrade process went well (had to manually remove and re-install a number of packages such as Freepascal, Lazarus though).

## Sunday, September 08, 2013

## Tuesday, September 03, 2013

## Monday, September 02, 2013

## Friday, August 16, 2013

## Wednesday, July 10, 2013

## Monday, July 08, 2013

## Thursday, July 04, 2013

### gNewSense

Just saw a new beta of the "strictly libre" distro, gNewSense, but not before long I realized that many of the software bundled in this distro are several generations older than the most recent ones. For example, the newest gcc is 4.8.1, and gNewSense has 4.4.5, almost as old as the one that comes with OSX.

## Tuesday, June 25, 2013

### Faster Gibbs sampling MCMC from within R

I find this comparison of MCMC performance between different languages helpful.

After generating a .class file from the java source, I need to add the following two lines of code to make it work:

.jaddClassPath("/home/shige/mydata/test/Gibbs/out/production/Gibbs")

.jaddClassPath("/home/shige/bin/java/parallelcolt-0.9.4.jar")

After generating a .class file from the java source, I need to add the following two lines of code to make it work:

.jaddClassPath("/home/shige/mydata/test/Gibbs/out/production/Gibbs")

.jaddClassPath("/home/shige/bin/java/parallelcolt-0.9.4.jar")

## Thursday, June 20, 2013

### Build R with Intel MKL

Following the instruction here and here, I built and installed R with the MKL library. Running this benchmark test shows that a significant improvement in performance over the default build (8.782 vs. 33.446 second).

Running the same test on my Windows machine (with an older Core2 chip) took 51 seconds. Following the suggestions and replaced the default "Rblas.dll" with the one provided here, reduced the run-time to 26 seconds.

Running the same test on my Windows machine (with an older Core2 chip) took 51 seconds. Following the suggestions and replaced the default "Rblas.dll" with the one provided here, reduced the run-time to 26 seconds.

## Monday, June 10, 2013

### Using Rcpp in agent-based models

Here is an example. The Rcpp code has to be slightly modified to run on my Mint linux machine.

## Sunday, June 09, 2013

### Ubuntu ...

What is the biggest's bug in Ubuntu? Unity! Ubuntu has been down the wrong path too far ... Fortunately we still have Mint and Debian.

## Friday, June 07, 2013

### Some interesting R packages for causal analysis

A number of new R packages are very helpful for the purpose of conducting causal analysis, including the IUPS package, and the CBPS package.

## Monday, June 03, 2013

### Symbolic algebra solution for R

Ryacas, a interface between R and Yacas, looks like a good solution for symbolic algebra for R.

## Monday, May 20, 2013

### Another option for R/LaTeX/Sweave editor

WinEdt is another good option for R/LaTeX/Sweave editing. I especially the feature of running "knitr + pdflatex" without opening a R terminal. Unfortunately it is a Windows-only software.

## Saturday, May 11, 2013

## Thursday, May 09, 2013

### Glmer2stan

Glmer2stan is an interesting software that translates glmer (lme4) syntax into Stan. The software is still at eary stage but looks very promising.

## Wednesday, May 08, 2013

### Sumatra pdf viewer

The Sumatry pdf viewer works very well on Windows platform. If you have to run LaTeX on Windows platform and needs function like auto-update of the compiled pdf file, Sumatry is the right choice.

## Friday, May 03, 2013

### Rstudio vs. StatET

Rstudio wins in easy installation, smaller size, and intuitive user interface. StatET wins in graphical debugger and an outline view. Right now, StatET is a better choice for Sweave/Knitr/LaTeX authoring platform than Rstudio.

## Sunday, April 14, 2013

### Stan as a unified statistical estimation and intepretation engine

Applied researchers who are used to Stata or R need a reason to learn and use Stan. Besides the usual Bayesian vs. frequentist discussions, there is also a practical one.

Stan provides a unified interface for statistical estimation and interpretation. I have been using R with the Zelig package for estimation and interpretation during the past years, which is great. The problem is that, because Zelig is built upon a large number of existing R packages written by many different researchers and the quality of these packages vary greatly, working with Zelig means that you are working with all these other packages and researchers as well. In addition, even though in theory you can modify the source of these packages to suite your needs, but applied researches rarely have the energy or skills to tweak the FORTRAN or C code.

Stan provides a modeling language, which makes it easy for user to tweak their model (of course the underlying C++ code is also available). It is simulation-based and uses posterior distribution for inference, which means that there is no need for an additional simulation step (as what Zelig brings to frequentist models). After some testing, I have come to the conclusion that Stan is fast and stable enough for my daily data analysis work.

The best of all, this package comes from a research group with very good reputation and their discussion list is unbelievably helpful.

These are good enough reasons for me to switch to Stan.

Stan provides a unified interface for statistical estimation and interpretation. I have been using R with the Zelig package for estimation and interpretation during the past years, which is great. The problem is that, because Zelig is built upon a large number of existing R packages written by many different researchers and the quality of these packages vary greatly, working with Zelig means that you are working with all these other packages and researchers as well. In addition, even though in theory you can modify the source of these packages to suite your needs, but applied researches rarely have the energy or skills to tweak the FORTRAN or C code.

Stan provides a modeling language, which makes it easy for user to tweak their model (of course the underlying C++ code is also available). It is simulation-based and uses posterior distribution for inference, which means that there is no need for an additional simulation step (as what Zelig brings to frequentist models). After some testing, I have come to the conclusion that Stan is fast and stable enough for my daily data analysis work.

The best of all, this package comes from a research group with very good reputation and their discussion list is unbelievably helpful.

These are good enough reasons for me to switch to Stan.

## Tuesday, April 09, 2013

## Monday, April 08, 2013

## Friday, April 05, 2013

## Wednesday, April 03, 2013

### R 3.0 supports long vector

It is a very exciting news that the new R 3.0 supports long vector! This is major step toward a viable big data platform.

## Sunday, March 31, 2013

### System76 Linux computer

I recently bought a Linux desktop computer ("Leopard Extreme") from System76. It has a six-core Intel Core I7 CPU, 64 GB memory, and an Intel SSD. I have to replace the Ubuntu system, which came pre-installed, with Mint to make it fully functional.

I really like this machine, and it certainly has made my data analysis work easier and more pleasurable.

## Saturday, March 16, 2013

### Save compilation time with RStan

It seems RStan compiles the program every time the "stan_code = " option in "stan" command was invoked. The best way to avoid this is to do it once:

fit1 <- stan(model_code = "stan.model," chains = 1, data = data)

Then call it using the "fit" option:

fit2 <- stan(fit="fit1," data=data, iter = 10000)

Or do it in parallel way:

library(parallel)

sflist1 <-

mclapply(1:4, mc.cores = 8,

function(i) stan(fit = fit1, chains = 1, chain_id = i, refresh = -1))

fit3 <- sflist2stanfit(sflist1)

## Tuesday, March 12, 2013

## Thursday, February 28, 2013

## Sunday, February 10, 2013

### Bayesian articles

This is a set of helpful articles on the historical development of Bayesian statistics.

## Tuesday, February 05, 2013

### Stan/Rstan new version

The new version (1.1.1) of Stan/Rstan is out. A lot of new models have been incorporated, judging from the user manual. This is exciting!

## Monday, January 07, 2013

Subscribe to:
Posts (Atom)