Monday, December 23, 2013

Useful ipython notebooks

There is probably no better way to demonstrate computer code than ipython notebook. Here are a number of examples that I find useful:

Thursday, December 05, 2013

Big data and Bayesian inference

This post provides invaluable information about the use of Bayesian method in big data analysis. I did not realize before that the LapLacesDemon package has provides such a powerful and versatile environment for Bayesian inference.

Right now the package works at fairly low level. The user needs to specify the likelihood function and take care of various kinds of details. A wrapper package that provides a simplified interface and reasonable default options will be an excellent addition and probably will help attracting a much wider use base.

Monday, November 25, 2013

Use the open source R in the similar way to SAS

Here is how. Very good examples!

By the way, learning materials for the ff package are surprisingly sparse.

Saturday, November 23, 2013

Three ways to run Bayesian models in R

Nice post here.

I tested the code on my machine. LaplacesDemon (LD) performs surprisingly well. For N = 20,000, JAGS takes 746 seconds, Stan takes 50 seconds (including compilation), and LD takes 42 seconds.

When N = 200,000, JAGS takes 85,44 seconds, Stan takes 280 seconds, and LD takes 382 seconds to complete. In other words, JAGS becomes impossible with "larger" data; both Stan and LD remains viable. With LD's big data capability, it is even possible that, with data size exceeds memory size, LD remains viable but Stan may not be.

It is, however, quite tricky to get LD to work properly. I spent a day tweaking a very simple linear regression model (by trying different samplers, different transformations of the data, etc.); even with very large of iterations, the demon could not be appeased.

Thursday, November 21, 2013

Sunday, November 10, 2013

Saturday, October 26, 2013

Parallel problem and a fix

By default, R only uses one of the 12 cores in my workstation. According to this post, the easiest solution is to put "system(sprintf("taskset -p 0xffffffff %d", Sys.getpid()))" in my .Rprofile.

Even better, one can simply put a line "mcaffinity(1:12)" everytime the "parallel" package is loaded.

Friday, October 11, 2013

The new version (1.0) of the "mixtools" packages looks promising

It's a welcomed addition to the mixture modeling toolkit for R users.

Using Zelig and ggplot2 together

I am not the only person doing that, apparently. Here is an example.

Sunday, October 06, 2013

R packages for statistical exam generation

Looks like both the "exams" and "professR" packages can do that.

Saturday, October 05, 2013

Gmisc package

The "gmisc" package provides a function called "htmlTable" that can produce good-looking tables in markdown files.

With the author's help, I was able to make the reshaped data (using the "reshape2" package) work with the "htmlTable" function to create arbitrary descriptive statistics table with Markdown.

Friday, October 04, 2013

Who's Afraid of Peer Review?

Science, open access, ethics, and peer review: some intriguing results. From this map, India really stands out.

Here is a list of predatory publishers, compiled by Jeffrey Beall. 

Wednesday, October 02, 2013

Trouble publishing on rpubs and solution

On my Windows 7 machine, Rstudio reported an error message when I tried to publish my files on Rpubs. After some digging, I found the solutions here and here.

Knitcitations package

The "knitcitations" package provides some very cool features that makes it much easier to do literature search and review when working with Markdown and LaTeX.

Friday, September 27, 2013

Rstudio, Markdown, and presentation

I recently discovered that using Markdown and Rstudio together can greatly simplify the kind of presentation I do for teaching and professional conference, that is, put computer code and the output side by side. Here is a simple demonstration.

Thursday, September 26, 2013

R to LaTeX options

This post provides a thorough comparison among a number of options. If one wants full Zelig support, stargazer seems to be the only option (it handles more Zelig model types than texreg does).

Wednesday, September 25, 2013

Are We Witnessing the Decline of Ubuntu?

A thoughtful post here.

Mint Debian update pack #7

A new update pack for Mint Debian is out! Among many other things, it comes with Linux kernel 3.10 and gcc 4.7.3. My upgrade process went well (had to manually remove and re-install a number of packages such as Freepascal, Lazarus though).

Sunday, September 08, 2013

RNetLogo

The RNetLogo package is a small piece of wonder. One can conduct agent-based simulation, try different model parameters, and then collect and conduct statistical analysis of the simulation results.

Tuesday, September 03, 2013

R and Hadoop

Some useful information here.

Monday, September 02, 2013

Friday, August 16, 2013

Repast

A new version of the Repast agent-based simulation toolkit was just released!

Thursday, August 01, 2013

Architect

Architect is another development environment for R.

Wednesday, July 10, 2013

Another faster BLAS post

Another post on using faster BLAS with R.

Monday, July 08, 2013

Wiki for R programming

Very useful wiki. I look forward to reading the final book as well.

Thursday, July 04, 2013

gNewSense

Just saw a new beta of the "strictly libre" distro, gNewSense, but not before long I realized that many of the software bundled in this distro are several generations older than the most recent ones. For example, the newest gcc is 4.8.1, and gNewSense has 4.4.5, almost as old as the one that comes with OSX.

Tuesday, June 25, 2013

Faster Gibbs sampling MCMC from within R

I find this comparison of MCMC performance between different languages helpful.

After generating a .class file from the java source, I need to add the following two lines of code to make it work:

.jaddClassPath("/home/shige/mydata/test/Gibbs/out/production/Gibbs") 
.jaddClassPath("/home/shige/bin/java/parallelcolt-0.9.4.jar") 

Thursday, June 20, 2013

Build R with Intel MKL

Following the instruction here and here, I built and installed R with the MKL library. Running this benchmark test shows that a significant improvement in performance over the default build (8.782 vs. 33.446 second).

Running the same test on my Windows machine (with an older Core2 chip) took 51 seconds. Following the suggestions and replaced the default "Rblas.dll" with the one provided here, reduced the run-time to 26 seconds.

A java-based R

Renjin is an implementation of R that is written entirely in Java. 

Monday, June 10, 2013

Using Rcpp in agent-based models

Here is an example. The Rcpp code has to be slightly modified to run on my Mint linux machine.

Sunday, June 09, 2013

Ubuntu ...

What is the biggest's bug in Ubuntu? Unity! Ubuntu has been down the wrong path too far ... Fortunately we still have Mint and Debian.

Docear

Docear is a free academic literature suite. Looks promising.

Friday, June 07, 2013

Some interesting R packages for causal analysis

A number of new R packages are very helpful for the purpose of conducting causal analysis, including the IUPS package, and the CBPS package.

Monday, June 03, 2013

Symbolic algebra solution for R

Ryacas, a interface between R and Yacas, looks like a good solution for symbolic algebra for R.

Identity crisis of Japan

Interesting article.

Monday, May 20, 2013

Another option for R/LaTeX/Sweave editor

WinEdt is another good option for R/LaTeX/Sweave editing. I especially the feature of running "knitr + pdflatex" without opening a R terminal. Unfortunately it is a Windows-only software.

Thursday, May 09, 2013

Glmer2stan

Glmer2stan is an interesting software that translates glmer (lme4) syntax into Stan. The software is still at eary stage but looks very promising.

Wednesday, May 08, 2013

Sumatra pdf viewer

The Sumatry pdf viewer works very well on Windows platform. If you have to run LaTeX on Windows platform and needs function like auto-update of the compiled pdf file, Sumatry is the right choice.

Friday, May 03, 2013

Rstudio vs. StatET

Rstudio wins in easy installation, smaller size, and intuitive user interface. StatET wins in graphical debugger and an outline view. Right now, StatET is a better choice for Sweave/Knitr/LaTeX authoring platform than Rstudio.

Sunday, April 14, 2013

Stan as a unified statistical estimation and intepretation engine

Applied researchers who are used to Stata or R need a reason to learn and use Stan. Besides the usual Bayesian vs. frequentist discussions, there is also a practical one.

Stan provides a unified interface for statistical estimation and interpretation. I have been using R with the Zelig package for estimation and interpretation during the past years, which is great. The problem is that, because Zelig is built upon a large number of existing R packages written by many different researchers and the quality of these packages vary greatly, working with Zelig means that you are working with all these other packages and researchers as well. In addition, even though in theory you can modify the source of these packages to suite your needs, but applied researches rarely have the energy or skills to tweak the FORTRAN or C code.

Stan provides a modeling language, which makes it easy for user to tweak their model (of course the underlying C++ code is also available). It is simulation-based and uses posterior distribution for inference, which means that there is no need for an additional simulation step (as what Zelig brings to frequentist models). After some testing, I have come to the conclusion that Stan is fast and stable enough for my daily data analysis work.

The best of all, this package comes from a research group with very good reputation and their discussion list is unbelievably helpful.

These are good enough reasons for me to switch to Stan.

Tuesday, April 09, 2013

Using R on Amazon EC2

Detailed instructions can be found here.

Monday, April 08, 2013

Friday, April 05, 2013

Wednesday, April 03, 2013

R 3.0 supports long vector

It is a very exciting news that the new R 3.0 supports long vector! This is major step toward a viable big data platform.

Sunday, March 31, 2013

System76 Linux computer

I recently bought a Linux desktop computer ("Leopard Extreme") from System76. It has a six-core Intel Core I7 CPU, 64 GB memory, and an Intel SSD. I have to replace the Ubuntu system, which came pre-installed, with Mint to make it fully functional. 

I really like this machine, and it certainly has made my data analysis work easier and more pleasurable. 

Saturday, March 16, 2013

Save compilation time with RStan


It seems RStan compiles the program every time the "stan_code = " option in "stan" command was invoked. The best way to avoid this is to do it once:

fit1 <- stan(model_code = "stan.model," chains = 1, data = data)

Then call it using the "fit" option:

fit2 <- stan(fit="fit1," data=data, iter = 10000)

Or do it in parallel way:

library(parallel)
sflist1 <- 
  mclapply(1:4, mc.cores = 8, 
           function(i) stan(fit = fit1, chains = 1, chain_id = i, refresh = -1))
fit3 <- sflist2stanfit(sflist1)

Tuesday, March 12, 2013

Thursday, February 28, 2013

Sunday, February 10, 2013

Bayesian articles

This is a set of helpful articles on the historical development of Bayesian statistics.

Tuesday, February 05, 2013

Stan/Rstan new version

The new version (1.1.1) of Stan/Rstan is out. A lot of new models have been incorporated, judging from the user manual. This is exciting!

Thursday, January 03, 2013

Snowlinux

Snowlinux is another Debian/Ubuntu-based distro that is light and fast. Unfortunately, it is not a rolling distro.

Counter