Wednesday, December 31, 2014


wxMEdit is the reincarnation of the once popular text/hex editor called MadEdit. I liked the editor but not its icon (the one with teeth). The new editor looks pleasant and snappy. I think it handles very large file size.

Saturday, December 20, 2014

Thursday, December 11, 2014

R vs. RRO (OpenBALS vs. MKL)

The Revolution R Open (RRO) just released a new version based on R 3.12 (Intel MKL). I installed this on Mint 17.1. Now I want to compare its performance against the vanilla R 3.12 I installed on Manjaro a few days earlier, which as compiled against OpenBLAS. I used the benchmark test provided here.

RRO + MKL: 7.347 sec
R + OpenBLAS: 7.557 sec

Now it's a tie.

Wednesday, December 10, 2014

Lighttable as a Julia IDE

Unconventional but very exciting.

The Chinese Century

The Chinese Century, by Joseph Stiglitz.

Why R is Hard to Learn

A long list of reasons here. Power and flexibility do come with a price, I guess. From a user perspective, a big part of learning R is about identifying a set of key packages that suite your needs best.

Tuesday, December 09, 2014

Useful Julia resources

This is a list of Julia resources for statistics.

Saturday, December 06, 2014

Thursday, December 04, 2014

Manjaro 0.8.11

The newly released Manjaro Linux is surprisingly stable and polished. After playing with it for a day (virtual machine installation), I think I am going to keep it. As it matures, it can be a viable competitor for a serious workstation OS.

Not the mention the fact that it is a rolling release so one always gets the latest and greatest of everything.

Monday, November 10, 2014

System76 has a facelift

System76, an excellent Linux computer hardware vendor, got a cool new web site. I like it!

Sunday, November 09, 2014

R is now the #12 most popular programming language

According to the most recent TIOBE index, R is the #12 most popular programming language.

Saturday, November 08, 2014

spread doesn't work with dplyr::grouped_df

This bug caused several hours of my time. Hope it can get fixed soon. Meanwhile adding an extra step by changing it to a data.frame works fine.

Friday, October 31, 2014

pander 0.5.0: the next generation of markdown tables in R

I am moving from Sweave to Rmarkdown, this looks very promising.

Wednesday, October 22, 2014


Cool utility, for those who stuck with SPSS.

Teaching statistics using R: Resources

I am compiling a list of resources on teaching statistics using R, many of which target undergraduates:

Sunday, October 19, 2014

Teaching statistics using R

This post explains it well. I also like the "mosaic" package.

Saturday, October 18, 2014

Zelig 5, 4, and 3

Zelig is one of the R packages that I use routinely. The most recent version is Zelig 5, which uses the new "reference class" OOP model, looks promising. But it does not support all the models the earlier versions supported yet and the output cannot be readily processed by the table-making packages, such as xtable, stargazer, and texreg. Zelig 4 has been around for some time but it seems to be more error prone than Zelig 3, which is still my Zelig version of choice for data analysis.

Friday, October 17, 2014

Thursday, October 16, 2014

Shiny, ggvis, and rmarkdown

I have been using rmarkdown in my research and really enjoyed it. I also tried to get my head around shiny and ggvis. I can understand what they do but, compared to rmarkdown, dplyr, and ggplot2, they seemed less relevant. Yesterday I went to the R-Day of the Strata+Hadoop meeting and the presentations made by Winston Chang and Garrett Grolemund really changed my opinion. And the timing is perfect: now I have strong incentive to incorporate both technologies into my Advanced Analytics course in the Spring of 2015.

According to Hadley Wichham, the author of dplyr and a number of other excellent packages, the next version of dplyr will have a function for recoding.

Monday, October 13, 2014

Getting Real About China

An article by Wesley Clark. The article is all right but some of the comments are really insightful! I am impressed by the NYT readers!

Super salesman

carton of the Chinese prime minister is hilarious.

Friday, October 10, 2014


This package from Google looks interesting.

Sunday, October 05, 2014

Sunday, September 28, 2014

Ubuntu blogs

Here is a list of blogs for Ubuntu users.

Wednesday, September 17, 2014

R package to convert statistical analysis objects to tidy data frames

This post introduces the broom package, which can tidy up R output in the same dplyr and tidyr package do to R dataframes.

Monday, September 15, 2014


The "ffbase2" packages aims to combine the "dplyr" and "ff" packages, really cool.

Tuesday, September 09, 2014

Ceemple vs. Rcpp

Ceemple is a cool way to do C++. Rcpp is another cool way to do C++. Each of them has its own strengths and weaknesses. I am amazed to see how little change is required to get the same source to compile and run under these environments. For example, Ceemple comes with an example that uses the Eigen matrix library:

#include <Eigen/Dense>
#include <iostream>
using namespace Eigen;
using namespace std;

int main()
  ArrayXXf  m(2,2);
  // assign some values coefficient by coefficient
  m(0,0) = 1.0; m(0,1) = 2.0;
  m(1,0) = 3.0; m(1,1) = m(0,1) + m(1,0);
  // print values to standard output
  cout << m << endl << endl;
  // using the comma-initializer is also allowed
  m << 1.0,2.0,
  // print values to standard output
  cout << m << endl;

With Rcpp (using the Rstudio IDE), this becomes:
// [[Rcpp::depends(RcppEigen)]]
#include <RcppEigen.h>
using namespace std;
using namespace Rcpp;
using namespace Eigen;

// [[Rcpp::export]]
int test_eigen()
  ArrayXXf  m(2,2);
  // assign some values coefficient by coefficient
  m(0,0) = 1.0; m(0,1) = 2.0;
  m(1,0) = 3.0; m(1,1) = m(0,1) + m(1,0);
  // print values to standard output
  cout << m << endl << endl;
  // using the comma-initializer is also allowed
  m << 1.0,2.0,
  // print values to standard output
  cout << m << endl;
  return 0;

/*** R

Virtually no changes required! 

The rgl package needs a new libpng!

The new version of the "rgl" package (0.94.1131) requires "". On my Ubuntu 14.04 system, I have to get the source tarball, install it, and make a soft link to "/usr/lib".

Wednesday, August 27, 2014

Data science toolkit

Here is a list of useful resources for data science.

Friday, August 22, 2014

Two R packages for piplining

PipeR and magrittr are two packages for piplining in R. This post explains some design differences between them.

Regular expressions in R

This post explains how to use different regular expression commands to achieve various tasks.

Here is a good tutorial, and a YouTube video here.

Sunday, August 17, 2014

Best VPN service providers

I find this introduction useful.

Friday, August 15, 2014

Atom editor

The Atom editor is really cool. The Windows build keeps up the pace of development but the Linux build seriously lags behind. The developmental version is 0.124, the Windows build is 0.123, whereas the Linux build is 0.161. Have to build it from source myself, don't have a choice.

Sunday, August 03, 2014


The RNetLogo package completes the NetLogo simulation environment and makes it an attractive and complete experimentation tool for social scientists and alike.

Wednesday, July 16, 2014

R, Python, and Julia

Good posting and lots of excellent comments here!

Sunday, July 13, 2014

The intelligence paradox

The Intelligence Paradox: Why the Intelligent Choice Isn't Always the Smart One is an interesting book. One one hand, it proposes a number of very important research questions; on the other hand, the statistical analysis tends to be overly simplistic (e.g., relying on regression analysis of observational data) and the substantive conclusions are sometimes not as solid as they could have been.

Friday, July 11, 2014

The Atom editor is getting really amazing!

Running Python, Go, and Julia code above (with the script package). I wish its LaTeX capacity can be improved to match the functionality provided by the AucTeX package for Emacs.

Monday, July 07, 2014

Not So Swift

A reaction to Apple's new programming language.

Tuesday, June 24, 2014

Getting Hadoop and RHadoop to work

I managed to get Hadoop and RHadoop installed on my workstation.

  1. Install Hadoop following these suggestions;
  2. Install Thrift following these suggestions;
  3. Install "rmr", "rhdfs", and "plyrmr" here.
This post is also quite helpful. 

Monday, June 23, 2014


Looks like RHadoop is still the tool of choice for R-based big data analysis.

I found this post helpful to get Hadoop installed on my workstation.

Thursday, May 22, 2014

Light Table

Light Table is is a modern editor/IDE that is almost as flexible as Emacs. It has a lot of potentials.

R beats Python! R beats Julia! Anyone else wanna challenge R?

Some good arguments and, more importantly, evidence here.

Tuesday, May 20, 2014

R vs Python: Why R is still the king of statistical computing

Here are some good arguments. I particularly like the author's argument regarding the vital role of Rcpp in the future development of the R ecosystem.

Friday, May 16, 2014


This is a young project, but it looks really promising!

Wednesday, May 14, 2014

The Atom text editor

This new text editor looks promising. Here are instructions of installing on Ubuntu.

Monday, May 12, 2014

Sunday, May 11, 2014

Saturday, May 10, 2014

Friday, May 09, 2014

Fanless laptop

Looks like the next big thing in the laptop industry is fanless computer. HP got the head start on this, looks like Apple is soon to follow. The chance that my next laptop (preferably from Lenovo or Sony) is a fanless one is pretty good.

Tuesday, May 06, 2014

Upgrading to Ubuntu 14.04

After trying on VMs and old machines, I finally replaced Mint 14 with Ubuntu 14.04 on my workstation. It works well so far: Unity seems to become less annoying, no need to trim my SSD manually, and, the best of all, my USB wireless adapter is working much more smoothly than before, thanks to the new kernel. I agree with some of the recent reviews that Ubuntu 14.04 is a very mature, stable, and a bit boring workstation OS. But being boring is exactly what a workstation OS needs.

Monday, April 28, 2014

How easy it is to create an R package?

I did not know until I found this video today. Rstudio and its sibling packages have really changed the way how people use R!

10 Funny Jokes In Pictures: Windows Vs Mac Vs Linux

Here they are.

Saturday, April 26, 2014

Friday, April 25, 2014

My desktop

The new version (14.04) is really good!

Monday, April 07, 2014

Play with C++ code in Rstudio

I am not a C++ programmer but sometimes I need to play with some C++ code. On my Linux workstation, I have the complete GNU tool chain installed. I don't really want to do that on my tiny Windows ultrabook because ... well, it's tiny and I want to keep it that way.

I do have R (and the magical Rcpp package), Rtools, and Rstudio installed on my ultrabook. I just realized how trivial it is to tweak C++ code fragments using these tools together.

For example, in order to run this very simple C++ code fragment:

#include <iostream>
using std::cout;

int main() {
for (int hashNum = 1; hashNum <= 5; hashNum++) {
cout << "#";
cout << "\n";
return 0;

One just need to add a few lines so that the code looks like this:

#include <Rcpp.h>

using std::cout;
using namespace Rcpp;

// [[Rcpp::export]]
int main() {
  for (int hashNum = 1; hashNum <= 5; hashNum++) {
cout << "#";
cout << "\n";
return 0;

/*** R

Hit Ctrl + Enter, problem solved!

Tuesday, April 01, 2014

Data analysis on AWS cloud

Here is a good introduction on how to get started with data analysis on AWS cloud.

Sunday, March 30, 2014

More on pandoc and markdown

This post explains how to use citation and reference in a pandoc markdown document.

Saturday, March 29, 2014

Texts, a pandoc editor

I really like this Pandoc editor I bought. It is small, lightweight, and elegant. More importantly, combined with the Pandoc engine, it can handle long and complicated documents. I also like the this review. This short article also looks very interesting (and this guy is a sociologist!).

CuteMarkEd is another promising markdown editor. It is a free and open source project.

Sunday, March 23, 2014

Tuesday, March 18, 2014


Manjaro is a Linux distro based on Arch. Based on this review, it has become a real competitor of Ubuntu and Mint. I should give a try sometime.


By integrating R, markdown, and many other exciting new technologies, Rstudio has become a complete and fascinating data analysis environment.

Monday, March 17, 2014


Ceemple takes a different approach to scientific computing: it is based on C++ and integrates a large number of scientific libraries into an intuitive IDE. It is free for academic use but so far is Windows-only.

Sunday, March 16, 2014

MCMC in LaplacesDemon

LaplacesDemon is a really cool package but I had a hard time to grasp its full capacity. This article summarizes provides a nice summary.

Friday, March 14, 2014


TeXMath is a LibreOffice plugin that support LaTeX equations. Looks good.

Saturday, March 08, 2014

LaTeX templates

Here is a collection of useful LaTeX templates.

Wednesday, February 26, 2014

R performance, again

I got a new Windows laptop recently, so I decided to do another R benchmark test. I used to R script provided here.

R without OpenBlas: 36 sec;
R with OpenBlas: 14 sec;
Revolution R: 9 sec.

Thursday, February 13, 2014

EIN -- Emacs IPython Notebook

EIN integrates ipython notebook into Emacs. Looks promising.

Tuesday, January 28, 2014

Useful ipython notebooks

Here is a large collection of useful ipython notebooks.

Friday, January 17, 2014


The RcppOctave may provide the right amount of Matlab to try the examples in this excellent book.

Thursday, January 16, 2014

Making tables using R

Both the "tables" and "etable" packages are useful for making cross-tabulations.