There is probably no better way to demonstrate computer code than ipython notebook. Here are a number of examples that I find useful:
Monday, December 23, 2013
Thursday, December 19, 2013
Thursday, December 05, 2013
Big data and Bayesian inference
This post provides invaluable information about the use of Bayesian method in big data analysis. I did not realize before that the LapLacesDemon package has provides such a powerful and versatile environment for Bayesian inference.
Right now the package works at fairly low level. The user needs to specify the likelihood function and take care of various kinds of details. A wrapper package that provides a simplified interface and reasonable default options will be an excellent addition and probably will help attracting a much wider use base.
Right now the package works at fairly low level. The user needs to specify the likelihood function and take care of various kinds of details. A wrapper package that provides a simplified interface and reasonable default options will be an excellent addition and probably will help attracting a much wider use base.
Wednesday, November 27, 2013
Monday, November 25, 2013
Use the open source R in the similar way to SAS
Here is how. Very good examples!
By the way, learning materials for the ff package are surprisingly sparse.
By the way, learning materials for the ff package are surprisingly sparse.
Saturday, November 23, 2013
Three ways to run Bayesian models in R
Nice post here.
I tested the code on my machine. LaplacesDemon (LD) performs surprisingly well. For N = 20,000, JAGS takes 746 seconds, Stan takes 50 seconds (including compilation), and LD takes 42 seconds.
When N = 200,000, JAGS takes 85,44 seconds, Stan takes 280 seconds, and LD takes 382 seconds to complete. In other words, JAGS becomes impossible with "larger" data; both Stan and LD remains viable. With LD's big data capability, it is even possible that, with data size exceeds memory size, LD remains viable but Stan may not be.
It is, however, quite tricky to get LD to work properly. I spent a day tweaking a very simple linear regression model (by trying different samplers, different transformations of the data, etc.); even with very large of iterations, the demon could not be appeased.
I tested the code on my machine. LaplacesDemon (LD) performs surprisingly well. For N = 20,000, JAGS takes 746 seconds, Stan takes 50 seconds (including compilation), and LD takes 42 seconds.
When N = 200,000, JAGS takes 85,44 seconds, Stan takes 280 seconds, and LD takes 382 seconds to complete. In other words, JAGS becomes impossible with "larger" data; both Stan and LD remains viable. With LD's big data capability, it is even possible that, with data size exceeds memory size, LD remains viable but Stan may not be.
It is, however, quite tricky to get LD to work properly. I spent a day tweaking a very simple linear regression model (by trying different samplers, different transformations of the data, etc.); even with very large of iterations, the demon could not be appeased.
Thursday, November 21, 2013
Sunday, November 10, 2013
Saturday, November 09, 2013
Saturday, October 26, 2013
Parallel problem and a fix
By default, R only uses one of the 12 cores in my workstation. According to this post, the easiest solution is to put "system(sprintf("taskset -p 0xffffffff %d", Sys.getpid()))" in my .Rprofile.
Even better, one can simply put a line "mcaffinity(1:12)" everytime the "parallel" package is loaded.
Even better, one can simply put a line "mcaffinity(1:12)" everytime the "parallel" package is loaded.
Friday, October 11, 2013
The new version (1.0) of the "mixtools" packages looks promising
It's a welcomed addition to the mixture modeling toolkit for R users.
Using Zelig and ggplot2 together
I am not the only person doing that, apparently. Here is an example.
Sunday, October 06, 2013
R packages for statistical exam generation
Looks like both the "exams" and "professR" packages can do that.
Saturday, October 05, 2013
Gmisc package
The "gmisc" package provides a function called "htmlTable" that can produce good-looking tables in markdown files.
With the author's help, I was able to make the reshaped data (using the "reshape2" package) work with the "htmlTable" function to create arbitrary descriptive statistics table with Markdown.
With the author's help, I was able to make the reshaped data (using the "reshape2" package) work with the "htmlTable" function to create arbitrary descriptive statistics table with Markdown.
Friday, October 04, 2013
Who's Afraid of Peer Review?
Science, open access, ethics, and peer review: some intriguing results. From this map, India really stands out.
Here is a list of predatory publishers, compiled by Jeffrey Beall.
Here is a list of predatory publishers, compiled by Jeffrey Beall.
Wednesday, October 02, 2013
Trouble publishing on rpubs and solution
Knitcitations package
The "knitcitations" package provides some very cool features that makes it much easier to do literature search and review when working with Markdown and LaTeX.
Friday, September 27, 2013
Rstudio, Markdown, and presentation
I recently discovered that using Markdown and Rstudio together can greatly simplify the kind of presentation I do for teaching and professional conference, that is, put computer code and the output side by side. Here is a simple demonstration.
Thursday, September 26, 2013
R to LaTeX options
Wednesday, September 25, 2013
Mint Debian update pack #7
A new update pack for Mint Debian is out! Among many other things, it comes with Linux kernel 3.10 and gcc 4.7.3. My upgrade process went well (had to manually remove and re-install a number of packages such as Freepascal, Lazarus though).
Sunday, September 08, 2013
Tuesday, September 03, 2013
Monday, September 02, 2013
Friday, August 16, 2013
Wednesday, July 10, 2013
Monday, July 08, 2013
Thursday, July 04, 2013
gNewSense
Just saw a new beta of the "strictly libre" distro, gNewSense, but not before long I realized that many of the software bundled in this distro are several generations older than the most recent ones. For example, the newest gcc is 4.8.1, and gNewSense has 4.4.5, almost as old as the one that comes with OSX.
Tuesday, June 25, 2013
Faster Gibbs sampling MCMC from within R
I find this comparison of MCMC performance between different languages helpful.
After generating a .class file from the java source, I need to add the following two lines of code to make it work:
.jaddClassPath("/home/shige/mydata/test/Gibbs/out/production/Gibbs")
.jaddClassPath("/home/shige/bin/java/parallelcolt-0.9.4.jar")
After generating a .class file from the java source, I need to add the following two lines of code to make it work:
.jaddClassPath("/home/shige/mydata/test/Gibbs/out/production/Gibbs")
.jaddClassPath("/home/shige/bin/java/parallelcolt-0.9.4.jar")
Thursday, June 20, 2013
Build R with Intel MKL
Following the instruction here and here, I built and installed R with the MKL library. Running this benchmark test shows that a significant improvement in performance over the default build (8.782 vs. 33.446 second).
Running the same test on my Windows machine (with an older Core2 chip) took 51 seconds. Following the suggestions and replaced the default "Rblas.dll" with the one provided here, reduced the run-time to 26 seconds.
Running the same test on my Windows machine (with an older Core2 chip) took 51 seconds. Following the suggestions and replaced the default "Rblas.dll" with the one provided here, reduced the run-time to 26 seconds.
Monday, June 10, 2013
Using Rcpp in agent-based models
Here is an example. The Rcpp code has to be slightly modified to run on my Mint linux machine.
Sunday, June 09, 2013
Ubuntu ...
What is the biggest's bug in Ubuntu? Unity! Ubuntu has been down the wrong path too far ... Fortunately we still have Mint and Debian.
Friday, June 07, 2013
Some interesting R packages for causal analysis
A number of new R packages are very helpful for the purpose of conducting causal analysis, including the IUPS package, and the CBPS package.
Monday, June 03, 2013
Symbolic algebra solution for R
Ryacas, a interface between R and Yacas, looks like a good solution for symbolic algebra for R.
Monday, May 20, 2013
Another option for R/LaTeX/Sweave editor
WinEdt is another good option for R/LaTeX/Sweave editing. I especially the feature of running "knitr + pdflatex" without opening a R terminal. Unfortunately it is a Windows-only software.
Saturday, May 11, 2013
Thursday, May 09, 2013
Glmer2stan
Glmer2stan is an interesting software that translates glmer (lme4) syntax into Stan. The software is still at eary stage but looks very promising.
Wednesday, May 08, 2013
Sumatra pdf viewer
The Sumatry pdf viewer works very well on Windows platform. If you have to run LaTeX on Windows platform and needs function like auto-update of the compiled pdf file, Sumatry is the right choice.
Friday, May 03, 2013
Rstudio vs. StatET
Rstudio wins in easy installation, smaller size, and intuitive user interface. StatET wins in graphical debugger and an outline view. Right now, StatET is a better choice for Sweave/Knitr/LaTeX authoring platform than Rstudio.
Sunday, April 14, 2013
Stan as a unified statistical estimation and intepretation engine
Applied researchers who are used to Stata or R need a reason to learn and use Stan. Besides the usual Bayesian vs. frequentist discussions, there is also a practical one.
Stan provides a unified interface for statistical estimation and interpretation. I have been using R with the Zelig package for estimation and interpretation during the past years, which is great. The problem is that, because Zelig is built upon a large number of existing R packages written by many different researchers and the quality of these packages vary greatly, working with Zelig means that you are working with all these other packages and researchers as well. In addition, even though in theory you can modify the source of these packages to suite your needs, but applied researches rarely have the energy or skills to tweak the FORTRAN or C code.
Stan provides a modeling language, which makes it easy for user to tweak their model (of course the underlying C++ code is also available). It is simulation-based and uses posterior distribution for inference, which means that there is no need for an additional simulation step (as what Zelig brings to frequentist models). After some testing, I have come to the conclusion that Stan is fast and stable enough for my daily data analysis work.
The best of all, this package comes from a research group with very good reputation and their discussion list is unbelievably helpful.
These are good enough reasons for me to switch to Stan.
Stan provides a unified interface for statistical estimation and interpretation. I have been using R with the Zelig package for estimation and interpretation during the past years, which is great. The problem is that, because Zelig is built upon a large number of existing R packages written by many different researchers and the quality of these packages vary greatly, working with Zelig means that you are working with all these other packages and researchers as well. In addition, even though in theory you can modify the source of these packages to suite your needs, but applied researches rarely have the energy or skills to tweak the FORTRAN or C code.
Stan provides a modeling language, which makes it easy for user to tweak their model (of course the underlying C++ code is also available). It is simulation-based and uses posterior distribution for inference, which means that there is no need for an additional simulation step (as what Zelig brings to frequentist models). After some testing, I have come to the conclusion that Stan is fast and stable enough for my daily data analysis work.
The best of all, this package comes from a research group with very good reputation and their discussion list is unbelievably helpful.
These are good enough reasons for me to switch to Stan.
Tuesday, April 09, 2013
Monday, April 08, 2013
Friday, April 05, 2013
Wednesday, April 03, 2013
R 3.0 supports long vector
It is a very exciting news that the new R 3.0 supports long vector! This is major step toward a viable big data platform.
Sunday, March 31, 2013
System76 Linux computer
I recently bought a Linux desktop computer ("Leopard Extreme") from System76. It has a six-core Intel Core I7 CPU, 64 GB memory, and an Intel SSD. I have to replace the Ubuntu system, which came pre-installed, with Mint to make it fully functional.
I really like this machine, and it certainly has made my data analysis work easier and more pleasurable.
Saturday, March 16, 2013
Save compilation time with RStan
It seems RStan compiles the program every time the "stan_code = " option in "stan" command was invoked. The best way to avoid this is to do it once:
fit1 <- stan(model_code = "stan.model," chains = 1, data = data)
Then call it using the "fit" option:
fit2 <- stan(fit="fit1," data=data, iter = 10000)
Or do it in parallel way:
library(parallel)
sflist1 <-
mclapply(1:4, mc.cores = 8,
function(i) stan(fit = fit1, chains = 1, chain_id = i, refresh = -1))
fit3 <- sflist2stanfit(sflist1)
Tuesday, March 12, 2013
Thursday, February 28, 2013
Sunday, February 10, 2013
Bayesian articles
This is a set of helpful articles on the historical development of Bayesian statistics.
Tuesday, February 05, 2013
Stan/Rstan new version
The new version (1.1.1) of Stan/Rstan is out. A lot of new models have been incorporated, judging from the user manual. This is exciting!
Monday, January 07, 2013
Subscribe to:
Posts (Atom)