Shige's Research Blog

Wednesday, May 25, 2016

Thursday, March 31, 2016

Friday, February 26, 2016

Multiple imputation using R

R has a long list of packages for multiple imputation. The main problem is integration: statistical procedures in other packages may or may not work with the imputation procedures. I have been using Amelia together with Zelig. Because they were written by the same group, they work well together. However, I have been having trouble with making multiple imputation to work with the plm package. After searching the internet, here comes the solution:

  1. Impute the missing data using Amelia or Mice.
  2. Estimate the model on each imputed data.
  3. Use the mitools package to extract and combine results. 
For example, here is a simple example:
imp <- mice(d)
mydata <- imputationList(lapply(1:5, complete, x = imp))
fit <- lapply(mydata$imputations, function(x){
plm(cog3pl ~ oc + grade9 + boy + han + ruralbirth, data = x,
index = c("schids"), model = "pooling")})
betas <- MIextract(fit, fun = coef)
vars <- MIextract(fit, fun = vcov)
summary(MIcombine(betas, vars))
I bet this will work for most, if not all, estimation procedures in R.

Sunday, February 21, 2016

Another text analysis package

Quanteda seems to be a serious contender for analyzing textual data using R.

Wednesday, February 10, 2016

Rstudio 0.99.878 becomes official

Rstudio 0.99.878 becomes official. There is server version in AUR that can be used. However, the pandoc that comes with that version seems to have problems on Arch/Manjora. The discussion here is very helpful. One easy solution is to replace the built-in pandoc with the one that comes with the system:

sudo mv /usr/lib/rstudio-server/bin/pandoc/pandoc /usr/lib/rstudio-server/bin/pandoc/pandoc_old
sudo mv /usr/lib/rstudio-server/bin/pandoc/pandoc-citeproc /usr/lib/rstudio-server/bin/pandoc/pandoc-citeproc_old

sudo ln -s /usr/bin/pandoc /usr/lib/rstudio-server/bin/pandoc/pandoc
sudo ln -s /usr/bin/pandoc-citeproc /usr/lib/rstudio-server/bin/pandoc/pandoc-citeproc

Friday, February 05, 2016

Alternate R Markdown Templates

These alternative R Markdown templates look great.

Tuesday, February 02, 2016

ggplot2 extensions

Here is a list of ggplot2 extension packages.

Tuesday, January 19, 2016

Notebook interface for everything

Zeppelin provides a unified interface for nearly all the major data process engines, including Spark. I quickly set it up on a virtual machine and gave it a test run. It works great.

This one has SparkR support built-in.

I was never really into the ipython/jupyter notebook mainly because there is nothing they can do that a good IDE such as PyCharm or Rodeo cannot. Zeppelin is different because its capability of tightly integrating different Spark front-ends, including Scala, Python, and R is uniquely powerful. I would call this revolutionary.