Shige's Research Blog

Monday, August 15, 2016


The new sparklyr package from rstudio provides a convenient interface between R/Rstudio and Spark. It runs well on Linux; it also works on Windows for Spark 1.6.2 and lower. For some reasons, it does not work with Spark 2.0 on Windows. I assume it will get fixed in subsequent releases.

Saturday, July 16, 2016

LaplacesDemon is back

Looks like LaplacesDemon package is back. Now we have a pure R-based Bayesian computation platform.

Friday, July 01, 2016

Microsoft Analytics in 2016

Here is a thorough introduction of data science solution offered by Microsoft.

Saturday, June 18, 2016

Wednesday, May 25, 2016

Thursday, March 31, 2016

Friday, February 26, 2016

Multiple imputation using R

R has a long list of packages for multiple imputation. The main problem is integration: statistical procedures in other packages may or may not work with the imputation procedures. I have been using Amelia together with Zelig. Because they were written by the same group, they work well together. However, I have been having trouble with making multiple imputation to work with the plm package. After searching the internet, here comes the solution:

  1. Impute the missing data using Amelia or Mice.
  2. Estimate the model on each imputed data.
  3. Use the mitools package to extract and combine results. 
For example, here is a simple example:
imp <- mice(d)
mydata <- imputationList(lapply(1:5, complete, x = imp))
fit <- lapply(mydata$imputations, function(x){
plm(cog3pl ~ oc + grade9 + boy + han + ruralbirth, data = x,
index = c("schids"), model = "pooling")})
betas <- MIextract(fit, fun = coef)
vars <- MIextract(fit, fun = vcov)
summary(MIcombine(betas, vars))
I bet this will work for most, if not all, estimation procedures in R.

Sunday, February 21, 2016

Another text analysis package

Quanteda seems to be a serious contender for analyzing textual data using R.