Sunday, December 25, 2016
Tuesday, December 13, 2016
Sunday, December 11, 2016
Monday, December 05, 2016
Thursday, December 01, 2016
Wednesday, November 30, 2016
Bigtop
Apache Bigtop is by far the easiest to install Hadoop distribution. I was able to get it work on an
CentOS VM. Unfortunately the bundled Spark is 1.5.1, which does not work well with the sparklyr package. I have not figured out how to make the yarn manager work with a third-party copy of Spark yet. Need to wait for the Bigtop distribution to upgrade.
CentOS VM. Unfortunately the bundled Spark is 1.5.1, which does not work well with the sparklyr package. I have not figured out how to make the yarn manager work with a third-party copy of Spark yet. Need to wait for the Bigtop distribution to upgrade.
Saturday, November 26, 2016
The importance of googling ...
Trying to submit a manuscript today. The main manuscript was written in Rmarkdown whereas the supplementary materials in Word. The online submission system kept complaining abut my pdf file. After some googling and tweaking, it turns out the problem was caused by indicator function "\mathbbm{1}". Following the suggestion given here, I replaced it with "\mathds{1}". Problem solved!
Sunday, November 20, 2016
Building Scalable Data Pipelines with Microsoft R Server and Azure Data Factory
Useful information on big data computation using Microsoft platform.
Saturday, November 12, 2016
RStudio IDE Easy Tricks You Might’ve Missed
I missed a few of the tricks mentioned here. Very neat indeed!
Saturday, November 05, 2016
Saturday, October 22, 2016
Monday, October 17, 2016
Sunday, October 09, 2016
Real-World Machine Learning
I find this book very helpful. The introduction chapter is freely available.
Monday, August 15, 2016
Saturday, July 16, 2016
LaplacesDemon is back
Looks like LaplacesDemon package is back. Now we have a pure R-based Bayesian computation platform.
Friday, July 01, 2016
Microsoft Analytics in 2016
Here is a thorough introduction of data science solution offered by Microsoft.
Saturday, June 18, 2016
Wednesday, May 25, 2016
Wednesday, April 06, 2016
Thursday, March 31, 2016
Friday, February 26, 2016
Multiple imputation using R
R has a long list of packages for multiple imputation. The main problem is integration: statistical procedures in other packages may or may not work with the imputation procedures. I have been using Amelia together with Zelig. Because they were written by the same group, they work well together. However, I have been having trouble with making multiple imputation to work with the plm package. After searching the internet, here comes the solution:
- Impute the missing data using Amelia or Mice.
- Estimate the model on each imputed data.
- Use the mitools package to extract and combine results.
For example, here is a simple example:
...
imp <- mice(d)
mydata <- imputationList(lapply(1:5, complete, x = imp))
fit <- lapply(mydata$imputations, function(x){
plm(cog3pl ~ oc + grade9 + boy + han + ruralbirth, data = x,
index = c("schids"), model = "pooling")})
betas <- MIextract(fit, fun = coef)
vars <- MIextract(fit, fun = vcov)
summary(MIcombine(betas, vars))I bet this will work for most, if not all, estimation procedures in R.
Sunday, February 21, 2016
Another text analysis package
Quanteda seems to be a serious contender for analyzing textual data using R.
Wednesday, February 10, 2016
Rstudio 0.99.878 becomes official
Rstudio 0.99.878 becomes official. There is server version in AUR that can be used. However, the pandoc that comes with that version seems to have problems on Arch/Manjora. The discussion here is very helpful. One easy solution is to replace the built-in pandoc with the one that comes with the system:
sudo mv /usr/lib/rstudio-server/bin/pandoc/pandoc /usr/lib/rstudio-server/bin/pandoc/pandoc_old sudo mv /usr/lib/rstudio-server/bin/pandoc/pandoc-citeproc /usr/lib/rstudio-server/bin/pandoc/pandoc-citeproc_old sudo ln -s /usr/bin/pandoc /usr/lib/rstudio-server/bin/pandoc/pandoc sudo ln -s /usr/bin/pandoc-citeproc /usr/lib/rstudio-server/bin/pandoc/pandoc-citeproc
Friday, February 05, 2016
Tuesday, February 02, 2016
Tuesday, January 19, 2016
Notebook interface for everything
Zeppelin provides a unified interface for nearly all the major data process engines, including Spark. I quickly set it up on a virtual machine and gave it a test run. It works great.
This one has SparkR support built-in.
I was never really into the ipython/jupyter notebook mainly because there is nothing they can do that a good IDE such as PyCharm or Rodeo cannot. Zeppelin is different because its capability of tightly integrating different Spark front-ends, including Scala, Python, and R is uniquely powerful. I would call this revolutionary.
This one has SparkR support built-in.
I was never really into the ipython/jupyter notebook mainly because there is nothing they can do that a good IDE such as PyCharm or Rodeo cannot. Zeppelin is different because its capability of tightly integrating different Spark front-ends, including Scala, Python, and R is uniquely powerful. I would call this revolutionary.
Thursday, January 14, 2016
R Users Will Now Inevitably Become Bayesians
Good post here. I would also add that the rethinking package is a third option that helps R users to become a Bayesian.
Subscribe to:
Posts (Atom)