Saturday, December 25, 2010
Maximum Likelihood Estimation and Inference: With examples in R, SAS and ADMB
As the first textbook on ADMB, this book is highly expected.
Friday, December 17, 2010
Where to find good data sets
This post provides some good information on where to find good data sets for statistical analysis.
Thursday, December 09, 2010
Programming languages on the rise
This post list seven programming languages that are rising, from Python to Ruby to R to... Cobol.
Monday, December 06, 2010
Saturday, December 04, 2010
Comparison of results
I am doing a simple comparison of different estimation procedures in dealing with a simple binomial model. Here is where I got started:
---------------------------------------------
library(INLA)
library(npmlreg)
library(MCMCglmm)
library(DPpackage)
data(Seeds)
# Using INLA
formula = r ~ x1*x2 + f(plate, model="iid")
mod.inla = inla(formula, data=Seeds, family="binomial", Ntrials=n)
summary(mod.seeds)
# Using npmlreg
mod.ml <- alldist(cbind(r, n-r) ~ x1*x2 , random=~1, data=Seeds, family=binomial, random.distribution="gq")
summary(mod.ml)
# Using MCMCglmm
prior <- list(R=list(V=1, nu=0.002))
mod.mcmc <- MCMCglmm(cbind(r, n-r) ~ x1*x2, family="multinomial2", data=Seeds, prior=prior)
summary(mod.mcmc$Sol)
# Using DPpackage
---------------------------------------------
library(INLA)
library(npmlreg)
library(MCMCglmm)
library(DPpackage)
data(Seeds)
# Using INLA
formula = r ~ x1*x2 + f(plate, model="iid")
mod.inla = inla(formula, data=Seeds, family="binomial", Ntrials=n)
summary(mod.seeds)
# Using npmlreg
mod.ml <- alldist(cbind(r, n-r) ~ x1*x2 , random=~1, data=Seeds, family=binomial, random.distribution="gq")
summary(mod.ml)
# Using MCMCglmm
prior <- list(R=list(V=1, nu=0.002))
mod.mcmc <- MCMCglmm(cbind(r, n-r) ~ x1*x2, family="multinomial2", data=Seeds, prior=prior)
summary(mod.mcmc$Sol)
# Using DPpackage
-----------------------------------------------
I will keep updating by adding new things (estimation procedures, predictive simulations, etc.)
Wednesday, December 01, 2010
Tuesday, November 30, 2010
Wrapping text around figures in LaTeX
I found this page very helpful in showing how to create figures with text wrapping around. And this one for making tables.
I am using the LaTeX package by Bruce Donald for my next NIH proposal.
I am using the LaTeX package by Bruce Donald for my next NIH proposal.
Sunday, November 28, 2010
Top 5 Most Anticipated Tablets of 2010-2011
This is the list. What's interesting is that the Adam can be used as a tablet and a e-book reader and the reader mode is much more power efficient.
Friday, November 26, 2010
Computational tools for Bayesian analysis
The increasing number of R-oriented Bayesian computational tools such as MCMCpack, MCMCglmm, DPpackage, R-INLA, spBayes, have made BUGS less and less crucial for day to day Bayesian computation. Honestly, I cannot figure out a single analysis that BUGS can do but at least one of the above mentioned packages cannot.
Saturday, November 20, 2010
Wednesday, November 17, 2010
Emacs, LaTeX, and forward search
Saturday, November 13, 2010
Wednesday, November 10, 2010
PPA for Emacs
Wednesday, November 03, 2010
Sunday, October 31, 2010
Friday, October 29, 2010
Ggplot2, some tricks
Here is how to remove the grey in the background as well as the grid, a modification of the default black/white theme.
Lotus Symphony 3
A new version of Lotus Symphony is out. Interesting that IBM does not provide a 64-bit build for Linux.
Thursday, October 28, 2010
Friday, October 22, 2010
Useful blog
Managing a statistical analysis project
This post is very helpful for people who are serious in statistical analysis.
For a wider use of R
Two things that are crucial for a wider use of R among applied researchers. The first one is data manipulation/reshaping tool. I think the package "reshape" and "reshape2" have done good job and have largely removed the barrier. The second one is a universal output processing engine, something like the "eststo, esttab, estsadd" combo in Stata. The package "xtable", "memisc", "aprstable", and "estout" have made some progress, but they all are limited to some estimation procedures and none of them provides a universal output process engine for "all" estimation procedures. One of the main reason I am still paying for Stata is the convenience of directly producing publication quality LaTeX tables from any estimation procedures without having to manually change anything.
I wish more attention can be devoted to this latter issue.
I wish more attention can be devoted to this latter issue.
Thursday, October 21, 2010
Samsung Galaxy Tab
According to this source, the Samsung Galaxy Tab will cost about $600, which I think is a little too much.
Wednesday, October 20, 2010
The "tikzDevice" package
The tikzDevice package is quite amazing. Here are two graphs I just made, with (lower) and without (upper) using the tikzDevice package. The difference in quality is huge.
Since the LaTeX source file for the figure is quite large in size and may take significantly longer to compile, it makes sense to export the file into PDF file and include in the main LaTeX file in the usual way. That way, one gets the pleasing effect produced by TikZ and does not suffer from longer compilation time.
Also, the package "pgfSweave" is also very helpful if one wants to combine R sessions with LaTeX session.
Here is a tikz editor that works great on my 32-bit Ubuntu box but cannot be compiled on my 64-bit box. It can produce PDF, EPS, and PNG files directly from LaTeX code. TpX is another editor for tikz. It is currently Windows-only but a alpha version based on Lazarus is under development.
Some additional information is provided by this post.
Since the LaTeX source file for the figure is quite large in size and may take significantly longer to compile, it makes sense to export the file into PDF file and include in the main LaTeX file in the usual way. That way, one gets the pleasing effect produced by TikZ and does not suffer from longer compilation time.
Also, the package "pgfSweave" is also very helpful if one wants to combine R sessions with LaTeX session.
Here is a tikz editor that works great on my 32-bit Ubuntu box but cannot be compiled on my 64-bit box. It can produce PDF, EPS, and PNG files directly from LaTeX code. TpX is another editor for tikz. It is currently Windows-only but a alpha version based on Lazarus is under development.
Some additional information is provided by this post.
Monday, October 18, 2010
Ubuntu 11.04
Looks like Ubuntu 11.04 will bring some significant changes: new GCC, new Gnome, and new file system.
Friday, October 15, 2010
Tuesday, October 12, 2010
Labelbook
The Stata command "labelbook" can be used to create output that shows both the raw values and value labels side by side, a feature very helpful dealing with second-hand data sources.
Monday, October 04, 2010
Estimation of quantile treatment effects with Stata
Here is a very useful article published on Stata Journal.
Saturday, October 02, 2010
Friday, October 01, 2010
Using emacs with rcs
I find this blog post regarding how to use emacs and rcs together very helpful.
Thursday, September 30, 2010
Tuesday, September 28, 2010
Document Foundation
LibreOffice, a new OpenOffice branch. People have learned their lesson from what happened to OpenSolaris, I guess.
By the way, by carefully reading the "natbib" package manual, I think I figure out how to change the default in-text citation from "...(Caldwell, 1979)" to "..(Caldwell 1979)".
By the way, by carefully reading the "natbib" package manual, I think I figure out how to change the default in-text citation from "...(Caldwell, 1979)" to "..(Caldwell 1979)".
Sunday, September 26, 2010
BibTeX style file
Here is a helpful post about how to tailor .bst file, with links to may more resources.
Saturday, September 25, 2010
R2admb
The new version of this package (0.6) is quite serviceable already. It also comes with a 17-page document and some interesting examples.
Wednesday, September 22, 2010
Handbook on Impact Evaluation: Quantitative Methods and Practices
Here is a the book, did not receive very positive review from the Stata community though.
Tuesday, September 21, 2010
C++ matrix library
This is a help post regarding matrix computation in C++ using the Armadillo library. This library seems to have a more comprehensive collection of numerical routines.
Sunday, September 19, 2010
Saturday, September 18, 2010
Writing Scientific Software: a guide to good style
Here are some interesting and useful code templates.
Thursday, September 16, 2010
Improving R performance
The combination or Rcpp and Inline packages seems to be really useful to improve the performance of R, as suggested in this post. Inline also works with FORTRAN, which I like better for numerical computations.
Monday, September 13, 2010
LD_LIBRARY_PATH and alternative
I compiled and installed the most recent version of Armadillo library, and tested a sample program here. Both compilation and build went well, but when I tried to run the resulted program, I got an error message saying that the library files were not found. I followed the suggestions found here and created a new file " /etc/ld.so.conf.d/local.conf" which has only one line of contents:
/usr/local/lib64
Which seems to do the trick and,from my understanding, is better than have a line in my .bashrc for "LD_LIBRARY_PATH".
==================== EDITED SEPT. 24, 2010 ==============================
Now I realized that every time a new library is installed, I need to run "sudo sudo ldconfig".
==================== EDITED SEPT. 24, 2010 ==============================
Now I realized that every time a new library is installed, I need to run "sudo sudo ldconfig".
Friday, September 10, 2010
Autostart R console
An recent patch to the StatET fixed the autostart function. Now the R console starts automatically when I launch Eclipse, which is exactly what I want.
Eclipse 3.6 no longer has the "run last launched tools" shortcut key, but this autostart thing is probably even better.
Eclipse 3.6 no longer has the "run last launched tools" shortcut key, but this autostart thing is probably even better.
Oracle vs. Google: FSF's Response
FSF's response to the law suite between Oracle and Google deserves some attention.
Thursday, September 09, 2010
Revolution R Enterprise, some thoughts
I requested an academic copy of the Revolution R Enterprise today. Since they only have Windows or Redhat version, I installed it on an old spare machine dual-booting Linux and Windows. I like what they did with the IDE (the debugger is nice) and some of their tailored made packages dealing with large data and parallel processing. It should be a good choice to introduce R to students. I guess I am not surprised to see the Windows version is made using MS Visual Studio, but having to install a load of VS-related stuff just to install this relatively simple software (with respect to the GUI, of course) make me feel a little, ...unhappy.
I think this is a serious alternative (for the first time) to the "official" R on Windows platform. And Revolution is generous to let faculty and students to use their software for free. If I decide to teach my students R, this software is absolutely my first choice.
By the way, I was not able to get the package "MCMCglmm" running because the required package "tensorA" is not in their official repository, and the version I pulled off CRAN is no good.
============== UPDATES (SEPT. 11, 2010) ========================
The problem with MCMCglmm seems to be fixed for the W32 system: the package "tensorA" was added to the official repository so I can run my favorite package for mixed effect modeling now. I still cannot use it on Win64 system.
I think this is a serious alternative (for the first time) to the "official" R on Windows platform. And Revolution is generous to let faculty and students to use their software for free. If I decide to teach my students R, this software is absolutely my first choice.
By the way, I was not able to get the package "MCMCglmm" running because the required package "tensorA" is not in their official repository, and the version I pulled off CRAN is no good.
============== UPDATES (SEPT. 11, 2010) ========================
The problem with MCMCglmm seems to be fixed for the W32 system: the package "tensorA" was added to the official repository so I can run my favorite package for mixed effect modeling now. I still cannot use it on Win64 system.
Wednesday, September 08, 2010
Spell checking with Eclipse
I like Eclipse as a LaTeX and Sweave platform. This web page explains how to do spell checking with Eclipse, which got me confused for quite a while.
Saturday, September 04, 2010
Xubuntu 10.04
I finally decided to upgrade my 4-year old Sony SZ laptop from Ubuntu 9.04 to Xubuntu 10.04. The main reason was that it is getting more and more difficult to keep up with some of the software I use on daily basis on a distro that is almost two years old.
The new system is snappy, and this xfce-based distro works really on this old laptop.
The new system is snappy, and this xfce-based distro works really on this old laptop.
Thursday, September 02, 2010
Wednesday, September 01, 2010
The equinox theme is getting better and better
I especially like the new equinox evolution them and the equinox evolution squared window border. I am very pleased about my computer UI now.
Tuesday, August 31, 2010
Monday, August 30, 2010
Sunday, August 29, 2010
Tuesday, August 24, 2010
R-INLA
From a quick look at the web site, R-INLA seems to be able to carry out bayesian analysis without the actual MCMC sampling. This makes it possible to quickly try various different model specifications without hours of waiting time. This package looks very interesting and definitely deserves more attention.
Since the Linux version is binary, I have to manually set the two main files so they are allowed to run. I hope the team can soon make a version ready for CRAN.
Since the Linux version is binary, I have to manually set the two main files so they are allowed to run. I hope the team can soon make a version ready for CRAN.
Monday, August 23, 2010
Runjags
To my knowledge, there are three interface packages between R and JAGS: rjags, R2jags, and runjags. I just realized that with runjags, you don't have to have a separate JAGS model file but to embed it in your R source file. This certainly has its appeal and might be considered as a unique advantage of runjags.
Sunday, August 22, 2010
Introduction to Data Technologies
Here is another free data analysis book that can be used to teach students to work with different types of data.
Saturday, August 21, 2010
Monday, August 16, 2010
Revolution R Enterprise
I realized that the Revolution R Enterprise has only a version for Redhat Linux but not Ubuntu. Too bad.
Sunday, August 08, 2010
R package for hidden Markov model
The depmixS4 package estimates hidden Markov models, among many others. I found it very useful. Here is a short into. paper.
Wednesday, August 04, 2010
Integrating R with Geany
According to landroni, R can also be easily integrated with Geany:
Download 0.19 (non-optional), then you need to set send_selection_unsafe=true in geany.conf and assign "Send selection to terminal" to a ctrl+r keybinding or similar.
As long as your Geany installation has support for the embedded virtual terminal emulator (VTE), you're ready to go (as far as I know, this requirement excludes Windows). Start R in the terminal, write some R code and send the line or selection with ctrl+r.
As long as your Geany installation has support for the embedded virtual terminal emulator (VTE), you're ready to go (as far as I know, this requirement excludes Windows). Start R in the terminal, write some R code and send the line or selection with ctrl+r.
Monday, August 02, 2010
Sunday, August 01, 2010
NetLogo-R-Extension
Here is a bridge between NetLogo and R, which looks very interesting. But I was not able to get it work on my Ubuntu box (running the latest R).
Programming in NetLogo
Here is a nice introduction to NetLogo from a programming language perspective.
Thursday, July 29, 2010
Another tablet, looking good
Based on this post, Motorola is going to release an Android-based tablet in the coming fall.
Wednesday, July 28, 2010
Pdf support in ebook reader
Based on this review, the Sony reader (daily edition) has better pdf support than both nook and kindle, and should be the #1 choice for handling pdf books (now costs $249). Here is a more detailed explanation of how sony reader deals with pdf files.
This web site deserves some attention, and this is an excellent review.
This web site deserves some attention, and this is an excellent review.
Monday, July 26, 2010
Ebook reader price cut
All major ebook readers have recently experienced major price cut (for example, Kindle DX, Nook). It seems that the Apple Ipad has something to do with it.
Looking forward to HP's webOS-based slate computer
From what I have read, webOS is a really cool and mature Linux-based OS for handhold device. Since HP is planning to release a webOS-based slate computer ("palmpad"?) later this year, I think it will be worth waiting for.
Tuesday, July 20, 2010
Monday, July 19, 2010
Flexmix taks long time...
I am running a mixture logit model on a sample of 10,000 cases. The model takes a long time (3-4 hours) to run, which makes it nearly impossible to try different specifications and such. It probably makes sense to use a subsample for exploratory analysis then fit the final model on the full sample.
This post describes how to draw a random sample in R.
This post describes how to draw a random sample in R.
Sunday, July 18, 2010
Npmlreg and Flexmix
In my attempt to estimate a mixture regression model for the long-term impact of prenatal famine exposure, I compared results obtained from NPMLREG and Flexmix using the "rainfall" data set came with the package "forward" and was able to achieve identical results from:
m.np <- allvc(cbind(Cases, Total-Cases) ~ Rain, random= ~1|ID, data=rainfall, k=2, family=binomial(logit))
and
m.mix <- stepFlexmix(cbind(Cases, Total-Cases) ~ 1|ID, model=FLXMRglmfix(family="binomial", fixed=~Rain), k=2, nrep=5, data=rainfall)
To test the random effect model, I created the new variable ID by:
rainfall$ID <- seq(from=1, to=34, by=1)
With Flexmix, it is possible to go a step further and let the treatment variable to vary between latent groups by:
m.mix <- stepFlexmix(cbind(Cases, Total-Cases) ~ Rain|ID, model=FLXMRglmfix(family="binomial"), k=2, nrep=5, data=rainfall)
Saturday, July 17, 2010
Latent variable analysis package for R
This is another R package for latent variable analysis. I hope it can soon include multilevel and mixture modeling capacity. Another promising package is OpenMx, which is also under active development and very promising. At this moment, OpenMx seems to have more features than lavaan (e.g. categorical dependent variable, mixture model, etc.), but lavaan promises to include them in the next year or so.
Tuesday, July 13, 2010
Online LaTeX table generater
It probably uses some Perl or Python script to add all the formatting marks, very convenient.
Flexmix for binary dependent variable
It looks that FlexMix does not handle binary dependent variables directly, you have to transform it into "cbind(event, total-event)" format.
Monday, July 05, 2010
Convergence diagnostics with MCMCglmm
MCMCglmm does not produce more than one chain. So the way to check convergence is to run the model twice, each with different starting values, then conduct the diagnostic with:
gelman.diag(m1$Sol, m2$Sol)
Handling missing data in R
Some estimation procedures, including MCMCglmm, does not handle missing values directly. Before estimating these models, missing values need to be excluded. From this post, the command to do this is:
newdata <- na.omit(mydata)
Monday, June 28, 2010
Thursday, June 24, 2010
A few words describing population projection
Projection in demography is calculating survivors down cohort lines of those living at a given point in time, calculating births in each successive period, and adding a suitable allowance for migration. Of the various ways of looking at population dynamics, the one most convenient for the present purpose is the matrix approach (Chapter 7). Such calculations were used, before Leslie (1945) expressed them in matrix form, by Cannan (1895), Bowley (1924), and especially Whelpton (1936).
----------------------------------------------Page 272, Applied Mathematical Demography
Here is some intro. materials, implemented in Stata. Even though I have been using Stata for a long time, I have never spent any timing learning Mata, too bad.
----------------------------------------------Page 272, Applied Mathematical Demography
Here is some intro. materials, implemented in Stata. Even though I have been using Stata for a long time, I have never spent any timing learning Mata, too bad.
Wednesday, June 23, 2010
Using World Bank data in R
This R package makes it easy to use data and information from World Bank.
A simple JAGS demo
Here is a simple JAGS demo. The "progress.bar="gui"" option for the "coda.samples()" function is very convenient, but was not documented in the manual.
OpenBUGS and rbugs package
I just realized that OpenBUGS 3.10 has a native Linux version and it can be called directly from within R via the package "rbugs". OpenBUGS is probably closer to WinBUGS than JAGS, and it is always nice to alternatives!
It will be fun to do some benchmark test between OpenBUGS and JAGS.
It will be fun to do some benchmark test between OpenBUGS and JAGS.
Tuesday, June 22, 2010
Making glmmBUGS work with JAGS
To streamline the process of making glmmBUGS to work with JAGS, I made changes to the "writeBugsModel.R" and replaced all "inprod2" with "inprod" and "dflat()" with "dnorm(0.0,1.0E-6)". I will spend some time looking at other sources files and see if anything else should be changed.
JAGS' glm module
I did not realize that new GLM module for JAGS 2.0 can be so effective in reducing the running time. Using the example I posed it yesterday (an example from the glmmBUGS package), without loading the glm module, it took 20.223 minutes to draw 10,000 samples from the posterior distribution; with the glm module loaded, it took 11.869 to get the same results:
Without glm module:
user system elapsed
20.170 0.020 20.223
With glm module:
user system elapsed
11.040 0.000 11.869
Without glm module:
user system elapsed
20.170 0.020 20.223
With glm module:
user system elapsed
11.040 0.000 11.869
Dflat() prior
Here is an example of replacing the "dflat()" prior with "dnorm()" prior, yet without much explanation.
Monday, June 21, 2010
A question
I posted a question to the R-Mixed-Model mailing list but have not had any responses yet:
---------------------------------------------------------------------------------------
Here is the answer from Jens Åströ
---------------------------------------------------------------------------------------
Hi!
Not an expert by any means, but I got it to run by doing this:
1) change inprod2 to inprod in the model file (JAGS does not have
inprod2 function)
2) change ~dflat() to other uninformative prior, e.g.
~dnorm(0.0,1.0E-6) (Jags does not have dflat distribution)
3) specify/compile the model with e.g.
bac.jags<-jags.model("model.bug",data=bacrag$ragged,n.chains=4)
4) update it, update(bac.jags,1000)
5) Collect coda samples,
bacResult<-coda.samples(temp,names(getInits()),n.iter=10000,thin=10)
This seemed to converge well, (gelman.diag(bacResult))
Good luck
----------------------------------------------------------------------------------------
It works.
---------------------------------------------------------------------------------------
Dear All, I wonder if anybody has tried to make glmmBUGS work with JAGS. My attempt was not successful. Here is the simple example I copied from the glmmBUGS Vignettes: ------------------------ library(MASS) data(bacteria) bacterianew <- bacteria bacterianew$yInt = as.integer(bacterianew$y == "y") levels(bacterianew$trt) <- c("placebo", "drug", "drugplus") library(glmmBUGS) bacrag <- glmmBUGS(formula = yInt ~ trt + week, data=bacterianew, effects = "ID", modelFile="model.bug", family="bernoulli") names(bacrag$ragged) source("getInits.R") startingValues = bacrag$startingValues ------------------------ With WinBUGS, it runs well: ------------------------ library(R2WinBUGS) bacResult1 = bugs(bacrag$ragged, getInits, model.file="model.bug", n.chain=3, n.iter=2000, n.burnin=100, parameters=names(getInits()), n.thin=10) ------------------------ But with JAGS, I got error message "Error in FUN(50L[[1L]], ...) : invalid first argument" ------------------------ library(R2jags) jags.parms=names(getInits()) bacResult = jags(data=bacrag$ragged, n.chain=3, n.iter=2000, model.file=model.bug) ------------------------ I hope somebody can help me figuring out how to make this work. Many thanks. Best, Shige---------------------------------------------------------------------------------------
Here is the answer from Jens Åströ
---------------------------------------------------------------------------------------
Hi!
Not an expert by any means, but I got it to run by doing this:
1) change inprod2 to inprod in the model file (JAGS does not have
inprod2 function)
2) change ~dflat() to other uninformative prior, e.g.
~dnorm(0.0,1.0E-6) (Jags does not have dflat distribution)
3) specify/compile the model with e.g.
bac.jags<-jags.model("model.bug",data=bacrag$ragged,n.chains=4)
4) update it, update(bac.jags,1000)
5) Collect coda samples,
bacResult<-coda.samples(temp,names(getInits()),n.iter=10000,thin=10)
This seemed to converge well, (gelman.diag(bacResult))
Good luck
----------------------------------------------------------------------------------------
It works.
Saturday, June 19, 2010
Thursday, June 17, 2010
Sunday, June 13, 2010
The BIAS project
The BIAS project has some useful resources and information. And here is a course on spatial analysis using R.
Thursday, June 10, 2010
Wednesday, June 09, 2010
When Linux stopped responding ...
My laptop (running Ubuntu 9.04) stopped responding when I was running update and doing many other things. This post saved my day.
Sunday, June 06, 2010
Using Multilevel Models to Analyze Couple and Family Treatment Data: Basic and Advanced Issues
Rarely published papers make their source code open and well organized. Here is an exception.
Friday, June 04, 2010
Friday, May 21, 2010
Bayesian disease mapping
Here is a slide showing how to make R to extract results from WinBUGS and show them in maps.
Thursday, May 20, 2010
Sunday, May 16, 2010
Bayesian change point analysis
Here is an example using MCMCpack. The same model can be done with WinBUGS (and thus JAGS), as demonstrated in "Bayesian Computation with R".
Thursday, May 13, 2010
Sunday, May 09, 2010
Sweave with Emacs and ESS, problem solved!
With help from Seb, the nagging problem I had when trying to sweaving with Emacs and ESS (http://old.nabble.com/ESS-and-Sweave-td28339734.html) has been solved.
First of all, apply the patch for ess-swv.el, provided by Seb. Second, set the default pdf viewer for ESS by "C-h v ess-pdf-viewer-pref ", then change the value from "nill" to "evince". Now I can directly sweave my Rnw document by pressing "M-n s", and see the pdf output by pressing "M-s P".
It has been a quite pleasant experience interacting with free software developer/maintainer.
By the way, Emacs 23.2 just got out, and it looks great on my Ubuntu box (the font scale problem that comes with the proprietary Nvidia driver is gone).
First of all, apply the patch for ess-swv.el, provided by Seb. Second, set the default pdf viewer for ESS by "C-h v ess-pdf-viewer-pref ", then change the value from "nill" to "evince". Now I can directly sweave my Rnw document by pressing "M-n s", and see the pdf output by pressing "M-s P".
It has been a quite pleasant experience interacting with free software developer/maintainer.
By the way, Emacs 23.2 just got out, and it looks great on my Ubuntu box (the font scale problem that comes with the proprietary Nvidia driver is gone).
Friday, May 07, 2010
OpenOffice quick-starter caused Ubuntu unable to shutdown?
According to this post, the shutdown problem on Ubuntu 10.04 was caused by the Openoffice quick-starter program. Once the program is closed, the problem goes away. So strange.
Thursday, May 06, 2010
Problem with the proprietary Nvidia driver
The reason why my ubuntu 10.04 box cannot shutdown is the proprietary nvidia driver. I learned this the hard way: I backed up all my data and wiped out my existing installation and installed a fresh new copy, and experimented step by step. I was able to replicate the problem when I enabled the proprietary driver, and the problem when away when I disabled it.
Although I cannot enable the desktop effects, at least I have a stable and efficient computational environment now.
-------------------------------- May 8 ----------------------------------------------
This is not true. The problem with shutdown is caused by OpenOffice quick starter program, not the proprietary Nvidia driver. After enabling the driver, 1) boot-up time increased from 20s to 25s, 2) Emacs fonts increased several fold.
Although I cannot enable the desktop effects, at least I have a stable and efficient computational environment now.
-------------------------------- May 8 ----------------------------------------------
This is not true. The problem with shutdown is caused by OpenOffice quick starter program, not the proprietary Nvidia driver. After enabling the driver, 1) boot-up time increased from 20s to 25s, 2) Emacs fonts increased several fold.
Wednesday, May 05, 2010
Monday, May 03, 2010
Compile aML with GCC 4.4.3 (64-bit)
I was able to successfully compile aML with GCC 4.4.3, which is part of Ubuntu 10.04. I only need to make changes on two files to avoid warning messages: "encrypt.f" (replacing "data str95/'Y,h=(ufSBF&H6W!LZ|boq1yJsX^Ax+K3U5r[mDlNOg7jn;-V:z]k/e *)v#2GQC$t''{c\\">%@~w?8dMREiT0a*9}<`\\._4PIp'/" with "data str95/'Y,h=(ufSBF&H6W!LZ|boq1yJsX^Ax+K3U5r[mDlNOg7jn;-V:z]k/e *)v#2GQC$t''{c\">%@~w?8dMREiT0a*9}<`\._4PIp'/") and "getdatainfo.f" (replacing "data backslash/'\\'/" with "data backslash/'\'/")
Sunday, May 02, 2010
Saturday, May 01, 2010
Friday, April 30, 2010
Ubuntu 10.04 cursor issue
I am having the same problems as these guys. The workaround seems to be to disable the desktop effect first, change the cursor, then enable desktop effect back. Hope to have a more convenient way way soon.
One good thing with the new release is that the annoying sound comes when the login screen shows up can be disabled. I tried many ways but still could not disable it with 9.10.
One good thing with the new release is that the annoying sound comes when the login screen shows up can be disabled. I tried many ways but still could not disable it with 9.10.
Upgrade to Ubuntu 10.04
Just upgraded my Ubuntu box from 9.10 to 10.04. The process went all right, the overall system performance seems satisfactory, but I did not like the change to move the window controls from upper right side to the upper left side.
And I am still using the Clearlook theme.
And I am still using the Clearlook theme.
JAGS for Bayesian analysis
One of the many reasons I like Jackman's book "Bayesian Analysis for the Social Sciences" is that it relies on JAGS as the primary computational platform (as opposed to WinBUGS). From what I know, it is the first Bayesian book that does this.
Thursday, April 29, 2010
Wednesday, April 28, 2010
Sweave vs. pgfSweave
I think pgfSweave should replace the default of Sweave system because it brings significant improvements over the original Sweave. I certain have made it my default Sweave tool on my system.
------------------------------- EDITED ON MAY 1, 2010 --------------------------------------
Since StatET made Sweaving so simple and easy, I don't have incentive to work directly on LaTeX documents any more. It is actually much simpler and more intuitive to work directly on a Sweave documents and see results in the final PDF output.
------------------------------- EDITED ON MAY 1, 2010 --------------------------------------
Since StatET made Sweaving so simple and easy, I don't have incentive to work directly on LaTeX documents any more. It is actually much simpler and more intuitive to work directly on a Sweave documents and see results in the final PDF output.
Tuesday, April 27, 2010
StatET does not work with OpenJDK
After several failed attempts to get StatET to work on my older Ubuntu box, I realized that it does not work with OpenJDK. After replacing the default OpenJDK with the official Sun JDK, everything works fine.
Word wrap in eclipse
Friday, April 23, 2010
Trouble with ESS and Sweave
Last time I tried to sweave a document from with Emacs+ESS, I was using an earlier version of ESS (the current version is 5.8), and things seemed to be fine. Today when I tried to sweave a simple document and produced PDF output, I got error message of ".pdf": exited abnormally with code 2". I posted a message to the ESS mailing list, but so far no solutions have been reported.
Here is a very simple .Rnw file I used to reproduce the error:
Here is a very simple .Rnw file I used to reproduce the error:
--------------------------------------- \documentclass[12pt]{article} %\usepackage{tikz} %\usepackage{fullpage} \usepackage{Sweave} \begin{document} This document outlines the analysis to be carried out to look at the relationship between prenatal famine exposure and fetal loss risk. \end{document} -----------------------------------------
Before this can be fixed, I will rely on Eclipse for all my sweaving.
---------------- April 26 ------------------------
Here is my own solution:
from line 177 of ess-svy.el:- (shell-command (concat - (if (and ess-microsoft-p (w32-shell-dos-semantics)) - "start \"" pdfviewer "\" \"" namestem ".pdf\"" - "\"" pdfviewer "\" \"" namestem ".pdf\" &")))) + (shell-command (concat+ "\"" pdfviewer "\" \"" namestem ".pdf\"")))
Rumour: Apple To Acquire ARM?
I believe that this, if not a rumor, is against most people's interest, manufactures and consumers alike.
Tuesday, April 20, 2010
Monday, April 12, 2010
Rampant cheating hurts China's research ambitions
This is interesting. Smells almost like the Great Leap Forward.
Friday, April 09, 2010
GLMM using DPpackage
I was able to fit a semi-parametric Bayesian GLMM model using DPpackage. It took me many hours to sample from the posterior distribution (DPM prior):
MCMC scan 1000 of 5000 (CPU time: 18950.080 s)
MCMC scan 2000 of 5000 (CPU time: 22510.100 s)
MCMC scan 3000 of 5000 (CPU time: 28293.830 s)
MCMC scan 4000 of 5000 (CPU time: 35111.930 s)
MCMC scan 5000 of 5000 (CPU time: 46726.330 s)
Which translates to 5.26, 6.25, 9.75, 12.98 hours. This makes it less suitable for routine (especially exploratory) data analysis.
I compared the results from DPpackage and that from MCMCglmm, and they are not that different, and the latter took only a small fraction of the time required by the former!
The lack of difference in results puzzled me. I compared from results from random effect logistic regression assuming Gaussian random effect and results from NPML, assuming a nonparametric distribution of the random effect, the differences are quite significant.
---------------------------- UPDATED ON APRIL 11 ----------------------------------------------------------------
Using DP prior instead of DPM prior, it took about 4.7 hours to run the model. The results are slightly different and the parameter I am interested in increased from .41 to .42. Now I am trying PT prior and see how it goes.
DPpackage is a exciting new tool for applied researchers, and A LOT OF new and cool things can be done with it. With convenient new Bayesian tools like MCMCpack, MCMCglmm, and DPpackage, I will not be surprised to see more Bayesian publications coming out in social sciences.
I compared the results from DPpackage and that from MCMCglmm, and they are not that different, and the latter took only a small fraction of the time required by the former!
The lack of difference in results puzzled me. I compared from results from random effect logistic regression assuming Gaussian random effect and results from NPML, assuming a nonparametric distribution of the random effect, the differences are quite significant.
---------------------------- UPDATED ON APRIL 11 ----------------------------------------------------------------
Using DP prior instead of DPM prior, it took about 4.7 hours to run the model. The results are slightly different and the parameter I am interested in increased from .41 to .42. Now I am trying PT prior and see how it goes.
DPpackage is a exciting new tool for applied researchers, and A LOT OF new and cool things can be done with it. With convenient new Bayesian tools like MCMCpack, MCMCglmm, and DPpackage, I will not be surprised to see more Bayesian publications coming out in social sciences.
Tuesday, April 06, 2010
Five open source alternatives to the iPad
Here you can find some alternatives to Apple's ipad, running on Linux.
Monday, April 05, 2010
Sunday, April 04, 2010
A Practical Guide to Geostatistical Mapping
Good book and good introduction to geostatistical modeling.
Friday, April 02, 2010
BAT: Bayesian analysis toolkit
BAT is a Bayesian analysis software. Looks like it needs quite some programming to get anything useful done.
----------------------- added on April 4 --------------------------
I seem to have troubles compiling it on my 64-bit Ubuntu box.
----------------------- added on April 4 --------------------------
I seem to have troubles compiling it on my 64-bit Ubuntu box.
Thursday, April 01, 2010
Apple got slapped
According to this post, Apple does not seem to be the first one to have the idea of "multitouch".
Wednesday, March 31, 2010
Econometrics Toolbox for Matlab
The Econometrics Toolbox for Matlab has just been updated. This is a quite comprehensive applied econometrics toolkit and I found it to be very useful.
Sunday, March 28, 2010
Difference in differences
The DID estimator is a useful tool to obtain a more fine-grained estimate of famine exposure on a variety of health and demographic outcomes, as long as the famine severity measure is good. However, in my case (fecundity), the provincial-level excess famine mortality level does not see to work well. I suspect that the provincial-level mortality measure is too crude and may not be able to adequately reflect individual's nutritional status.
Monday, March 22, 2010
Making Simple R Package
This post explains how to create a simple R package that contains R and FORTRAN source code.
Friday, March 19, 2010
The "apsrtable" is getting better
Now we have both "memisc" and "apsrtable" for producing publication-ready tables from model results.
Thursday, March 18, 2010
BUGS language: strength and weakness
Here is an informative introduction to the strength and weakness of the BUGS language in conducting Bayesian data analysis. Since my analysis often utilize large data (tens of thousands, even millions), BUGS may not be a viable option in many cases. I hope the soon-to-come new version of JAGS can overcome some of the weaknesses mentioned in Gelman's post.
Wednesday, March 17, 2010
Stata 11
Got my Stata 11 today, was a little disappointed because the new program editor feature is not available on Linux platform.
I like the new multiple imputation feature though.
I like the new multiple imputation feature though.
Measuring the length of time to run a function
This post describes how to time the run time of a R function.
Tuesday, March 16, 2010
Sunday, March 14, 2010
Here comes Linux's iPad clones
Linux-based IPad clone will soon emerge and will dominate the market, based on this post.
Saturday, March 13, 2010
Thursday, March 11, 2010
Wednesday, March 10, 2010
Saturday, March 06, 2010
OpenBUGS has a new home
Looks like OpenBUGS now has a new home! And now the command line tools can actually run on my Linux box.
Friday, March 05, 2010
GLMM revisted
A short while ago, I reported some discrepancies between the results produced by "lme4" and other R packages as well as Stata. Today I upgraded to the most recent version of "lme4a" and re-ran my model. The error of false convergence disappeared and the new results are very much in line with other packages.
I am glad that the problem eventually got sorted out.
For this version, the adaptive GH quadrature has not been implemented yet; so the only option to estimate a GLMM is laplace transformation.
I am glad that the problem eventually got sorted out.
For this version, the adaptive GH quadrature has not been implemented yet; so the only option to estimate a GLMM is laplace transformation.
Tuesday, March 02, 2010
Several screenshots
After some tweaking, I finally got Eclipse+StatET working the way I want. The first figure shows a running R session; the second figure shows the maximized running R session. The following shortcut keys are very helpful:
- Ctrl + A: select what you to run;
- Ctrl + R + R: run the selected program then go to the R console;
- Ctrl + M: maximize the R console;
- Ctril + T + 0: run Sweave of pgfSweave.
I like the way Eclipse organize the project and windows.
Creating Beamer Handouts with Notes
This post provides some very helpful information. Unfortunately, the package is not designed to incorporate author's notes into the printed outline; instead, it is designed to print blank outline pages so the audience can jot down their own notes there.
Saturday, February 27, 2010
New version of Eclipse
A new version of Eclipse, v.3.5.2 is out. This one works better on my 64-bit Linux machine in general and works very well with StatET.
An interesting paper
Ben Bolker has an interesting paper (outline of a paper) comparing different approaches to estimate GLMM in R environment, which is very helpful to what I am doing right now.
The paper pointed out the following options to fit GLMM using R:
The paper pointed out the following options to fit GLMM using R:
- glmer
- glmmML
- glmm (from the repeated package)
- glmmADMB
- MCMCglmm
- glmmBUGS
- glmmPQL
- BUGS (through R2WinBUGS)
- glmmAK
And I would like to add one more, npmlreg.
I am not aware of the glmmAK package before. From the first glance, it seems to be very promising in the sense that it seems to allow non-Gaussian random effect in a Bayesian framework, something similar to what npmlreg does with ML method.
============== edited on March 3 ====================
DPpackage is another package that can estimate GLMM in a Bayesian framework.
============== edited on March 5 ====================
ASReml/ASReml-R is another choice. It is not free software, but it does seem to have some unique strengths. Maybe I should download a demo copy and try it myself.
============== edited on March 29 ===================
hglm is another possibility.
I am not aware of the glmmAK package before. From the first glance, it seems to be very promising in the sense that it seems to allow non-Gaussian random effect in a Bayesian framework, something similar to what npmlreg does with ML method.
============== edited on March 3 ====================
DPpackage is another package that can estimate GLMM in a Bayesian framework.
============== edited on March 5 ====================
ASReml/ASReml-R is another choice. It is not free software, but it does seem to have some unique strengths. Maybe I should download a demo copy and try it myself.
============== edited on March 29 ===================
hglm is another possibility.
Beamer themes
I did not know there are so many different themes that can be used for a Beamer presentation slide.
Friday, February 26, 2010
When using open source makes you an enemy of the state
According to this post, a US copyright lobby has long trying to make the case that using open source software is the same thing as using pirate software.
Thursday, February 25, 2010
Wednesday, February 24, 2010
Tuesday, February 23, 2010
Monday, February 22, 2010
Theory without code
As a demographer, I would not pay much attention to statistical methodological papers that do not describe how to implement their methodology either using existing software or using their own code. The reason is simple: I don't have the time and patience to gain a understanding of the methodology deep enough to be able to implement it myself; after all, it is the substantive research questions in demography that interest me most!
Open source statistical platform such as R, WinBUGS, and JAGS have made it so easy for statisticians to code their ideas into readily usable software, so the question is: why not?
Open source statistical platform such as R, WinBUGS, and JAGS have made it so easy for statisticians to code their ideas into readily usable software, so the question is: why not?
More option for cure model
Looks like the cure model can also be estimated using another R package GAMLSS.
de Castro, M., V. G. Cancho, and J. Rodrigues. 2009. “A hands-on approach for fitting long-term survival models under the GAMLSS framework.” Computer Methods and Programs in Biomedicine.
Saturday, February 20, 2010
A few software packages for mixed models
Less well-known but definitely deserve attention:
All these softwares are free to use, but unfortunately none of them seem to be open sourced.
Information provided by this web site is very helpful.
Information provided by this web site is very helpful.
GLMM for ecologists and evolutionary biologists
This web site is devoted to the use of GLMM in the filed of evolutionary biologists.
Friday, February 19, 2010
Thursday, February 18, 2010
Tuesday, February 16, 2010
Generalized linear mixed effect model problem
I am trying to compare cohort difference in infant mortality using generalized linear mixed model. I first estimated the model in Stata:
xi:xtlogit inftmort i.cohort, i(code)
which converged nicely:
Fitting comparison model:
Iteration 0: log likelihood = -1754.4476
Iteration 1: log likelihood = -1749.3366
Iteration 2: log likelihood = -1749.2491
Iteration 3: log likelihood = -1749.2491
Fitting full model:
tau = 0.0 log likelihood = -1749.2491
tau = 0.1 log likelihood = -1743.8418
tau = 0.2 log likelihood = -1739.0769
tau = 0.3 log likelihood = -1736.4914
tau = 0.4 log likelihood = -1739.5415
Iteration 0: log likelihood = -1736.4914
Iteration 1: log likelihood = -1722.6629
Iteration 2: log likelihood = -1694.9114
Iteration 3: log likelihood = -1694.6509
Iteration 4: log likelihood = -1694.649
Iteration 5: log likelihood = -1694.649
Random-effects logistic regression Number of obs = 21694
Group variable: code Number of groups = 10789
Random effects u_i ~ Gaussian Obs per group: min = 1
avg = 2.0
max = 9
Wald chi2(2) = 8.05
Log likelihood = -1694.649 Prob > chi2 = 0.0178
------------------------------------------------------------------------------
inftmort | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
_Icohort_2 | -.5246846 .1850328 -2.84 0.005 -.8873422 -.1620269
_Icohort_3 | -.1424331 .140369 -1.01 0.310 -.4175513 .132685
_cons | -5.214642 .1839703 -28.35 0.000 -5.575217 -4.854067
-------------+----------------------------------------------------------------
/lnsig2u | .9232684 .1416214 .6456956 1.200841
-------------+----------------------------------------------------------------
sigma_u | 1.586665 .1123528 1.381055 1.822885
rho | .4335015 .0347791 .3669899 .5024984
------------------------------------------------------------------------------
Likelihood-ratio test of rho=0: chibar2(01) = 109.20 Prob >= chibar2 = 0.000
Then I tried the same model using lme4:
m2 <- glmer(inftmort ~ (1|code) + as.factor(cohort), family=binomial, data=d)
I got:
Warning message:
In mer_finalize(ans) : false convergence (8)
And the results are quite different from what I got from Stata. I tried to estimate the model using "glmmML" and also got into trouble. This time the error message is:
Warning message:
In glmmML.fit(X, Y, weights, cluster.weights, start.coef, start.sigma, :
Hessian non-positive definite. No variance!
I then tried MCMCglmm:
prior=list(R=list(V=1, nu=0, fix=1), G=list(G1=list(V=1, nu=0)))
m2 <- MCMCglmm(inftmort ~ 1 + as.factor(cohort), random=~code, family = "categorical", data=d, prior=prior)
It seems to be working and produced estimates that are comparable to what Stata produced (not identical, of course):
Iterations = 3001:12991
Thinning interval = 10
Number of chains = 1
Sample size per chain = 1000
1. Empirical mean and standard deviation for each variable,
plus standard error of the mean:
Mean SD Naive SE Time-series SE
(Intercept) -5.7145 0.1805 0.005708 0.02295
as.factor(cohort)2 -0.5633 0.1788 0.005653 0.02194
as.factor(cohort)3 -0.1888 0.1471 0.004653 0.01912
2. Quantiles for each variable:
2.5% 25% 50% 75% 97.5%
(Intercept) -6.0464 -5.8324 -5.7251 -5.61120 -5.2817
as.factor(cohort)2 -0.8977 -0.6985 -0.5551 -0.43084 -0.2371
as.factor(cohort)3 -0.4614 -0.2947 -0.1867 -0.09644 0.1232
This is puzzling.
xi:xtlogit inftmort i.cohort, i(code)
which converged nicely:
Fitting comparison model:
Iteration 0: log likelihood = -1754.4476
Iteration 1: log likelihood = -1749.3366
Iteration 2: log likelihood = -1749.2491
Iteration 3: log likelihood = -1749.2491
Fitting full model:
tau = 0.0 log likelihood = -1749.2491
tau = 0.1 log likelihood = -1743.8418
tau = 0.2 log likelihood = -1739.0769
tau = 0.3 log likelihood = -1736.4914
tau = 0.4 log likelihood = -1739.5415
Iteration 0: log likelihood = -1736.4914
Iteration 1: log likelihood = -1722.6629
Iteration 2: log likelihood = -1694.9114
Iteration 3: log likelihood = -1694.6509
Iteration 4: log likelihood = -1694.649
Iteration 5: log likelihood = -1694.649
Random-effects logistic regression Number of obs = 21694
Group variable: code Number of groups = 10789
Random effects u_i ~ Gaussian Obs per group: min = 1
avg = 2.0
max = 9
Wald chi2(2) = 8.05
Log likelihood = -1694.649 Prob > chi2 = 0.0178
------------------------------------------------------------------------------
inftmort | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
_Icohort_2 | -.5246846 .1850328 -2.84 0.005 -.8873422 -.1620269
_Icohort_3 | -.1424331 .140369 -1.01 0.310 -.4175513 .132685
_cons | -5.214642 .1839703 -28.35 0.000 -5.575217 -4.854067
-------------+----------------------------------------------------------------
/lnsig2u | .9232684 .1416214 .6456956 1.200841
-------------+----------------------------------------------------------------
sigma_u | 1.586665 .1123528 1.381055 1.822885
rho | .4335015 .0347791 .3669899 .5024984
------------------------------------------------------------------------------
Likelihood-ratio test of rho=0: chibar2(01) = 109.20 Prob >= chibar2 = 0.000
Then I tried the same model using lme4:
m2 <- glmer(inftmort ~ (1|code) + as.factor(cohort), family=binomial, data=d)
I got:
Warning message:
In mer_finalize(ans) : false convergence (8)
And the results are quite different from what I got from Stata. I tried to estimate the model using "glmmML" and also got into trouble. This time the error message is:
Warning message:
In glmmML.fit(X, Y, weights, cluster.weights, start.coef, start.sigma, :
Hessian non-positive definite. No variance!
I then tried MCMCglmm:
prior=list(R=list(V=1, nu=0, fix=1), G=list(G1=list(V=1, nu=0)))
m2 <- MCMCglmm(inftmort ~ 1 + as.factor(cohort), random=~code, family = "categorical", data=d, prior=prior)
It seems to be working and produced estimates that are comparable to what Stata produced (not identical, of course):
Iterations = 3001:12991
Thinning interval = 10
Number of chains = 1
Sample size per chain = 1000
1. Empirical mean and standard deviation for each variable,
plus standard error of the mean:
Mean SD Naive SE Time-series SE
(Intercept) -5.7145 0.1805 0.005708 0.02295
as.factor(cohort)2 -0.5633 0.1788 0.005653 0.02194
as.factor(cohort)3 -0.1888 0.1471 0.004653 0.01912
2. Quantiles for each variable:
2.5% 25% 50% 75% 97.5%
(Intercept) -6.0464 -5.8324 -5.7251 -5.61120 -5.2817
as.factor(cohort)2 -0.8977 -0.6985 -0.5551 -0.43084 -0.2371
as.factor(cohort)3 -0.4614 -0.2947 -0.1867 -0.09644 0.1232
This is puzzling.
Sunday, February 14, 2010
Spreadsheet to LaTeX solutions
Excel2latex and spreadsheet2latex seem to be the only two solutions at the time.
Saturday, February 13, 2010
Cure model using R
The package "nltm" seems to be able to estimate proportional hazard and proportional odds cure models. I will do some experiments and see how it goes.
Friday, February 12, 2010
Reshaping data in R
This post explains how to reshape data in R and how to created nicely formated LaTeX using the xtable package.
Thursday, February 11, 2010
Handling hierarchical data structure in R
R has a comprehensive set of tools for the handling of hierarchical data structure. The most widely used package is probably "nlme" and "lme4", contributed by Douglas Bates and colleagues. While "nlme" is older and probably more mature than "lme4", it seems that the latter is increasingly taking over the former because it comes with full set of tools to handle non-Gaussian distributions.
Based on "nlme" and "lme4", the package "mgcv" and "gamm4" can estimate generalized additive model on multilevel data structure. This is especially useful if one have continuous covariates such as age or calendar year, whose effects are not very well understood in the literature and grouping may distort the results.
If you don't feel comfortable with the assumption that the unobserved heterogeneity components are normally distributed, then the package "npmlreg" can be used to conduct sensitivity analysis through nonparametric maximum likelihood method.
The new "MCMCglmm" package extends R's modeling capacity to multilevel and multivariate models such as those that can be estimated using aML and Sabre. Sure, MCMCglmm relies on MCMC instead of maximum likelihood as the method of estimation, which may be slower especially when the data is huge (as is often the case in demographic studies), but as the author indicated, this package has already gained significant performance advantage over traditional Bayesian packages such as WinBUGS and JAGS, and is almost as general as the latter, with respect to multilevel regression-like models.
The Sabre team also produced a R wrapper of their package for multilevel multiprocess modeling. Unfortunately, their R package is based on an outdated version (v.5 instead of v.6) of Sabre and the project seems to be halted due to lack of fund.
It is not difficult to conduct Bayesian multilevel analysis in R using WinBUGS and JAGS via packages such as "R2WinBUGS" and "R2jags". These packages can estimate a very large number of models. This means that almost anything one can think of can be model with R.
Another package, ADMB, also has several interfaces to R. I hope they are getting better and better because ADMB is such a powerful and versatile package that, if one can write down the likelihood function, it can maximize it for you.
Based on "nlme" and "lme4", the package "mgcv" and "gamm4" can estimate generalized additive model on multilevel data structure. This is especially useful if one have continuous covariates such as age or calendar year, whose effects are not very well understood in the literature and grouping may distort the results.
If you don't feel comfortable with the assumption that the unobserved heterogeneity components are normally distributed, then the package "npmlreg" can be used to conduct sensitivity analysis through nonparametric maximum likelihood method.
The new "MCMCglmm" package extends R's modeling capacity to multilevel and multivariate models such as those that can be estimated using aML and Sabre. Sure, MCMCglmm relies on MCMC instead of maximum likelihood as the method of estimation, which may be slower especially when the data is huge (as is often the case in demographic studies), but as the author indicated, this package has already gained significant performance advantage over traditional Bayesian packages such as WinBUGS and JAGS, and is almost as general as the latter, with respect to multilevel regression-like models.
The Sabre team also produced a R wrapper of their package for multilevel multiprocess modeling. Unfortunately, their R package is based on an outdated version (v.5 instead of v.6) of Sabre and the project seems to be halted due to lack of fund.
It is not difficult to conduct Bayesian multilevel analysis in R using WinBUGS and JAGS via packages such as "R2WinBUGS" and "R2jags". These packages can estimate a very large number of models. This means that almost anything one can think of can be model with R.
Another package, ADMB, also has several interfaces to R. I hope they are getting better and better because ADMB is such a powerful and versatile package that, if one can write down the likelihood function, it can maximize it for you.
App Store Model Provides Security, Stability: Evidence, Please?
Here is an interesting post on the claim that Iphone has to be locked down to make it safe and stable.
Wednesday, February 10, 2010
Loglinear models using R
For those sociologists who want to estimate complicated loglinear models (e.g. Goodman's RC model) using R, the package "VGAM" seems to be a good choice.
Tuesday, February 09, 2010
Mixed Effects Models and Extensions in Ecology with R
This is a truly wonderful book. It deals with advanced statistical models in the way that people without advanced mathematical background can follow easily. I think I may have found a perfect textbook for my graduate course in advanced demographic analysis. All I need to do is to introduce some demographic examples to make it easier to understand to demography students.
Sunday, February 07, 2010
Skype on Ipad via 3G
Saturday, February 06, 2010
Friday, February 05, 2010
Graphics using Stata vs. R
Here is an informative blog post comparing simple histogram produced using Stata and various methods using R.
Saturday, January 30, 2010
An example of MCMCglmm
Here is a very useful example of how to estimate multinomial logit with random effect using MCMCglmm package.
Friday, January 29, 2010
Thursday, January 28, 2010
Wednesday, January 27, 2010
IPad
Apple's new toy, ipad, looks interesting. But 10 hour's battery life per charge seems to be less impressive than I originally thought. Some laptop computers can do that. Also, I was surprised to find out its operating system is IPhone OS instead of OSX.
Being an Apple product also means that, in order to do anything useful, you have to buy and buy and buy...
Monday, January 25, 2010
Submitting to a sociology journal
This blog has some useful information for those who are thinking about submitting to a sociology journal.
Monday, January 18, 2010
Wednesday, January 13, 2010
Word count for PDF (LaTeX) file
Friday, January 08, 2010
Thursday, January 07, 2010
Wednesday, January 06, 2010
Apophenia
The January 2010 source release of the Apophenia library cannot be compiled on my Ubuntu machines (64-bit 9.10 and 32-bit 9.04); but the developmental version (downloaded via git) compiles flawlessly. Not sure what is going on.
R AnalyticFlow
R AnalyticFlow is an interesting piece of software. From its web site:
"R AnalyticFlow is a software which enables
state-of-the-art data analysis by drawing analysis flowcharts.
You can effectively share processes of data analysis
in collaborative works"
state-of-the-art data analysis by drawing analysis flowcharts.
You can effectively share processes of data analysis
in collaborative works"
Monday, January 04, 2010
A customized Ubuntu-based Chinese distro
Linux Deepin looks promising. But it does not seem to provide a 64-bit version.
Sunday, January 03, 2010
Friday, January 01, 2010
Create LaTeX table for descriptive statistics using Estout
I want to create a table of descriptive statistics. To be more specific, The table should contains 6 rows and 4 columns, something like this:
Descriptive Statistics
-----------------------------------------------------------
Cohort A Cohort B Cohort C
-----------------------------------------------------------
Urban Residence 0.263 0.313 0.264
Years of Schooling 5.585 7.280 7.592
Ethnic Majority 0.928 0.918 0.915
Want Children 0.990 0.992 0.991
Ever Pregnant 0.991 0.987 0.992
-----------------------------------------------------------
Observations 3553 2450 4694
-----------------------------------------------------------
The official document for Estout provides some clues but does not make it crystal clear. Here are the steps I followed:
- Generate an arbitrary variable;
- Run regression, separately for each cohort;
- Collect the results;
- Make the table
Here are the code:
eststo clear
gen y=uniform()
quietly regress y urban eduy han want_birth preg if cohort==1, noconstant
estadd mean
eststo d1
quietly regress y urban eduy han want_birth preg if cohort==2, noconstant
estadd mean
eststo d2
quietly regress y urban eduy han want_birth preg if cohort==3, noconstant
estadd mean
eststo d3
esttab d1 d2 d3, cells(mean(fmt(%8.3f))) label nonotes nonumber nodepvars mtitles("Cohort A" "Cohort B" "Cohort C") width(1\hsize) title(Descriptive Statistics)
Subscribe to:
Posts (Atom)