The idea underlying Sweave is very appealing: you can compose a complete research paper with equations, tables, figures without manual intervention (copy, paste, adjust margin, etc.).
The package "odfWeave" extends this idea from LaTeX to OpenOffice. I just gave it a test-drive, and it rocks!
Monday, December 31, 2007
Sunday, December 30, 2007
demogR
As a demographer, it is really exciting to find a toolkit designed for demographic analysis: http://cran.stat.ucla.edu/
This is a short paper explaining the usage of it: http://www.jstatsoft.org/v22/i10/paper
This is a short paper explaining the usage of it: http://www.jstatsoft.org/v22/i10/paper
Saturday, December 29, 2007
How to "reshape" a data set in R
Here is a helpful post explaining how to do "reshape" a data, a convenient feature in Stata, using R: http://www.statmethods.net/management/reshape.html
Zelig
Gary King's crew is doing some really nice work with Zelig. The first goal is to standardize statistical analysis syntax in R, which is very important. It also has the ability to analyze multiply imputed data, to simulate posterior distribution, etc.
Keep up the good work, it looks really promising.
Keep up the good work, it looks really promising.
Thursday, December 27, 2007
The "per thousand" symbol
In OpenOffice, the "per thousand" symbol can be input as equation or special character. The special character version can usually be found under the "general punctuation" in most fonts.
Saturday, December 22, 2007
Sunday, December 09, 2007
Here is how I choose between Stata, Mplus and aML
in dealing with a specific research question:
- Use Stata (or R) for routine statistical analysis, they are getting better and better;
- Use Mplus for multilevel Cox regression; the results can be compared with that from Stata or R;
- Use aML to handle multiple clock situation (i.e. APC model).
Have some fun with GreenFoot
A agent-based simulation environment and a good teaching tool for java programming: http://www.greenfoot.org/index.html
Thursday, November 29, 2007
OooLaTeX
Wednesday, November 21, 2007
Just Another Gibbs Sampler
This project (http://www-fis.iarc.fr/~martyn/software/jags/) looks very promising.
Saturday, November 10, 2007
Saturday, November 03, 2007
Geany the tiny IDE
Geany is a tiny IDE for C/C++. It is convenient to use in cases when one needs to develop short single-file programs like the ones presented in the book "Simulating Ecological and Evolutionary Systems in C".
The address is here:
http://geany.uvena.de/
It also supports FreeBasic and FreePascal.
The address is here:
http://geany.uvena.de/
It also supports FreeBasic and FreePascal.
Thursday, November 01, 2007
Zotero again
Tuesday, October 30, 2007
Chinese fonts
High quality Chinese fonts: http://wqy.sourceforge.net/cgi-bin/index.cgi?%e9%a6%96%e9%a1%b5
Free and open source.
Free and open source.
Saturday, October 27, 2007
Linux & virus
Interesting post of virus on Linux: http://linuxmafia.com/~rick/faq/index.php?page=virus
Wednesday, October 24, 2007
Wow, it's finally here...
A reference manager integrated seamlessly with OpenOffice, on all platforms! Its name is Zotero: http://www.zotero.org/. There is no need to stay with MS-Office just for the convenience of Endnote, and there is no need to stay with Windows just for the combination of Office and Endnote.
It has some very nice features that neither Endnote nor NoteExpress has, such as grab more than one references from google scholars. I realize this is something serious ...
It has some very nice features that neither Endnote nor NoteExpress has, such as grab more than one references from google scholars. I realize this is something serious ...
Monday, October 22, 2007
Upgradeing the RGL package on Ubuntu
Remember first to type "sudo apt-get build-dep r-cran-rgl" to get all the necessary files, then do the upgrading.
Sunday, October 21, 2007
Ubuntu 7.10 and Stata 10
Ubuntu 7.10 is no doubt the best Linux distro so far. I had some problem installing Stata 10 on my Ubuntu box. When I typed "xstata-se", I got "./xstata: error while loading shared libraries: libtiff.so.3: cannot open shared object file: No such file or directory". Google search pointed me to this url:
http://www.stata.com/support/faqs/unix/libtiff.html
and this:
http://www.stata.com/support/faqs/unix/dynamiclibs.html
Fortunately, the solution is simple: just download the file " tiff-3.7.4.tar.gz", compile and install it, Stata works just fine.
http://www.stata.com/support/faqs/unix/libtiff.html
and this:
http://www.stata.com/support/faqs/unix/dynamiclibs.html
Fortunately, the solution is simple: just download the file " tiff-3.7.4.tar.gz", compile and install it, Stata works just fine.
Saturday, October 06, 2007
Period effect in Stata
If the time variable is age, how to get Stata to estimate period effect? For a long time, a thought the only way was to use a discrete-time approach: create person-year data format, then estimate either LOGIT or CLOGLOG model. The problem with this approach is: when dealing with large data sets, it becomes impossible to expand the data 10 to 20 times or even more (depends on the duration and the time scale). The only feasible way to get period effect is to use aML's multiple clock capability.
By carefully reading the manual, I realize there is a way to do this. In my schizophenia case, I do the following:
By carefully reading the manual, I realize there is a way to do this. In my schizophenia case, I do the following:
- stset dura_cal, f(event) origin(time birth) id(id)
- stsplit p, at(0, 16, 26) after(time=1949)
- xi: stcox i.p
Wednesday, October 03, 2007
Split-population model (cure model, long-term survivor model)
When there are a portion of respondents who will never experience the event (immortal), ordinary survival modeling techniques are not adequate. Special models designed to handle this kind of situations are called split-population model, cure model, or long-term survivor model.
aML does not handle split-population model; Mplus handles it by imposing constraints on a two-class mixture model; Stata has the following some facilities:
aML does not handle split-population model; Mplus handles it by imposing constraints on a two-class mixture model; Stata has the following some facilities:
- lncure: log-normal model with split-population;
- spsurv: discrete time split-population model;
- cureregr: split-population model with weibull, lognormal, logistic, gamma, and exponential distribution;
- strxmix and strsnmix: split-population model with weibull, lognormal, gamma, and some mixture distribution.
Among the above, 1-3 are not well documented, while 4 is described in the most recent issue of Stata Journal (7-3).
For discrete-time models, there are only two alternatives: Mplus or spsurv.
Monday, October 01, 2007
Research on schizophrenia
This article is very interesting:
But my results do not support their findings. Worth some further exploration.
- St Clair, D., M. Xu, P. Wang, Y. Yu, Y. Fang, F. Zhang, X. Zheng, N. Gu, G. Feng, and P. Sham. 2005. "Rates of Adult Schizophrenia Following Prenatal Exposure to the Chinese Famine of 1959-1961." Journal of American Medical Association 294:557-562.
- http://sciencenow.sciencemag.org/cgi/content/full/2005/802/3
- http://cmbi.bjmu.edu.cn/news/0508/13.htm
But my results do not support their findings. Worth some further exploration.
Sunday, September 30, 2007
Enter and die at the same time
Stata does not allow a subject enters the risk set and dies at the same time, here is an explanation:
http://www.stata.com/support/faqs/stat/zerohelp.html
http://www.stata.com/support/faqs/stat/zerohelp.html
Saturday, September 08, 2007
Human interface device
Sometimes not all tray icons show up (after a new boot). This can be caused by a unexpected stop in the human interface device. Re-start that service and reboot the computer should solve the problem.
Friday, September 07, 2007
Some R related materials
I have been leaning toward R gradually. Here are some useful resources in learning and using R:
http://faculty.chicagogsb.edu/peter.rossi/research/bayes%20book/bayesm/Making%20R%20Packages%20Under%20Windows.pdf
http://www.maths.bris.ac.uk/~maman/computerstuff/Rhelp/Rpackages.html
http://www.bioconductor.org/workshops/2006/biocintro_oct/lectures/R_intro.pdf
A good GUI for R is JGR: http://rosuda.org/JGR/
http://faculty.chicagogsb.edu/peter.rossi/research/bayes%20book/bayesm/Making%20R%20Packages%20Under%20Windows.pdf
http://www.maths.bris.ac.uk/~maman/computerstuff/Rhelp/Rpackages.html
http://www.bioconductor.org/workshops/2006/biocintro_oct/lectures/R_intro.pdf
A good GUI for R is JGR: http://rosuda.org/JGR/
Friday, August 31, 2007
Weather
It is getting cooler here in Beijing. Hopefully Hangzhou will be cool off in the next half month so that we can fully enjoy our bike trip between Hangzhou and Suzhou.
Wednesday, August 29, 2007
Stata 10 graphics editor, bad idea!
The new graphics editor provided in Stata 10, to me, is a bad idea. Instead of focusing on trying to get the figure right in a do file, many people, especially beginners, will take shortcuts and rely heavily on the new editor. This, in long run, is a very bad idea. Once people start to do that, they lose the capability to be able to replicate exactly what they do. If you spend 10 minutes editing a figure, then realize the data is not quite right and need to run the figure command again, then you need another 10 minutes to edit it... you get the idea.
It is a bit disappointing to find that unicode support is still not there. I have been waiting for this for too long...
It is a bit disappointing to find that unicode support is still not there. I have been waiting for this for too long...
Wednesday, August 22, 2007
aML reported random effects
The random effects that aML reports are standard deviation and correlation, not variance and covariance.
Tuesday, August 21, 2007
Age, period, and cohort effect
A model with age, period, and cohort can only be identified if:
1) Two or more of the remaining age, period, or cohort coefficients to be equal;
2) Use a proxy variable approach that assumes the cohort (e.g. cohort size) or period effects are proportional to certain measured variables;
3) Transform at least one of the age, period, or cohort variables so that its relationship to other is nonlinear.
A piecewise linear hazard rate model can usually be identified because of (3).
1) Two or more of the remaining age, period, or cohort coefficients to be equal;
2) Use a proxy variable approach that assumes the cohort (e.g. cohort size) or period effects are proportional to certain measured variables;
3) Transform at least one of the age, period, or cohort variables so that its relationship to other is nonlinear.
A piecewise linear hazard rate model can usually be identified because of (3).
Friday, August 10, 2007
Ideal calender solution
The combination of Google calender, Mozilla Sunbird (or Thunderbird + Lightning), and Google calender provider (https://addons.mozilla.org/en-US/sunbird/addon/4631) provides an ideal calender solution. More details can be found here: http://www.linux.com/feature/118068.
Thursday, August 09, 2007
My working paper
My most recent working paper "Does Son Preference Influence Children's Growth in Height? A Comparative Study of Chinese and Filipino Children" is available online:
http://www.ccpr.ucla.edu/ccprwpseries/ccpr_018_07.pdf
Another one (impact of famine on mortality) will be available soon.
http://www.ccpr.ucla.edu/ccprwpseries/ccpr_018_07.pdf
Another one (impact of famine on mortality) will be available soon.
Fall begins
秋天来了。天空突然变得高远而清澈。外面温度还是那么高,但是心情却由着这清明的天色而没那么烦闷了。
北京,秋天来了……
By the way, Linux is a superior platform for number crunching compared to Windows. One needs to install lots of extra stuff to have a more or less comparable working environment under Windows. This is one of reasons I am so happy after getting aML working on my Ubuntu box.
北京,秋天来了……
By the way, Linux is a superior platform for number crunching compared to Windows. One needs to install lots of extra stuff to have a more or less comparable working environment under Windows. This is one of reasons I am so happy after getting aML working on my Ubuntu box.
Wednesday, August 08, 2007
Compiling aML under Ubuntu 7.04, again!
I decided to give to it another try. I wrote to Stan, reporting the problem I had when trying to compiling aML on my Ubuntu 7.04 box. He pointed it out that it is likely to be a bug in the compiler. Then I ask myself: why not try a different compiler?
Some google search shows that alternative FORTRAN 77 under linux includes PGI, Intel, Absoft, among others. PGI offers 30 days trial, so I decided to give it a try. The pgf77 generates correct binary for "aml", "bigaml", and "hugeaml", but it creates problems for "mktab", a utility to create tabular results out of aML output files. I need to re-compile this utility using the old g77 compiler.
In short, an easy solution to the problem will be to use pgf77 to generate the main binaries and to use the g77 to generate the auxiliary binaries, then put them together in one place (in the system path).
I have checked using both the provided samples and my own data, so far so good.
Some google search shows that alternative FORTRAN 77 under linux includes PGI, Intel, Absoft, among others. PGI offers 30 days trial, so I decided to give it a try. The pgf77 generates correct binary for "aml", "bigaml", and "hugeaml", but it creates problems for "mktab", a utility to create tabular results out of aML output files. I need to re-compile this utility using the old g77 compiler.
In short, an easy solution to the problem will be to use pgf77 to generate the main binaries and to use the g77 to generate the auxiliary binaries, then put them together in one place (in the system path).
I have checked using both the provided samples and my own data, so far so good.
Tuesday, August 07, 2007
Access blogpost in China
I just realize that this site can be assessed in China without using VPN or proxy. This just happened today.
Monday, August 06, 2007
More aML stuff
This software review is also a good introduction of aML: http://www.cmm.bristol.ac.uk/learning-training/multilevel-m-software/reviewaml.pdf
Dan Powers (http://www.la.utexas.edu/research/faculty/dpowers/) also has some aML-related meterials, scattering around at different places.
Dan Powers (http://www.la.utexas.edu/research/faculty/dpowers/) also has some aML-related meterials, scattering around at different places.
Saturday, August 04, 2007
"Censoring Due to Death"
Interesting blog post I found: http://www.iq.harvard.edu/blog/sss/archives/2005/09/censoring_due_t_1.shtml
New data wave from CLHNS
New wave of the Cebu Longitudinal Health and Nutrition Survey is out. I need to update my data and run the Mplus program for catch-up growth and see if anything changes.
Tuesday, July 31, 2007
Mplus vs. aML
I am working on a joint model of miscarriage and child mortality. The idea is that higher rate of miscarriage may end up with a more highly selected newborns, who are less vulnerable to premture death. This is a Heckman selection type of model, with the exception that the main equation is a hazard model instead of a continuous model. I have been trying to make Mplus to do the job, as hinted by Bengt (http://www.statmodel.com/discussion/messages/11/740.html?1155563326). Then I got a direct response from Linda that this type of model cannot be estimated using Mplus (http://www.statmodel.com/discussion/messages/14/2459.html?1185842830). I guess this leaves me no other alternatives but to go to aML.
aML is a fine software. I used it in 2004 for a chapter of my dissertation (at that time it cost about 1,000 bucks). Now it is free and open-source, and everyone can look at the code (in FORTRAN) to see how certain things are done. I am a little surprised that it has not attracted more attention.
What I would really like to see is a wrapper program in Stata and can 1) export data to aML, 2) automatically generate sensible starting values and feed them to aML, and 3) gather coefficients estimated in aML for post-estimation manipulation.
If nobody has already started working on this, I might end up doing it myself.
aML is a fine software. I used it in 2004 for a chapter of my dissertation (at that time it cost about 1,000 bucks). Now it is free and open-source, and everyone can look at the code (in FORTRAN) to see how certain things are done. I am a little surprised that it has not attracted more attention.
What I would really like to see is a wrapper program in Stata and can 1) export data to aML, 2) automatically generate sensible starting values and feed them to aML, and 3) gather coefficients estimated in aML for post-estimation manipulation.
If nobody has already started working on this, I might end up doing it myself.
Friday, July 27, 2007
LyX 1.50 is out, with unicode support
LyX 1.50 seems to be a real alternative to MS Office and OpenOffice for serious writing. Now it has unicode support built in. I have tried it on both Windows and Linux and it works jus fine. What I have not figured out is how to get the file compiled with LaTeX into DVI file correctly when there are double-byte characters (Chinese, for example).
Tuesday, July 24, 2007
New book
This book looks interesting: http://www.springer.com/west/home/computer/mathematics?SGWID=4-151-22-97836125-detailsPage=ppmmediatoc
Saturday, June 16, 2007
Thursday, June 07, 2007
Freebasic
An easy-to-use programming language is an invaluable tool for quantitative social scientists. I have learned several programming languages including C/C++, Java, Python, Pascal. My favorite language right now is FreeBasic (http://www.freebasic.net/). It is highly compatible with the once widely used QBasic and is 100% free. There are several editors that can be used with FreeBasic on Windows, including the one I am using, FbEdit (http://www.radasm.com/fbedit/). The picture on the left shows how to debug a FreeBasic program using Insight, a frontend of GNU debugging tool, GDB.
Sunday, June 03, 2007
Editing and comparing huge text files
I need to work on several huge text files, each being around 30M in size. The work involves cleaning them and comparing to each other. I have worked with several pretty good text editors before, some are free while others are commercial. For this particular work, I tried ultraedit, emeditor, vedit, madedit, and multiedit. For text comparison, I tried beyond compare and ultracompare. Overall, I would say that the winner is multiedit, for several reasons. First of all, even though all the editors I tested can handle large files, the process is not painless. Most editors show a significant slowdown after loading the files, even on my core 2 duo machine with 2 GB memory. Multiedit does not have this problem. The 30 MB file loads instantly and can easily scroll to anywhere in the file without delays. Second, most editors have very limited file comparison functions built-in, and is not suitable for the work I have at hand. I tried ultracompare, but it did not work in the way it is supposed to, and I gave it up after trying for several times (not a very patient man). Beyond compare delivers good results, then I realized that multiedit has a copy of beyond compare built-in!
The price for multiedit is a bit steep. When most other editors cost around $50 or less, it costs three times of that price ($149). That is probably why it is not used as widely as it could have been...
The price for multiedit is a bit steep. When most other editors cost around $50 or less, it costs three times of that price ($149). That is probably why it is not used as widely as it could have been...
Friday, June 01, 2007
Best Linux Distro
I began using Linux in 2000, when I was a graduate student at UCLA. Since I have tried various distros including SuSE (open SuSE), RedHat (Fedora), Mandrake, Turbolinux, Debian, Slackware. I even bought a copy of a now discontinuted distro named "libranet". I settled on OpenSuSE for the past several years until I discovered Ubuntu. I have been running Ubuntu 7.04 on my desktop for more than a month and it has been a very pleasant experience so far.
The next distro I want to try is PCLinuxOS. It looks very nice and (from various reviewers) it runs very fast. I am going to install it once I have a new machine. Maybe at that time, I can have a good answer about which is, between Ubuntu and PCLinuxOS, the best Linux distro.
The next distro I want to try is PCLinuxOS. It looks very nice and (from various reviewers) it runs very fast. I am going to install it once I have a new machine. Maybe at that time, I can have a good answer about which is, between Ubuntu and PCLinuxOS, the best Linux distro.
Sunday, May 06, 2007
Drawing maps using Stata
Don't need expensive specialized GIS anymore, just Stata.
http://www.stata.com/support/faqs/graphics/tmap.html
http://www.stata.com/support/faqs/graphics/tmap.html
Saturday, April 21, 2007
A good cross-platform text editor
I use both Linux and Windows and have been searching for a good cross-platform text editor for a long time. Now this small piece software called "MadEdit" seems to be very promising. It is cross-platform, unicode enabled, free, and open source. Check it out at:
http://madedit.sourceforge.net/
http://madedit.sourceforge.net/
Sunday, April 15, 2007
A good new book on Monte Carlo methods
Title: Simulation and Monte Carlo: With applications in finance and MCMC
Author: J. S. Dagpunar
Publisher: Wiley
This new book seems really cool and I will try to remember to buy it next time I visit US.
http://as.wiley.com/WileyCDA/WileyTitle/productCd-0470854952.html
Author: J. S. Dagpunar
Publisher: Wiley
This new book seems really cool and I will try to remember to buy it next time I visit US.
http://as.wiley.com/WileyCDA/WileyTitle/productCd-0470854952.html
Wednesday, April 11, 2007
Maple, a good tool to teach maximum likelihood method
The new Maple has very convenient tool built-in to demonstrate the process of maximum likelihood estimation in an intuitive way. Unlike software like Stata, SAS, or Matlab, where the computation is done numerically under the surface, Maple solves the problem analytically with the whole process showing up on the screen. This way, students can clearly see what is happening: how to get log likelihood function, how to solve it , what the Hessian matrix looks like, etc.
Very good teaching tool.
Very good teaching tool.
Wednesday, March 21, 2007
Multiple imputation with growth modeling
Several days ago, a new user-contributed Stata module "MIM" appears in the Stata software repository. This module can automate the process of combing estimates from multiply imputed data sets and calculating confidence intervals for a wide array of Stata estimation commands, including "XTMIXED". This means that the complete process of imputation, estimation, and post-estimation can be done without leaving Stata.
Saturday, March 10, 2007
Multiple imputation with longitudinal data
Have been working with Sarah on revising our comparative paper. One problem we need to solve in this revision is the presence of large amount of missing values. I use "ICE", a user contributed module in Stata to do the imputation. Unfortunately, the built-in estimation procedures does not include "XTMIXED" or "GLLAMM". I have to import the imputed data sets into Mplus and do the estimation there.
Thursday, February 22, 2007
A list of survey data that measure well-being
Here is a list of (US) survey data that measure well-being: http://www.nber.org/~kling/surveys/
Sunday, February 11, 2007
Latent Growth Model and Repeated Measurement
I had a brief conversation with Linda Adair on the past Wednesday about the issues of catch-up growth. She mentioned that she would have done the analysis differently, had she known the method of latent growth modeling methodology in the 1990s. The idea of standardizing the measurement and calculating the difference score does not make much sense.
I am thinking about doing a Monte Carlo simulation to demonstrate the relationship between two sets of relationships: that revealed by latent growth model, and that revealed by standardized difference scores.
I am thinking about doing a Monte Carlo simulation to demonstrate the relationship between two sets of relationships: that revealed by latent growth model, and that revealed by standardized difference scores.
Tuesday, February 06, 2007
Inequality dropped
Alas, since the mechanism of inequality-obesity has not been fully developed in the literature, and it will be too ambitious for us to tackle this issue in this small project, have to drop it for now.
Will get it back, soon.
Will get it back, soon.
Wednesday, January 31, 2007
Economic inequality as an important moderator
Barry Popkin gave a talk at CCPR seminar today on the global nutritional transition. His research is inspiring in many ways. I had a brief chat with him after the seminar about my idea of economic inequality as an important moderator between the link between obesity and SES and economic development. He thinks it is a good idea and certainly worth the effort to explore it further. I am glad.
Subscribe to:
Posts (Atom)