- http://www.bullogger.com/blogs/xys/
- http://www.bullogger.com/blogs/dbanotes/
- http://www.bullogger.com/blogs/songshuhui/

## Friday, February 27, 2009

### Bullbloger

Here is an interesting blog (in Chinese): http://www.bullogger.com/

## Monday, February 23, 2009

### OpenWatcome 1.8

The open source compiler project OpenWatcom just let out a new release, v. 1.8. I don't have much experience with this compiler but remember in the old days the Watcom compiler used to beat its competitors including big names like Microsoft, Borland, and Symantic regarding to performance.

It probably will not work with ADMB, but it would be really great if it does.

It probably will not work with ADMB, but it would be really great if it does.

### Debian Lenny looks cool

http://www.howtoforge.com/the-perfect-desktop-debian-lenny

I am going to find a spare machine and make a test install.

I am going to find a spare machine and make a test install.

### ADMB, again

Dave, the main person behind the open source ADMB project, answered my earlier question about how to estimate mixture growth model using ADMB. His reply is very helpful and illuminating, and the source code can be used as template for other similar models:

--------------------------------------------------------------------------------------------------------

Shige,

I think the "ADMB team" consists mostly of me. So I'll try and answer

your question about finite mixtures and growth models. Until today I had

very little idea what growth models were, but in writing the simulator I

hope I am getting the idea.

Actually finite mixtures are handled directly so it is not necessary to

use the random effects package. This is actually an advantage since it

greatly speeds up the calculations.

To illustrate the ideas I built a simulator to create a data set

something like you describe in

http://sgsong.blogspot.com/2009/01/finite-mixture-modeling-with-admb.html

This could be done in R but I used ADMB since it illustrates some of the

techniques which are useful for building the analyzer. since you didn't

identify the growth curves you used I use the von Bertalanfy curve

L_t = Linf * ( 1 - exp(-k*age))

where L_0=0 is assumed. Of course it is a simple matter to switch to a

different curve There are two curves one with Linf=100 and the other

with Linf=120. For both cases k=0.11 (time measured in years).

I generated 2000 samples. First the size measured in 2 month intervals

and then at ages 8,11,15,19

The individuals were generated with a probability of 0.4 of belonging to

group1 (linf=100) and 0.6 of belonging to group 2. After the first

period the individual were changed at random to groups 1 or 2 with a

transition matrix

.8 prob stay in group 1

.2 prob to move from group 2 to group 1

.7 prob to say in group 2

.3 prob to move from group 2 to group 1

The the size data was generator using the appropriate growth curve.

the resulting sizes were then perturbed using multiplicative lognormal

random variables (std dev of .08) which were highly positively

correlated (0.9)

The resulting output was analyzed with a finite mixture model where

the 12 initial measuremeasurements are assumed to come from a mixture of

two multivariate log-normal distributions with a correlation matrix of

the form

1 rho rho^2 .... rho^11

rho 1

rho^2 .......

etc.

The model was fit and the Hessian evaluated in under a minute.

these are the parameter estimates for a typical run.

# Number of parameters = 10 Objective function value = -84436.9

Maximum gradient component = 0.0208847

# p1coff:

0.379774 0.620226

# p2coff:

0.482844 0.517158

# Linf:

99.2992 119.646

# k:

0.109928 0.110294

# rho:

0.882750993879

# lsigma:

-2.54165198426

For each individual the conditional probabilities of belonging in either

of the two groups was calculated after observing the first sizes and

again after observing the second group of sizes. this was used to

estimate the transition probabilities. (Of course this could be done in

the model which would be more efficient.)

~

~ 0.793901 0.206099

0.292103 0.707897

~

which is almost perfect.

the C++ code for the simulator is

~

~

#include

dvector growth_function(const dvector & age,const dvector& theta)

{

dvector tmp=theta(1)*(1.0-exp(-theta(

2)*age)); // use you own

return tmp; // growth function here

}

int main(int argc,char * argv[])

{

int n=2000;

int nm=16;

int iseed=289;

random_number_generator rng(iseed);

dvector p1(1,2); // priori probability that a child is in group 1 or 2

p1(1)=.4; p1(2)=.6;

dvector p2(1,2);

dmatrix M(1,2,1,2); // transition probabilities

M(1,1)=.8; // 1 remains 1

M(2,1)=.3; // 2 moves to 1

M(1,2)=.2; // 1 moves to 2

M(2,2)=.7; // 2 remains 2

dvector choose(1,n);

dvector choose1(1,n);

double sigma=0.08;

double rho=0.9;

dmatrix size(1,n,1,nm);

imatrix group(1,n,1,2);

dvector age(1,nm);

dvector eta(1,nm);

dvector eps(1,nm);

age.fill_seqadd(.2,.2);

age(13)=8;

age(14)=11;

age(15)=15;

age(16)=19;

dmatrix theta(1,2,1,2); //parameters for growth curves

theta(1,1)=100;

theta(2,1)=120;

theta(1,2)=.11;

theta(2,2)=.11;

choose.fill_randu(rng); // use to pick group that each child belongs to

choose1.fill_randu(rng); // use to pick group that each child belongs to

int i;

for (i=1;i<=n;i++) // loop over children { if (choose(i)age(1,12),theta(group(i,1)));

size(i)(13,16)=growth_function(age(13,16),theta(group(i,2)));

}

double rho2=rho*rho;

for (i=1;i<=n;i++) // loop over children and add noise { eta.fill_randn(rng); eps(1)=eta(1); int j; for (j=2;j<=4;j++) // loop over children and add noise { eps(j)=rho*eps(j-1)+sqrt(1.0-rho2)*eta(j);

}

eps(5)=eta(1);

for (j=6;j<=16;j++) // loop over children and add noise { eps(j)=rho*eps(j-1)+sqrt(1.0-rho2)*eta(j);

}

size(i)=elem_prod(size(i),exp(sigma*eps)); // lognormal error

}

ofstream ofs("growth.dat");

ofs << "# nobs" << i="1;i<=" lmin="min(column(sizes,nm));" lmax="max(column(sizes,nm));" sigma="exp(lsigma);" vinv="1.0/(sigma*sigma);" i="1;i<=" j="1;j<=" i="1;i<=" j="1;j<=" ic1="inv(cor1);" ic2="inv(cor2);" is1="ic1*vinv;" is2="ic2*vinv;" lds1="ln_det(is1);" lds2="ln_det(is2);" psum1="sum(p1coff);" p1="p1coff/psum1;" psum2="sum(p2coff);" p2="p2coff/psum2;" lmax="max(column(sizes,nm));">exp(-k(1)*ages)));

pred_size(2)=log(Linf(2)*(1.0-exp(-k(2)*ages)));

FUNCTION get_likelihood

int i;

dvar_matrix res(1,2,1,nm);

for (i=1;i<=n;i++) { res(1)=log(sizes(i))-pred_size(1);

res(2)=log(sizes(i))-pred_size(2);

dvar_vector res1a=res(1)(1,12);

dvar_vector res1b=(res(1)(13,16)).shift(1);

dvar_vector res2a=res(2)(1,12);

dvar_vector res2b=(res(2)(13,16)).shift(1);

dvariable l1a=exp(-0.5*res1a*(is1*res1a));

dvariable l1b=exp(-0.5*res1b*(is2*res1b));

dvariable l2a=exp(-0.5*res2a*(is1*res2a));

dvariable l2b=exp(-0.5*res2b*(is2*res2b));

f-=log(1.e-10+(p1(1)*l1a+p1(2)*l2a));

f-=log(1.e-10+(p2(1)*l1b+p2(2)*l2b));

f-=0.5*(lds1+lds2);

q0(i,1)=value(p1(1))*value(l1a);

q0(i,2)=value(p1(2))*value(l2a);

q0(i)/=sum(q0(i));

q1(i,1)=value(p2(1))*value(l1b);

q1(i,2)=value(p2(2))*value(l2b);

q1(i)/=sum(q1(i));

}

REPORT_SECTION

report <<>

--------------------------------------------------------------------------------------------------------

--------------------------------------------------------------------------------------------------------

Shige,

I think the "ADMB team" consists mostly of me. So I'll try and answer

your question about finite mixtures and growth models. Until today I had

very little idea what growth models were, but in writing the simulator I

hope I am getting the idea.

Actually finite mixtures are handled directly so it is not necessary to

use the random effects package. This is actually an advantage since it

greatly speeds up the calculations.

To illustrate the ideas I built a simulator to create a data set

something like you describe in

http://sgsong.blogspot.com/2009/01/finite-mixture-modeling-with-admb.html

This could be done in R but I used ADMB since it illustrates some of the

techniques which are useful for building the analyzer. since you didn't

identify the growth curves you used I use the von Bertalanfy curve

L_t = Linf * ( 1 - exp(-k*age))

where L_0=0 is assumed. Of course it is a simple matter to switch to a

different curve There are two curves one with Linf=100 and the other

with Linf=120. For both cases k=0.11 (time measured in years).

I generated 2000 samples. First the size measured in 2 month intervals

and then at ages 8,11,15,19

The individuals were generated with a probability of 0.4 of belonging to

group1 (linf=100) and 0.6 of belonging to group 2. After the first

period the individual were changed at random to groups 1 or 2 with a

transition matrix

.8 prob stay in group 1

.2 prob to move from group 2 to group 1

.7 prob to say in group 2

.3 prob to move from group 2 to group 1

The the size data was generator using the appropriate growth curve.

the resulting sizes were then perturbed using multiplicative lognormal

random variables (std dev of .08) which were highly positively

correlated (0.9)

The resulting output was analyzed with a finite mixture model where

the 12 initial measuremeasurements are assumed to come from a mixture of

two multivariate log-normal distributions with a correlation matrix of

the form

1 rho rho^2 .... rho^11

rho 1

rho^2 .......

etc.

The model was fit and the Hessian evaluated in under a minute.

these are the parameter estimates for a typical run.

# Number of parameters = 10 Objective function value = -84436.9

Maximum gradient component = 0.0208847

# p1coff:

0.379774 0.620226

# p2coff:

0.482844 0.517158

# Linf:

99.2992 119.646

# k:

0.109928 0.110294

# rho:

0.882750993879

# lsigma:

-2.54165198426

For each individual the conditional probabilities of belonging in either

of the two groups was calculated after observing the first sizes and

again after observing the second group of sizes. this was used to

estimate the transition probabilities. (Of course this could be done in

the model which would be more efficient.)

~

~ 0.793901 0.206099

0.292103 0.707897

~

which is almost perfect.

the C++ code for the simulator is

~

~

#include

dvector growth_function(const dvector & age,const dvector& theta)

{

dvector tmp=theta(1)*(1.0-exp(-theta(

2)*age)); // use you own

return tmp; // growth function here

}

int main(int argc,char * argv[])

{

int n=2000;

int nm=16;

int iseed=289;

random_number_generator rng(iseed);

dvector p1(1,2); // priori probability that a child is in group 1 or 2

p1(1)=.4; p1(2)=.6;

dvector p2(1,2);

dmatrix M(1,2,1,2); // transition probabilities

M(1,1)=.8; // 1 remains 1

M(2,1)=.3; // 2 moves to 1

M(1,2)=.2; // 1 moves to 2

M(2,2)=.7; // 2 remains 2

dvector choose(1,n);

dvector choose1(1,n);

double sigma=0.08;

double rho=0.9;

dmatrix size(1,n,1,nm);

imatrix group(1,n,1,2);

dvector age(1,nm);

dvector eta(1,nm);

dvector eps(1,nm);

age.fill_seqadd(.2,.2);

age(13)=8;

age(14)=11;

age(15)=15;

age(16)=19;

dmatrix theta(1,2,1,2); //parameters for growth curves

theta(1,1)=100;

theta(2,1)=120;

theta(1,2)=.11;

theta(2,2)=.11;

choose.fill_randu(rng); // use to pick group that each child belongs to

choose1.fill_randu(rng); // use to pick group that each child belongs to

int i;

for (i=1;i<=n;i++) // loop over children { if (choose(i)age(1,12),theta(group(i,1)));

size(i)(13,16)=growth_function(age(13,16),theta(group(i,2)));

}

double rho2=rho*rho;

for (i=1;i<=n;i++) // loop over children and add noise { eta.fill_randn(rng); eps(1)=eta(1); int j; for (j=2;j<=4;j++) // loop over children and add noise { eps(j)=rho*eps(j-1)+sqrt(1.0-rho2)*eta(j);

}

eps(5)=eta(1);

for (j=6;j<=16;j++) // loop over children and add noise { eps(j)=rho*eps(j-1)+sqrt(1.0-rho2)*eta(j);

}

size(i)=elem_prod(size(i),exp(sigma*eps)); // lognormal error

}

ofstream ofs("growth.dat");

ofs << "# nobs" << i="1;i<=" lmin="min(column(sizes,nm));" lmax="max(column(sizes,nm));" sigma="exp(lsigma);" vinv="1.0/(sigma*sigma);" i="1;i<=" j="1;j<=" i="1;i<=" j="1;j<=" ic1="inv(cor1);" ic2="inv(cor2);" is1="ic1*vinv;" is2="ic2*vinv;" lds1="ln_det(is1);" lds2="ln_det(is2);" psum1="sum(p1coff);" p1="p1coff/psum1;" psum2="sum(p2coff);" p2="p2coff/psum2;" lmax="max(column(sizes,nm));">exp(-k(1)*ages)));

pred_size(2)=log(Linf(2)*(1.0-exp(-k(2)*ages)));

FUNCTION get_likelihood

int i;

dvar_matrix res(1,2,1,nm);

for (i=1;i<=n;i++) { res(1)=log(sizes(i))-pred_size(1);

res(2)=log(sizes(i))-pred_size(2);

dvar_vector res1a=res(1)(1,12);

dvar_vector res1b=(res(1)(13,16)).shift(1);

dvar_vector res2a=res(2)(1,12);

dvar_vector res2b=(res(2)(13,16)).shift(1);

dvariable l1a=exp(-0.5*res1a*(is1*res1a));

dvariable l1b=exp(-0.5*res1b*(is2*res1b));

dvariable l2a=exp(-0.5*res2a*(is1*res2a));

dvariable l2b=exp(-0.5*res2b*(is2*res2b));

f-=log(1.e-10+(p1(1)*l1a+p1(2)*l2a));

f-=log(1.e-10+(p2(1)*l1b+p2(2)*l2b));

f-=0.5*(lds1+lds2);

q0(i,1)=value(p1(1))*value(l1a);

q0(i,2)=value(p1(2))*value(l2a);

q0(i)/=sum(q0(i));

q1(i,1)=value(p2(1))*value(l1b);

q1(i,2)=value(p2(2))*value(l2b);

q1(i)/=sum(q1(i));

}

REPORT_SECTION

report <<>

--------------------------------------------------------------------------------------------------------

## Sunday, February 22, 2009

### Interesting article on netbook, Linux, and Windows

http://www.netbookdigest.com/2009/02/20/the-linux-netbook-nightmare435-million-in-vaporized-2008-profits-continued/

### VPN command

I need to type these two commands in order to get my VPN working:

- sudo /etc/init.d/vpnclient_init start
- vpnclient connect sscvpn

## Saturday, February 21, 2009

### VPN on Linux

I figured out how to get my Ubuntu box talk to UCLA VPN. Now I don't need to switch to Windows to use that service anymore.

## Thursday, February 19, 2009

### Get back to research

I spent the past several days writing proposal, now it is done and it's time to get back to my research.

## Sunday, February 15, 2009

### Papers

My paper on schizophrenia (Social Science & Medicine) and long-term mortality consequences of famine (Journal of Biosocial Science) are going to see online version soon.

### Private blog for research progress

I open a private blog to record the progress of my research: http://sgsong-research-plan.blogspot.com/

## Friday, February 13, 2009

### Dell Latitude ON

This seems to an amazing idea (and product): http://blogs.computerworld.com/the_first_dual_windows_linux_pcs_arrive

I wonder how this work. For example, is it possible to run the usual Linux stuff such as gcc, emacs, bash, etc. on it? Is possible to install more software?

I wonder how this work. For example, is it possible to run the usual Linux stuff such as gcc, emacs, bash, etc. on it? Is possible to install more software?

## Thursday, February 12, 2009

### Emacs again

The most recent CVS update broke Emacs. I follow this instruction (http://www.nabble.com/recent-checkin-broke-the-Emacs-build-td21969877.html) to get it up and running again. Man, life without Emacs is hard.

## Tuesday, February 10, 2009

### What if Apple loses the case?

http://www.osnews.com/story/20946/Small_Win_Psystar_Judge_Hints_at_Consequences_if_Psystar_Wins

I guess by definition, being "fans" of something or someone means not being able to calculate or reason rationally...

I guess by definition, being "fans" of something or someone means not being able to calculate or reason rationally...

## Monday, February 09, 2009

### Linux computer

This site is useful: http://linuxpreloaded.com/

Linux on laptop: http://emperorlinux.com/

I did some comparisons and realized that the Emperiorlinux is charging around $500-$900 premium per machine for the same model and the same configuration form Dell and Lenovo. I am not sure their service worth that much. Yeah they got more models than System 76 and LAClinux, but the price is just way too high.

Linux on laptop: http://emperorlinux.com/

I did some comparisons and realized that the Emperiorlinux is charging around $500-$900 premium per machine for the same model and the same configuration form Dell and Lenovo. I am not sure their service worth that much. Yeah they got more models than System 76 and LAClinux, but the price is just way too high.

### Chinees lanauge article on sociology methodology

http://www.sociology.cass.net.cn/shxw/zxwz/t20090207_20455.htm

### Multilevel mixture model

The multilevel mixture model that Bengt and his group have been working on can be of significant importance to demographers, especially those who are obsessed with the idea of "frailty", like myself. I need to do more reading on this.

## Sunday, February 08, 2009

### Linux workstation options

I wonder what are options to buy a workstation-level machine powered by Linux (I like Ubuntu). System 7 seems to be selling some pretty powerful desktop computers, but I really do not know how well they perform in real world computation.

## Friday, February 06, 2009

### CVS Emacs

I noticed today that the version number of the CVS version of Emacs has changed from 23.0.60 to 23.0.90, getting closer to a new major release.

## Wednesday, February 04, 2009

### Starting value for aML

Feeding aML good starting values is crucial for fast convergence. I followed the suggestions offered by the aML manual for a while. Now I no longer need to struggle with the manual: it becomes a fun game that I am really good at.

The secret is to build up complicated models piece by piece, use estimates from the simpler models as the starting values of the more complicated ones; and use stepwise procedure to get good estimates for the intercepts, then fixed effects, then random effects.

The secret is to build up complicated models piece by piece, use estimates from the simpler models as the starting values of the more complicated ones; and use stepwise procedure to get good estimates for the intercepts, then fixed effects, then random effects.

## Tuesday, February 03, 2009

### Interesting article on Linux vs. Windows

in Chinese: http://www.linuxeden.com/html/news/20090203/63843.html

## Monday, February 02, 2009

### Use "label data" to keep track of Stata programs

It is wise to put a label data command at the end of a data-generation or data-modification Stata program file with something like:

label data "Created by ~/project/inter_generation/program/live_birth.do"

To make sure the new data set can be easily matched to the program that created it.

label data "Created by ~/project/inter_generation/program/live_birth.do"

To make sure the new data set can be easily matched to the program that created it.

Subscribe to:
Posts (Atom)