Tuesday, October 04, 2011

GEE using Stata vs. R

I am running GEE logistic regression model for my fetal loss paper. As usual, I compare results between Stata and R and make sure they are consistent. To my surprise, the models assuming independent correlation structure give similar results but the models assuming exchangeable correlation structure give drastically different results.

It turns out that there is only one woman in my sample who reported a total number of eleven pregnancies (all others reported ten or less) and the presence of this single observation had huge influence on the algorithm used in R but not the one used in Stata. After excluding this single observation, the two sets of results look identical.


wim said...


I am not a statistician, but statistics is been a favorite subject for me recently. So, based on your article, do you want to say that R is more sensitive than Stata? Is it good or bad? do you already publish your paper so i can get more explanation? thanks.

Shige said...

That's what the results seem to suggest. It will be worthwhile to dig deeper to figure out how these different packages handle such "abnormal" cases.

My paper is not about GEE; instead, it is a demographic research on involuntary fetal loss that makes use of GEE and statistical simulation.

SJ Synnot said...

how did you assess influence in R in the GEE model?

SJ Synnot said...

Hi Shige,

How did you assess influence in R for the GEE model? I get errors when I try influence.measures(model). Would be curious to find out how you did it?

SJ Synnot said...

how did you assess influence in R in the GEE model?