gologit2/oglm Troubleshooting
Richard Williams, University of Notre Dame
NOTE!!! gologit2 & oglm now support factor variables and the svy: prefix. They require Stata 11.2 or higher. The old versions have been renamed gologit29 & oglm9. Use them if you are condemned to using earlier versions of Stata.
Here are some of the main issues that I get asked about with gologit2. (some of these points also apply to oglm). Feel free to email me if you have other problems, suggestions or recommendations, or to let me know what recommendations worked best for you. Click here if you want the main gologit2 support page. Click here for the main oglm page.
Universal Recommendation. Make sure you have the most current version of the program (and also the most up-to-date version of the Stata software you are using.). If you are lucky the problem you are encountering may have already been fixed. From within Stata, type
ssc install gologit2, replace
ssc install oglm, replace
update allIf you have Stata 9 or higher you can also use the adoupdate command. Also, the most up-to-date version of the documentation is gologit2.pdf
General tip for weird and inexplicable errors: Try running gologit2 with the nolabel option. This will cause the equations to be labeled eq1, eq2, etc. The printout may not be as aesthetically appealing but this may reduce the likelihood of having problems with gologit2 itself or with other commands that have trouble with your labels (e.g. value labels that start with a number sometimes cause problems). Changing your value labels may also solve the problem. I have found that value labels that work fine when svy and gsvy are NOT used can create weird errors when they are. suest can have problems with value labels too.
For security or other reasons, my computer can't access the Internet. How can I install your programs? This must be a real nuisance for you! You may want to talk to your computing people to see if they can't find a way to make your life easier. Some possible solutions are:
* If you have another computer that has Stata and can access the Internet, install the programs on it. Then, copy c:\ado (or whatever the appropriate directory is on your machine) from one computer to the other. Be sure you understand what you are doing, because you don't want to accidentally overwrite files that are needed on the non-Internet machine.
* Following are zipped versions of my programs and their support files. Unzip the files and store them in c:\ado\personal or some other location where Stata can find them. Again, if you don't understand how to do this, find somebody on your support staff who can help you out.
gologit2_ver3.2.5.zip (Supports Factor variables & svy prefix; requires Stata 11.2 or higher)
gologit29 version 2.1.7 (Old version of gologit2; Requires Stata 8.2 or higher)
oglm version 2.3.0 (Supports Factor variables & svy prefix; requires Stata 11.2 or higher)
oglm9_ver1.2.0.zip (old version of oglm; requires Stata 9.2 or higher)
mfx2 version 1.2.0 (Requires Stata 8.2)* Sometimes people have read/write access to some drives but not others. If so, you may be able to modify these suggestions for Notre Dame users. Basically, the "trick" is to get Stata to look for programs in a folder that you have control over.
The output is hard to read and understand. What are some good ways to interpret and present the results? For interpreting the results -- see my Stata Journal article, especially section 3.1. Also see my 2016 Journal of Mathematical Sociology article. (Email me if you don't have free access.) Also look at this presentation and handout. A few additional points, and some comparisons with oglm, are made in this presentation and handout.
For presenting the results -- the default gologit2 output is indeed a little hard to read, because you keep seeing the same numbers over and over for those variables that meet the parallel lines constraint. A more parsimonious layout is achieved with the gamma option; see sections 3.2 and 3.8 of my Stata Journal article. I also like the way Thomas Craemer formatted this table (version1, version2); he only presents multiple coefficients for a variable when necessary. The citation for his complete paper (which also includes a nice discussion of the gologit model) is Craemer, Thomas. 2009. Psychological 'self-other overlap' and support for slavery reparations. Social Science Research 38: 668--680. I modified his approach for Table 2 of my 2016 article in the Journal of Mathematical Sociology. You may also find that the margins command provides an effective way of presenting and interpreting results; see my discussions here and here.
I don't have Stata. Is there any other way I can estimate gologit models? In R, I believe that gologit models can be estimated with the VGAM package. I've never used it myself, but I understand that Don Hedeker's mixor program can do many of the same things that gologit2 can. Somebody who is familiar with both programs said that "Hedeker's software does gologit2, but with random effects. I would assume that if you don't specify a random effect you get the same results. His program doesn't do a lot of the cool things that yours does, but if you have specific non-proportionality hypotheses in mind, it will test them and produce the non-proportional results." Hedeker's web page also includes programs or code for DOS, SPSS and SAS. There is also a commercial verison of the program called SuperMix.
Can I do a random effects or multilevel model with gologit2? No. Instead, check out Stefan Boes's regoprob program, or else regoprob2 written by Pfarr, Schmid and Schneider. Both use code adapted from gologit2 and reoprob. Also, I've never tried it myself, but gllamm (also downloadable from SSC) has been used by some people to estimate gologit2-type models - see these Statalist posts from Mirko Moro and Richard Williams. Or, Don Hedeker's mixor program or SuperMix may do what you want.
Note: There is (or was) a problem with the regoprob program on ssc, which caused both regoprob and regoprob2 to give errors. A patched version of regoprob can be found here. Unzip the files and place them in C:\ado\personal or somewhere else where Stata will find them.
Note: If you have a multilevel model and aren't too worried about violations of the parallel lines assumption, then consider using the xtologit, meologit, xtoprobit, or meoprobit commands. You might also consider using xtmlogit, available in newer versions of Stata. Or, you can use gsem to estimate a multilevel mlogit model.
How do I change the base category in gologit2? You can't (although you could reverse the coding of your dependent variable if you liked that better). Suppose your dependent variable has four categories. Although the gologit2 output looks a lot like mlogit output, it doesn't make any sense to think of there being a single "base" category. Rather, the gologit results are like a series of logistic regressions. In the first panel, it is like a logistic regression where category 1 = 0 and categories 2, 3, 4 = 1. In the 2nd panel, it is 1 & 2 = 0 and 3, 4 = 1. 3rd panel, 1, 2, 3 = 0, 4 = 1. There is no 4th panel because if there were it would be like 1, 2, 3, 4 = 0, nothing equals 1. If the assumptions of the ordered logit model are met, the coefficients should all be the same in each panel (except for the intercepts). For more, see section 3.1 of my Stata Journal article.
gologit29 does not work with Long & Freese's spost9 routines. First off, unless you are condemned to using ancient versions of Stata, you are much better off using gologit2 and Long & Freese's spost13 commands. But if you are so condemned this is covered in the help file but many people miss the advice. Add the v1 option to gologit29, e.g.
gologit29 y x1 x2 x3, v1
Some (but not all) of Long and Freese's spost9 routines currently work with the original gologit but not gologit29. The v1 option saves the internally-stored results in the same format that was used by gologit. However, you can still use gologit29's other unique options, such as autofit or pl. Note that post-estimation commands written specifically for gologit29 (including the pr option of predict) may not work correctly if you use the v1 option. In that case just rerun the model without it. Also, the v1 option only works with the default logit link, since that is all the original gologit supported. spost13 provides much better support for both gologit29 and gologit2; use gologit2 if at all possible.
How do I estimate marginal effects with gologit29 & oglm9? If you have a 21st century version of Stata, use gologit2 and oglm and the margins command instead. But if you don't...Stata's mfx command will work. However, it is generally better to use my mfx2 program, which can be downloaded and installed from SSC (ssc install mfx2). mfx2 makes it easier to compute marginal effects after multiple-outcome commands like oglm9, gologit29, ologit, oprobrit, mlogit, mprobit and slogit. In addition, the results are formatted in a way that makes them compatible with post-estimation table formatting commands like outreg2 and estout.
The predict command comes up with negative predicted probabilities (or else predicted probabilities greater than 1). Believe it or not, negative predicted probabilities are possible. McCullagh & Nelder discuss this in Generalized Linear Models, 2nd edition, 1989, p. 155:
The usefulness of non-parallel regression models is limited to some extent by the fact that the lines must eventually intersect. Negative fitted values are then unavoidable for some values of x, though perhaps not in the observed range. If such intersections occur in a sufficiently remote region of the x-space, this flaw in the model need not be serious.
So yes, it can happen, and a couple of people have written me about this. But, they've also mentioned things like extremely high standard errors or other problems, so I suspect that in most cases a solution lies somewhere in the next couple of points.
I do recommend computing the predicted probabilities under your models; if they seem implausible, then you may wish to modify your model or use a different statistical technique altogether. (One person wrote me that 2 cases out of 27,068 had negative predicted probabilities; I probably wouldn't worry too much in a case like that, but I would get worried if a non-trivial number of cases had negative predicted values.) Sometimes combining categories of the response variable (especially if the Ns for some categories are small) and/or simplifying the model helps. The imposition of parallel lines constraints (either via autofit or the pl or npl options) may also help because it reduces the likelihood of non-parallel lines intersecting.
Click here for an example of the problem and a solution.
The standard errors are extremely high. You may have high multicollinearity in your variables. User-written routines like collin can check for this. But, routines like ologit and gologit2 can also have problems when an X variable has little or no variability within a category of Y, e.g. when Y = 2 X always equals 0. In ologit, you might get a warning message like this:
Note: 40 observations completely determined. Standard errors questionable.
In gologit2, alas, such a warning is still on the "wish list" of things I'd like to add. But, the high standard errors will still be a clue. Possible diagnostic devices:
Run the similar model in ologit or mlogit. That will help to identify whether the problem is unique to gologit2 or represents a broader problem in your model. And, they may give you more meaningful error or warning messages than gologit2 does.
Try something like bysort y: sum x1 x2 x3 . Look for x's with little or no variability within a category of y. Or, maybe try bysort y: corr x1 x2 x3 and see if there is extreme multicollinearity within a category of y.
If lack of x variability or extreme multicollinearity within a category of y is the problem - you'll have to decide what to do. You may want (or be forced) to drop the problematic variable. Maybe y has too many categories with small Ns, and some will need to be combined. When logit encounters such a problem it not only drops the variable, it drops the cases that were completely determined.
If none of this seems to address the problem - then consider the next FAQ:
gologit2 is very slow and/or does not converge/and/or produces implausible estimates. A couple of people have written to me with problems like this. Often they have a large number of variables and/or cases. Since I don't have their data it is hard for me to tell if there actually is a problem with the program or whether they need to be more patient or whether their model is problematic. Here are several things you can try.
Make sure that you are using the right dependent variable and that it is categorical! One user was having problems until she realized that she was analyzing z-scores rather than the variable she had intended. Just running descriptive statistics on your variables may help to identify basic mistakes you are making.
Probably the simplest thing to do is to use the difficult option. A user reported that this got one complicated model to converge and made another run much faster. To learn more about difficult and other related maximization options, from within Stata type
help maximize
These options will sometimes help programs to converge, but not necessarily (they can even make things worse). For example, typegologit2 y x1 x2 x3, pl(x1) difficult
Sometimes rescaling variables will help. One user reported that gologit2 had problems when year was coded 1970 through 1999, but worked fine when he recoded it to 1 through 30. Another found that income in dollars would not converge but income in thousands of dollars would. The larger the ratio between the largest standard deviation and the smallest standard deviation, the more problems you will have with numerical methods. For example, if you have income measured in dollars, it may have a very large standard deviation relative to other variables. Recoding income to thousands of dollars may solve the problem. Scott Long says that, in his experience, problems are much more likely when the ratio between the largest and smallest standard deviations exceeds 10. (You may want to rescale for presentation purposes anyway, e.g. the effect of 1 dollar of income may be extremely small and have to be reported to several decimal places; coding income in thousands of dollars may make your tables look better.)
Simplify your model!! Drop unnecessary variables completely, or add variables gradually. You may be able to identify problematic variables this way and/or identify the limits of how large a model gologit2 can handle. If none of the "easy" options work, I STRONGLY suspect that this is the best way to go. One user was trying a 22-variable model, with the cluster option, and it took forever to run. He dropped a single variable and the program took 3 seconds to reach a solution! As noted above, variables that have little or no variability within a category of Y may be especially problematic, e.g. X1 is a constant or almost a constant when Y = 1.
Estimate similar models in ologit and mlogit. gologit2 is kind of a cross between those two programs, and by all rights they should be much faster than gologit2 is. If they are slow or have problems, it is probably not too surprising that gologit2 has problems too. You may just need to be patient or make other changes in your analysis.
Analyze a random subsample of your data. If the program works with a 10% sample it may eventually work with a 100% sample (or you might just have to say that a random subsample had to be used because of the size of the model.). If you don't know how to sample then type help sample from within Stata.
Use the log option, e.g.
gologit2 y x1 x2 x3, pl(x1) log
This option prints out the iteration log, and may help you to see whether gologit2 is just spinning its wheels or slowly but surely working towards a solution. NOTE: The log option makes the autofit output look messy but it does work now.
Another maximization option that may be worth trying is technique. This option will cause Stata to try different algorithms; if one gets "stuck" another might get "unstuck." See help maximize for more details. For example,
gologit2 y x1 x2 x3, pl(x1) technique(nr bhhh dfp bfgs)
Let the program run overnight or at least for several hours. gologit2 is not the fastest program in the world, and it may just need time to finish its job. I would use the log option if you do this so you know the program has not locked up.
Before running gologit2, type
set more off
set trace on
gologit2 ...
You will get incredible amounts of output on your screen but you may be able to identify where the program is having problems.Consider using a different technique, e.g. mlogit or slogit. It may just be that your data and model are not well suited for a gologit analysis.