Negative Predicted Probabilities in gologit2

Here is a simple example of how gologit2 can sometimes produce negative predicted probabilities.

`. sysuse auto`
`(1978 Automobile Data)`
` `
`. gologit2 rep78 foreign length mpg, log`
` `
`Iteration 0:   log likelihood = -93.692061  `
`Iteration 1:   log likelihood = -75.312137  `
`Iteration 2:   log likelihood = -73.625868  `
`Iteration 3:   log likelihood = -73.511329  `
`Iteration 4:   log likelihood = -73.487266  `
`Iteration 5:   log likelihood = -73.481532  `
`Iteration 6:   log likelihood = -73.480231  `
`Iteration 7:   log likelihood = -73.479948  `
`Iteration 8:   log likelihood = -73.479901  `
`Iteration 9:   log likelihood = -73.479895  (not concave)`
` `
`Generalized Ordered Logit Estimates               Number of obs   =         69`
`                                                  LR chi2(12)     =      40.42`
`                                                  Prob > chi2     =     0.0001`
`Log likelihood = -73.479895                       Pseudo R2       =     0.2157`
`------------------------------------------------------------------------------`
`       rep78 |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]`
`-------------+----------------------------------------------------------------`
`1            |`
`     foreign |  (dropped)`
`      length |   .1070418    .154429     0.69   0.488    -.1956334    .4097171`
`         mpg |   .1952588   .4860697     0.40   0.688    -.7574202    1.147938`
`       _cons |  -21.49491   38.85059    -0.55   0.580    -97.64066    54.65084`
`-------------+----------------------------------------------------------------`
`2            |`
`     foreign |   15.29959   1044.196     0.01   0.988    -2031.287    2061.886`
`      length |  -.0005172   .0410406    -0.01   0.990    -.0809554     .079921`
`         mpg |  -.0250512   .1696942    -0.15   0.883    -.3576457    .3075432`
`       _cons |   1.908498   11.06988     0.17   0.863    -19.78807    23.60507`
`-------------+----------------------------------------------------------------`
`3            |`
`     foreign |   3.839048   1.020579     3.76   0.000      1.83875    5.839346`
`      length |   .0618394   .0322828     1.92   0.055    -.0014336    .1251125`
`         mpg |   .2485872   .1474317     1.69   0.092    -.0403735    .5375479`
`       _cons |  -18.36718   9.021422    -2.04   0.042    -36.04884   -.6855182`
`-------------+----------------------------------------------------------------`
`4            |`
`     foreign |   3.029612   1.064825     2.85   0.004     .9425925    5.116631`
`      length |   .0464896   .0329677     1.41   0.158    -.0181259    .1111052`
`         mpg |   .2451507   .1194876     2.05   0.040     .0109592    .4793421`
`       _cons |  -17.31256   8.470451    -2.04   0.041    -33.91434   -.7107851`
`------------------------------------------------------------------------------`
` `
`WARNING! 25 in-sample cases have an outcome with a predicted probability that is`
`less than 0. See the gologit2 help section on Warning Messages for more information.`
` `
`. predict p1 p2 p3 p4 p5`
`(option p assumed; predicted probabilities)`
` `
`. sum p1-p5`
` `
`    Variable |       Obs        Mean    Std. Dev.       Min        Max`
`-------------+--------------------------------------------------------`
`          p1 |        74    .1251925       .1639   .0021189   .7717534`
`          p2 |        74    .0244799    .2264678  -.7717533   .2173719`
`          p3 |        74    .4389264    .2131038   .0052249   .7701156`
`          p4 |        74    .2623029    .1576892   .0249331   .5947708`
`          p5 |        74    .1490983    .2158554    .006046   .9513912`
` `
`. tab rep78`
` `
`     Repair |`
`Record 1978 |      Freq.     Percent        Cum.`
`------------+-----------------------------------`
`          1 |          2        2.90        2.90`
`          2 |          8       11.59       14.49`
`          3 |         30       43.48       57.97`
`          4 |         18       26.09       84.06`
`          5 |         11       15.94      100.00`
`------------+-----------------------------------`
`      Total |         69      100.00`
` `

As we can see from the above, several of the predicted probabilities were negative.  However, there are also many problems with this analysis.  Only 2 cases fall into category one and only 8 cases fall into category 2.  The variable foreign was actually dropped from the first equation and has a huge coefficient and even bigger standard error in the second.  We are spreading the data way too thin here.  Combining categories may help.

`. recode rep78 (1/3=3), gen(rep78b)`
`(10 differences between rep78 and rep78b)`
` `
`. gologit2 rep78b foreign length mpg, log`
` `
`Iteration 0:   log likelihood = -66.194631  `
`Iteration 1:   log likelihood = -48.776863  `
`Iteration 2:   log likelihood = -47.631556  `
`Iteration 3:   log likelihood = -47.599469  `
`Iteration 4:   log likelihood = -47.599382  `
`Iteration 5:   log likelihood = -47.599382 `
` `
`Generalized Ordered Logit Estimates               Number of obs   =         69`
`                                                  LR chi2(6)      =      37.19`
`                                                  Prob > chi2     =     0.0000`
`Log likelihood = -47.599382                       Pseudo R2       =     0.2809`
`------------------------------------------------------------------------------`
`      rep78b |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]`
`-------------+----------------------------------------------------------------`
`3            |`
`     foreign |   3.844035   1.027301     3.74   0.000     1.830561    5.857509`
`      length |   .0617889    .032248     1.92   0.055     -.001416    .1249937`
`         mpg |   .2468554   .1485422     1.66   0.097     -.044282    .5379928`
`       _cons |  -18.32185   9.008774    -2.03   0.042    -35.97872   -.6649788`
`-------------+----------------------------------------------------------------`
`4            |`
`     foreign |   3.032383   1.067163     2.84   0.004     .9407823    5.123985`
`      length |   .0466275   .0330607     1.41   0.158    -.0181702    .1114252`
`         mpg |   .2451139   .1194367     2.05   0.040     .0110223    .4792054`
`       _cons |  -17.33818   8.484411    -2.04   0.041    -33.96732   -.7090361`
`------------------------------------------------------------------------------`

As you can, combining categories solved the problem.

In other examples I have seen, there are more cases, but there are often a huge number of variables, some of which are themselves problematic (e.g. a 0-1 dichotomy with 985 0's and 15 1's.)  It may just be that gologit models aren't appropriate for your data, but following the rest of the advice in the trouble-shooting FAQ may help you to come up with a workable model.  (Indeed, your model may be problematic with any method, but negative predicted probabilities in gologit2 may just make the problems more obvious.)