Negative Predicted Probabilities in gologit2

Here is a simple example of how gologit2 can sometimes produce negative predicted probabilities.

. sysuse auto

(1978 Automobile Data)

. gologit2 rep78 foreign length mpg, log

Iteration 0:   log likelihood = -93.692061

Iteration 1:   log likelihood = -75.312137

Iteration 2:   log likelihood = -73.625868

Iteration 3:   log likelihood = -73.511329

Iteration 4:   log likelihood = -73.487266

Iteration 5:   log likelihood = -73.481532

Iteration 6:   log likelihood = -73.480231

Iteration 7:   log likelihood = -73.479948

Iteration 8:   log likelihood = -73.479901

Iteration 9:   log likelihood = -73.479895  (not concave)

Generalized Ordered Logit Estimates               Number of obs   =         69

                                                  LR chi2(12)     =      40.42

                                                  Prob > chi2     =     0.0001

Log likelihood = -73.479895                       Pseudo R2       =     0.2157

------------------------------------------------------------------------------

       rep78 |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]

-------------+----------------------------------------------------------------

1            |

     foreign |  (dropped)

      length |   .1070418    .154429     0.69   0.488    -.1956334    .4097171

         mpg |   .1952588   .4860697     0.40   0.688    -.7574202    1.147938

       _cons |  -21.49491   38.85059    -0.55   0.580    -97.64066    54.65084

-------------+----------------------------------------------------------------

2            |

     foreign |   15.29959   1044.196     0.01   0.988    -2031.287    2061.886

      length |  -.0005172   .0410406    -0.01   0.990    -.0809554     .079921

         mpg |  -.0250512   .1696942    -0.15   0.883    -.3576457    .3075432

       _cons |   1.908498   11.06988     0.17   0.863    -19.78807    23.60507

-------------+----------------------------------------------------------------

3            |

     foreign |   3.839048   1.020579     3.76   0.000      1.83875    5.839346

      length |   .0618394   .0322828     1.92   0.055    -.0014336    .1251125

         mpg |   .2485872   .1474317     1.69   0.092    -.0403735    .5375479

       _cons |  -18.36718   9.021422    -2.04   0.042    -36.04884   -.6855182

-------------+----------------------------------------------------------------

4            |

     foreign |   3.029612   1.064825     2.85   0.004     .9425925    5.116631

      length |   .0464896   .0329677     1.41   0.158    -.0181259    .1111052

         mpg |   .2451507   .1194876     2.05   0.040     .0109592    .4793421

       _cons |  -17.31256   8.470451    -2.04   0.041    -33.91434   -.7107851

------------------------------------------------------------------------------

WARNING! 25 in-sample cases have an outcome with a predicted probability that is

less than 0. See the gologit2 help section on Warning Messages for more information.

. predict p1 p2 p3 p4 p5

(option p assumed; predicted probabilities)

. sum p1-p5

    Variable |       Obs        Mean    Std. Dev.       Min        Max

-------------+--------------------------------------------------------

          p1 |        74    .1251925       .1639   .0021189   .7717534

          p2 |        74    .0244799    .2264678  -.7717533   .2173719

          p3 |        74    .4389264    .2131038   .0052249   .7701156

          p4 |        74    .2623029    .1576892   .0249331   .5947708

          p5 |        74    .1490983    .2158554    .006046   .9513912

. tab rep78

     Repair |

Record 1978 |      Freq.     Percent        Cum.

------------+-----------------------------------

          1 |          2        2.90        2.90

          2 |          8       11.59       14.49

          3 |         30       43.48       57.97

          4 |         18       26.09       84.06

          5 |         11       15.94      100.00

------------+-----------------------------------

      Total |         69      100.00

As we can see from the above, several of the predicted probabilities were negative. However, there are also many problems with this analysis. Only 2 cases fall into category one and only 8 cases fall into category 2. The variable foreign was actually dropped from the first equation and has a huge coefficient and even bigger standard error in the second. We are spreading the data way too thin here. Combining categories may help.

. recode rep78 (1/3=3), gen(rep78b)

(10 differences between rep78 and rep78b)

. gologit2 rep78b foreign length mpg, log

Iteration 0:   log likelihood = -66.194631

Iteration 1:   log likelihood = -48.776863

Iteration 2:   log likelihood = -47.631556

Iteration 3:   log likelihood = -47.599469

Iteration 4:   log likelihood = -47.599382

Iteration 5:   log likelihood = -47.599382

Generalized Ordered Logit Estimates               Number of obs   =         69

                                                  LR chi2(6)      =      37.19

                                                  Prob > chi2     =     0.0000

Log likelihood = -47.599382                       Pseudo R2       =     0.2809

------------------------------------------------------------------------------

      rep78b |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]

-------------+----------------------------------------------------------------

3            |

     foreign |   3.844035   1.027301     3.74   0.000     1.830561    5.857509

      length |   .0617889    .032248     1.92   0.055     -.001416    .1249937

         mpg |   .2468554   .1485422     1.66   0.097     -.044282    .5379928

       _cons |  -18.32185   9.008774    -2.03   0.042    -35.97872   -.6649788

-------------+----------------------------------------------------------------

4            |

     foreign |   3.032383   1.067163     2.84   0.004     .9407823    5.123985

      length |   .0466275   .0330607     1.41   0.158    -.0181702    .1114252

         mpg |   .2451139   .1194367     2.05   0.040     .0110223    .4792054

       _cons |  -17.33818   8.484411    -2.04   0.041    -33.96732   -.7090361

------------------------------------------------------------------------------

As you can, combining categories solved the problem.

In other examples I have seen, there are more cases, but there are often a huge number of variables, some of which are themselves problematic (e.g. a 0-1 dichotomy with 985 0's and 15 1's.) It may just be that gologit models aren't appropriate for your data, but following the rest of the advice in the trouble-shooting FAQ may help you to come up with a workable model. (Indeed, your model may be problematic with any method, but negative predicted probabilities in gologit2 may just make the problems more obvious.)