1. Of course, we also need to demonstrate that our hypotheses do fit the evidence. This is not always
easy, but it is easier than ruling out all the other possibilities, so I emphasize disconfirmation here. One
of the alternative hypotheses is that any association we discover is due to chance. For this reason, some
scholars encourage us to avoid procedures that increase the probability of false positives, such as testing
a hypothesis with the same sample that suggested it or engaging in exploratory specification searches or
"mere curve-fitting." Some even find methodological virtue in procedures that are more likely to
generate hypotheses that are wrong, i.e., logical deduction of the implications of simplistic assumptions.
I consider this stance an over-reaction to the danger of chance associations. The counter-intuitiveness of
a hypothesis should increase our skepticism and our insistence on thorough testing, not our confidence in
thinly documented associations. There are better ways to guard against false positives: enlarging the
sample, replication using different indicators, and testing other observable implications of the hypothesis.
2. The seminal article by Rustow (1970) anticipated many of the complex relationships discussed
3. I consider rational choice to be primarily a method for generating theory. Rational choice
theorists who test their models, as many increasingly do, have no choice but to use existing empirical
research methods and are subject to the same methodological standards as quantitative or small-N
researchers. I do not take the načve falsificationalist stand that any single disconfirmation invalidates the
entire approach: such invalidation comes only after the accumulation of many disconfirmations of an
approach's predictions. However, the accumulation of these consistencies or inconsistencies requires
much prior testing of specific hypotheses. In this process, every hypothesis must be evaluated with
respect to at least one competing hypothesis, perhaps drawn from the same approach, perhaps from
others. For this reason I advocate a disaggregated, eclectic approach, in which specific hypotheses
generated by rational choice are tested against competing hypotheses generated by historical
institutionalism, class analysis, political culture, dependency theory, and any other approaches that may
be relevant to the question at hand.
4. It is sometimes possible to combine multidimensional components into a single indicator.
Doing so, however, requires a theory that tells one how to combine them properly. In geometry, for
example, "volume" is a single indicator of a multidimensional quality, but it cannot be calculated unless
one knows the appropriate formula for the shape of the object in question.
5. It has been recently noted, however, that the Polity III data are properly considered categorical.
Also, it seems likely that only one component of this indicator, most likely "decisional constraints on the
executive," validly measures democracy; its other components may simply add noise (Gleditsch and
6. Bollen's conclusions are based on confirmatory factor analysis, which estimates how much of
the variance in several related indices is due to random error, method factors (bias introduced by the
method used to gather data and construct the indicator or by the people doing the measurement), or valid
measurement of the concept. The soundness of the estimate depends crucially on the assumptions of the
model relating valid measurement, method factors, and random error (Long 1983). I consider almost all
of Bollen's assumptions to have been quite reasonable. However, in order to permit identification of his
model, he made the unexplained assumption that there was no method factor contamination of Arthur
Banks' indicator of the freedom of group opposition (Bollen 1993, 1218).
7. Technically, the Polyarchy Scale is not continuous, but a set of 11 ordered categories. However,
the principle would be the same for truly continuous indicators even though it would be harder to
identify the threshold that is the closest equivalent to the categorical definition.
8. There is, however, one indicator of regime change: the Political Regime Change Dataset
(Gasiorowski 1996). Another limited exception is the regime classification used by Hannan and Carroll
(1981), which employed the categories "multiparty," "one party," "military," and "no party."
9. Actually, it is not hard to introduce a fixed effect from a higher level of analysis. But there is
no way to test hypotheses about lower-level actors interacting more frequently without creating an
entirely new dataset.
10. This element is implied by Linz's explicit statements that authoritarian regimes are by
definition nondemocratic. The language comes from his own definition of a democratic political system
(Linz 1975, 182-3).