1. Of course, we also need to demonstrate that our hypotheses do fit the evidence. This is not always easy, but it is easier than ruling out all the other possibilities, so I emphasize disconfirmation here. One of the alternative hypotheses is that any association we discover is due to chance. For this reason, some scholars encourage us to avoid procedures that increase the probability of false positives, such as testing a hypothesis with the same sample that suggested it or engaging in exploratory specification searches or "mere curve-fitting." Some even find methodological virtue in procedures that are more likely to generate hypotheses that are wrong, i.e., logical deduction of the implications of simplistic assumptions. I consider this stance an over-reaction to the danger of chance associations. The counter-intuitiveness of a hypothesis should increase our skepticism and our insistence on thorough testing, not our confidence in thinly documented associations. There are better ways to guard against false positives: enlarging the sample, replication using different indicators, and testing other observable implications of the hypothesis.

2. The seminal article by Rustow (1970) anticipated many of the complex relationships discussed below.

3. I consider rational choice to be primarily a method for generating theory. Rational choice theorists who test their models, as many increasingly do, have no choice but to use existing empirical research methods and are subject to the same methodological standards as quantitative or small-N researchers. I do not take the načve falsificationalist stand that any single disconfirmation invalidates the entire approach: such invalidation comes only after the accumulation of many disconfirmations of an approach's predictions. However, the accumulation of these consistencies or inconsistencies requires much prior testing of specific hypotheses. In this process, every hypothesis must be evaluated with respect to at least one competing hypothesis, perhaps drawn from the same approach, perhaps from others. For this reason I advocate a disaggregated, eclectic approach, in which specific hypotheses generated by rational choice are tested against competing hypotheses generated by historical institutionalism, class analysis, political culture, dependency theory, and any other approaches that may be relevant to the question at hand.

4. It is sometimes possible to combine multidimensional components into a single indicator. Doing so, however, requires a theory that tells one how to combine them properly. In geometry, for example, "volume" is a single indicator of a multidimensional quality, but it cannot be calculated unless one knows the appropriate formula for the shape of the object in question.

5. It has been recently noted, however, that the Polity III data are properly considered categorical. Also, it seems likely that only one component of this indicator, most likely "decisional constraints on the executive," validly measures democracy; its other components may simply add noise (Gleditsch and Ward 1997).

6. Bollen's conclusions are based on confirmatory factor analysis, which estimates how much of the variance in several related indices is due to random error, method factors (bias introduced by the method used to gather data and construct the indicator or by the people doing the measurement), or valid measurement of the concept. The soundness of the estimate depends crucially on the assumptions of the model relating valid measurement, method factors, and random error (Long 1983). I consider almost all of Bollen's assumptions to have been quite reasonable. However, in order to permit identification of his model, he made the unexplained assumption that there was no method factor contamination of Arthur Banks' indicator of the freedom of group opposition (Bollen 1993, 1218).

7. Technically, the Polyarchy Scale is not continuous, but a set of 11 ordered categories. However, the principle would be the same for truly continuous indicators even though it would be harder to identify the threshold that is the closest equivalent to the categorical definition.

8. There is, however, one indicator of regime change: the Political Regime Change Dataset (Gasiorowski 1996). Another limited exception is the regime classification used by Hannan and Carroll (1981), which employed the categories "multiparty," "one party," "military," and "no party."

9. Actually, it is not hard to introduce a fixed effect from a higher level of analysis. But there is no way to test hypotheses about lower-level actors interacting more frequently without creating an entirely new dataset.

10. This element is implied by Linz's explicit statements that authoritarian regimes are by definition nondemocratic. The language comes from his own definition of a democratic political system (Linz 1975, 182-3).