Week 1 Summaries
Gary King, Robert O. Keohane, and Sidney Verba, Designing Social Inquiry:
Scientific Inference in Qualitative Research. (chaps. 1-3, 6)
The authors’ principal purposes in Designing Social
Inquiry is to show that the logic of inference typically associated with
quantitative methods also applies to qualitative methods. Understood in this
vein, the book proposes a methodological synthesis rather than a tyranny
of quantitative over the qualitative methods. The book adopts the language
of mathematics to give greater precision and clarity to notions implicit
in qualitative approaches that otherwise might seem fuzzy or muddled.
Chapter 1 introduces the purpose of the book and discusses
the basic components of a research design: the research question, theories,
data, and tests of the theories using data.
- Research questions should deal with matters that are “important”
in the real world. The answers to the questions should contribute to some
recognized body of scholarly literature. (In other words, they ought to address
an empirical problem as well as a theoretical approach).
- Theories must be fashioned with care to ensure that they are not
tautological or unfalsifiable, have observable implications, and are as concrete
as possible—lending themselves to clear operationalizations. If a theory
claims that x causes y, it is “falsifiable” if there remains the possibility
that x results in not y, not x results in y, or not x results in not y.
- In collecting data, one must be completely familiar with the
process that generated the data, collect data related to as many of the observable
implications as possible, endeavor to maximize validity (by how one operationalizes
the variables), ensure reliability (by using consistent processes that are
repeatable), and ensure that the data collection and analysis processes can
be replicated by others.
- In using data, a researcher must be aware of the potential sources
of bias and should “wring” as much as possible out of the data by disaggregating
or looking for temporal “breaks” that delineate separate cases. Moreover,
a persuasive test of a theory must explore the bounds of the theory’s applicability
by investigating the outcomes for cases that do not conform to the proposed
set of causal conditions.
Chapter 2 contrasts contextual “interpretation” with “inference,”
but argues that the same standards of inference apply when testing hypotheses
no matter how the hypotheses were divined. The chapter then develops a generalized
formal model of a research design that is useful for qualitative research.
The model draws analogies using the notions of “expected value,” “mean,”
“variance,” “bias,” and “data efficiency” as applied in statistical inference.
Chapter 3 takes up the topics of causality and causal
inference. In devising a test of a theory, one should choose empirical cases
that satisfy the criteria of unit homogeneity or conditional independence—or
both, if possible. Unit homogeneity implies that for all cases in which the
explanatory variables take on certain values, the expected value of the dependent
variable is the same. Conditional independence asserts that observations
are chosen such that the values taken by the study variables are independent
of the values of the dependent variable. Conditional independence eliminates
the problems of endogeneity (in which the explanatory variables are caused,
at least in part, by the dependent variable), selection bias (choosing cases
that explicitly or implicitly favor certain values of the dependent variable),
and omitted variable bias (the influence of constant effects and spurious
variables).
Randomly selecting cases satisfies conditional independence.
However, random selection introduces three difficulties in small-n research
designs. It is difficult to apply in situations where the universe of cases
is unclear. Also, if there is a small number of observations, random selection
risks missing important cases. Furthermore, random selection can introduce
biases in a small-n research design that can be avoided by carefully selecting
cases in which the explanatory variables appear to be uncorrelated with the
dependent variable.
Chapter 6 addresses strategies for using case studies
(”observations”) to best advantage for testing a theory. A single case study
can serve as a “critical” test of a theory if it corresponds to a “least
likely” test of a hypothesis (or a “most likely” test in the case of a plausibility
probe). However, this approach has limited usefulness if there might be more
than one causal effect, if we are concerned about inherent measurement error
(which is almost always true), and if there is a possibility that the causal
effect embodies contingency or a probabilistic quality. The number of data
points (“observations” or “cases”) necessary to test a theory is a function
of the variance in the causal variable, variability in the outcomes, the
uncertainty associated with the causal inference (the desired “confidence
interval”), and the degree of collinearity between the causal variable and
the control variables. If more observations are needed, it may be possible
to “squeeze” more out of the data on hand—by looking at subunits within the
case or looking at the case across time—recognizing that a single case “parsed”
into more cases may not yield observations that are independent.
Sprinz, Detlef F. and Yael N. Wolinsky (eds., under review). Cases,
Numbers, Models: International Relations Research Methods. Chapter
2.
There are three methods of case analysis: process tracing, congruence
testing, and counterfactual analysis. 1) Process tracing applies
theories under investigation to each causal step (intervening variables)
between the hypothesized cause and the observed effect. It allows
testing a hypothesis derived from a case against different evidence in the
same case. 2) Congruence testing involves comparing the predicted
and the observed values of the dependent variable. It’s inferior to
process tracing because of the n=1 problem. 3) When we need to test a hypothesis
“if and only if x then y,” counterfactual analysis will test a logically
equivalent hypothesis “ if not x then not y.”
Two important single-case research designs are identified.
1) Eckstein proposed studying most-likely, least-likely, and crucial cases
(the latter are perfectly most- or least- likely) for testing a theory.
“A most likely case is one that is almost certain to fit a theory if the
theory is true for any cases at all” (37). The theory is undermined
if it doesn’t hold true in this case. The definition of a least-likely
case is analogous to that above. If a theory holds in this case, it
is strongly supported. Incorporating the existence of competing
theories into Ecksteins design, strong support for a theory is found when
a case supports that theory’s predictions, and not the others’. 2)
Studying deviant or “outlier” cases is useful in identifying new hypotheses
and new or omitted variables.
Comparative methods
1) Mill’s method of agreement / least similar case comparison involves
selecting cases (observations), in which all but one independent variables
have different values, and the dependent variable has the same value.
This yields the conclusion that the common independent variable is causally
related to the dependent variable. Mill’s method of difference / most
similar case design involves selecting cases, in which all but one independent
variables have same values, and the dependent variable has different values.
This yields the conclusion that the differing independent variable is causally
related to the dependent variable. These designs cannot be used in
the presence of equifinality – a condition where “the same outcome can arise
through different pathways or combinations of variables.” In general,
the requirements that must be satisfied in order to use these methods are
unrealistic, and, therefore, these methods are rarely used.
2) Structured focused comparison, developed by A. George, requires i)
defining the research objective (formulating hypotheses, etc.), ii) specifying
control, key causal, and dependent variables, iii) selecting cases, iv) establishing
how to measure variance in the dependent and independent variables, v) specifying
the method for selecting observations (single values of variables).
George argued that case studies are useful in developing “typologycal theories”
(41), which make less restrictive assumptions than those of Mill and incorporate
equifinality.
Comparative advantages and some tradeoffs
The most important comparative advantage of case study methods is in
identifying new hypotheses. The other advantages include studying
causal mechanisms via process tracing, developing historical explanations,
identifying new and omitted variables, attaining high levels of construct
validity, and accommodating complex causal relations, such as equifinality,
interactions effects, and path dependency. The latter advantage carries
a tradeoff, since it implies the loss of parsimony in selecting the number
of variables and the loss of generality of findings. Statistical methods
face the opposite tradeoff.
Construct validity is the “ability to [operationalize and] measure in
a case the indicators that best represent the theoretical concept we intend
to measure” (42). Again, there is a tradeoff between achieving high
levels of construct validity, where case studies are superior to statistical
analysis, and external validity, or the ability to generalize findings to
a wide number of cases, where statistical methods are superior.
Drawbacks
One of the problems with cases studies is the danger of the selection
bias - the case selection process, which results in “inferences that suffer
from systematic error” (47-48). Selection bias usually results from
selecting on the dependent variable (selecting cases from the non-randomly
limited sample of values of the dependent variable). However, some argue
that selecting on the dependent variable is useful when trying to test or
limit the choice of the independent variables and when identifying the causal
paths leading to a selected value of a dependent variable.
Another source of selection bias is “confirmation bias: selecting only
those cases whose independent and dependent variables vary as the favored
hypothesis suggests and ignoring cases that appear to contradict the theory”
(48, underlined in the original). While selecting on the dependent
variable typically understates the strength of the causal relationship,
confirmation bias can either understate or overstate it.
The indeterminancy problem arises when a case could be successfully explained
by several competing hypotheses. This is different from the “degrees
of freedom” problem, where the number of independent variables exceeds
the number of observations. Because the author defines a case
as “an instance of a class of events of interest to the investigator” (28),
he argues that cases usually include a potentially large number of observations
on dependent and independent variables, so that the degrees of freedom problem
is not endemic to case studies.
Another danger of using cases studies – potential lack of independence
of cases – need not be a problem if this lack of independence is recognized
(e.g. through process tracing) and adjuster for. Another limitation
of case studies is difficulty in measuring magnitude and uncertainty of
causal inference.
Chp. 6: B.F. Braumoeller and A.E. Satori, "Empirical-Quantitative
Approaches to the Study of International Relations."
Statistical method “permits the researcher to draw inferences about reality
based on the data at hand and the laws of probability” (139). It is
especially useful for evaluating and testing theories.
Advantages
Ability to aggregate information from large numbers of cases is the major
advantage of statistical method and can be useful for theory development.
Statistical analysis allows not only to uncover a puzzle, but, unlike a
case study, to check if it represents a systemic pattern. Thus, the
method permits generalizations. Statistics requires both high standards
of inference (explicit assumptions) and standards of evidence (explicit
criteria for measurement).
Statistical method allows drawing causal inferences and estimating uncertainty
of those inferences – the probability that the association is due to chance.
Finally, the method is extremely useful for testing rival hypotheses against
each other.
Pitfalls
Error of specification is a failure of statistical tests to “relate
meaningfully to the causal mechanisms implied by the theories that they
purport to evaluate” (143). Three such errors are identified.
1) Focus on correlations with little attention to theory.
This error is illustrated by the development of the democratic peace theory.
There, theory development lagged behind studies built on statistical associations.
Development of new theories uncovered the possibility that preceding studies
based their analysis on the wrong causal variables.
2) Analysis based on imprecise or shallow theories. An imprecise
theory allows for “a wide range of relationships between independent and
dependent variables” (144). Such theories may be unfalsifiable.
According to Lake and Powell, Waltzian neorealism is one such theory.
It predicts that when, in a multipolar system, an alliance is challenged,
a member of the alliance will either free-ride or join with others in meeting
the challenge (144). Since these responses are exhaustive and mutually
exclusive, falsification is impossible.
A shallow theory has few testable implications. For instance, a one-shot
Prisoner’s Dilemma (PD) game has been used to hypothesize a relationship
between nuclear weapons and the likelihood of war. However, if confronted
with rival theories that predict the same relationship, the PD fails to provide
additional testable implications, which would differ from those of rival
theories. Therefore, imprecise and shallow theories require theoretical
development before statistical models can be applied.
3) Inattention to functional form - imposing a statistical model
on a theory, instead of using a model to test the theory. A statistical
model should reflect the underlying theory and the causal processes that
generated the data. A combination of formal theory at the development
stage and statistical methods for testing is, therefore, recommended.
Errors of inference refer to fallacious reasoning as to “the extent
that tests of a given theory reveal information about reality” (150).
1) One way to make this error is to focus on statistical significance
to the detriment of substantive significance. In large-n studies,
the smallest degree of association that provides weak support for a theory
will prove to be statistically significant. Another problem is that
rejecting, or failing to reject, the null hypothesis on the basis of arbitrary
significance levels is wrong, but widely practiced. Instead, the certainty
of one’s results should be represented by a probability measure of observing
those results due to chance. Finally, data mining, or running a model
until significant results appear, significantly compromises reliability.
If one runs a model enough times, the probability of some results appearing
significant, when the relationship is actually spurious, can be quite high.
A “sin of omission” occurs when researchers “accept or reject a
theory based upon an assessment of how likely certain variables are to have
non-zero effects” (i.e. looking at the coefficients and standard errors)
(152). Instead, according to Lakatos, a theory should be evaluated
on its performance against rival theories. Or, as the Bayesian view
holds that a theory should be evaluated based on results over time.
A “sin of commission” occurs when too many independent variables
are included in the analysis (“’garbage can’ models” (153)) and presents a
serious threat to inference. “Moreover, if the variables that the competing
theory suggests are correlated in the sample with the variables of primary
interest, then including these ‘control’ variables can lead to incorrect
conclusions about the primary theory being tested” (153). (The summarizer
will be grateful to anyone who can reconcile that last quote with Gary King’s
assertion that multicollinearity is not a problem, unless correlation = 1).
Chp. 10: Duncan Snidal, “Formal Models of International Politics"
A model is “a simplified picture of a part of the real world,” which takes
into account the most important considerations for the theory under investigation
(242). Formal models vary a great deal: verbal, physical, mathematical,
computer models, etc. Though different models have the same basic
logical structure, each has its own advantages and disadvantages. For
example, computer models are difficult to set up and explain, but are great
for manipulating assumptions of the model and for handling complex problems.
Mathematical models do not incorporate much detail, but have advantages of
generality and preciseness of representation.
Model construction is a powerful way to develop a theory. When constructing
a model, one should start with the simplest of specifications and then add
complexity as needed.
“The greatest advantage of models emerges when their deductive power moves
us beyond descriptions to inferences from assumptions” (249). Formal
models are very good at achieving internal validity. They help avoid
logical mistakes, but are often criticized for producing intuitive results.
Therefore, a model is especially valuable when its conclusions are “surprising”
(252). The deductions of models can be surprising when they predict
unobservable outcomes (e.g. if Saddam Hussein gets nukes, he’ll use them),
or when an observed outcome depends on an unobserved cause (e.g. nuclear peace
depends on the credibility of mutually assured destruction), or when only
one of many potential outcomes is observed (existence of multiple equilibria).
Though models achieve high levels of internal validity, external validity
is a problem if the model is not properly tested. The empirical content
of models is based on “stylized facts,” or empirical generalizations (254).
To test a model, one must assess its applicability to an empirical problem
it attempts to address. Ascertaining “face validity” (whether the facts
clearly contradict the theory or not) is a start, which should generally be
followed by statistical testing. This way of testing is inconclusive
if a model is indeterminate – comes up with too many predictions. The
author notes, however, that such a model is not necessarily useless – it might
be “illuminating indeterminancy that is a fundamental feature of the world”
(255). Case studies, which focus on complex causal relationships
and interaction effects, can be useful when testing such models. Besides
testing predictions, one should also test the assumptions to see if the results
are robust to reasonable variations in specifications.
Progression of Formal Models
Richardson (1960) produced the first formal model in international relations.
Please refer to pages 258-261 for a concise and informative summary of the
model. Here it is in brief. This is a rational choice decision
theory model of two states conditioned by three motivations: grievances between
states, fear of the other state, and fatigue resulting from costs of acquiring
weapons. A state’s behavior (i.e. the rate of weapons acquisition) can
then be represented mathematically and graphically as a function of its grievances,
its armament levels (of which fatigue is a function), and the other state’s
armaments level (of which fear is a function). The two equations (one
for each state) can then be solved to arrive at an equilibrium level of military
spending for each state. Comparative statics can then be derived.
For example, the optimal level of spending by a state is increasing in its
grievances, decreasing in the cost of maintaining current armaments, and
increasing in either state’s fear (this is not bad grammar, this is rat.
choice grammar J). The model also implies that states will behave differently
when not at equilibrium as parameters vary. Specifically, when the
fatigue factors are relatively larger than the fear factors, the states will
always converge at equilibrium – the equilibrium is stable (see p.278).
If the inequality is reversed, the equilibrium becomes unstable, and the
model comes up with bizarre predictions. If both states levels of spending
are slightly above equilibrium, spending will spiral up to infinity; and
if both sides spend slightly less, spending on both sides will decrease to
0 (see p. 279). One can algebraically solve for conditions under which
the equilibrium is stable, but that does not explain away the predictions
for unstable equilibrium conditions, and these predictions fly in the face
of empirical observations.
Game
theory addressed this problem by modeling interactions strategically (i.e.
each player’s actions are conditioned by the other’s). Using
the example above, spending 0 will no longer be an equilibrium, since one
state has an incentive to increase its spending a little to take advantage
of the other.
Please refer to pp. 263 and 280 for a description of the one-shot two-player
Prisoner’s Dilemma. It has been widely used in studies of cooperation,
despite its obvious shortcomings. First, it predicts mutual defection
as the only equilibrium, while we do observe cooperation empirically.
Second, it treats states as unitary actors and ignores effects of domestic
politics on foreign policy. Third, the model ignores existing international
institutional environment. (Though I would assign the latter shortcomings
to the underlying theory (i.e. realism), and not to the model, unless the
model is not built on the assumptions of realism).
The first shortcoming is addressed by incorporating repeated games into
the model. Folk Theorem predicts cooperation in an infinitely repeated
PD game. Repeated games also provide an answer to why cooperation on
security issues is harder to achieve than on economic issues. This is
because the discount factor on future payoffs is lower for security issues
(i.e. being taken advantage of now and getting a “sucker’s payoff” is especially
not worth potential future cooperation when it comes to security). A
more unpleasant implication of the Folk Theorem is that there is an infinite
number of possible cooperative equilibria, and the model cannot predict which
one will occur. Non-rational choice (psychological, cultural) theories
rely on concepts like “focal points” to predict the outcome (266). The
Folk Theorem also highlights a substantive problem with the PD in that it
focuses too much on cooperation and not enough on coordination between states
on the possible choice of equilibria.
Extensive
form games offer more detail than normal form games discussed above.
Presenting a model in extensive form allows the use of backwards induction
or other techniques to find subgame perfect equilibria, which do not depend
on incredible threats or commitments by states. For example, on page
283, the normal form game has two Nash equilibria marked by asterisks.
By looking at the extensive form game on the same page, we see that the “Cooperate
with threat – Cooperate” equilibrium relies on a commitment by C to cooperate
after R cooperates. However, this commitment is incredible, since, when
C gets to choose, she will want to get a higher payoff and will not cooperate.
Since R knows this (by common knowledge assumption), he will not cooperate
in the first place. Thus, the only predicted (subgame-perfect) equilibrium
of this game is mutual non-cooperation.
Some
models experience difficulties describing reality because of simplifying
assumptions. Nevertheless, formal theory allows changing and relaxing
such assumptions. For example, to explain why war occurs, an assumption
of complete information must be relaxed and uncertainty introduced.
Similarly, an assumption that states are unitary actors could be relaxed
to introduce domestic actors. Finally, complexity theory allows for
change of preferences.
…