Reading Responses(Less Than 1 Page)

2 reading responses, each of them will only be one paragraph

Global Environment and International Inequality

Henry Shue

Global environment and international inequality



My aim is to establish that three commonsense principles of fairness, none of

them dependent upon controversial philosophical theories ofjustice, give rise to

the same conclusion about the allocation of the costs of protecting the environ-


Poor states and rich states have long dealt with each other primarily upon

unequal terms. The imposition of unequal terms has been relatively easy for the

rich states because they have rarely needed to ask for the voluntary cooperation

of the less powerful poor states. Now the rich countries have realized that their

own industrial activity has been destroying the ozone in the earth’s atmosphere

and has been making far and away the greatest contribution to global warming.

They would like the poor states to avoid adopting the same form of industriali-

zation by which they themselves became rich. It is increasingly clear that if poor

states pursue their own economic development with the same disregard for the

natural environment and the economic welfare of other states that rich states

displayed in the past during their development, everyone will continue to suffer

the effects of environmental destruction. Consequently, it is at least conceivable

that rich states might now be willing to consider dealing cooperatively on

equitable terms with poor states in a manner that gives due weight to both the

economic development of poor states and the preservation of the natural


If we are to have any hope of pursuing equitable cooperation, we must try to

arrive at a consensus about what equity means. And we need to define equity,

not as a vague abstraction, but concretely and specifically in the context of both

development of the economy in poor states and preservation of the environ-

ment everywhere.

Fundamental fairness and acceptable inequality

What diplomats and lawyers call equity incorporates important aspects of what

ordinary people everywhere call fairness. The concept of fairness is neither

Henry Shue

Eastern nor Western, Northern nor Southern, but universal.’ People every-

where understand what it means to ask whether an arrangement is fair or biased

towards some parties over other parties. If you own the land but I supply the

labour, or you own the seed but I own the ox, or you are old but I am young,

or you are female but I am male, or you have an education and I do not, or you

worked long and hard but I was lazy-in situation after situation it makes

perfectly good sense to ask whether a particular division of something among

two or more parties is fair to all the parties, in light of this or that difference

between them. All people understand the question, even where they have been

taught not to ask it. What would be fair? Or, as the lawyers and diplomats

would put it, which arrangement would be equitable?

Naturally, it is also possible to ask other kinds of questions about the same

arrangements. One can always ask economic questions, for instance, in addition

to ethical questions concerning equity: would it increase total output if, say,
women were paid less and men were paid more? Would it be more efficient?

Sometimes the most efficient arrangement happens also to be fair to all parties,

but often it is unfair. Then a choice has to be made between efficiency and fair-

ness. Before it is possible to discuss such choices, however, we need to know

the meaning of equity: what are the standards of equity and how do they matter?

Complete egalitarianism-the belief that all good things ought to be shared

equally among all people-can be a powerfully attractive view, and it is much

more difficult to argue against than many of its opponents seem to think. I shall,

nevertheless, assume here that complete egalitarianism is unacceptable. If it

were the appropriate view to adopt, our inquiry into equity could end now.

The answer to the question, ‘what is an equitable arrangement?’ would always

be the same: an equal distribution. Only equality would ever provide equity.

While I do assume that it may be equitable for some good things to be
distributed unequally, I also assume that other things must be kept equal-most

importantly, dignity and respect. It is part of the current international consensus

that every person is entitled to equal dignity and equal respect. In traditional

societies in both hemispheres, even the equality of dignity and respect was

denied in theory as well as practice. Now, although principles of equality are

still widely violated in practice, inequality of dignity and of respect have

relatively few public advocates even among those who practice them. If it is

equitable for some other human goods to be distributed unequally, but it is not

equitable for dignity or respect to be unequal, the central questions become:

‘which inequalities in which other human goods are compatible with equal

human dignity and equal human respect?’ and ‘which inequalities in other

goods ought to be eliminated, reduced or prevented from being increased?’

When one is beginning from an existing inequality, like the current inequality

in wealth between North and South, three critical kinds of justification are:

Global environment and international inequality

justifications of unequal burdens intended to reduce or eliminate the existing

inequality by removing an unfair advantage of those at the top; justifications of

unequal burdens intended to prevent the existing inequality from becoming

worse through any infliction of an unfair additional disadvantage upon those at

the bottom; and justifications of a guaranteed minimum intended to prevent the

existing inequality from becoming worse through any infliction of an unfair

additional disadvantage upon those at the bottom. The second justification for

unequal burdens and the justification for a guaranteed minimum are the same:

two different mechanisms are being used to achieve fundamentally the same

purpose. I shall look at these two forms of justification for unequal burdens and

then at the justification for a guaranteed minimum.

Unequal burdens

Greater contribution to the problem

All over the world parents teach their children to clean up their own mess. This

simple rule makes good sense from the point of view of incentive: if one learns

that one will not be allowed to get away with simply walking away from what-

ever messes one creates, one is given a strong negative incentive against making

messes in the first place. Whoever makes the mess presumably does so in the

process of pursuing some benefit-for a child, the benefit may simply be the

pleasure of playing with the objects that constitute the mess. If one learns that

whoever reaps the benefit of making the mess must also be the one who pays
the cost of cleaning up the mess, one learns at the very least not to make messes

with costs that are greater than their benefits.

Economists have glorified this simple rule as the ‘internalization of extern-

alities’. If the basis for the price of a product does not incorporate the costs of

cleaning up the mess made in the process of producing the product, the costs are

being externalized, that is, dumped upon other parties. Incorporating into the

basis of the price of the product the costs that had been coercively socialized is

called internalizing an externality.

At least as important as the consideration of incentives, however, is the

consideration of fairness or equity. If whoever makes a mess receives the bene-

fits and does not pay the costs, not only does he have no incentive to avoid

making as many messes as he likes, but he is also unfair to whoever does pay the

costs. He is inflicting costs upon other people, contrary to their interests and,

presumably, without their consent. By making himself better off in ways that

make others worse off, he is creating an expanding inequality.
Once such an inequality has been created unilaterally by someone’s imposing

costs upon other people, we are justified in reversing the inequality by imposing

extra burdens upon the producer of the inequality. There are two separate

points here. First, we are justified in assigning additional burdens to the party

who has been inflicting costs upon us. Second, the minimum extent of the


Henry Shue

compensatory burden we are justified in assigning is enough to correct the

inequality previously unilaterally imposed. The purpose of the extra burden is

to restore an equality that was disrupted unilaterally and arbitrarily (or to reduce

an inequality that was enlarged unilaterally and arbitrarily). In order to accom-

plish that purpose, the extra burden assigned must be at least equal to the unfair

advantage previously taken. This yields us our first principle of equity:

When a party has in the past taken an unfair advantage of others by imposing costs upon

them without their consent, those who have been unilaterally put at a disadvantage are

entitled to demand that in the future the offending party shoulder burdens that are

unequal at least to the extent of the unfair advantage previously taken, in order to

restore equality.2

In the area of development and the environment, the clearest cases that fall

under this first principle of equity are the partial destruction of the ozone layer

and the initiation of global warming by the process of industrialization that has

enriched the North but not the South. Unilateral initiatives by the so-called

developed countries (DCs) have made them rich, while leaving the less

developed countries (LDCs) poor. In the process the industrial activities and

accompanying lifestyles of the DCs have inflicted major global damage upon

the earth’s atmosphere. Both kinds of damage are harmful to those who did not

benefit from Northern industrialization as well as to those who did. Those

societies whose activities have damaged the atmosphere ought, according to the

first principle of equity, to bear sufficiently unequal burdens henceforth to

correct the inequality that they have imposed. In this case, everyone is bearing

costs-because the damage was universal-but the benefits have been over-

whelmingly skewed towards those who have become rich in the process.

This principle of equity should be distinguished from the considerably

weaker-because entirely forward-looking-‘polluter pays principle’ (PPP),

which requires only that all future costs of pollution (in production or

consumption) be henceforth internalized into prices. Even the OECD formally

adopted the PPP in I974, to govern relations among rich states.3

Spokespeople for the rich countries make at least three kinds of counter-

arguments to this first principle of equity. These are:

i. The LDCs have also benefited, it is said, from the enrichment of the

DCs. Usually it is conceded that the industrial countries have benefited more

than the non-industrialized. Yet it is maintained that, for example, medicines

and technologies made possible by the lifestyles of the rich countries have also

reached the poor countries, bringing benefits that the poor countries could not

have produced as soon for themselves.

Global environment and international inequality

Quite a bit of breath and ink has been spent in arguments over how much

LDCs have benefited from the technologies and other advances made by the

DCs, compared to the benefits enjoyed by the DCs themselves. Yet this dispute

does not need to be settled in order to decide questions of equity. Whatever

benefits LDCs have received, they have mostly been charged for. No doubt

some improvements have been widespread. Yet, except for a relative trickle of

aid, all transfers have been charged to the recipients, who have in fact been left

with an enormous burden of debt, much of it incurred precisely in the effort to

purchase the good things produced by industrialization.

Overall, poor countries have been charged for any benefits that they have

received by someone in the rich countries, evening that account. Much greater

additional benefits have gone to the rich countries themselves, including a

major contribution to the very process of their becoming so much richer than

the poor countries. Meanwhile, the environmental damage caused by the

process has been incurred by everyone. The rich countries have profited to the

extent of the excess of the benefits gained by them over the costs incurred by

everyone through environmental damage done by them, and ought in future to

bear extra burdens in dealing with the damage they have done.

2. Whatever environmental damage has been done, it is said, was uninten-

tional. Now we know all sorts of things about CFCs and the ozone layer, and

about carbon dioxide and the greenhouse effect, that no one dreamed of when

CFCs were created or when industrialization fed with fossil fuels began. People

cannot be held responsible, it is maintained, for harmful effects that they could

not have foreseen. The philosopher Immanuel Kant is often quoted in the West

for having said, ‘Ought presupposes can’-it can be true that one ought to have

done something only if one actually could have done it. Therefore, it is

allegedly not fair to hold people responsible for effects they could not have

avoided because the effects could not have been predicted.

This objection rests upon a confusion between punishment and respon-

sibility. It is not fair to punish somneone for producing effects that could not
have been avoided, but it is common to hold people responsible for effects that

were unforeseen and unavoidable.

We noted earlier that, in order to be justifiable, an inequality in something

between two or more parties must be compatible with an equality of dignity

and respect between the parties. If there were an inequality between two groups

of people such that members of the first group could create problems and then

expect members of the second group to deal with the problems, that inequality

would be incompatible with equal respect and equal dignity. For the members

of the second group would in fact be functioning as servants for the first group.

If I said to you, ‘I broke it, but I want you to clean it up’, then I would be your

master and you would be my servant. If I thought that you should do my

bidding, I could hardly respect you as my equal.

It is true, then, that the owners of many coal-burning factories could not

possibly have known the bad effects of the carbon dioxide they were releasing


Henry Shue

into the atmosphere, and therefore could not possibly have intended to contri-

bute to harming it. It would, therefore, be unfair to punish them-by, for

example, demanding that they pay double or triple damages. It is not in the least

unfair, however, simply to hold them responsible for the damage that they have

in fact done. This naturally leads to the third objection.

3. Even if it is fair to hold a person responsible for damage done uninten-

tionally, it will be said, it is not fair to hold the person responsible for damage he

did not do himself. It would not be fair, for example, to hold a grandson

responsible for damage done by his grandfather. Yet it is claimed this is exactly

what is being done when the current generation is held responsible for carbon

dioxide emissions produced in the nineteenth century. Perhaps Europeans

living today are responsible for atmosphere-damaging gases ermitted today, but

it is not fair to hold people responsible for deeds done long before they were


This objection appeals to a reasonable principle, namely that one person

ought not to be held responsible for what is done by another person who is

completely unrelated. ‘Completely unrelated’ is, however, a critical portion of

the principle. To assume that the facts about the industrial North’s contribution

to global warming straightforwardly fall under this principle is to assume that

they are considerably simpler than they actually are.

First, and undeniably, the industrial states’ contributions to global warming

have continued unabated long since it became impossible to plead ignorance. It

would have been conceivable that as soon as evidence began to accumulate that

industrial activity was having a dangerous environmental effect, the industrial

states would have adopted a conservative or even cautious policy of cutting

back greenhouse-gas emissions or at least slowing their rate of increase. For the

most part this has not happened.

Second, today’s generation in the industrial states is far from completely

unrelated to the earlier generations going back all the way to the beginning of

the Industrial Revolution. What is tlhe difference between being born in I975

in Belgium and being born in I975 in Bangladesh? Clearly one of the most
fundamental differences is that the Belgian infant is born into an industrial

society and the Bangladeshi infant is not. Even the medical setting for the birth

itself, not to mention the level of prenatal care available to the expectant
mother, is almost certainly vastly more favourable for the Belgian than the

Bangladeshi. Childhood nutrition, educational opportunities and life-long

standards of living are likely to differ enormously because of the difference

between an industrialized and a non-industrialized economy. In such respects

current generations are, and future generations probably will be, continuing

beneficiaries of earlier industrial activity.

Nothing is wrong with the principle invoked in the third objection. It is

indeed not fair to hold someone responsible for what has been done by

someone else. Yet that principle is largely irrelevant to the case at hand, because
one generation of a rich industrial society is not unrelated to other generations


Global environment and international inequality

past and future. All are participants in enduring economriic structures. Benefits
and costs, and rights and responsibilities, carry across generations.

We turn now to a second, quite different kind of justification of the same

mechanism of assigning unequal burdens. This first justification has rested in

part upon the unfairness of the existing inequality. The second justification
neither assumes nor argues that the initial inequality is


Greater ability to pay

The second principle of equity is widely accepted as a requirement of simple

fairness. It states:

Among a number of parties, all of whom are bound to contribute to some common

endeavour, the parties who have the most resources normally should contribute the

most to the endeavour.

This principle of paying in accordance with ability to pay, if stated strictly,
would specify what is often called a progressive rate of payment: insofar as a

party’s assets are greater, the rate at which the party should contribute to the

enterprise in question also becomes greater. The progressivity can be strictly

proportional-those with double the base amount of assets contribute at

twice the rate at which those with the base amount contribute, those with

triple the base amount of assets contribute at three times the rate at which

those with the base amount contribute, and so on. More typically, the
progressivity is not strictly proportional-the more a party has, the higher the
rate at which it is expected to contribute, but the rate does not increase in

strict proportion to increases in assets.

The general principle itself is sufficiently fundamental that it is not

necessary, and perhaps not possible, to justify it by deriving it from con-
siderations that are more fundamental still. Nevertheless, it is possible to

explain its appeal to some extent more fully. The basic appeal of payment in

accordance with ability to pay as a principle of fairness is easiest to see by

contrast with a flat rate of contribution, that is, the same rate of contribution

by every party irrespective of different parties’ differing assets. At first
thought, the same rate for everyone seems obviously the fairest imaginable

arrangement. What could possibly be fairer, one is initially inclined to think,

than absolutely equal treatment for everyone? Surely, it seems, if everyone

pays an equal rate, everyone is treated the same and therefore fairly? This,
however, is an exceedingly abstract approach, which pays no attention at all
to the actual concrete circumstances of the contributing parties. In addition, it

focuses exclusively upon the contribution process and ignores the position in

which, as a result of the process, the parties end up. Contribution according
to ability to pay is much more sensitive both to concrete circumstance and to

final outcome.


Henry Shue

Suppose that Party A has go units of something, Party B has 30 units, and
Party C has 9 units. In order to accomplish their missions, it is proposed that
everyone should contribute at a flat rate of one-third. This may seem fair in that

everyone is treated equally: the same rate is applied to everyone, regardless of

circumstances. When it is considered that A’s contribution will be 30 and B’s

will be io, while C’s will be only 3, the flat rate may appear more than fair to C

who contributes only one-tenth as much as A does. However, suppose that

these units represent $ioo per year in income and that where C lives it is

possible to survive on $750 per year but on no less. If C must contribute 3

units-$3oo-he will fall below the minimum for survival. While the flat rate

of one-third would require A to contribute far more ($3 ,ooo) than C, and B to
contribute considerably more ($i,ooo) than C, both A (with $6,ooo left) and B
(with $2,000 left) would remain safely above subsistence level. A and B can

afford to contribute at the rate of one-third because they are left with more than

enough while C is unable to contribute at that rate and survive.

While flat rates appear misleadingly fair in the abstract, they do so largely

because they look at only the first part of the story and ignore how things turn

out in the end. The great strength of progressive rates, by contrast, is that they

tend to accommodate final outcomes and take account of whether the

contributors can in fact afford their respective contributions.

A single objection is usually raised against progressive rates of contribution:

disincentive effects. If those who have more are going to lose what they have at

a greater rate than those who have less, the incentive to come to have more in

the first place will, it is said, be much less than it would have been with a flat rate

of contribution. Why should I take more risks, display more imagination, or

expend more effort in order to gain more resources if the result will only be

that, whenever something must be paid for, I will have to contribute not merely

a larger absolute amount (which would happen even with a flat rate) but a larger

percentage? I might as well not be productive if much of anything extra I

produce will be taken away from me, leaving me little better off than those who

produced far less.

Three points need to be noticed regarding this objection. First, of course,

being fair and providing incentives are two different matters, and there is

certainly no guarantee in the abstract that whatever arrangement would provide

the greatest incentives would also be fair.

Second, concerns about incentives often arise when it is assumed that maximum

production and limitless growth are the best goal. It is increasingly clear that

many current forms of production and growth are unsustainable and that the last

thing we should do is to give people self-interested reasons to consume as many

resources as they can, even where the resources are consumed productively. These

issues cannot be settled in the abstract either, but it is certainly an open question-

and one that should be asked very seriously-whether in a particular situation it
is desirable to stimulate people by means of incentives to maximum production.

Sometimes it is desirable, and sometimes it is not. This is an issue about ends.


Global environment and international inequality

Third, there is a question about means. Assuming that it had been demon-

strated that the best goal to have in a specific set of circumstances involved

stimulating more production of something, one would then have to ask: how

much incentive is needed to stimulate that much production? Those who are

preoccupied with incentives often speculate groundlessly that unlimited

incentives are virtually always required. Certainly it is true that it is generally

necessary to provide some additional incentive in order to stimulate additional

production. Some people are altruistic and are therefore sometimes willing to

contribute more to the welfare of others even if they do not thereby improve

their own welfare. It would be completely unrealistic, however, to try to

operate an economy on the assumption that people generally would produce

more irrespective of whether doing so was in their own interest-they need

instead to be provided with some incentive. However, some incentive does not

mean unlimited incentive.

It is certainly not necessary to offer unlimited incentives in order to stimulate

(limited) additional production by some people (and not others). Whether people

respond or not depends upon individual personalities and individual circum-

stances. It is a factual matter, not something to be decreed in the abstract, how

much incentive is enough: for these people in these circumstances to produce

this much more, how much incentive is enough? What is clearly mistaken is the

frequent assumption that nothing less than the maximum incentive is ever


In conclusion, insofar as the objection based on disincentive effects is

intended to be a decisive refutation of the second principle of equity, the

objection fails. It is not always a mistake to offer less than the maximum possible

incentive, even when the goal of thereby increasing production has itself been

justified. There is no evidence that anything less than the maximum is even

generally a mistake. Psychological effects must be determined case by case.

On the other hand, the objection based on disincentive effects may be
intended-much more modestly-simply as a warning that one of the possible

costs of restraining inequalities by means of progressive rates of contribution, in

the effort of being fair, may (or may not) be a reduction in incentive effects. As

a caution rather than a (failed) refutation, the objection points to one sensible

consideration that needs to be taken into account when specifying which

variation upon the general second principle of equity is the best version to adopt

in a specific case. One would have to consider how much greater the incentive

effect would be if the rate of contribution were less progressive, in light of how

unfair the results of a less progressive rate would be.

This conclusion that disincentive effects deserve to be considered, although

they are not always decisive, partly explains why the second principle of equity

is stated, not as an absolute, but as a general principle. It says: ‘… the parties who

have the most resources normally should contribute the most…’-not always,

but normally. One reason why the rate of contribution might not be progress-
ive, or might not be as progressive as possible, is the potential disincentive


Henry Shue

effects of more progressive rates. It would need to be shown case by case that an

important goal was served by having some incentive and that the goal in

question would not be served by the weaker incentive compatible with a more

progressive rate of contribution.

We have so far examined two quite different kinds ofjustifications of unequal

burdens: to reduce or eliminate an existing inequality by removing an unfair

advantage of those at the top and to prevent the existing inequality from

becoming worse through any infliction of an unfair additional disadvantage

upon those at the bottom. The first justification rests in part upon explaining why

the initial inequality is unfair and ought to be removed or reduced. The second

justification applies irrespective of whether the initial inequality is fair. Now we

turn to a different mechanism that-much more directly-serves the second

purpose of avoiding making those who are already the worst-off yet worse off.

Guaranteed minimum

We noted earlier that issues of equity or fairness can arise only if there is some-

thing that must be divided among different parties. The existence of the

following circumstances can be taken as grounds for thinking that certain parties

have a legitimate claim to some of the available resources: (a) the aggregate total

of resources is sufficient for all parties to have more than enough; (b) some

parties do in fact have more than enough, some of them much more than

enough; and (c) other parties have less than enough. American philosopher

Thomas Nagel has called such circumstances radical inequality.4 Such an

inequality is radical in part because the total of available resources is so great that

there is no need to reduce the best-off people to anywhere near the minimum

level in order to bring the worst-off people up to the minimum: the existing

degree of inequality is utterly unnecessary and easily reduced, in light of the

total resources already at hand. In other words, one could preserve considerable

inequality-in order, for instance, to provide incentives, if incentives were

needed for some important purpose-while arranging for those with less than

enough to have at least enough.

Enough for what? The answer could of course be given in considerable detail,

and some of the details would be controversial (and some, although not all,

would vary across societies). The basic idea, however, is of enough for a decent

chance for a reasonably healthy and active life of more or less normal length,

barring tragic accidents and interventions. ‘Enough’ means the essentials for at

Global environment and international inequality

least a bit more than mere physical survival-for at least a distinctively human, if

modest, life. For example, having enough means owning not merely clothing

adequate for substantial protection against the elements but clothing adequate in

appearance to avoid embarrassment, by local standards, when being seen in

public, as Adam Smith noted.

In a situation of radical inequality-a situation with the three features out-

lined above-fairness demands that those people with less than enough for a

decent human life be provided with enough. This yields the third principle of

equity, which states:

When some people have less than enough for a decent human life, other people have far

more than enough, and the total resources available are so great that everyone could

have at least enough without preventing some people from still retaining considerably

more than others have, it is unfair not to guarantee everyone at least an adequate


Clearly, provisions to guarantee an adequate minimum can be of many differ-

ent kinds, and, concerning many of the choices, equity has little or nothing to

say. The arrangements to provide the minimum can be local, regional, national,

international or, more likely, some complex mixture of all, with secondary

arrangements at one level providing a backstop for primary arrangements at

another level.6 Similarly, particular arrangements might assign initial responsi-

bility for maintaining the minimum to families or other intimate groups, to

larger voluntary associations like religious groups or to a state bureau. Consider-

ation of equity might have no implications for many of the choices about

arrangements, and some of the choices might vary among societies, provided

the minimum was in fact guaranteed.

Children, it is worth emphasizing, are the main beneficiaries of this principle

of equity. When a family drops below the minimum required to maintain all its

members, the children are the most vulnerable. Even if the adults choose to

allocate their own share of an insufficient supply to the children, it is still quite

likely that the children will have less resistance to disease and less resilience in

general. And of course not all adults will sacrifice their own share to their

children. Or, in quite a few cultures, adults will sacrifice on behalf of male

children but not on behalf of female children. All in all, when essentials are

scarce, the proportion of children dying is far greater than their proportion in

the population, which in poorer countries is already high-in quite a few poor

countries, more than half the population is under the age of I 5.

5 This third principle of equity is closely related to what I called the argument from vital interests in Henry
Shue, ‘The unavoidability of justice’, in Andrew Hurrell and Benedict Kingsbury, eds, The international

politics of the environment (Oxford: Oxford University Press, I992), pp. 373-97. It is the satisfaction of vital
interests that constitutes the minimum everyone needs to have guaranteed. In the formulation here the

connection with limits on inequality is made explicit.

6 On the importance of backstop arrangements, or the allocation of default duties, see ‘Afterword’ in

Henry Shue, Basic rights: subsistence, affluence, and USforeign policy, 2nd edn (Princeton, NJ: Princeton
University Press, I996).


Henry Shue

One of the most common objections to this third principle of equity flows

precisely from this point about the survival of children. It is what might be

called the over-population objection. I consider this objection to be ethically

outrageous and factually groundless, as explained elsewhere.7

The other most common objection is that while it may be only fair for each

society to have a guaranteed minimum for its own members, it is not fair to

expect members of one society to help to maintain a guarantee of a minimum

for members of another society.8 This objection sometimes rests on the

assumption that state borders-national political boundaries-have so much

moral significance that citizens of one state cannot be morally required, even by

considerations of elemental fairness, to concern themselves with the welfare of

citizens of a different political jurisdiction. A variation on this theme is the

contention that across state political boundaries moral mandates can only be

negative requirements not to harm and cannot be positive requirements to help.

I am unconvinced that, in general, state political borders and national citizen-

ship are markers of such extraordinary and over-riding moral significance.

Whatever may be the case in general, this second objection is especially

unpersuasive if raised on behalf of citizens of the industrialized wealthy states in

the context of international cooperation to deal with environmental problems

primarily caused by their own states and of greatest concern in the medium

term to those states.

To help to maintain a guarantee of a minimum could mean either of two

things: a weaker requirement (a) not to interfere with others’ ability to maintain

a minimum for themselves; or a stronger requirement (b) to provide assistance

to others in maintaining a minimum for themselves. If everyone has a general

obligation, even towards strangers in other states and societies, not to inflict

harm on other persons, the weaker requirement would follow, provided only

that interfering with people’s ability to maintain a minimum for themselves

counted as a serious harm, as it certainly would seem to. Accordingly, persons

with no other bonds to each other would still be obliged not to hinder the

others’ efforts to provide a minimum for themselves.

One could not, for example, demand as one of the terms of an agreement
that someone make sacrifices that would leave the person without necessities.

This means that any agreement to cooperate made between people having

more than enough and people not having enough cannot justifiably require

those who start out without enough to make any sacrifices. Those who lack

essentials will still have to agree to act cooperatively, if there is in fact to be

cooperation, but they should not bear the costs of even their own cooperation.

Because a demand that those lacking essentials should make a sacrifice would

harm them, making such a demand is unfair.

Global environment and international inequality

That (a), the weaker requirement, holds, seems perfectly clear. When, if

ever, would (b), the stronger requirement to provide assistance to others in

maintaining a minimum for themselves, hold? Consider the case at hand.

Wealthy states, which are wealthy in large part because they are operating

industrial processes, ask the poor states, which are poor in large part because

they have not industrialized, to cooperate in controlling the bad effects of these

same industrial processes, like the destruction of atmospheric ozone and the

creation of global warming. Assume that the citizens of the wealthy states have

no general obligation, which holds prior to and independently of any

agreement to work together on environmental problems, to contribute to the

provision of a guaranteed minimum for the citizens of the poor states. The

citizens of the poor states certainly have no general obligation, which holds

prior to and independently of any agreement, to assist the wealthy states in

dealing with the environmental problems that the wealthy states’ own industrial

processes are producing. It may ultimately be in the interest of the poor states to

see ozone depletion and global warming stopped, but in the medium term the

citizens of the poor states have far more urgent and serious problems-like

lack of food, lack of clean drinking water and lack of jobs to provide minimal

support for themselves and their families. If the wealthy states say to the

poor states, in effect, ‘our most urgent request of you is that you act in ways that

will avoid worsening the ozone depletion and global warming that we have

started’, the poor states could reasonably respond, ‘our most urgent request of

you is assistance in guaranteeing the fulfilment of the essential needs of our


In other words, if the wealthy have no general obligation to help the poor,

the poor certainly have no general obligation to help the wealthy. If this

assumed absence of general obligations means that matters are to be determined

by national interest rather than international obligation, then surely the poor

states are as fully at liberty to specify their own top priority as the wealthy states

are. The poor states are under no general prior obligation to be helpful to the

wealthy states in dealing with whatever happens to be the top priority of the

wealthy states. This is all the more so as long as the wealthy states remain
content to watch hundreds of thousands of children die each year in the poor

states for lack of material necessities, which the total resources in the world

could remedy many times over. If the wealthy states are content to allow radical
inequalities to persist and worsen, it is difficult to see why the poor states should

divert their attention from their own worst problems in order to help out with

problems that for them are far less immediate and deadly. It is as if I am starving

to death, and you want me to agree to stop searching for food and instead to

help repair a leak in the roof of your house without your promising me any

food. Why should I turn my attention away from my own more severe problem
to your less severe one, when I have no guarantee that if I help you with your

problem you will help me with mine? If any arrangement would ever be unfair,
that one would.


Henry Shue

Radical human inequalities cannot be tolerated and ought to be eliminated,

irrespective of whether their elimination involves the movement of resources

across national political boundaries: resources move across national boundaries

all the time for all sorts of reasons. I have not argued here for this judgement

about radical inequality, however.9 The conclusion for which I have provided a

rationale is even more compelling: when radical inequalities exist, it is unfair for

people in states with far more than enough to expect people in states with less

than enough to turn their attention away from their own problems in order to

cooperate with the much better-off in solving their problems (and all the more

unfair-in light of the first principle of equity-when the problems that

concern the much better-off were created by the much better-off themselves in

the very process of becoming as well off as they are). The least that those below

the minimum can reasonably demand in reciprocity for their attention to the

problems that concern the best-off is that their own most vital problems be

attended to: that they be guaranteed means of fulfilling their minimum needs.

Any lesser guarantee is too little to be fair, which is to say that any international

agreement that attempts to leave radical inequality across national states un-

touched while asking effort from the worst-off to assist the best-off is grossly



I have emphasized that the reasons for the second and third principles of equity

are fundamentally the same, namely, avoiding making those who are already the

worst-off yet worse off The second principle serves this end by requiring that

when contributions must be made, they should be made more heavily by the

better-off, irrespective of whether the existing inequality is justifiable. The third
principle serves this end by requiring that no contributions be made by those

below the minimum unless they are guaranteed ways to bring themselves up at

least to the minimum, which assumes that radical inequalities are unjustified.

Together, the second and third principles require that if any contributions to a

common effort are to be expected of people whose minimum needs have not

been guaranteed so far, guarantees must be provided; and the guarantees must

be provided most heavily by the best-off.

The reason for the first principle was different from the reason for the second

principle, in that the reason for the first rests on the assumption that an existing

inequality is already unjustified. The reason for the third principle rests on the

same assumption. The first and third principles apply, however, to inequalities

that are, respectively, unjustified for different kinds of reasons. Inequalities to

which the first principle applies are unjustified because of how they arose,
namely some people have been benefiting unfairly by dumping the costs of

their own advances upon other people. Inequalities to which the third principle

9 And for the argument to the contrary see Miller, ‘Cosmopolitan respect and patriotic concern’.


Global environment and international inequality

applies are unjustified independently of how they arose and simply because they

are radical, that is, so extreme in circumstances in which it would be very easy

to make them less extreme.

What stands out is that in spite of the different content of these three

principles of equity, and in spite of the different kinds of grounds upon which

they rest, they all converge upon the same practical conclusion: whatever needs

to be done by wealthy industrialized states or by poor non-industrialized states

about global environmental problems like ozone destruction and global

warming, the costs should initially be borne by the wealthy industrialized states.


This content downloaded from on Tue, 10 Mar 2020 16:08:20 UTC
All use subject to

Measurement Validity: A Shared Standard for Qualitative and Quantitative Research
Author(s): Robert Adcock and David Collier
Source: The American Political Science Review, Vol. 95, No. 3 (Sep., 2001), pp. 529-546
Published by: American Political Science Association
Stable URL:
Accessed: 10-03-2020 16:19 UTC

Aeica PoitclSineRveVo.9,N.3Spmbr20

Measurement Validity: A Shared Standard for Qualitative and Quantitative
ROBERT ADCOCK and DAVID COLLIER University of California, Berkeley

Scholars routinely make claims that presuppose the validity of the observations and measurements that
operationalize their concepts. Yet, despite recent advances in political science methods, surprisingly
little attention has been devoted to measurement validity. We address this gap by exploring four themes.
First, we seek to establish a shared framework that allows quantitative and qualitative scholars to assess
more effectively, and communicate about, issues of valid measurement. Second, we underscore the need to
draw a clear distinction between measurement issues and disputes about concepts. Third, we discuss the
contextual specificity of measurement claims, exploring a variety of measurement strategies that seek to
combine generality and validity by devoting greater attention to context. Fourth, we address the proliferation
of terms for alternative measurement validation procedures and offer an account of the three main types of
validation most relevant to political scientists.

R esearchers routinely make complex choices
about linking concepts to observations, that is,
about connecting ideas with facts. These choices

raise the basic question of measurement validity: Do
the observations meaningfully capture the ideas con-
tained in the concepts? We will explore the meaning of
this question as well as procedures for answering it. In
the process we seek to formulate a methodological
standard that can be applied in both qualitative and
quantitative research.

Measurement validity is specifically concerned with
whether operationalization and the scoring of cases
adequately reflect the concept the researcher seeks to
measure. This is one aspect of the broader set of
analytic tasks that King, Keohane, and Verba (1994,
chap. 2) call “descriptive inference,” which also encom-
passes, for example, inferences from samples to popu-
lations. Measurement validity is distinct from the va-
lidity of “causal inference” (chap. 3), which Cook and
Campbell (1979) further differentiate into internal and
external validity.1 Although measurement validity is
interconnected with causal inference, it stands as an
important methodological topic in its own right.

New attention to measurement validity is overdue in
political science. While there has been an ongoing
concern with applying various tools of measurement
validation (Berry et al. 1998; Bollen 1993; Elkins 2000;
Hill, Hanna, and Shafqat 1997; Schrodt and Gerner

Robert Adcock ( is a Ph.D candi-
date, Department of Political Science, and David Collier
( is Professor of Political Science,
University of California, Berkeley, CA 94720-1950.

1994), no major statement on this topic has appeared
since Zeller and Carmines (1980) and Bollen (1989).
Although King, Keohane, and Verba (1994, 25, 152-5)
cover many topics with remarkable thoroughness, they
devote only brief attention to measurement validity.
New thinking about measurement, such as the idea of
measurement as theory testing (Jacoby 1991, 1999), has
not been framed in terms of validity.

Four important problems in political science re-
search can be addressed through renewed attention to
measurement validity. The first is the challenge of
establishing shared standards for quantitative and qual-
itative scholars, a topic that has been widely discussed
(King, Keohane, and Verba 1994; see also Brady and
Collier 2001; George and Bennett n.d.). We believe the
skepticism with which qualitative and quantitative re-
searchers sometimes view each other’s measurement

tools does not arise from irreconcilable methodological
differences. Indeed, substantial progress can be made
in formulating shared standards for assessing measure-
ment validity. The literature on this topic has focused
almost entirely on quantitative research, however,
rather than on integrating the two traditions. We
propose a framework that yields standards for mea-
surement validation and we illustrate how these apply
to both approaches. Many of our quantitative and
qualitative examples are drawn from recent compara-
tive work on democracy, a literature in which both
groups of researchers have addressed similar issues.
This literature provides an opportunity to identify
parallel concerns about validity as well as differences in
specific practices.

A second problem concerns the relation between
measurement validity and disputes about the meaning
of concepts. The clarification and refinement of con-
cepts is a fundamental task in political science, and
carefully developed concepts are, in turn, a major
prerequisite for meaningful discussions of measure-
ment validity. Yet, we argue that disputes about con-
cepts involve different issues from disputes about mea-
surement validity. Our framework seeks to make this
distinction clear, and we illustrate both types of dis-

A third problem concerns the contextual specificity


American Political Science Review Vol. 95, No. 3
September 2001

September 2001

Measurement Validity: A Shared Standard for Qualitative and Quantitative Research

of measurement validity-an issue that arises when a
measure that is valid in one context is invalid in

another. We explore several responses to this problem
that seek a middle ground between a universalizing
tendency, which is inattentive to contextual differences,
and a particularizing approach, which is skeptical about
the feasibility of constructing measures that transcend
specific contexts. The responses we explore seek to
incorporate sensitivity to context as a strategy for
establishing equivalence across diverse settings.

A fourth problem concerns the frequently confusing
language used to discuss alternative procedures for
measurement validation. These procedures have often
been framed in terms of different “types of validity,”
among which content, criterion, convergent, and con-
struct validity are the best known. Numerous other
labels for alternative types have also been coined, and
we have found 37 different adjectives that have been
attached to the noun “validity” by scholars wrestling
with issues of conceptualization and measurement.2
The situation sometimes becomes further confused,
given contrasting views on the interrelations among
different types of validation. For example, in recent
validation studies in political science, one valuable
analysis (Hill, Hanna, and Shafqat 1997) treats “con-
vergent” validation as providing evidence for “con-
struct” validation, whereas another (Berry et al. 1998)
treats these as distinct types. In the psychometrics
tradition (i.e., in the literature on psychological and
educational testing) such problems have spurred a
theoretically productive reconceptualization. This liter-
ature has emphasized that the various procedures for
assessing measurement validity must be seen, not as
establishing multiple independent types of validity, but
rather as providing different types of evidence for valid-
ity. In light of this reconceptualization, we differentiate
between “validity” and “validation.” We use validity to
refer only to the overall idea of measurement validity,
and we discuss alternative procedures for assessing
validity as different “types of validation.” In the final
part of this article we offer an overview of three main
types of validation, seeking to emphasize how proce-
dures associated with each can be applied by both
quantitative and qualitative researchers.

In the first section of this article we introduce a

framework for discussing conceptualization, measure-
ment, and validity. We then situate questions of validity
in relation to broader concerns about the meaning of
concepts. Next, we address contextual specificity and
equivalence, followed by a review of the evolving
discussion of types of validation. Finally, we focus on
three specific types of validation that merit central

2 We have found the following adjectives attached to validity in
discussions of conceptualization and measurement: a priori, appar-
ent, assumption, common-sense, conceptual, concurrent, congruent,
consensual, consequential, construct, content, convergent, criterion-
related, curricular, definitional, differential, discriminant, empirical,
face, factorial, incremental, instrumental, intrinsic, linguistic, logical,
nomological, postdictive, practical, pragmatic, predictive, rational,
response, sampling, status, substantive, theoretical, and trait. A
parallel proliferation of adjectives, in relation to the concept of
democracy, is discussed in Collier and Levitsky 1997.

attention in political science: content, convergent/dis-
criminant, and nomological/construct validation.


Measurement validity should be understood in relation
to issues that arise in moving between concepts and

Levels and Tasks

We depict the relationship between concepts and ob-
servations in terms of four levels, as shown in Figure 1.
At the broadest level is the background concept, which
encompasses the constellation of potentially diverse
meanings associated with a given concept. Next is the
systematized concept, the specific formulation of a
concept adopted by a particular researcher or group of
researchers. It is usually formulated in terms of an
explicit definition. At the third level are indicators,
which are also routinely called measures. This level
includes any systematic scoring procedure, ranging
from simple measures to complex aggregated indexes.
It encompasses not only quantitative indicators but
also the classification procedures employed in qualita-
tive research. At the fourth level are scores for cases,
which include both numerical scores and the results of

qualitative classification.
Downward and upward movement in Figure 1 can be

understood as a series of research tasks. On the

left-hand side, conceptualization is the movement from
the background concept to the systematized concept.
Operationalization moves from the systematized con-
cept to indicators, and the scoring of cases applies
indicators to produce scores. Moving up on the right-
hand side, indicators may be refined in light of scores,
and systematized concepts may be fine-tuned in light of
knowledge about scores and indicators. Insights de-
rived from these levels may lead to revisiting the
background concept, which may include assessing al-
ternative formulations of the theory in which a partic-
ular systematized concept is embedded. Finally, to
define a key overarching term, “measurement” involves
the interaction among levels 2 to 4.

Defining Measurement Validity

Valid measurement is achieved when scores (including
the results of qualitative classification) meaningfully
capture the ideas contained in the corresponding con-
cept. This definition parallels that of Bollen (1989,
184), who treats validity as “concerned with whether a
variable measures what it is supposed to measure.”
King, Keohane, and Verba (1994, 25) give essentially
the same definition.

If the idea of measurement validity is to do serious
methodological work, however, its focus must be fur-
ther specified, as emphasized by Bollen (1989, 197).
Our specification involves both ends of the connection
between concepts and scores shown in Figure 1. At the
concept end, our basic point (explored in detail below)
is that measurement validation should focus on the


September 2001
Ameica Poiia cec eiwVl 5 o

FIGURE 1. Conceptualization and Measurement: Levels and Tasks


Level 1. Backgl
The broad constellati

understandings associat

Task: ConceF

Formulating a systel
reasoning about the
light of the goals of I


round Concept
ion of meanings and
ed with a given concept.

itualization Task: Revisiting Background
matized concept through Concept. Exploring broader issues concerning
background concept, in the background concept in light of insights about
research. scores, indicators, and the systematized concept.

Level 2. Systematized Concept
A specific formulation of a concept used by a

given scholar or group of scholars;
commonly involves an explicit definition. 2.

Task: Operationalization
Developing, on the basis of a systema-
tized concept, one or more indicators
for scoring/classifying cases.

Task: Modifying Systematized
Concept. Fine-tuning the systematized
concept, or possibly extensively revising it, in
light of insights about scores and indicators.

ures” and “opera- y

Level 3. Indi
Also referred to as “meas
tinnalinztinns ” In nlalitativ ,e research, these

tions employed in

// are the operal
/ I – cl

Task: Scoring Cases
Applying these indicators to produce
scores for the cases being analyzed.

II,I %,.11..1i,.A,?I

tional definit

lassifying ca

Level 4. Scores

The scores for cases gener
indicator. These include bot

and the results of qualital

relation between observations and the systematized
concept; any potential disputes about the background
concept should be set aside as an important but
separate issue. With regard to scores, an obvious but
crucial point must be stressed: Scores are never exam-
ined in isolation; rather, they are interpreted and given
meaning in relation to the systematized concept.

In sum, measurement is valid when the scores (level
4 in Figure 1), derived from a given indicator (level 3),
can meaningfully be interpreted in terms of the system-
atized concept (level 2) that the indicator seeks to
operationalize. It would be cumbersome to refer re-
peatedly to all these elements, but the appropriate
focus of measurement validation is on the conjunction
of these components.

Task: Refining Indicators
Modifying indicators, or potentially creating
new indicators, in light of observed scores.

for Cases

ated by a particular
:h numerical scores

tive classification

Measurement Error, Reliability, and Validity

Validity is often discussed in connection with measure-
ment error and reliability. Measurement error may be
systematic-in which case it is called bias-or random.
Random error, which occurs when repeated applica-
tions of a given measurement procedure yield incon-
sistent results, is conventionally labeled a problem of
reliability. Methodologists offer two accounts of the
relation between reliability and validity. (1) Validity is
sometimes understood as exclusively involving bias,
that is error that takes a consistent direction or form.

From this perspective, validity involves systematic er-
ror, whereas reliability involves random error (Car-
mines and Zeller 1979, 14-5; see also Babbie 2001,


C *0






Measurement Validity: A Shared Standard for Qualitative and Quantitative Research

144-5). Therefore, unreliable scores may still be cor-
rect “on average” and in this sense valid. (2) Alterna-
tively, some scholars hesitate to view scores as valid if
they contain large amounts of random error. They
believe validity requires the absence of both types of
error. Therefore, they view reliability as a necessary but
not sufficient condition of measurement validity (Kirk
and Miller 1986, 20; Shively 1998, 45).

Our goal is not to adjudicate between these accounts
but to state them clearly and to specify our own focus,
namely, the systematic error that arises when the links
among systematized concepts, indicators, and scores
are poorly developed. This involves validity in the first
sense stated above. Of course, the random error that
routinely arises in scoring cases is also important, but it
is not our primary concern.

A final point should be emphasized. Because error is
a pervasive threat to measurement, it is essential to
view the interpretations of scores in relation to system-
atized concepts as falsifiable claims (Messick 1989,
13-4). Scholars should treat these claims just as they
would any casual hypothesis, that is, as tentative state-
ments that require supporting evidence. Validity as-
sessment is the search for this evidence.


A growing body of work considers the systematic
analysis of concepts an important component of polit-
ical science methodology.3 How should we understand
the relation between issues of measurement validity
and broader choices about concepts, which are a
central focus of this literature?

Conceptual Choices: Forming the
Systematized Concept
We view systematized concepts as the point of depar-
ture for assessing measurement validity. How do schol-
ars form such concepts? Because background concepts
routinely include a variety of meanings, the formation
of systematized concepts often involves choosing
among them. The number of feasible options varies
greatly. At one extreme are concepts such as triangle,
which are routinely understood in terms of a single
conceptual systematization; at the other extreme are
“contested concepts” (Gallie 1956), such as democracy.
A careful examination of diverse meanings helps clarify
the options, but ultimately choices must be made.

These choices are deeply interwined with issues of
theory, as emphasized in Kaplan’s (1964, 53) paradox
of conceptualization: “Proper concepts are needed to
formulate a good theory, but we need a good theory to
arrive at the proper concepts…. The paradox is
resolved by a process of approximation: the better our

3 Examples of earlier work in this tradition are Sartori 1970, 1984
and Sartori, Riggs, and Teune 1975. More recent studies include
Collier and Levitsky 1997; Collier and Mahon 1993; Gerring 1997,
1999, 2001; Gould 1999; Kurtz 2000; Levitsky 1998; Schaffer 1998.
Important work in political theory includes Bevir 1999; Freeden
1996; Gallie 1956; Pitkin 1967, 1987.

concepts, the better the theory we can formulate with
them, and in turn, the better the concepts available for
the next, improved theory.” Various examples of this
intertwining are explored in recent analyses of impor-
tant concepts, such as Laitin’s (2000) treatment of
language community and Kurtz’s (2000) discussion of
peasant. Fearon and Laitin’s (2000) analysis of ethnic
conflict, in which they begin with their hypothesis and
ask what operationalization is needed to capture the
conceptions of ethnic group and ethnic conflict en-
tailed in this hypothesis, further illustrates the interac-
tion of theory and concepts.

In dealing with the choices that arise in establishing
the systematized concept, researchers must avoid three
common traps. First, they should not misconstrue the
flexibility inherent in these choices as suggesting that
everything is up for grabs. This is rarely, if ever, the
case. In any field of inquiry, scholars commonly asso-
ciate a matrix of potential meanings with the back-
ground concept. This matrix limits the range of plau-
sible options, and the researcher who strays outside it
runs the risk of being dismissed or misunderstood. We
do not mean to imply that the background concept is
entirely fixed. It evolves over time, as new understand-
ings are developed and old ones are revised or fall from
use. At a given time, however, the background concept
usually provides a relatively stable matrix. It is essential
to recognize that a real choice is being made, but it is
no less essential to recognize that this is a limited

Second, scholars should avoid claiming too much in
defending their choice of a given systematized concept.
It is not productive to treat other options as self-
evidently ruled out by the background concept. For
example, in the controversy over whether democracy
versus nondemocracy should be treated as a dichotomy
or in terms of gradations, there is too much reliance on
claims that the background concept of democracy
inherently rules out one approach or the other (Collier
and Adcock 1999, 546-50). It is more productive to
recognize that scholars routinely emphasize different
aspects of a background concept in developing system-
atized concepts, each of which is potentially plausible.
Rather than make sweeping claims about what the
background concept “really” means, scholars should
present specific arguments, linked to the goals and
context of their research, that justify their particular

A third problem occurs when scholars stop short of
providing a fleshed-out account of their systematized
concepts. This requires not just a one-sentence defini-
tion, but a broader specification of the meaning and
entailments of the systematized concept. Within the
psychometrics literature, Shepard (1993, 417) summa-
rizes what is required: “both an internal model of
interrelated dimensions or subdomains” of the system-
atized concept, and “an external model depicting its
relationship to other [concepts].” An example is Bol-
len’s (1990, 9-12; see also Bollen 1980) treatment of
political democracy, which distinguishes the two di-
mensions of “political rights” and “political liberties,”
clarifies these by contrasting them with the dimensions



September 2001

American Political Science Review Vol. 95, No. 3

developed by Dahl, and explores the relation between
them. Bollen further specifies political democracy
through contrasts with the concepts of stability and
social or economic democracy. In the language of
Sartori (1984, 51-4), this involves clarifying the seman-
tic field.

One consequence of this effort to provide a fleshed-
out account may be the recognition that the concept
needs to be disaggregated. What begins as a consider-
ation of the internal dimensions or components of a
single concept may become a discussion of multiple
concepts. In democratic theory an important example
is the discussion of majority rule and minority rights,
which are variously treated as components of a single
overall concept of democracy, as dimensions to be
analyzed separately, or as the basis for forming distinct
subtypes of democracy (Dahl 1956; Lijphart 1984;
Schmitter and Karl 1992). This kind of refinement may
result from new conceptual and theoretical arguments
or from empirical findings of the sort that are the focus
of the convergent/discriminant validation procedures
discussed below.

Measurement Validity and the Systematized
Versus Background Concept

We stated earlier that the systematized concept, rather
than the background concept, should be the focus in
measurement validation. Consider an example. A re-
searcher may ask: “Is it appropriate that Mexico, prior
to the year 2000 (when the previously dominant party
handed over power after losing the presidential elec-
tion), be assigned a score of 5 out of 10 on an indicator
of democracy? Does this score really capture how
‘democratic’ Mexico was compared to other coun-
tries?” Such a question remains underspecified until we
know whether “democratic” refers to a particular sys-
tematized concept of democracy, or whether this re-
searcher is concerned more broadly with the back-
ground concept of democracy. Scholars who question
Mexico’s score should distinguish two issues: (1) a
concern about measurement-whether the indicator

employed produces scores that can be interpreted as
adequately capturing the systematized concept used in
a given study and (2) a conceptual concern-whether
the systematized concept employed in creating the
indicator is appropriate vis-a-vis the background con-
cept of democracy.

We believe validation should focus on the first issue,
whereas the second is outside the realm of measure-

ment validity. This distinction seems especially appro-
priate in view of the large number of contested con-
cepts in political science. The more complex and
contested the background concept, the more important
it is to distinguish issues of measurement from funda-
mental conceptual disputes. To pose the question of
validity we need a specific conceptual referent against
which to assess the adequacy of a given measure. A
systematized concept provides that referent. By con-
trast, if analysts seek to establish measurement validity
in relation to a background concept with multiple

competing meanings, they may find a different answer
to the validity question for each meaning.

By restricting the focus of measurement validation to
the systematized concept, we do not suggest that
political scientists should ignore basic conceptual is-
sues. Rather, arguments about the background concept
and those about validity can be addressed adequately
only when each is engaged on its own terms, rather
than conflated into one overly broad issue. Consider
Schumpeter’s (1947, chap. 21) procedural definition of
democracy. This definition explicitly rules out elements
of the background concept, such as the concern with
substantive policy outcomes, that had been central to
what he calls the classical theory of democracy. Al-
though Schumpeter’s conceptualization has been very
influential in political science, some scholars (Harding
and Petras 1988; Mouffe 1992) have called for a revised
conception that encompasses other concerns, such as
social and economic outcomes. This important debate
exemplifies the kind of conceptual dispute that should
be placed outside the realm of measurement validity.

Recognizing that a given conceptual choice does not
involve an issue of measurement validity should not
preclude considered arguments about this choice. An
example is the argument that minimal definitions can
facilitate causal assessment (Alvarez et al. 1996, 4; Karl
1990, 1-2; Linz 1975, 181-2; Sartori 1975, 34). For
instance, in the debate about a procedural definition of
democracy, a pragmatic argument can be made that if
analysts wish to study the casual relationship between
democracy and socioeconomic equality, then the latter
must be excluded from the systematization of the
former. The point is that such arguments can effectively
justify certain conceptual choices, but they involve
issues that are different from the concerns of measure-
ment validation.

Fine-Tuning the Systematized Concept with
Friendly Amendments

We define measurement validity as concerned with the
relation among scores, indicators, and the systematized
concept, but we do not rule out the introduction of new
conceptual ideas during the validation process. Key
here is the back-and-forth, iterative nature of research
emphasized in Figure 1. Preliminary empirical work
may help in the initial formulation of concepts. Later,
even after conceptualization appears complete, the
application of a proposed indicator may produce un-
expected observations that lead scholars to modify
their systematized concepts. These “friendly amend-
ments” occur when a scholar, out of a concern with
validity, engages in further conceptual work to suggest
refinements or make explicit earlier implicit assump-
tions. These amendments are friendly because they do
not fundamentally challenge a systematized concept
but instead push analysts to capture more adequately
the ideas contained in it.

A friendly amendment is illustrated by the emer-
gence of the “expanded procedural minimum” defini-
tion of democracy (Collier and Levitsky 1997, 442-4).
Scholars noted that, despite free or relatively free


American Political Science Review Vol. 95, No. 3

Measurement Validity: A Shared Standard for Qualitative and Quantitative Research

elections, some civilian governments in Central and
South America to varying degrees lacked effective
power to govern. A basic concern was the persistence
of “reserved domains” of military power over which
elected governments had little authority (Valenzuela
1992, 70). Because procedural definitions of democracy
did not explicitly address this issue, measures based
upon them could result in a high democracy score for
these countries, but it appeared invalid to view them as
democratic. Some scholars therefore amended their

systematized concept of democracy to add the differ-
entiating attribute that the elected government must to
a reasonable degree have the power to rule (Karl 1990,
2; Loveman 1994, 108-13; Valenzuela 1992, 70). De-
bate persists over the scoring of specific cases (Rabkin
1992, 165), but this innovation is widely accepted
among scholars in the procedural tradition (Hunting-
ton 1991, 10; Mainwaring, Brinks, and Perez-Linan
2001; Markoff 1996, 102-4). As a result of this friendly
amendment, analysts did a better job of capturing, for
these new cases, the underlying idea of procedural
minimum democracy.


Contextual specificity is a fundamental concern that
arises when differences in context potentially threaten
the validity of measurement. This is a central topic in
psychometrics, the field that has produced the most
innovative work on validity theory. This literature
emphasizes that the same score on an indicator may
have different meanings in different contexts (Moss
1992, 236-8; see also Messick 1989, 15). Hence, the
validation of an interpretation of scores generated in
one context does not imply that the same interpreta-
tion is valid for scores generated in another context. In
political science, this concern with context can arise
when scholars are making comparisons across different
world regions or distinct historical periods. It can also
arise in comparisons within a national (or other) unit,
given that different subunits, regions, or subgroups may
constitute very different political, social, or cultural

The potential difficulty that context poses for valid
measurement, and the related task of establishing
measurement equivalence across diverse units, deserve
more attention in political science. In a period when
the quest for generality is a powerful impulse in the
social sciences, scholars such as Elster (1999, chap. 1)
have strongly challenged the plausibility of seeking
general, law-like explanations of political phenomena.
A parallel constraint on the generality of findings may
be imposed by the contextual specificity of measure-
ment validity. We are not arguing that the quest for
generality be abandoned. Rather, we believe greater
sensitivity to context may help scholars develop mea-
sures that can be validly applied across diverse con-
texts. This goal requires concerted attention to the
issue of equivalence.

Contextual Specificity in Political Research

Contextual specificity affects many areas of political
science. It has long been a problem in cross-national
survey research (Sicinski 1970; Verba 1971; Verba,
Nie, and Kim 1978, 32-40; Verba et al. 1987, Appen-
dix). An example concerning features of national con-
text is Cain and Ferejohn’s (1981) discussion of how
the differing structure of party systems in the United
States and Great Britain should be taken into account

when comparing party identification. Context is also a
concern for survey researchers working within a single
nation, who wrestle with the dilemma of “inter-person-
ally incomparable responses” (Brady 1985). For exam-
ple, scholars debate whether a given survey item has
the same meaning for different population sub-
groups-which could be defined, for example, by re-
gion, gender, class, or race. One specific concern is
whether population subgroups differ systematically in
their “response style” (also called “response sets”).
Some groups may be more disposed to give extreme
answers, and others may tend toward moderate an-
swers (Greenleaf 1992). Bachman and O’Malley (1984)
show that response style varies consistently with race.
They argue that apparently important differences
across racial groups may in part reflect only a different
manner of answering questions. Contextual specificity
also can be a problem in survey comparisons over time,
as Baumgartner and Walker (1990) point out in dis-
cussing group membership in the United States.

The issue of contextual specificity of course also
arises in macro-level research in international and

comparative studies (Bollen, Entwisle, and Anderson
1993, 345). Examples from the field of comparative
politics are discussed below. In international relations,
attention to context, and particularly a concern with
“historicizing the concept of structure,” is central to
“constructivism” (Ruggie 1998, 875). Constructivists
argue that modern international relations rest upon
“constitutive rules” that differ fundamentally from
those of both medieval Christendom and the classical

Greek world (p. 873). Although they recognize that
sovereignty is an organizing principle applicable across
diverse settings, the constructivists emphasize that the
“meaning and behavioral implications of this principle
vary from one historical context to another” (Reus-
Smit 1997, 567). On the other side of this debate,
neorealists such as Fischer (1993, 493) offer a general
warning: If pushed to an extreme, the “claim to context
dependency” threatens to “make impossible the collec-
tive pursuit of empirical knowledge.” He also offers
specific historical support for the basic neorealist posi-
tion that the behavior of actors in international politics
follows consistent patterns. Fischer (1992, 463, 465)
concludes that “the structural logic of action under
anarchy has the character of an objective law,” which is
grounded in “an unchanging essence of human na-

The recurring tension in social research between
particularizing and universalizing tendencies reflects in
part contrasting degrees of concern with contextual
specificity. The approaches to establishing equivalence


September 2001
American Political Science Review Vol. 95, No. 3

discussed below point to the option of a middle ground.
These approaches recognize that contextual differences
are important, but they seek to combine this insight
with the quest for general knowledge.
The lessons for political science are clear. Any
empirical assessment of measurement validity is neces-
sarily based on a particular set of cases, and validity
claims should be made, at least initially, with reference
to this specific set. To the extent that the set is
heterogeneous in ways that may affect measurement
validity, it is essential to (1) assess the implications for
establishing equivalence across these diverse contexts
and, if necessary, (2) adopt context-sensitive measures.
Extension to additional cases requires similar proce-

Establishing Equivalence: Context-Specific
Domains of Observation

One important means of establishing equivalence
across diverse contexts is careful reasoning, in the
initial stages of operationalization, about the specific
domains to which a systematized concept applies. Well
before thinking about particular scoring procedures,
scholars may need to make context-sensitive choices
regarding the parts of the broader polity, economy, or
society to which they will apply their concept. Equiva-
lent observations may require, in different contexts, a
focus on what at a concrete level might be seen as
distinct types of phenomena.

Some time ago, Verba (1967) called attention to the
importance of context-specific domains of observation.
In comparative research on political opposition in
stable democracies, a standard focus is on political
parties and legislative politics, but Verba (pp. 122-3)
notes that this may overlook an analytically equivalent
form of opposition that crystallizes, in some countries,
in the domain of interest group politics. Skocpol (1992,
6) makes a parallel argument in questioning the claim
that the United States was a “welfare laggard” because
social provision was not launched on a large scale until
the New Deal. This claim is based on the absence of

standard welfare programs of the kind that emerged
earlier in Europe but fails to recognize the distinctive
forms of social provision in the United States, such as
veterans’ benefits and support for mothers and chil-
dren. Skocpol argues that the welfare laggard charac-
terization resulted from looking in the wrong place,
that is, in the wrong domain of policy.

Locke and Thelen (1995, 1998) have extended this
approach in their discussion of “contextualized com-
parison.” They argue that scholars who study national
responses to external pressure for economic decentral-
ization and “flexibilization” routinely focus on the
points at which conflict emerges over this economic
transformation. Yet, these “sticking points” may be
located in different parts of the economic and political
system. With regard to labor politics in different coun-
tries, such conflicts may arise over wage equity, hours
of employment, workforce reduction, or shop-floor
reorganization. These different domains of labor rela-
tions must be examined in order to gather analytically

equivalent observations that adequately tap the con-
cept of sticking point. Scholars who look only at wage
conflicts run the risk of omitting, for some national
contexts, domains of conflict that are highly relevant to
the concept they seek to measure.

By allowing the empirical domain to which a system-
atized concept is applied to vary across the units being
compared, analysts may take a productive step toward
establishing equivalence among diverse contexts. This
practice must be carefully justified, but under some
circumstances it can make an important contribution to
valid measurement.

Establishing Equivalence: Context-Specific
Indicators and Adjusted Common

Two other ways of establishing equivalence involve
careful work at the level of indicators. We will discuss

context-specific indicators,4 and what we call adjusted
common indicators. In this second approach, the same
indicator is applied to all cases but is weighted to
compensate for contextual differences.

An example of context-specific indicators is found in
Nie, Powell, and Prewitt’s (1969, 377) five-country
study of political participation. For all the countries,
they analyze four relatively standard attributes of par-
ticipation. Regarding a fifth attribute-membership in
a political party-they observe that in four of the
countries party membership has a roughly equivalent
meaning, but in the United States it has a different
form and meaning. The authors conclude that involve-
ment in U.S. electoral campaigns reflects an equivalent
form of political participation. Nie, Powell, and Prewitt
thus focus on a context-specific domain of observation
(the procedure just discussed above) by shifting their
attention, for the U.S. context, from party membership
to campaign participation. They then take the further
step of incorporating within their overall index of
political participation context-specific indicators that
for each case generate a score for what they see as the
appropriate domain. Specifically, the overall index for
the United States includes a measure of campaign
participation rather than party membership.

A different example of context-specific indicators is
found in comparative-historical research, in the effort
to establish a meaningful threshold for the onset of
democracy in the nineteenth and early twentieth cen-
tury, as opposed to the late twentieth century. This
effort in turn lays a foundation for the comparative
analysis of transitions to democracy. One problem in
establishing equivalence across these two eras lies in
the fact that the plausible agenda of “full” democrati-
zation has changed dramatically over time. “Full” by
the standards of an earlier period is incomplete by later
standards. For example, by the late twentieth century,
universal suffrage and the protection of civil rights for
the entire national population had come to be consid-
ered essential features of democracy, but in the nine-

4 This approach was originally used by Przeworski and Teune (1970,
chap. 6), who employed the label “system-specific indicator.”


Vol. 95, No. 3 American Political Science Review
Measurement Validity: A Shared Standard for Qualitative and Quantitative Research

teenth century they were not (Huntington 1991, 7, 16).
Yet, if the more recent standard is applied to the
earlier period, cases are eliminated that have long been
considered classic examples of nascent democratiza-
tion in Europe. One solution is to compare regimes
with respect to a systematized concept of full democ-
ratization that is operationalized according to the
norms of the respective periods (Collier 1999, chap. 1;
Russett 1993, 15; see also Johnson 1999, 118). Thus, a
different scoring procedure-a context-specific indica-
tor-is employed in each period in order to produce
scores that are comparable with respect to this system-
atized concept.5

Adjusted common indicators are another way to
establish equivalence. An example is found in Moene
and Wallerstein’s (2000) quantitative study of social
policy in advanced industrial societies, which focuses
specifically on public social expenditures for individuals
outside the labor force. One component of their mea-
sure is public spending on health care. The authors
argue that in the United States such health care
expenditures largely target those who are not members
of the labor force. By contrast, in other countries
health expenditures are allocated without respect to
employment status. Because U.S. policy is distinctive,
the authors multiply health care expenditures in the
other countries by a coefficient that lowers their scores
on this variable. Their scores are thereby made roughly
equivalent-as part of a measure of public expendi-
tures on individuals outside the labor force-to the

U.S. score. A parallel effort to establish equivalence in
the analysis of economic indicators is provided by
Zeitsch, Lawrence, and Salernian (1994, 169), who use
an adjustment technique in estimating total factor
productivity to take account of the different operating
environments, and hence the different context, of the
industries they compare. Expressing indicators in per-
capita terms is also an example of adjusting indicators
in light of context. Overall, this practice is used to
address both very specific problems of equivalence, as
with the Moene and Wallerstein example, as well as
more familiar concerns, such as standardizing by pop-

Context-specific indicators and adjusted common
indicators are not always a step forward, and some
scholars have self-consciously avoided them. The use of
such indicators should match the analytic goal of the
researcher. For example, many who study democrati-
zation in the late twentieth century deliberately adopt
a minimum definition of democracy in order to con-
centrate on a limited set of formal procedures. They do
this out of a conviction that these formal procedures
are important, even though they may have different
meanings in particular settings. Even a scholar such as
O’Donnell (1993, 1355), who has devoted great atten-
tion to contextualizing the meaning of democracy,
insists on the importance of also retaining a minimal
definition of “political democracy” that focuses on

5 A well-known example of applying different standards for democ-
racy in making comparisons across international regions is Lipset
(1959, 73-4).

basic formal procedures. Thus, for certain purposes, it
can be analytically productive to adopt a standard
definition that ignores nuances of context and apply the
same indicator to all cases.

In conclusion, we note that although Przeworski and
Teune’s (1970) and Verba’s arguments about equiva-
lence are well known, issues of contextual specificity
and equivalence have not received adequate attention
in political science. We have identified three tools-
context-specific domains of observation, context-spe-
cific indicators, and adjusted common indicators-for
addressing these issues, and we encourage their wider
use. We also advocate greater attention to justifying
their use. Claims about the appropriateness of contex-
tual adjustments should not simply be asserted; their
validity needs to be carefully defended. Later, we
explore three types of validation that may be fruitfully
applied in assessing proposals for context-sensitive
measurement. In particular, content validation, which
focuses on whether operationalization captures the
ideas contained in the systematized concept, is central
to determining whether and how measurement needs
to be adjusted in particular contexts.


Discussions of measurement validity are confounded
by the proliferation of different types of validation, and
by an even greater number of labels for them. In this
section we review the emergence of a unified concep-
tion of measurement validity in the field of psychomet-
rics, propose revisions in the terminology for talking
about validity, and examine the important treatments
of validation in political analysis offered by Carmines
and Zeller, and by Bollen.

Evolving Understandings of Validity

In the psychometric tradition, current thinking about
measurement validity developed in two phases. In the
first phase, scholars wrote about “types of validity” in a
way that often led researchers to treat each type as if it
independently established a distinct form of validity. In
discussing this literature we follow its terminology by
referring to types of “validity.” As noted above, in the
rest of this article we refer instead to types of “valida-

The first pivotal development in the emergence of a
unified approach occurred in the 1950s and 1960s,
when a threefold typology of content, criterion, and
construct validity was officially established in reaction
to the confusion generated by the earlier proliferation
of types.6 Other labels continued to appear in other
disciplines, but this typology became an orthodoxy in

6 The second of these is often called criterion-related validity.
Regarding these official standards, see American Psychological As-
sociation 1954, 1966; Angoff 1988, 25; Messick 1989, 16-7; Shultz,
Riggs, and Kottke 1998, 267-9. The 1954 standards initially pre-
sented four types of validity, which became the threefold typology in
1966 when “predictive” and “concurrent” validity were combined as
“criterion-related” validity.


American Political Science Review Vol. 95, No. 3

psychology. A recurring metaphor in that field charac-
terized the three types as “something of a holy trinity
representing three different roads to psychometric sal-
vation” (Guion 1980, 386). These types may be briefly
defined as follows.

* Content validity assesses the degree to which an
indicator represents the universe of content entailed
in the systematized concept being measured.
* Criterion validity assesses whether the scores pro-
duced by an indicator are empirically associated with
scores for other variables, called criterion variables,
which are considered direct measures of the phe-
nomenon of concern.

* Construct validity has had a range of meanings. One
central focus has been on assessing whether a given
indicator is empirically associated with other indica-
tors in a way that conforms to theoretical expecta-
tions about their interrelationship.

These labels remain very influential and are still the
centerpiece in some discussions of measurement valid-
ity, as in the latest edition of Babbie’s (2001, 143-4)
widely used methods textbook for undergraduates.

The second phase grew out of increasing dissatisfac-
tion with the “trinity” and led to a “unitarian” ap-
proach (Shultz, Riggs, and Kottke 1998, 269-71). A
basic problem identified by Guion (1980, 386) and
others was that the threefold typology was too often
taken to mean that any one type was sufficient to
establish validity (Angoff 1988, 25). Scholars increas-
ingly argued that the different types should be sub-
sumed under a single concept. Hence, to continue with
the prior metaphor, the earlier trinity came to be seen
“in a monotheistic mode as the three aspects of a
unitary psychometric divinity” (p. 25).

Much of the second phase involved a reconceptual-
ization of construct validity and its relation to content
and criterion validity. A central argument was that the
latter two may each be necessary to establish validity,
but neither is sufficient. They should be understood as
part of a larger process of validation that integrates
“multiple sources of evidence” and requires the com-
bination of “logical argument and empirical evidence”
(Shepard 1993, 406). Alongside this development, a
reconceptualization of construct validity led to “a more
comprehensive and theory-based view that subsumed
other more limited perspectives” (Shultz, Riggs, and
Kottke 1998, 270). This broader understanding of
construct validity as the overarching goal of a single,
integrated process of measurement validation is widely
endorsed by psychometricians. Moss (1995, 6) states
“there is a close to universal consensus among validity
theorists” that “content- and criterion-related evidence

of validity are simply two of many types of evidence
that support construct validity.”

Thus, in the psychometric literature (e.g., Messick
1980, 1015), the term “construct validity” has become
essentially a synonym for what we call measurement
validity. We have adopted measurement validity as the
name for the overall topic of this article, in part
because in political science the label construct validity
commonly refers to specific procedures rather than to

the general idea of valid measurement. These specific
procedures generally do not encompass content valida-
tion and have in common the practice of assessing
measurement validity by taking as a point of reference
established conceptual and/or theoretical relationships.

We find it helpful to group these procedures into two
types according to the kind of theoretical or conceptual
relationship that serves as the point of reference.
Specifically, these types are based on the heuristic
distinction between description and explanation.7 First,
some procedures rely on “descriptive” expectations
concerning whether given attributes are understood as
facets of the same phenomenon. This is the focus of
what we label “convergent/discriminant validation.”
Second, other procedures rely on relatively well-estab-
lished “explanatory” causal relations as a baseline
against which measurement validity is assessed. In
labeling this second group of procedures we draw on
Campbell’s (1960, 547) helpful term, “nomological”
validation, which evokes the idea of assessment in
relation to well-established causal hypotheses or law-
like relationships. This second type is often called
construct validity in political research (Berry et al.
1998; Elkins 2000).8 Out of deference to this usage, in
the headings and summary statements below we will
refer to nomological/construct validation.

Types of Validation in Political Analysis
A baseline for the revised discussion of validation

presented below is provided in work by Carmines and
Zeller, and by Bollen. Carmines and Zeller (1979, 26;
Zeller and Carmines 1980, 78-80) argue that content
validation and criterion validation are of limited utility
in fields such as political science. While recognizing
that content validation is important in psychology and
education, they argue that evaluating it “has proved to
be exceedingly difficult with respect to measures of the
more abstract phenomena that tend to characterize the
social sciences” (Carmines and Zeller 1979, 22). For
criterion validation, these authors emphasize that in
many social sciences, few “criterion” variables are
available that can serve as “real” measures of the

phenomena under investigation, against which scholars
can evaluate alternative measures (pp. 19-20). Hence,
for many purposes it is simply not a relevant procedure.
Although Carmines and Zeller call for the use of
multiple sources of evidence, their emphasis on the
limitations of the first two types of validation leads
them to give a predominant role to nomological/
construct validation.

In relation to Carmines and Zeller, Bollen (1989,
185-6, 190-4) adds convergent/discriminant validation

7 Description and explanation are of course intertwined, but we find
this distinction invaluable for exploring contrasts among validation
procedures. While these procedures do not always fit in sharply
bounded categories, many do indeed focus on either descriptive or
explanatory relations and hence are productively differentiated by
our typology.
8 See also the main examples of construct validation presented in the
major statements by Carmines and Zeller 1979, 23, and Bollen 1989,


Measurement Validity: A Shared Standard for Qualitative and Quantitative Research

to their three types and emphasizes content validation,
which he sees as both viable and fundamental. He also

raises general concerns about correlation-based ap-
proaches to convergent and nomological/construct val-
idation, and he offers an alternative approach based on
structural equation modeling with latent variables (pp.
192-206). Bollen shares the concern of Carmines and
Zeller that, for most social research, “true” measures
do not exist against which criterion validation can be
carried out, so he likewise sees this as a less relevant
type (p. 188).

These valuable contributions can be extended in

several respects. First, with reference to Carmines and
Zeller’s critique of content validation, we recognize
that this procedure is harder to use if concepts are
abstract and complex. Moreover, it often does not lend
itself to the kind of systematic, quantitative analysis
routinely applied in some other kinds of validation.
Yet, like Bollen (1989, 185-6, 194), we are convinced it
is possible to lay a secure foundation for content
validation that will make it a viable, and indeed essen-
tial, procedure. Our discussion of this task below
derives from our distinction between the background
and the systematized concept.

Second, we share the conviction of Carmines and
Zeller that nomological/construct validation is impor-
tant, yet given our emphasis on content and conver-
gent/discriminant validation, we do not privilege it to
the degree they do. Our discussion will seek to clarify
some aspects of how this procedure actually works and
will address the skeptical reaction of many scholars to

Third, we have a twofold response to the critique of
criterion validation as irrelevant to most forms of social

research. On the one hand, in some domains criterion
validation is important, and this must be recognized.
For example, the literature on response validity in
survey research seeks to evaluate individual responses
to questions, such as whether a person voted in a
particular election, by comparing them to official voting
records (Anderson and Silver 1986; Clausen 1968;
Katosh and Traugott 1980). Similarly, in panel studies
it is possible to evaluate the adequacy of “recall” (i.e.,
whether respondents remember their own earlier opin-
ions, dispositions, and behavior) through comparison
with responses in earlier studies (Niemi, Katz, and
Newman 1980). On the other hand, this is not one of
the most generally applicable types of validation, and
we favor treating it as one subtype within the broader
category of convergent validation. As discussed below,
convergent validation compares a given indicator with
one or more other indicators of the concept-in which
the analyst may or may not have a higher level of
confidence. Even if these other indicators are as fallible

as the indicator being evaluated, the comparison pro-
vides greater leverage than does looking only at one of
them in isolation. To the extent that a well-established,
direct measure of the phenomenon under study is
available, convergent validation is essentially the same
as criterion validation.

Finally, in contrast both to Carmines and Zeller and
to Bollen, we will discuss the application of the differ-

ent types of validation in qualitative as well as quanti-
tative research, using examples drawn from both tradi-
tions. Furthermore, we will employ crucial distinctions
introduced above, including the differentiation of levels
presented in Figure 1, as well as the contrast between
specific procedures for validation, as opposed to the
overall idea of measurement validity.


We now discuss various procedures, both qualitative
and quantitative, for assessing measurement validity.
We organize our presentation in terms of a threefold
typology: content, convergent/discriminant, and nomo-
logical/construct validation. The goal is to explicate
each of these types by posing a basic question that, in
all three cases, can be addressed by both qualitative
and quantitative scholars. Two caveats should be intro-
duced. First, while we discuss correlation-based ap-
proaches to validity assessment, this article is not
intended to provide a detailed or exhaustive account of
relevant statistical tests. Second, we recognize that no
rigid boundaries exist among alternative procedures,
given that one occasionally shades off into another.
Our typology is a heuristic device that shows how
validation procedures can be grouped in terms of basic
questions, and thereby helps bring into focus parallels
and contrasts in the approaches to validation adopted
by qualitative and quantitative researchers.

Content Validation

Basic Question. In the framework of Figure 1, does a
given indicator (level 3) adequately capture the full
content of the systematized concept (level 2)? This
“adequacy of content” is assessed through two further
questions. First, are key elements omitted from the
indicator? Second, are inappropriate elements in-
cluded in the indicator?9 An examination of the scores

(level 4) of specific cases may help answer these
questions about the fit between levels 2 and 3.

Discussion. In contrast to the other types considered,
content validation is distinctive in its focus on concep-
tual issues, specifically, on what we have just called
adequacy of content. Indeed, it developed historically
as a corrective to forms of validation that focused solely
on the statistical analysis of scores, and in so doing
overlooked important threats to measurement validity
(Sireci 1998, 83-7).

Because content validation involves conceptual rea-
soning, it is imperative to maintain the distinction we
made between issues of validation and questions con-
cerning the background concept. If content validation
is to be useful, then there must be some ground of
conceptual agreement about the phenomena being
investigated (Bollen 1989, 186; Cronbach and Meehl

9 Some readers may think of these questions as raising issues of “face
validity.” We have found so many different definitions of face validity
that we prefer not to use this label.


AeiaI IPoiia Scec Rve Vol. 95,

1955, 282). Without it, a well-focused validation ques-
tion may rapidly become entangled in a broader dis-
pute over the concept. Such agreement can be pro-
vided if the systematized concept is taken as given, so
attention can be focused on whether a particular
indicator adequately captures its content.

Examples of Content Validation. Within the psycho-
metric tradition (Angoff 1988, 27-8; Shultz, Riggs, and
Kottke 1998, 267-8), content validation is understood
as focusing on the relationship between the indicator
(level 3) and the systematized concept (level 2), with-
out reference to the scores of specific cases (level 4).
We will first present examples from political science
that adopt this focus. We will then turn to a somewhat
different, “case-oriented” procedure (Ragin 1987,
chap. 3), identified with qualitative research, in which
the examination of scores for specific cases plays a
central role in content validation.

Two examples from political research illustrate, re-
spectively, the problems of omission of key elements
from the indicator and inclusion of inappropriate ele-
ments. Paxton’s (2000) article on democracy focuses on
the first problem. Her analysis is particularly salient for
scholars in the qualitative tradition, given its focus on
choices about the dichotomous classification of cases.

Paxton contrasts the systematized concepts of democ-
racy offered by several prominent scholars-Bollen,
Gurr, Huntington, Lipset, Muller, and Rueschemeyer,
Stephens, and Stephens-with the actual content of the
indicators they propose. She takes their systematized
concepts as given, which establishes common concep-
tual ground. She observes that these scholars include
universal suffrage in what is in effect their systematized
concept of democracy, but the indicators they employ
in operationalizing the concept consider only male
suffrage. Paxton thus focuses on the problem that an
important component of the systematized concept is
omitted from the indicator.

The debate on Vanhanen’s (1979, 1990) quantitative
indicator of democracy illustrates the alternative prob-
lem that the indicator incorporates elements that cor-
respond to a concept other than the systematized
concept of concern. Vanhanen seeks to capture the
idea of political competition that is part of his system-
atized concept of democracy by including, as a compo-
nent of his scale, the percentage of votes won by parties
other than the largest party. Bollen (1990, 13, 15) and
Coppedge (1997, 6) both question this measure of
democracy, arguing that it incorporates elements
drawn from a distinct concept, the structure of the
party system.

Case-Oriented Content Validation. Researchers en-

gaged in the qualitative classification of cases routinely
carry out a somewhat different procedure for content
validation, based on the relation between conceptual
meaning and choices about scoring particular cases. In
the vocabulary of Sartori (1970, 1040-6), this concerns
the relation between the “intension” (meaning) and
“extension” (set of positive cases) of the concept. For
Sartori, an essential aspect of concept formation is the
procedure of adjusting this relation between cases and

concept. In the framework of Figure 1, this procedure
involves revising the indicator (i.e., the scoring proce-
dure) in order to sort cases in a way that better fits
conceptual expectations, and potentially fine-tuning
the systematized concept to better fit the cases. Ragin
(1994, 98) terms this process of mutual adjustment
“double fitting.” This procedure avoids conceptual
stretching (Collier and Mahon 1993; Sartori 1970), that
is, a mismatch between a systematized concept and the
scoring of cases, which is clearly an issue of validity.

An example of case-oriented content validation is
found in O’Donnell’s (1996) discussion of democratic
consolidation. Some scholars suggest that one indicator
of consolidation is the capacity of a democratic regime
to withstand severe crises. O’Donnell argues that by
this standard, some Latin American democracies
would be considered more consolidated than those in

southern Europe. He finds this an implausible classifi-
cation because the standard leads to a “reductio ad

absurdum” (p. 43). This example shows how attention
to specific cases can spur recognition of dilemmas in
the adequacy of content and can be a productive tool in
content validation.

In sum, for case-oriented content validation, upward
movement in Figure 1 is especially important. It can
lead to both refining the indicator in light of scores and
fine-tuning the systematized concept. In addition, al-
though the systematized concept being measured is
usually relatively stable, this form of validation may
lead to friendly amendments that modify the system-
atized concept by drawing ideas from the background
concept. To put this another way, in this form of
validation both an “inductive” component and concep-
tual innovation are especially important.

Limitations of Content Validation. Content validation
makes an important contribution to the assessment of
measurement validity, but alone it is incomplete, for
two reasons. First, although a necessary condition, the
findings of content validation are not a sufficient con-
dition for establishing validity (Shepard 1993, 414-5;
Sireci 1998, 112). The key point is that an indicator
with valid content may still produce scores with low
overall measurement validity, because further threats
to validity can be introduced in the coding of cases. A
second reason concerns the trade-off between parsi-
mony and completeness that arises because indicators
routinely fail to capture the full content of a system-
atized concept. Capturing this content may require a
complex indicator that is hard to use and adds greatly
to the time and cost of completing the research. It is a
matter of judgment for scholars to decide when efforts
to further improve the adequacy of content may be-
come counterproductive.

It is useful to complement the conceptual criticism of
indicators by examining whether particular modifica-
tions in an indicator make a difference in the scoring of
cases. To the extent that such modifications have little

influence on scores, their contribution to improving
validity is more modest. An example in which their
contribution is shown to be substantial is provided by
Paxton (2000). She develops an alternative indicator of


Measurement Validity: A Shared Standard for Qualitative and Quantitative Research

democracy that takes female suffrage into account,
compares the scores it produces with those produced
by the indicators she originally criticized, and shows
that her revised indicator yields substantially different
findings. Her content validation argument stands on
conceptual grounds alone, but her information about
scoring demonstrates the substantive importance of
her concerns. This comparison of indicators in a sense
introduces us to convergent/discriminant validation, to
which we now turn.

Convergent/Discriminant Validation

Basic Question. Are the scores (level 4) produced by
alternative indicators (level 3) of a given systematized
concept (level 2) empirically associated and thus con-
vergent? Furthermore, do these indicators have a
weaker association with indicators of a second, differ-
ent systematized concept, thus discriminating this sec-
ond group of indicators from the first? Stronger asso-
ciations constitute evidence that supports interpreting
indicators as measuring the same systematized con-
cept-thus providing convergent validation; whereas
weaker associations support the claim that they mea-
sure different concepts-thus providing discriminant
validation. The special case of convergent validation in
which one indicator is taken as a standard of reference,
and is used to evaluate one or more alternative indi-

cators, is called criterion validation, as discussed above.

Discussion. Carefully defined systematized concepts,
and the availability of two or more alternative indica-
tors of these concepts, are the starting point for
convergent/discriminant validation. They lay the
groundwork for arguments that particular indicators
measure the same or different concepts, which in turn
create expectations about how the indicators may be
empirically associated. To the extent that empirical
findings match these “descriptive” expectations, they
provide support for validity.

Empirical associations are crucial to convergent/
discriminant validation, but they are often simply the
point of departure for an iterative process. What
initially appears to be negative evidence can spur
refinements that ultimately enhance validity. That is,
the failure to find expected convergence may encour-
age a return to the conceptual and logical analysis of
indicators, which may lead to their modification. Alter-
natively, researchers may conclude that divergence
suggests the indicators measure different systematized
concepts and may reevaluate the conceptualization
that led them to expect convergence. This process
illustrates the intertwining of convergent and discrimi-
nant validation.

Examples of Convergent/Discriminant Validation.
Scholars who develop measures of democracy fre-
quently use convergent validation. Thus, analysts who
create a new indicator commonly report its correlation
with previously established indicators (Bollen 1980,
380-2; Coppedge and Reincke 1990, 61; Mainwaring et
al. 2001, 52; Przeworski et al. 1996, 52). This is a
valuable procedure, but it should not be employed

atheoretically. Scholars should have specific conceptual
reasons for expecting convergence if it is to constitute
evidence for validity. Let us suppose a proposed indi-
cator is meant to capture a facet of democracy over-
looked by existing measures; then too high a correla-
tion is in fact negative evidence regarding validity, for
it suggests that nothing new is being captured.

An example of discriminant validation is provided by
Bollen’s (1980, 373-4) analysis of voter turnout. As in
the studies just noted, different measures of democracy
are compared, but in this instance the goal is to find
empirical support for divergence. Bollen claims, based
on content validation, that voter turnout is an indicator
of a concept distinct from the systematized concept of
political democracy. The low correlation of voter turn-
out with other proposed indicators of democracy pro-
vides discriminant evidence for this claim. Bollen con-
cludes that turnout is best understood as an indicator

of political participation, which should be conceptual-
ized as distinct from political democracy.

Although qualitative researchers routinely lack the
data necessary for the kind of statistical analysis per-
formed by Bollen, convergent/discriminant validation
is by no means irrelevant for them. They often assess
whether the scores for alternative indicators converge
or diverge. Paxton, in the example discussed above, in
effect uses discriminant validation when she compares
alternative qualitative indicators of democracy in order
to show that recommendations derived from her as-
sessment of content validation make a difference em-

pirically. This comparison, based on the assessment of
scores, “discriminates” among alternative indicators.
Convergent/discriminant validation is also employed
when qualitative researchers use a multimethod ap-
proach involving “triangulation” among multiple indi-
cators based on different kinds of data sources (Brewer
and Hunter 1989; Campbell and Fiske 1959; Webb et
al. 1966). Orum, Faegin, and Sjoberg (1991, 19) spe-
cifically argue that one of the great strengths of the
case study tradition is its use of triangulation for
enhancing validity. In general, the basic ideas of con-
vergent/discriminant validation are at work in qualita-
tive research whenever scholars compare alternative

Concerns about Convergent/Discriminant Validation. A
first concern here is that scholars might think that in
convergent/discriminant validation empirical findings
always dictate conceptual choices. This frames the
issue too narrowly. For example, Bollen (1993, 1208-9,
1217) analyzes four indicators that he takes as compo-
nents of the concept of political liberties and four
indicators that he understands as aspects of democratic
rule. An examination of Bollen’s covariance matrix

reveals that these do not emerge as two separate
empirical dimensions. Convergent/discriminant valida-
tion, mechanically applied, might lead to a decision to
eliminate this conceptual distinction. Bollen does not
take that approach. He combines the two clusters of
indicators into an overall empirical measure, but he
also maintains the conceptual distinction. Given the
conceptual congruence between the two sets of indica-


tors and the concepts of political liberties and demo-
cratic rule, the standard of content validation is clearly
met, and Bollen continues to use these overarching

Another concern arises over the interpretation of
low correlations among indicators. Analysts who lack a
“true” measure against which to assess validity must
base convergent validation on a set of indicators, none
of which may be a very good measure of the system-
atized concept. The result may be low correlations
among indicators, even though they have shared vari-
ance that measures the concept. One possible solution
is to focus on this shared variance, even though the
overall correlations are low. Standard statistical tech-

niques may be used to tap this shared variance.
The opposite problem also is a concern: the limita-

tions of inferring validity from a high correlation
among indicators. Such a correlation may reflect fac-
tors other than valid measurement. For example, two
indicators may be strongly correlated because they
both measure some other concept; or they may mea-
sure different concepts, one of which causes the other.
A plausible response is to think through, and seek to
rule out, alternative reasons for the high correlation.10

Although framing these concerns in the language of
high and low correlations appears to orient the discus-
sion toward quantitative researchers, qualitative re-
searchers face parallel issues. Specifically, these issues
arise when qualitative researchers analyze the sorting
of cases produced by alternative classification proce-
dures that represent different ways of operationalizing
either a given concept (i.e., convergent validation) or
two or more concepts that are presumed to be distinct
(i.e., discriminant validation). Given that these scholars
are probably working with a small N, they may be able
to draw on their knowledge of cases to assess alterna-
tive explanations for convergences and divergences
among the sorting of cases yielded by different classi-
fication procedures. In this way, they can make valu-
able inferences about validity. Quantitative research-
ers, by contrast, have other tools for making these
inferences, to which we now turn.

Convergent Validation and Structural Equation Models
with Latent Variables. In quantitative research, an
important means of responding to the limitations of
simple correlational procedures for convergent/dis-
criminant validation is offered by structural equation
models with latent variables (also called LISREL-type
models). Some treatments of such models, to the
extent that they discuss measurement error, focus their
attention on random error, that is, on reliability (Hay-
duk 1987, e.g., 118-24; 1996).11 However, Bollen has
made systematic error, which is the concern of the

10 On the appropriate size of the correlation, see Bollen and Lennox
1991, 305-7.
1 To take a political science application, Green and Palmquist’s
(1990) study also reflects this focus on random error. By contrast,
Green (1991) goes farther by considering both random and system-
atic error. Like the work by Bollen discussed below, these articles are
an impressive demonstration of how LISREL-type models can
incorporate a concern with measurement error into conventional
statistical analysis, and how this can in turn lead to a major

present article, a central focus in his major method-
ological statement on this approach (1989, 190-206).
He demonstrates, for example, its distinctive contribu-
tion for a scholar concerned with convergent/discrimi-
nant validation who is dealing with a data set with high
correlations among alternative indicators. In this case,
structural equations with latent variables can be used
to estimate the degree to which these high correlations
derive from shared systematic bias, rather than reflect
the valid measurement of an underlying concept.12

This approach is illustrated by Bollen (1993) and
Bollen and Paxton’s (1998, 2000) evaluation of eight
indicators of democracy taken from data sets devel-
oped by Banks, Gastil, and Sussman.13 For each indi-
cator, Bollen and Paxton estimate the percent of total
variance that validly measures democracy, as opposed
to reflecting systematic and random error. The sources
of systematic error are then explored. Bollen and
Paxton conclude, for example, that Gastil’s indicators
have “conservative” bias, giving higher scores to coun-
tries that are Catholic, that have traditional monar-
chies, and that are not Marxist-Leninist (Bollen 1993,
1221; Bollen and Paxton 2000, 73). This line of re-
search is an outstanding example of the sophisticated
use of convergent/discriminant validation to identify
potential problems of political bias.

In discussing Bollen’s treatment and application of
structural equation models we would like to note both
similarities, and a key contrast, in relation to the
practice of qualitative researchers. Bollen certainly
shares the concern with careful attention to concepts,
and with knowledge of cases, that we have emphasized
above, and that is characteristic of case-oriented con-
tent validation as practiced by qualitative researchers.
He insists that complex quantitative techniques cannot
replace careful conceptual and theoretical reasoning;
rather they presuppose it. Furthermore, “structural
equation models are not very helpful if you have little
idea about the subject matter” (Bollen 1989, vi; see also
194). Qualitative researchers, carrying out a case-by-
case assessment of the scores on different indicators,
could of course reach some of the same conclusions

about validity and political bias reached by Bollen. A
structural equation approach, however, does offer a

reevaluation of substantive findings-in this case concerning party
identification (Greene 1991, 67-71).
12 Two points about structural equation models with latent variables
should be underscored. First, as noted below, these models can also
be used in nomological/construct validation, and hence should not be
associated exclusively with convergent/discriminant validation, which
is the application discussed here. Second, we have emphasized that
convergent/discriminant validation focuses on “descriptive” relations
among concepts and their components. Within this framework, it
merits emphasis that the indicators that measure a given latent
variable (i.e., concept) in these models are conventionally inter-
preted as “effects” of this latent variable (Bollen 1989, 65; Bollen and
Lennox 1991, 305-6). These effects, however, do not involve causal
interactions among distinct phenomena. Such interactions, which in
structural equation models involve causal relations among different
latent variables, are the centerpiece of the conventional understand-
ing of “explanation.” By contrast, the links between one latent
variable and its indicators are productively understood as involving a
“descriptive” relationship.
13 See, for example, Banks 1979; Gastil 1988; Sussman 1982.


Measurement Validity: A Shared Standard for Qualitative and Quantitative Research

fundamentally different procedure that allows scholars
to assess carefully the magnitude and sources of mea-
surement error for large numbers of cases and to
summarize this assessment systematically and con-

Nomological/Construct Validation
Basic Question. In a domain of research in which a
given causal hypothesis is reasonably well established,
we ask: Is this hypothesis again confirmed when the
cases are scored (level 4) with the proposed indicator
(level 3) for a systematized concept (level 2) that is one
of the variables in the hypothesis? Confirmation is
treated as evidence for validity.

Discussion. We should first reiterate that because the

term “construct validity” has become synonymous in
the psychometric literature with the broader notion of
measurement validity, to reduce confusion we use
Campbell’s term “nomological” validation for proce-
dures that address this basic question. Yet, given
common usage in political science, in headings and
summary statements we call this nomological/construct
validation. We also propose an acronym that vividly
captures the underlying idea: AHEM validation; that
is, “Assume the Hypothesis, Evaluate the Measure.”

Nomological validation assesses the performance of
indicators in relation to causal hypotheses in order to
gain leverage in evaluating measurement validity.
Whereas convergent validation focuses on multiple
indicators of the same systematized concept, and dis-
criminant validation focuses on indicators of different
concepts that stand in a “descriptive” relation to one
another, nomological validation focuses on indicators
of different concepts that are understood to stand in an
explanatory, “causal” relation with one another. Al-
though these contrasts are not sharply presented in
most definitions of nomological validation, they are
essential in identifying this type as distinct from con-
vergent/discriminant validation. In practice the con-
trast between description and explanation depends on
the researcher’s theoretical framework, but the distinc-
tion is fundamental to the contemporary practice of
political science.

The underlying idea of nomological validation is that
scores which can validly be claimed to measure a
systematized concept should fit well-established expec-
tations derived from causal hypotheses that involve this
concept. The first step is to take as given a reasonably
well-established causal hypothesis, one variable in
which corresponds to the systematized concept of
concern. The scholar then examines the association of

the proposed indicator with indicators of the other
concepts in the causal hypothesis. If the assessment
produces an association that the causal hypothesis
leads us to expect, then this is positive evidence for

Nomological validation provides additional leverage
in assessing measurement validity. If other types of
validation raise concerns about the validity of a given
indicator and the scores it produces, then analysts
probably do not need to employ nomological valida-

tion. When other approaches yield positive evidence,
however, then nomological validation is valuable in
teasing out potentially important differences that may
not be detected by other types of validation. Specifi-
cally, alternative indicators of a systematized concept
may be strongly correlated and yet perform very dif-
ferently when employed in causal assessment. Bollen
(1980, 383-4) shows this, for example, in his assess-
ment of whether regime stability should be a compo-
nent of measures of democracy.

Examples of Nomological/Construct Validation. Lijp-
hart’s (1996) analysis of democracy and conflict man-
agement in India provides a qualitative example of
nomological validation, which he uses to justify his
classification of India as a consociational democracy.
Lijphart first draws on his systematized concept of
consociationalism to identify descriptive criteria for
classifying any given case as consociational. He then
uses nomological validation to further justify his scor-
ing of India (pp. 262-4). Lijphart identifies a series of
causal factors that he argues are routinely understood
to produce consociational regimes, and he observes
that these factors are present in India. Hence, classify-
ing India as consociational is consistent with an estab-
lished causal relationship, which reinforces the plausi-
bility of his descriptive conclusion that India is a case of

Another qualitative example of nomological valida-
tion is found in a classic study in the tradition of
comparative-historical analysis, Perry Anderson’s Lin-
eages of the Absolutist State.14 Anderson (1974, 413-5)
is concerned with whether it is appropriate to classify
as “feudalism” the political and economic system that
emerged in Japan beginning roughly in the fourteenth
century, which would place Japan in the same analytic
category as European feudalism. His argument is partly
descriptive, in that he asserts that “the fundamental
resemblance between the two historical configurations
as a whole [is] unmistakable” (p. 414). He validates his
classification by observing that Japan’s subsequent
development, like that of postfeudal Europe, followed
an economic trajectory that his theory explains as the
historical legacy of a feudal state. “The basic parallel-
ism of the two great experiences of feudalism, at the
opposite ends of Eurasia, was ultimately to receive its
most arresting confirmation of all, in the posterior
destiny of each zone” (p. 414). Thus, he uses evidence
concerning an expected explanatory relationship to
increase confidence in his descriptive characterization
of Japan as feudal. Anderson, like Lijphart, thus fol-
lows the two-step procedure of making a descriptive
claim about one or two cases, and then offering evi-
dence for the validity of this claim by observing that it
is consistent with an explanatory claim in which he has

A quantitative example of nomological validation is
found in Elkins’s evaluation of the proposal that de-
mocracy versus nondemocracy should be treated as a
dichotomy, rather than in terms of gradations. One

14 Sebastian Mazzuca suggested this example.


American Political Science Review Vol. 95, No. 3

potential defense of a dichotomous measure is based
on convergent validation. Thus, Alvarez and colleagues
(1996, 21) show that their dichotomous measure is
strongly associated with graded measures of democ-
racy. Elkins (2000, 294-6) goes on to apply nomolog-
ical validation, exploring whether, notwithstanding this
association, the choice of a dichotomous measure
makes a difference for causal assessment. He compares
tests of the democratic peace hypothesis using both
dichotomous and graded measures. According to the
hypothesis, democracies are in general as conflict
prone as nondemocracies but do not fight one another.
The key finding from the standpoint of nomological
validation is that this claim is strongly supported using
a graded measure, whereas there is no statistically
significant support using the dichotomous measure.
These findings give nomological evidence for the
greater validity of the graded measure, because they
better fit the overall expectations of the accepted
causal hypothesis. Elkins’s approach is certainly more
complex than the two-step procedure followed by
Lijphart and Anderson, but the basic idea is the same.

Skepticism about Nomological Validation. Many schol-
ars are skeptical about nomological validation. One
concern is the potential problem of circularity. If one
assumes the hypothesis in order to validate the indica-
tor, then the indicator cannot be used to evaluate the
same hypothesis. Hence, it is important to specify that
any subsequent hypothesis-testing should involve hy-
potheses different from those used in nomological

A second concern is that, in addition to taking the
hypothesis as given, nomological validation also pre-
supposes the valid measurement of the other system-
atized concept involved in the hypothesis. Bollen
(1989, 188-90) notes that problems in the measure-
ment of the second indicator can undermine this

approach to assessing validity, especially when scholars
rely on simple correlational procedures. Obviously,
researchers need evidence about the validity of the
second indicator. Structural equation models with la-
tent variables offer a quantitative approach to address-
ing such difficulties because, in addition to evaluating
the hypothesis, these models can be specified so as to
provide an estimate of the validity of the second
indicator. In small-N, qualitative analysis, the re-
searcher has the resource of detailed case knowledge
to help evaluate this second indicator. Thus, both
qualitative and quantitative researchers have a means
for making inferences about whether this important
presupposition of nomological validation is indeed

A third problem is that, in many domains in which
political scientists work, there may not be a sufficiently
well-established hypothesis to make this a viable ap-
proach to validation. In such domains, it may be
plausible to assume the measure and evaluate the
hypothesis, but not the other way around. Nomological
validation therefore simply may not be viable. Yet, it is
helpful to recognize that nomological validation need
not be restricted to a dichotomous understanding in

which the hypothesis either is or is not reconfirmed,
using the proposed indicator. Rather, nomological
validation may focus, as it does in Elkins (2000; see also
Hill, Hanna, and Shafqat 1997), on comparing two
different indicators of the same systematized concept,
and on asking which better fits causal expectations. A
tentative hypothesis may not provide an adequate
standard for rejecting claims of measurement validity
outright, but it may serve as a point of reference for
comparing the performance of two indicators and
thereby gaining evidence relevant to choosing between

Another response to the concern that causal hypoth-
eses may be too tentative a ground for measurement
validation is to recognize that neither measurement
claims nor causal claims are inherently more epistemo-
logically secure. Both types of claims should be seen as
falsifiable hypotheses. To take a causal hypothesis as
given for the sake of measurement validation is not to
say that the hypothesis is set in stone. It may be subject
to critical assessment at a later point. Campbell (1977/
1988, 477) expresses this point metaphorically: “We are
like sailors who must repair a rotting ship at sea. We
trust the great bulk of the timbers while we replace a
particularly weak plank. Each of the timbers we now
trust we may in turn replace. The proportion of the
planks we are replacing to those we treat as sound must
always be small.”


In conclusion, we return to the four underlying issues
that frame our discussion. First, we have offered a new
account of different types of validation. We have
viewed these types in the framework of a unified
conception of measurement validity. None of the spe-
cific types of validation alone establishes validity;
rather, each provides one kind of evidence to be
integrated into an overall process of assessment. Con-
tent validation makes the indispensable contribution of
assessing what we call the adequacy of content of
indicators. Convergent/discriminant validation-taking
as a baseline descriptive understandings of the rela-
tionship among concepts, and of their relation to
indicators-focuses on shared and nonshared variance

among indicators that the scholar is evaluating. This
approach uses empirical evidence to supplement and
temper content validation. Nomological/construct val-
idation-taking as a baseline an established causal
hypothesis-adds a further tool that can tease out
additional facets of measurement validity not ad-
dressed by convergent/discriminant validation.

We are convinced that it is useful to carefully
differentiate these types. It helps to overcome the
confusion deriving from the proliferation of distinct
types of validation, and also of terms for these types.
Furthermore, in relation to methods such as structural
equation models with latent variables-which provide
sophisticated tools for simultaneously evaluating both
measurement validity and explanatory hypotheses-
the delineation of types serves as a useful reminder that
validation is a multifaceted process. Even with these


American Political Science Review Vol. 95, No. 3
This content downloaded from on Tue, 10 Mar 2020 16:19:52 UTC
All use subject to

Measurement Validity: A Shared Standard for Qualitative and Quantitative Research

models, this process must also incorporate the careful
use of content validation, as Bollen emphasizes.

Second, we have encouraged scholars to distinguish
between issues of measurement validity and broader
conceptual disputes. Building on the contrast between
the background concept and the systematized concept
(Figure 1), we have explored how validity issues and
conceptual issues can be separated. We believe that
this separation is essential if scholars are to give a
consistent focus to the idea of measurement validity,
and particularly to the practice of content validation.

Third, we examined alternative procedures for
adapting operationalization to specific contexts: con-
text-specific domains of observation, context-specific
indicators, and adjusted common indicators. These
procedures make it easier to take a middle position
between universalizing and particularizing tendencies.
Yet, we also emphasize that the decision to pursue
context-specific approaches should be carefully consid-
ered and justified.

Fourth, we have presented an understanding of
measurement validation that can plausibly be applied
in both quantitative and qualitative research. Although
most discussions of validation focus on quantitative
research, we have formulated each type in terms of
basic questions intended to clarify the relevance to
both quantitative and qualitative analysis. We have also
given examples of how these questions can be ad-
dressed by scholars from within both traditions. These
examples also illustrate, however, that while they may
be addressing the same questions, quantitative and
qualitative scholars often employ different tools in
finding answers.

Within this framework, qualitative and quantitative
researchers can learn from these differences. Qualita-
tive researchers could benefit from self-consciously
applying the validation procedures that to some degree
they may already be employing implicitly and, in par-
ticular, from developing and comparing alternative
indicators of a given systematized concept. They should
also recognize that nomological validation can be
important in qualitative research, as illustrated by the
Lijphart and Anderson examples above. Quantitative
researchers, in turn, could benefit from more fre-
quently supplementing other tools for validation by
employing a case-oriented approach, using the close
examination of specific cases to identify threats to
measurement validity.


American Political Science Review Vol. 95, No. 3

American Political Science Review Vol. 95, No. 3
