w y} 17
POLICY RESEARCH WORKING PAPER 295 6
Survey Compliance and the Distribution
of Income
Johan A. Mistiaen
Martin Ravallion
The World Bank
Development Research Group
Poverty Team
January 2003
POLICY RESEARCH WORKING PAPER 2956
Abstract
While it is improbable that households with different on response rates across geographic areas. An application
incomes are equally likely to participate in sample using the Current Population Survey for the United
surveys, the lack of data for nonrespondents has States indicates that compliance falls as income rises.
hindered efforts to correct for the bias in measures of Correcting for selective compliance appreciably increases
poverty and inequality. Mistiaen and Ravallion mean income and inequality, but has only a small impact
demonstrate how the latent income effect on survey on poverty incidence up to commonly used poverty lines
compliance can be estimated using readily available data in the United States.
This paper-a product of the Poverty Team, Development Research Group-is part of a larger effort in the group to develop
better methods of measuring poverty and inequality from survey data. Copies of the paper are available free from the World
Bank, 1818 H Street NW, Washington, DC 20433. Please contact Patricia Sader, room MC3-556, telephone 202-473-
3902, fax 202-522-1151, email address psader@worldbank.org. Policy Research Working Papers are also posted on the
Web at http://econ.worldbank.org. The authors may be contacted atjmistiaen@worldbank.org or mravallion@worldbank.org.
January 2003. (31 pages)
The Policy Research Working Paper Senes disseminates the findings of work in progress to encourage the exchange of ideas about
development issues. An objective of the series is to get the findings out quickly, even if the presentations are less than fully polished. The
papers carry the names of the authors and should be cited accordingly The findings, interpretations, and conclusions expressed in this
paper are entirely those of the authors. They do not necessarily represent the view of the World Bank, its Executive Directors, or the
countries they represent
Produced by the Research Advisory Staff
Survey Compliance and the Distribution of Income
Johan A. Mistiaen and Martin Ravallion'
World Bank, Washington DC
Keywords: Survey non-response, income distribution, poverty and inequality measurement.
JEL: C42, D31, D63, 13
I For comments on an earlier draft we are grateful to Frank Cowell, Angus Deaton and Domninique
van de Walle. These are the views of the authors, and should not be attributed to the World Bank or any
affiliated organization. EM addresses: imistiaen()worldbank.org and mravallionAworldbank.org.
1. Introduction
It is known that errors in the incomes reported in surveys have important implications for
measures of poverty and inequality based on those surveys (Van Praag et al., 1983; Chakravarty
and Eichhorn, 1994; Ravallion, 1994; Cowell and Victoria-Feser, 1996; Chesher and Schluter,
2002). For example, classical measurement error in the reported incomes of sampled households
leads to over-estimation of standard inequality measures (Chakravarty and Eichorn, 1994).
Chesher and Schluter (2002) derive formulae for correcting a number of poverty and inequality
measures for multiplicative measurement error in the underlying individual welfare levels,
assuming that the sample is representative of the relevant population.
A measurement issue that has received less attention is the fact that it is invariably the
case that some sampled households simply do not participate in surveys, either because they
explicitly refuse to do so or nobody is at home. In the literature, this is often called "unit non-
response" and is distinct from "item non-response," which occurs when some of the sampled
households who agree to participate refuse to answer questions on their incomes. Various
imputation/matching methods address item non-response by exploiting the questions that are
answered (Lilard et al., 1986; Little and Rubin, 1987). However, that is not an option for unit
non-response. Some surveys make efforts to avoid unit non-response, using "call-backs" to non-
responding households and fees paid to those who agree to be interviewed.2 Nonetheless, the
problem is practically unavoidable and non-response rates of 10% or higher are common; indeed,
we know of national surveys for which 30% of those sampled did not comply.3
2 On reducing bias using call-backs see Deming (1953), Van Praag et al. (1983), Alho (1990), and
Nijman and Verbeek (1992). On the economics of incentive payments see Philipson (1997).
3 Scott and Steele (2002) report non-response rates for eight countries, ranging from virtually zero
to 26%. Holt and Elliot (1991) quote a range of 15-30% for surveys in the UK. Philipson (1997) reports a
mean non-response rate of 21% for surveys by the National Opinion Research Center in the U.S.
2
How does unit non-response affect survey-based measures of poverty and inequality? To
the extent that compliance is random, there will be no bias. However, just as income constrains
almost all behavior, it undoubtedly matters to choices about compliance with sample
assignments. For instance, high-income households might be less likely to participate because of
a high opportunity cost of their time or concerns about intrusion in their affairs. The poor too
may be underrepresented; some are homeless and hard to reach in standard household survey
designs, and some may be physically or socially isolated and thus less easily interviewed. The
presence of income-dependent compliance can bias survey-based estimates of the distribution of
income. However, the direction of bias cannot be assessed on a priori grounds; for example, if
compliance tends to be lower for both the very poor and the very rich then there will be
potentially offsetting effects on measures of the incidence of poverty. Unit non-response may
well have an offsetting effect on measured inequality to measurement errors in reported incomes.
The possibility of selective compliance is commonly ignored in practice. There are two
exceptions. The first is found in the strand of the literature on measuring poverty and inequality
in which the survey mean is replaced by average incomes from national accounts.4 This
approach rests on two key assumptions, namely that the national accounts give a valid estimate
of mean household income and that the discrepancy between the two data sources is distribution
neutral; implying one only needs to make an equi-proportionate correction at all levels. Hitherto,
little or no evidence has been advanced for or against these assumptions.5
4 This is not common practice in empirical work, but there has been a flurry of recent examples,
including Bhalla (2002), Bourguignon and Morrisson (2002) and Salla-i-Martin (2002). While these
authors acknowledge that they are making these assumptions for computational convenience, some also
defend the method on the grounds that it allows a correction for under-reporting and non-compliance in
surveys (Bhalla, 2002; Sala-i-Martin, 2002).
5 For further discussion (in the context of poverty measurement for India, though the point is more
general) see Ravallion (2000). On the discrepancies between estimates of mean consumption from
surveys versus national accounts across countries see Ravallion (2002).
3
A second, more promising approach is based on utilizing geographic or other observable
differences in survey response rates. Atkinson and Micklewright (1983) use regional differences
in survey response rates to correct for differential non-response in the U.K. Family Expenditure
Survey. The Current Population Survey for the U.S. uses a similar method (Census Bureau,
2000, Chapter 10). These methods assume that the non-compliance problem is ignorable within
areas. However, this assumption is essentially ad hoc, with no behavioral basis, and there is no a
priori reason why it would be valid; why would compliance be non-random between areas but
random within them?
The contribution of the present paper is to show that the ignorability assumption can be
relaxed using exactly the same data used in past ad hoc corrections following the second
approach. We show that it is possible to identify the latent individual probability of survey
compliance as a function of income using the empirical relationship between aggregate
compliance rates across areas and mean incomes by percentile groups. Our method recognizes
that the empirical percentile group shares are biased given that there is selective compliance. We
deal with this problem numerically, by iterating the parameter estimation after revising the
empirical shares consistently with the empirical income effect identified at the previous iteration.
On convergence, the identified individual compliance probability given income is used to correct
for bias in the estimated income distribution. Our approach deals simultaneously with response
bias within and between areas.
We are thus able to present the first estimates (to our knowledge) of the bias in measured
distributions due to unit non-response. While we only present estimates for one country here, the
minimal data requirements of our method should allow a wide range of applications in practice.
4
We first establish why unit non-response is unlikely to be ignorable using a simple
economic model of compliance choice (section 2). We then examine the model's implications
for measures of income poverty and inequality (section 3). This motivates our effort to test for
an income effect on compliance. We outline our empirical method in section 4 and then present
results for the U.S. (section 5). We offer some conclusions in section 6.
2. Income-dependent suirvey compliance
Survey participation is a matter of individual choice; nobody is obliged to comply with
the statistician's randomized assignment. There is some perceived utility gain from
compliance-the satisfaction of doing one's civic duty, for example-but there is a cost as well.
Let ye [yp, YR] be household income per person (yp is the income of the poorest person
and YR is for the richest) and c(y) the cost to the respondent of survey participation (net of any
compensation received for participation). We assume that c'(y) 2 0. This can be rationalized by
assuming that the opportunity cost of the time required to comply rises with income, while the
time itself is roughly independent of income. More precisely, let r denote the time required for
the survey interview and normalize total available time to unity. Full income is y = w + Xr where
w is the wage rate and X* is non-wage income. The cost of survey participation is then
c(y) = TW= T(y - ,r) with 0 < c'(y) = T < 1 . Nonlinearity of c(y) can arise when T varies with y.
Let utility be u[y - c(y)d, d] where d= 1 if one chooses to comply and d=O if not. The
function u is strictly increasing in both arguments. The utility gain from compliance is:
g(y) = u[y - c(y), 1] - u(y, 0) (1)
with slope:
g'(y) = uy [y - c(y), 11 - c'(y)] - uy (y, 0) (2)
5
where subscripts denote partial derivatives. We assume that the probability of compliance is a
strictly increasing common function of the utility gain. This simple model can generate a wide
range of outcomes for the relationship between compliance and income. We consider some
special cases.
From (2), it is evident that compliance falls monotonically with income if and only if:
c'(y) > 1- uy(y,O) for ally
Uy[y - C(y), 1]
A simple case in which this holds is when the cost of participation increases monotonically with
income (c'(y) > 0) and the marginal utility of income is independent of survey participation, i.e.,
uY(y, 0) = uy[y - c(y), 1]. Then g'(y) = -uy(.)c'(y) < 0 for all y.
However, the opposite result can also be obtained, whereby compliance rises with
income. For example, suppose instead that the cost of participation is independent of income
( c'(y) = 0 ), implying that g'(y) = uy [y - c(y), 1] - uy (y, 0) . If there is diminishing marginal
utility of income and utility is separable between income and compliance (uy (y, 1) = uy (y, 0))
then g'(y) > 0; the poor will be less likely to participate.
Without separability, the outcome depends on whether compliance raises or lowers the
marginal utility of income, which is not obvious on a priori grounds. If compliance leads to a
higher marginal utility of income then again g'(y) > 0. If it lowers the marginal utility of
income then the income effect could go either way. Suppose that the difference in income effect
on the marginal utility of income dominates at low incomes, uy [-c(y), 1] > uy (0, 0), while the
adverse effect of compliance on the marginal utility of income dominates at high y, i.e.,
6
uY [1- c(y), 1] < uY (1, 0). Then one can again find an inverted-U pattern in which middle-income
groups are more likely to participate than either tail of the distribution.
Other special cases can deliver this inverted-U relationship. For instance, assume that: (i)
the cost of compliance is a non-negative and strictly increasing and convex in income, c'(y) > 0,
c0(y) > 0 with c'(yp) = 0; (ii) utility is separable between income and compliance and (iii) for
the richest person, the cost of participation is negligibly small, i.e., lim uy - c(y)] = uy (y).
Then separability implies that we can re-write (2) as:
g'(y) = -uyAy - c(y)]c'(y) + uy[y - c(y)] - uy (y) (3)
The first term on the right-hand side is negative while the second is positive, given declining
marginal utility. At low incomes the second term will dominate (since c'(y) will be small) and
hence g'(y) > 0 at low y. At high incomes, by contrast, the first term will dominate and hence
g'(y) < 0. In other words, the gains will tend to be highest for middle-income groups.
Notice that in this model, the introduction of a fixed fee paid to those who agree to
participate will increase the probability of participation, but it can make the income gradient of
compliance even more negative. This will happen if the cost of compliance rises less than one-
to-one with income, and there is declining marginal utility of income.
3. lmplications for poverty and iiAneqjunaLUty nmeasures
In exploring the theoretical implications for the distribution of income, we confine
attention to the special cases discussed above in which the compliance-income relationship is
either monotonic decreasing or an inverted-U shape.
7
Let F(y) denote the true (unobserved) cumulative distribution function of income y with
continuous density functionj(y). The sample-based estimate is F(y) with corresponding density
f (y) and we assume that F(O) = 0. The true distribution can be derived from the empirical
distribution by appropriate re-weighting. The true density function is f (y) = w(y)f(y) where the
"correction factors" w(y) are the inverse probabilities of compliance, so w(y) = 0[g(y)] for a
strictly decreasing differentiable function 0. The corrected distribution function is:
F(y) = fw(x)f(x)dx (4)
yp
The expected value of the correction factor is unity, i.e., fYR w(x)f (x)dx = 1.
Consider first the case in which compliance falls monotonically with income, i.e.,
w'(y) >0. On integrating (4) by parts one obtains the following formula for the difference
between the true distribution of income and the empirical distribution:
F(y) - F(y) = [w(y) - l]F(y) - |w'(x)F(x)dx (5)
yP
It is evident that F(y) < F(y) for all y < w'1 (1). By continuity there must exist an income y
defined as the minimum value of y for which F(y) = F(y). Following a result proved in
Atkinson (1987), the empirical distribution will then overestimate the extent of income poverty
for all poverty lines up to y and all additive poverty measures satisfying standard properties.
Notice however, that first-order dominance over all y is not guaranteed by the assumptions made
so far; values of y for which F(y) > F(y) are possible if compliance rates fall to a sufficiently
low level at high incomes. This is an empirical question.
8
Consider instead the inverted-U relationship of compliance with income. There are two
points at which no correction to the density function is needed, namely YL and Yu with
YL l for YYu and w(y) 0 for all y > Yu though this can be
relaxed somewhat without altering the main results. From (5):
YL
F(yL)-F(yL) = - w'(x)F(x)dx >O (6.1)
yP
Yu
F(yu)-F(yU) = - JW'(x)F(x)dx < 0 (6.2)
yP
Intuitively, both the incidence of low-incomes (F(yL)) and high incomes (1-F(yu)) are
underestimated, given the structure of the income effect on compliance. On noting that:
d[F(y) - F(y)] = [w(y) - l]f(y) (7)
dy
it is evident that the impact of this pattem of income effects on compliance is as represented in
Figure 1. By continuity, there must exist a point y* e (YL, Yu) such that F(y ) = F(y0).
Again, for a broad class of poverty measures in the literature and all poverty lines up to y*, the
empirical distribution will underestimate the extent of income poverty. Of course, the same
holds over the entire support of the distribution if nobody has an income greater than yo
(f (y) = 0 for all y > y*). On the other hand, suppose that nobody has an income less than y
(f(y) = 0 for all y < ye). Then the empirical distribution will unambiguously overestimate the
extent of poverty (i.e., F(y) < F(y) for all y.)
9
Though we omit the detailed analysis, similar arguments can be used to show that the
impact on measured inequality of an income effect on compliance is also ambiguous, and will
depend (inter alia) on the specific measure of inequality used. It is easy to see why if we
consider the case in which compliance falls monotonically with income, implying that the mean
is underestimated. Consider the poorest and richest persons, with incomes yp and YR. The
survey yields the correct values for these incomes but underestimates the proportion of people
who have income YR and overestimates the proportion with yp. Figure 2 shows how the
income effect on compliance affects the Lorenz curve. The bold lines are the segments of the
empirical Lorenz curve for the poor and the rich, and the bold dashed lines are the underlying
true Lorenz curve. The true slope of the lower segment corresponding to the poorest person is
yp /, while the slope of the upper most segment is YR IP/, where , is mean income. The
slopes of both segments of the Lorenz curve will be overestimated by the survey data given that
the empirical mean is underestimated (,u > ,u) since the higher income groups are
underrepresented. By continuity, the true Lorenz curve must intersect the empirical Lorenz
curve, implying that the effect on inequality is ambiguous, and will depend crucially on he
measure of inequality used. If instead compliance rises with income then one can re-interpret
Figure 2 accordingly (bold line is the true Lorenz curve) and see that again there must be an
intersection.
4. Method for estimating the income effect on compliance
While we do not observe the individual probabilities of compliance, we do observe both
the aggregate response rates by geographic area and the incomes of complying units. The
problem is to infer how individual compliance varies with income from these data. The observed
10
aggregate response rates by area are unconditional means across the (unknown) conditional
response rates by level of income. However, the aggregate response rates are not simple un-
weighted means, since if compliance rates vary with income then the population shares by
income level in the survey data actually collected will be wrong.
The fact that we only observe aggregate response rates across geographic areas implies
that we must impose some aggregation structure on the problem of estimating the latent
individual income effect on compliance. We make two key assumptions. Firstly we assume that
the data can be aggregated in the form of a set of homogeneous income groups with a common
number of groups across all geographic areas. The population is divided into n income groups
and m geographical areas, called "states" hereafter. For the computational convenience of
having a common data structure across all states, we impose the restriction that the number of
income groups is identical across states. Since the sample size is unlikely to be constant across
regions this also entails that a degree of aggregation is unavoidable. In estimating the parameters
of the income effect on compliance, we further ignore income differences within a given
(income-state-specific) group of sampled households. Thus the mean incomes of the n by m
groups become fixed data points in our method for estimating the income effect on compliance
and hence correcting the sample weights for selective compliance.
The second assumption involves aggregation of the latent heterogeneity. Here we assume
that the heterogeneity in compliance at given income can be captured by a common additive
area-specific error term. Given that our method relies on the observation of state-specific
compliance aggregates only (rather than by income group, which is of course intrinsically
unobservable), it is impossible to further decompose the aggregate (state-specific) error term.
11
Let Pij denote the (unobserved) probability of compliance for a person in income group
i=l,..,n living in statej=l,..,m. The probability of compliance varies with the mean income y,
of group i in statej according to:
PJi = P(yiJ; P) + £, (8)
where P is a smooth function with one or more parameters, fl, and e, is a zero-mean error term.
We assume the following parametric form:
P(yij; ,B) = L(4o + Al In y, + ,82 (In yi)2 ) (9)
where L(x) = ex ( + ex) is the logistic function. This specification is both sufficiently flexible
to test the scenarios developed in section 3 and ensures that the observed mean response rate P
is bounded within the unit interval.
WhilePjj is unknown, we observe the proportion of the population in each statej that are
compliant:
n
i15
where wj is the proportion of the population of statej who belong to income group i, and
n
E,-E W.,E, (11)
i=l
If there was no selective compliance then for equal sized groups (quintiles, say) we have
W,j = 1/ n . With suitable parameterization of the function P(y, ; ,8) we can then estimate (10)
using standard econometric methods. However, selective compliance complicates matters. To
correct for this we should be re-weighting the data according to the differences in response rates
across income groups, so that the correct weight take the form:
12
w = for all (i,j) (12)
k=l
We proceed iteratively. First we estimate (10) based on the assumption that compliance
is distribution neutral, i.e. w0 = 1/n for all (i,j), where the superscript "O" refers to the starting
WU
value. This yields a vector of parameter estimates, /i0, and state-specific error terms. However,
the error terms by income group are not identified. Under our assumption that the error term is
common to all income groups in a given state, we obtain an initial vector of estirnated
compliance probabilities:
pi =P(Yv;/30)+.C (13)
These in turn can be used to re-weight the data for the next iteration using:
w,, u = ,( v )(14)
k=l
We then re-estimate (10) using (14) for these new weights, giving the regression:
pj = Jv ,P(yi,;fi) + (15)
i=,
This gives revised estimates of the parameters and residuals. We iterate this procedure until the
estimated coefficients (and hence the estimated proportions of the population in each income
group and area) converge.
Finally, we use the vector of parameter estimates from the last iteration and each
complying household's per capita income to infer the latent compliance probability for that
household. The inverse of this probability gives the household-specific correction factor that
allows us to estimate the corrected income distribution fiunction defined in (4). Notice that this
13
last step does not require the first aggregation assumption, described above which is only used in
estimating the parameters and state-specific error terms.
5. Application to the U.S. income distribution
Data on survey response rates across geographical areas are often available from survey
producers. A case in point is the March 2001 supplement of the US Current Population Survey
(CPS).6 In addition to detailed data on incomes, the CPS contains geographically referenced
information on compliance (Census Bureau, 2000, Chapter 7). We define non-compliance as
what the Census Bureau refers to a "type A non-interviews," which refer to households assigned
for interview but for which no usable data were collected because household members explicitly
refused to be interviewed or were absent during the interviewing period.7 The March 2001 CPS
has a sample size of 17,788 households (net of other non-interview types) of which 1,461 were
classified as type A non-interviews. In addition, we also treat the 134 households that were
interviewed but refused to answer the income questions as non-compliant. Together this implies
an overall non-response rate of about 9%.
The CPS has its own procedures in trying to adjust for non-response (described in Census
Bureau, 2000, Chapter 10).8 In dealing with unit non-response, the CPS assumes that the
problem is ignorable once primary sampling units with non-responding households are grouped
together within other matched geographic areas (typically within the same state). The Census
Bureau acknowledges that this may or may not be valid. The data set only gives one weight
6 The CPS data and survey methodology details are available for the US Census Bureau and can be
accessed on-line at: http://www.census.gov/hhes/www/income.html.
7 Other types of "non-interviews" refer to cases were the residence was found to be demolished,
under construction, etc. These are less likely to bias the income distribution because the household is
likely to be no longer the premises for a variety of reasons that are not correlated with income.
8 For a critical assessment of the imputation methods used by the Census Bureau in correcting
estimates for income non-response see Lillard, Smith and Welch (1986).
14
(called "final weight") for each household, and that weight reflects various adjustments,
including for non-response and sample design. We cannot disentangle the CPS adjustment for
non-response from other factors. For this reason, we chose to ignore the CPS weights. So, for
the purpose of our exercise, neither our "empirical" nor "corrected" distributions of income have
used the CPS weights, though both distributions are household-size weighted.
The sample was designed to be representative of the US at the state level, giving
j=l,. . .,51 geographical areas. We set a minimum sample size of 30 for any state-income group
combination. Since the smallest sample size for any state is 150, this means that we set n=5.
Thus, we divide the sample for each state into quintiles, based on the state-level per capita
income distribution quintiles. We also test the sensitivity of our results to this assumption.
Non-response rates vary from 3.2% in Alabama to 19.6% in the District of Columbia
(Table 1). There is no significant correlation between sample size and compliance rates. State-
level average income on the other hand is correlated with compliance, and this correlation is
strongest for the top income quintile and weakest for the bottom quintile (see Figure 3). The
mean incomes by quintile are also given in Table 1.
The specification in equation (9) did not yield an estimate for P2 that was statistically
significantly different from zero so we set /2=0. The linear specification did produce significant
parameter estimates at each iteration (Table 2) indicating that higher income negatively affects
the propensity to comply; Table 2 gives the parameter estimates.9 The estimated coefficients
(Figure 4) and reweighed shares of the population in each income group in each state (Figure 5)
converged up to 3 decimal places after 9 iterations.
9 For each iteration, we used the standard Gauss-Newton non-linear estimation method and all
parameter estimates converged.
15
Our results indicate that ignoring selective compliance according to income appreciably
understates the proportion of the population in the richest income per capita quintile and
overstates the population shares in the bottom four quintiles. The highest income quintile is
estimated to comprise 24% of the population after correcting for its lower probability of survey
compliance. By contrast, the poorest quintile in the unadjusted data actually comprises 18% of
the population.
Table 3 gives the original and corrected mean incomes by 20 equal fractiles (the third
column, labeled n=6, will be discussed below.) After our correction for selective non-
compliance, the overall mean rises by 23%, from $21,576 per capita to $26,454. However, the
correction is clearly not distribution-neutral; the proportionate adjustment rises from about 5% at
the bottom to over 54% at the top.
Figure 6 gives the Lorenz curves, with enlargements of the extreme lower and upper ends
shown in Figure 7. (Focus on the n=5 case; we will explain the n=6 case shortly.) The Lorenz
curves intersect as predicted in section 2; thus the qualitative effect on measured inequality
cannot be predicted on a priori grounds. However, it is plain from Figure 6 that the predominant
effect of our correction is a downward shift of the Lorenz curve, implying higher inequality by
most measures. The Gini index increases appreciably from 45.05% to 50.76% on correcting for
our estimates of the income effect on compliance.
The effect on the levels distribution of income per capita can be seen from Figure 8.
Naturally, this also reflects the impact on the mean. It can be seen that the impact on poverty
incidence is small for poverty lines commonly used in the U.S., giving poverty rates around 12%
(Census Bureau, 2001); Figure 9 gives a blow-up for the lower 30%. However, there is still first-
16
order dominance, implying that poverty measures are unambiguously overestimated under the
standard assumption in practice of ignorable non-response.
A striking feature of our findings is that so much of the impact is at the upper end of the
distribution, notably the top quintile or so (Table 3). So our results may be sensitive to
aggregation at this end of the distribution. To test this, we split the highest-income quintile into
two and re-run the estimation method. The method converged at a lower estimate (in absolute
value) for,ti8 of -1.553, with a standard error of 0.243. Table 3 gives the conditional means for
this case; the pattern is similar, but the upward adjustmnent is lower. The upward adjustment
needed to be consistent with selective compliance rises from only 3% at the bottom to 30% at the
top. Instead of a revised mean of $26,454 we obtained $24,291. Figures 6 and 8 also give the
Lorenz curves and distribution functions for this case (labeled n=6). Instead of an upward
revision of the Gini index to 50.75% (from 45.05%) we now obtain 48.29%. There is negligible
impact on the cumulative distribution function at the lower end.
While quantitative magnitudes are somewhat sensitive to this change to the estimation
method, the qualitative results are not. The problem of selective compliance is clearly not
ignorable in estimating standard summary statistics from income surveys. And even if one is
willing to assume that the national accounts provide a better basis for setting the mean, the bias is
clearly far from distribution-neutral.
6. Con Rus$ons
We have argued that there is likely to be an income effect on survey compliance, though
the direction of bias in poverty or inequality measures could go either way in theory. So it is an
empirical question. Past empirical work has either ignored the problem of selective compliance
in surveys or made essentially ad hoc corrections. We have shown how the latent income effect
17
on compliance can be estimated consistently with the available data on average response rates
and the measured distribution of income across geographic areas. Thus we are able to re-weight
the raw data to correct for the problem.
On implementing our method using US data, we find that the problem is not ignorable.
We can also reject the assumptions made in past ad hoc correction methods. We find a highly
significant negative income effect on survey compliance. While we do not find strict Lorenz
dominance, inequality tends to be appreciably higher after correcting for selective compliance.
Thus we find that unit non-response has the opposite impact on inequality to the problem of
classical measurement error in reported incomes that has been studied in past work in the
literature. A sizeable upward revision to the overall mean is also called for to correct for
selective compliance. In terms of the impact on the incidence of poverty, the downward bias in
the mean tends to offset the downward bias in measured inequality. The tendency for low income
groups to be over-represented (because of their higher compliance probabilities) still means that
the poverty rate tends to be over-estimated, though the impact on poverty incidence is small up
to poverty lines normally used in the U.S. We find some sensitivity of the quantitative results to
changing the number of income groups one identifies in the estimation method, though our
qualitative conclusions are robust.
There can be no presumption that even our qualitative results will hold elsewhere.
Possibly in poorer settings one will find greater under-representation of the poor than in the US.
Or one might find a less (more) steep income gradient of compliance in countries with lower
(higher) inequality than the US. These are conjectures. However, the data and computational
demands of the method we have proposed are not great, so other applications are possible.
18
References
Alho, J.M. (1990) "Adjusting for Nonresponse Bias using Logistic Regression," Biometrica,
77(3): 617-24.
Atkinson, A.B. (1987) "On the Measurement of Poverty," Econometrica 55: 749-764.
Atkinson, A.B. and J. Micklewright (1983) "On the Reliability of Income Data in the Family
Expenditure Survey 1970-1977," Journal of the Royal Statistical Society Series A,
146(1): 33-61.
Bhalla, Surjit (2002) Imagine There's No Country: Poverty, Inequality and Growth in the Era of
Globalization, Washington DC.: Institute for International Economics.
Bourguignon, Francois and Christian Morrisson (2002) "Inequality Among World Citizens:
1820-1992," American Economic Review 92(4): 727-744.
Census Bureau (2000) "Current Population Survey Design and Methodology," Technical Paper
63. Washington, D.C: U.S. Departnent of Commerce.
(2001) "Poverty in the United States: 2001" Current Population Report P60-219,
Washington, D.C: U.S. Department of Commerce.
Chakravarty, S.R., and W. Eichhorn (1994) "Measurement of Income Inequality: Observed
versus True Data," in W. Eichhorn (ed.) Models and Measurement of Welfare and
Inequality, Heidelberg: Springer-Verlag.
Chesher, A., and C. Schluter (2002) "Welfare Measurement and Measurement Error," Review of
Economic Studies, 69: 357-378.
Cowell, F.A., and M. Victoria-Feser (1996) "Robustness of Inequality Measures," Econometrica,
64: 77-101.
Deming, W.E. (1953) "On a Probability Mechanism to Attain an Economic Balance between the
Resultant Error of Response and the Bias of Nonresponse," Journal of the American
Statistical Association, 48:743-72.
Holt, D., and D. Elliot (1991) "Methods of Weighting for Unit Non-Response," The Statistician,
40: 333-342.
Lillard, L., Smith, J.P. and F. Welch (1986) "What Do We Really Know about Wages? The
Importance of Nonreporting and Census Imputation," Journal of Political Economy,
94(3):489-506.
Little, R.J.A. and D.B. Rubin (1987) Statistical Analysis with Missing Data. New York: Wiley.
19
Nijman, T. and M. Verbeek (1992) "Nonresponse in Panel Data: The Impact on Estimates of a
Life Cycle Consumption Function," Journal of Applied Econometrics, 7:243-57.
Philipson, Tomas (1997) "Data Markets and the Production of Surveys," Review of Economic
Studies 64: 47-72.
Ravallion, M. (1994) "Poverty Rankings Using Noisy Data on Living Standards," Economics
Letters, 45: 481-485.
. 2000, "Should Poverty Measures be Anchored to the National Accounts?"
Economic and Political Weekly 34 (August 26): 3245-3252.
, 2002, "Measuring Aggregate Welfare in Developing Countries: How Well do
National Accounts and Surveys Agree?," Review of Economics and Statistics, in press.
Sala-I-Martin, Xavier (2002), "The World Distribution of Income (Estimated from Individual
Country Distributions)," mimeo, Columbia University.
Scott, Kinnon and Diane Steele (2002), "Measuring Welfare in Developing Countries: Living
Standards Measurement Study Surveys," in UN Statistical Division, Surveys in
Developing and Transition Countries, forthcoming.
Van Praag, B., A. Hagenaars and W. Van Eck (1983) "The Influence of Classification and
Observation Errors on the Measurement of Income Inequality," Econometrica, 51:
093-1108.
20
Table 1. Sample characteristics
Mean Sample size Mean log per capita income per ii quintile
State compliance Households Individuals i=I i=2 i=3 z=4 i=5
rate
Alabama 0.968 250 620 8.32 9.12 9.53 10.01 10.85
Idaho 0.960 250 612 8.55 9.24 9.69 10.08 10.83
West Virginia 0.955 245 558 8.48 9.13 9.55 9.95 10.54
Utah 0.955 198 613 8.48 9.29 9.73 10.13 10.90
North Dakota 0.950 219 495 8.72 9.33 9.76 10.06 10.69
Mississippi 0.950 199 466 8.51 9.16 9.56 10.01 10.65
Louisiana 0.949 198 466 8.45 9.10 9.56 9.96 10.66
Nebraska 0.949 254 586 8.72 9.47 9.84 10.29 10.83
Montana 0.942 225 498 8.65 9.30 9.71 10.05 10.86
South Dakota 0.940 235 523 8.68 9.32 9.76 10.09 10.71
Wyoming 0.938 242 568 8.66 9.29 9.69 10.16 10.95
Iowa 0.936 219 514 8.87 9.36 9.71 10 12 10.84
Delaware 0.935 168 441 8.69 9.50 9.87 10.29 11.11
Florida 0.932 942 2,161 8.65 9.36 9.79 10.19 10.92
Minnesota 0.930 244 557 8.95 9.59 9.92 10.31 11.28
Tennessee 0.929 225 511 8.45 9.17 9.60 10.08 10.98
Virginia 0.928 263 633 8.82 9.52 9.90 10.33 11.18
Indiana 0.928 235 536 8.69 9.40 9.77 10.20 10.82
Wisconsin 0.925 268 636 8.85 9.52 9.91 10.24 11.02
Arkansas 0.925 253 576 8.38 9.08 9.47 9.91 10.85
South Carolma 0.924 171 363 8.74 9.34 9.73 10.15 10.80
Oklahoma 0.923 285 667 8.23 9.13 9.61 10.13 10.86
Vermont 0.922 192 415 8.62 9.42 9.90 10.19 11.13
Oregon 0.921 203 478 8.61 9.43 9.83 10.23 10 92
Massachusetts 0.921 403 944 8.75 9.48 9.92 10.35 11.22
Maine 0.920 188 408 8.76 9.41 9.79 10.20 10 88
Nevada 0.917 240 624 8.69 9.34 977 10.21 11.00
Kansas 0.915 235 514 8.78 9.40 9.85 10.28 10.84
Ohio 0.914 629 1,485 8.78 9.46 9.85 10.24 10.97
Washington 0.913 230 546 8.56 9.40 9.82 10.26 11.07
North Carolina 0.913 436 1,007 8.61 9.26 9.75 10.19 10.85
Missouri 0.912 239 539 8.90 9.56 9.95 10.32 11.03
Texas 0.911 961 2,439 8.29 9.17 9.63 10.13 11.10
Michigan 0.910 577 1,401 8.72 9.45 9.85 10.26 11.04
New Mexico 0.909 309 760 8.24 9.13 9.59 9.99 10.77
Georgia 0.909 253 579 8.56 9.30 9.75 10.24 11.09
Kentucky 0.909 219 503 8.67 9.22 9.69 10.23 11.07
Colorado 0.906 255 627 8.98 9.68 9.98 10 45 11.13
Arizona 0.902 287 688 8.56 9.26 9.71 10.17 11.11
Connecticut 0.901 182 412 8.73 9.61 9.99 10.36 11.05
Illinois 0.901 744 1,841 8.70 9.50 9.91 10.32 11 01
Pennsylvania 0.896 724 1,650 8.75 9.44 9.88 10.32 11.16
Alaska 0.896 193 492 8.60 9.43 9.94 10.31 11.01
California 0.888 1,583 4,177 8.41 9.26 9.75 10.28 11.19
New Jersey 0.885 582 1,340 8.76 9.54 9.96 10.35 11.09
Rhode Island 0.880 150 304 8.82 9.42 9.85 10.36 11 32
New York 0.874 1,183 2,702 8 51 9.30 9.77 10 22 11.07
Hawaii 0.866 179 426 8 72 9.54 9.98 10.46 11.10
New Hampshire 0.853 191 407 9 03 9.67 10.06 10.39 10 93
Maryland 0.842 209 432 8.85 9.57 9.96 10.41 11.19
Dist. Of Columbia 0.804 224 384 8.46 9.30 10.00 1062 11.42
21
Table 2. Parameter estimates and corrected population shares
Mean proportion (%) of the population by quintile
i=l i=2 i=3 i=4 i=5
Iteration (t) PO fit (richest) (poorest)
0 24.682 -2.168 20.00 20.00 2000 20.00 20.00
(3.595) (0.337)
1 18.997 -1.613 25.87 19.43 18.53 18.18 17.99
2 21.210 -1.828 23.64 19.91 19.16 18.78 18.52
(2.806) (0.263)
(20.442 -1.753 24.36 19.76 18.95 18.58 18.35
(2.656) (0.249)
(20.715 -1.780 24.10 19.81 19.02 18.65 18.41
(2.709) (0.254)
( 20 619 -1.770 24.19 19.79 19.00 18.63 18.39
(2.690) (0.252)
6 20.653 -1.774 24.16 19.80 19.01 18.64 18 40
(2.698) (0.253)
7 20.641 -1.773 24.17 19.80 19.00 18.63 18.40
(2.694) (0 253)
8 20.645 -1.773 24 16 19.80 19.01 18.64 18.40
(2.695) (0.253)
9 20.644 -1.773 24.17 19.80 19.00 18.63 18.40
(2.695) (0.253)
1 0 20.644 -1.773 24.17 19.80 19.00 18.63 18.40
(I 695) (0 253)
22
Table 3: Mean nimome with/without correctdoi for income-dependent co
Fractile (ranked by Mean income ($/person/year)
income per person) Empirical distribution Corrected distribution Corrected distribution
(n=5) (n=6)
0 - 5 1,968 2,068 2,034
5 - 10 3,999 4,199 4,129
10- 15 5,543 5,845 5,745
15-20 6,863 7,198 7,087
20 - 25 8,110 8,570 8,406
25 - 30 9,389 9,941 9,746
30 - 35 10,637 11,308 11,073
35 - 40 11,995 12,829 12,540
40 - 45 13,438 14,391 14,062
45 - 50 14,877 15,876 15,513
50 - 55 16,340 17,604 17,139
55 - 60 18,046 19,579 19,015
60 - 65 19,967 21,783 21,066
65 - 70 22,172 24,433 23,578
70 - 75 24,801 27,627 26,470
75 - 80 28,071 31,811 30,252
80 - 85 32,433 37,476 35,379
85 - 90 38,636 46,740 43,119
90 - 95 49,971 64,246 57,499
95 - 100 94,234 145,466 121,895
21,576 26,454 24,287
23
Figure 1: Pattern of bias for an inverted-U relationship between compliance and income
F(y) - F(y)
0 YL Y 'Yu
Figure 2: Lorenz curve bias under a monotonic income effect on survey compliance
,S / I - 1I - [I - F(YR)]YR/
,,-,. 1 - [1 - F(YR/YR
., / F(YP)YPIu
° F(YP) F(YP) F(YR) [(YR)
24
IFAguire 3: Nonm-comIpllaince oddsL siadl zftte-wibe peir capiAtai Aimomne veirages
1.55
1.50
Richest 20%
1.40 O y -0.63x + 7.98
1.35 R2 0.45
1.30 c
1.25 \0 0
tE 1205 ° + % % O@n a a
ws1.15 C° o°'0oB°
0.85- O ° O R
01.15 D 0
O O3
0.70
0.6 ci
0 95 Cl~ ~ ~ ~ ~ ~ ~~~~~~~~0 O
0.60-~~~~~~ ~ ~~~~ ,023 O ,3,,0,O8
8 1 .05
O.B5~~~~~~~~~~~~~~o 0vr0 ps alaIcm
0100 0L a 0
0~~~~~~
0.70Ely -23 30
0.65~~~~~~~~R 00
0.70
8.00 8.25 8.50 8 75 9.00 9 25 9 50 9.75 10.00 10.25 10.50 10.75 11.00 11 25 11.50 11.75
Log average per capita Income
25
Figure 4: Convergence pattern of the slope coefficient (b1)
-1.60
-1 65
-1.70
-1.75
-1.80
{E -1.85
c -1.90
-1.95
-2 00 -
-2.05
-2.10
-2.15
-2.20
0 1 2 3 4 5 6 7 8 9 10
iteration (t)
26
Figure 5: Convergence pattern of the estimated population shares
0.27
0.26
0.25
* 0.24
0.23
a.,~~~~~~~~~ 0.23 -u-~~~~~~~~~~~~~~~~~~ richest 20%
01
0 1 2 - 60%-80%
2
0 18x40-0
0.217 -4 porst20
0.1 1 67891
iteration (t)
27
Figure 6: Empirical and compliance re-weighted Lorenz curves
1.00 -
0 95
0.90
0.85 -
0.80
0.75
o0.70
E
0o 0.65
a 0.60
3 0.55 -
*0.50
0 0.45
o0.40
0 35
E
0.30
0.25-
0.20
0.125 D_ - empirical
0.10 -corrected (n=6)
0.05 - _ corrected (n=5)
000
0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45 0.50 0.55 0.60 0.65 0.70 0.75 0.80 0.85 0.90 0.95 1.00
cumulative %of the population
28
]FIgnire 7: Loweir and uippeir tais of thne lLorenmz cuiirves
=ernpirical
---corrected
209
0 0000=1 ogm~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
61~~~~~~~~~~~~~~~~~~~~~~
29~~~~~~~~~~~~~~
Figure 8: Empirical and compliance-corrected cumulative distributions of income
1.00
0.95........
0.90
0.85
0.80
0.75
0.70
0.65
r 0.60
, 0.55
°0.5
; 0.45
o 040
0.35
0.30
0.25
0.20 -em pirical
0.15 -corrected (n=6)
010 - corrected (n=5)
0.05
0 00 -
0 10,000 20,000 30,000 40,000 50,000 60,000 70,000 80,000 9o,ooo 100,000 110,000 120,000 130,000 140,000 150,000
income per capita In US$
30
IFMgure 9: Lower segnm¢mt of the cumuRative diisribuntionns of income in FigFuire a
0.30
0.25
0.20
a
a.
CO 0.15
0
0 1,000 2,000 3,000 4,000 5,000 6,000 7,000 8,000 9,000 10,000
incoms psr capita in US$
31
Policy Research Working Paper Series
Contact
Title Author Date for paper
WPS2938 Recurrent Expenditure Requirements Ron Hood December 2002 M Galatis
of Capital Projects. Estimation for David Husband 31177
Budget Purposes Fei Yu
WPS2939 School Attendance and Child Labor Gladys Lopez-Acevedo December 2002 M Geller
in Ecuador 85155
WPS2940 The Potential Demand for an HIV/ Hillegonda Maria Dutilh December 2002 H Sladovich
AIDS Vaccine in Brazil Novaes 37698
Expedito J A Luna
Mois6s Goldbaum
Samuel Kilsztajn
Anaclaudia Rossbach
Jose de la Rocha Carvalheiro
WPS2941 Income Convergence during the Branko Milanovic January 2003 P Sader
Disintegration of the World 33902
Economy, 1919-39
WPS2942 Why is Son Preference so Persistent Monica Das Gupta January 2003 M Das Gupta
in East and South Asia9 A Cross- Jiang Zhenghua 31983
Country Study of China, India, and the Li Bohua
Republic of Korea Xie Zhenming
Woojin Chung
Bae Hwa-Ok
WPS2943 Capital Flows, Country Risk, Norbert Fiess January 2003 R lzquierdo
and Contagion 84161
WPS2944 Regulation, Productivity, and Giuseppe Nicoletti January 2003 Social Protection
Growth. OECD Evidence Stefano Scarpetta Advisory Service
85267
WPS2945 Micro-Finance and Poverty Evidence Shahidur R Khandker January 2003 D Afzal
Using Panel Data from Bangladesh 36335
WPS2946 Rapid Labor Reallocation with a Jan Rutkowski January 2003 J Rutkowski
Stagnant Unemployment Pool. The 84569
Puzzle of the Labor Market in Lithuania
WPS2947 Tax Systems in Transition Pradeep Mitra January 2003 S Tassew
Nicholas Stern 88212
WPS2948 The Impact of Contractual Savings Gregorio Impavido January 2003 P Braxton
Institutions on Securities Markets Alberto R Musalem 32720
Thierry Tressel
WPS2949 Intersectoral Migration in Southeast Rita Butzer January 2003 P Kokila
Asia Evidence from Indonesia, Yair Mundlak 33716
Thailand, and the Philippines Donald F Larson
WPS2950 Is the Emerging Nonfarm Market Dominique van de Walle January 2003 H Sladovich
Economy the Route Out of Poverty Dorothyjean Cratty 37698
in Vietnam9
WPS2951 Land Allocation in Vietnam's Martin Ravallion January 2003 H Sladovich
Agrarian Transition Dominique van de Walle 37698
Policy Research Working Paper Series
Contact
Title Author Date for paper
WPS2952 The Effects of a Fee-Waiver Program Nazmul Chaudhury January 2003 N Chaudhury
on Health Care Utilization among the Jeffrey Hammer 84230
Poor Evidence from Armenia Edmundo Murrugarra
WPS2953 Health Facility Surveys. An Magnus Lindelow January 2003 H. Sladovich
Introduction Adam Wagstaff 37698
WPS2954 Never Too Late to Get Together Bartlomiej Kaminski January 2003 P. Flewitt
Again: Turning the Czech and Slovak Beata Smarzynska 32724
Customs Union into a Stepping Stone
to EU Integration
WPS2955 The Perversity of Preferences: The Qaglar Ozden January 2003 P Flewitt
Generalized System of Preferences Eric Reinhardt 32724
and Developing Country Trade
Policies, 1976-2000