WPS5400
Policy Research Working Paper 5400
A Control Function Approach to Estimating
Dynamic Probit Models with Endogenous
Regressors, with an Application to the Study
of Poverty Persistence in China
John Giles
Irina Murtazashvili
The World Bank
Development Research Group
Human Development and Public Services Team
August 2010
Policy Research Working Paper 5400
Abstract
This paper proposes a parametric approach to estimating below the poverty line. In this application, it is shown
a dynamic binary response panel data model that allows that migration is important for reducing the likelihood
for endogenous contemporaneous regressors. This that poor households remain in poverty and that non-
approach is of particular value for settings in which poor households fall into poverty. Furthermore, it is
one wants to estimate the effects of an endogenous demonstrated that failure to control for unobserved
treatment on a binary outcome. The model is next heterogeneity would lead the researcher to underestimate
used to examine the impact of rural-urban migration the impact of migrant labor markets on reducing the
on the likelihood that households in rural China fall probability of falling into poverty.
This paper--a product of the Human Development and Public Services Team, Development Research Group--is part of
a larger effort in the department to study the effects of rural to urban migration on household outcomes and investment
decisions in migrant sending communities. Policy Research Working Papers are also posted on the Web at http://econ.
worldbank.org. The author may be contacted at jgiles@worldbank.org.
The Policy Research Working Paper Series disseminates the findings of work in progress to encourage the exchange of ideas about development
issues. An objective of the series is to get the findings out quickly, even if the presentations are less than fully polished. The papers carry the
names of the authors and should be cited accordingly. The findings, interpretations, and conclusions expressed in this paper are entirely those
of the authors. They do not necessarily represent the views of the International Bank for Reconstruction and Development/World Bank and
its affiliated organizations, or those of the Executive Directors of the World Bank or the governments they represent.
Produced by the Research Support Team
A Control Function Approach to Estimating Dynamic Probit
Models with Endogenous Regressors, with an Application to
the Study of Poverty Persistence in China*
John Giles + and Irina Murtazashvili
JEL Codes: C13, C33, O15, P25
Key Words: Dynamic Binary Response Models; Control Function Approach; PovertyPersistence;
Migration; Rural China
*
The paper has benefitted from helpful comments and conversations with Alan de Brauw, Ana Maria Herrera,
Martin Ravallion, Peter Schmidt, David Tschirley, Adam Wagstaff, Jeffrey Wooldridge, from seminar participants at
Ohio State University and conference participants at the June 2007 UNUWIDER Conference on Fragile States held
in Helsinki and the September 2009 Midwest Econometrics Group Annual Meeting. We gratefully acknowledge
financial support for data collection from the National Science Foundation (SES0214702), the Michigan State
University Intramural Research Grants Program, the Ford Foundation (Beijing) and the Weatherhead Center for
International Affairs (Academy Scholars Program) at Harvard University. The research discussion and conclusions
presented in this paper reflect the views of the authors and should not be attributed to the World Bank or to any
affiliated organization or member country.
+
Development Research Group, The World Bank. Email: jgiles@worldbank.org.
Corresponding Author. Department of Economics, University of Pittsburgh, Pittsburgh, PA 15260. Tel: (412)648
1762, fax: (412)6481793, and email: irinam@pitt.edu.
1 Introduction
Dynamic binary response models have considerable appeal for a diverse range of policy analy-
ses in which identifying or controlling for state dependence is important and one is interested
in a binary outcome.1 When the outcome is also a¤ected by an endogenous treatment, then
an additional complication arises in e¤orts to identify the e¤ects of the treatment on the
outcome and on state dependence. In this paper, we propose a parametric approach to
estimating dynamic binary response panel data models with endogenous contemporaneous
regressors. Our method combines a recent approach to solving the unobserved heterogeneity
and the initial conditions problems in non-linear dynamic models (Wooldridge, 2005) with
a control function approach to controlling for endogeneity of contemporaneous explanatory
variables in non-linear models (e.g., Smith and Blundell, 1986; Rivers and Vuong, 1988;
Papke and Wooldridge, 2008).
Among other possible applications, the relevance and potential strength of our approach
can be demonstrated in analyses of how migration in developing countries a¤ects the poverty
status of residents living in migrant source communities. In this setting, we are faced with
two important sources of endogeneity: ...rst, the migration decision of community residents
may be driven by negative shocks that also raise the probability that households are poor.
Second, we expect there to be correlation between migration decisions and the unobserved
characteristics of individuals and communities, which may also a¤ect poverty status. Our
approach allows us to consistently estimate parameters of a dynamic binary response panel
data model with unobserved heterogeneity when some of the continuous contemporaneous
explanatory variables are endogenous. To account for the endogeneity in migration from
home communities, we employ a control function approach in which residuals from the re-
duced form for the endogenous regressor are introduced as covariates in the structural model.
Recently, Papke and Wooldridge (2008) employ this approach to deal with an endogenous
regressor in a static fractional response panel data model to study the e¤ects of school in-
puts on student performance. In contrast with Papke and Wooldridge, this paper develops
1
The range of research areas for which dynamic binary response models have proven important include:
labor force participation (Heckman and Willis, 1977; Hyslop, 1999), the probability of receiving welfare (Bane
and Ellwood, 1986), the experience social exclusion (Devicienti and Poggi, 2007), and the identi...cation of
adverse selection in insurance markets (Chiappori and Salanie, 2000).
2
a control function approach for a dynamic model. To deal with the dynamic nature of the
model, we consider two possibilities. We ...rst use a "pure" random e¤ects approach which
assumes that unobserved heterogeneity is independent of the observed exogenous covariates
and initial conditions. Next, we relax this strong assumption by employing the dynamic cor-
related random e¤ects model introduced by Wooldridge (2005). This approach is not only
more relevant for analyses of poverty persistence, but also more exible and computationally
straightforward than alternative approaches currently in use.
We then implement our empirical approach using panel household and village data from
rural China. Following the market-oriented reforms introduced in the early 1980s, there was
s
a pronounced decline in the proportion of China' population living below the poverty line
s
(Ravallion and Chen, 2007). While much of the literature examining growth in China' rural
areas has focused on incentive e¤ects related to reform and on the role of local non-farm
employment, there has been relatively little research demonstrating the relationship between
increasing migration and the probability that households within villages have consumption
levels below the poverty line. Our empirical analysis demonstrates an economically signi...cant
causal relationship between migration and poverty reduction in rural China. In performing
this exercise, we highlight the usefulness of our econometric approach to settings in which
the researcher must work from binary indicators of poverty status, which is often the only
information available from administrative data sources.
The paper proceeds as follows. In the Section 2 below, we ...rst review approaches to esti-
mation of dynamic binary response panel data models, and then propose a general approach
to estimating these models when there is an endogenous regressor. In Section 3, we introduce
the rural China setting, and motivate a speci...c implementation of the model developed in
Section 2, and ...nally describe a strategy for identifying the e¤ect of migration on poverty
s
within China' villages. In Section 4, we discuss our estimation results and the performance
of the model, and then in Section 5 we summarize our results and discuss the potential value
of the estimator introduced in the paper.
3
2 Estimation of a Dynamic Binary Response Panel
Data Model with an Endogenous Regressor
2.1 Dynamic Binary Response Panel Data Models
Dynamic binary response panel data models with unobserved heterogeneity have been used
extensively in theoretical and empirical studies. Both parametric and semi-parametric meth-
ods have been proposed to solve the initial conditions problem and to obtain consistent esti-
mates of model parameters when all explanatory variables other than the lagged dependent
variable are strictly exogenous.2 Semi-parametric methods allow estimation of parameters
without specifying a distribution of the unobserved heterogeneity, but they are often overly
restrictive with respect to the strictly exogenous covariates. Honoré and Kyriazidou (2000),
for example, propose an approach that does not allow for discrete explanatory variables.
More importantly, because semi-parametric methods do not specify the distribution of the
unobserved heterogeneity, the absolute importance of any of the explanatory variables in a
dynamic binary response panel data model cannot be determined. Models which do not place
any assumption on either the unobserved e¤ects or the initial conditions, or their relation-
ship to other covariates, are best described as ...xed e¤ects models, and the semi-parametric
approach of Honoré and Kyriazidou (2000) falls into this class of models.3
Due to their computational simplicity, parametric methods have received greater at-
tention than semi-parametric methods. There are four main parametric approaches, all
employing conditional maximum likelihood (CMLE) analysis, that have been employed for
estimation of the dynamic nonlinear panel data models in which all covariates other than
the lagged dependent variable are strictly exogenous. The ...rst approach treats the initial
conditions for each cross-sectional unit - yi0 - as nonrandom variables. If, in addition, unob-
served e¤ects, ci , are also assumed to be independent of the exogenous explanatory variables,
2
With a structural binary outcome model that allows for unobserved e¤ects, one must be concerned that
bias could be introduced through a systematic relationship between an unobserved e¤ect and the initial value
of the dependent variable. This is known as the initial conditions problem.
3
We follow Chay and Hyslop (2000) in classifying models requiring no assumption on unobservable e¤ects
or initial conditions as ...xed e¤ ect models, and refer to random e¤ ect models as those in which one speci...es
a distribution of unobserved e¤ects and initial conditions given exogenous explanatory variables.
4
zi = (zi1 ; zi2 ; :::; ziT ); one obtains the density of (yi1 ; yi2 ; :::; yiT ) given the initial conditions,
yi0 ; and zi , by integrating out the ci . We refer to the relationship between the observed
exogenous covariates and the unobserved heterogeneity in the ...rst method as one of "pure"
random e¤ects because we assume ci to be independent of zi and yi0 . While this method may
provide a way to obtain consistent estimates of the model parameters, nonrandomness of the
initial conditions requires a very strong and often implausible assumption of independence
between the initial conditions and the unobserved e¤ects.
A second parametric approach involves treating the initial conditions as random and
specifying the density for yi0 given (zi ; ci ). With this density, one can then obtain the joint
distribution of all the outcomes, (yi0 ; yi1 ; yi2 ; :::; yiT ), conditional on unobserved heterogeneity,
ci , and strictly exogenous observables, zi . The most important drawback of this approach,
however, lies with the di¢ culty of specifying the density of yi0 given (zi ; ci ).4
A third method, proposed by Heckman (1981), suggests approximating a density of the
initial conditions, yi0 ; given (zi ; ci ) and specifying a density of the unobserved e¤ects given the
strictly exogenous explanatory variables. The density of (yi0 ; yi1 ; yi2 ; :::; yiT ) given zi can then
s
be obtained. While Heckman' approach avoids the drawbacks of the second method, it is
computationally challenging. Since both the second and the third methods explicitly specify
a distribution of the unobserved heterogeneity conditional on strictly exogenous observables
and a distribution of the initial conditions conditional on the unobserved e¤ects and the
exogenous covariates, they can be classi...ed as random e¤ects models.
Finally, an approach proposed by Wooldridge (2005) recommends obtaining a joint distri-
bution of (yi1 ; yi2 ; :::; yiT ) conditional on (yi0 ; zi ) rather than a distribution of (yi0 ; yi1 ; yi2 ; :::; yiT )
s
conditional on zi as in Heckman' approach. For this method to work, one must specify a
density of ci given (yi0 ; zi ):5 This fourth approach is more exible and requires fewer computa-
s s
tional resources than Heckman' technique. In this method, similar to Heckman' approach,
we call the relationship between the observed exogenous covariates and the unobserved het-
erogeneity a "correlated" random e¤ects relationship because we allow ci to be a linear
function of zi and yi0 :
4
More details on this approach and potential drawbacks can be found in Wooldridge (2002), page 494.
5
s s
The speci...cation of this density in Wooldridge' method is motivated by Chamberlain' (1980) approach,
which models the distribution of the unobserved e¤ect conditional on the strictly exogenous variables.
5
In the next section we develop an approach to consistently estimating parameters of a
dynamic binary response panel data model when the contemporaneous explanatory variables
are not strictly exogenous. To do so, we employ a control function approach, popularized by
Smith and Blundell (1986) and Rivers and Vuong (1988). The main idea of our approach is
to add (control) variables into the structural model to control for endogeneity. We consider
a model with two possible sources of endogeneity: correlation between the unobserved het-
erogeneity and a regressor, and correlation between a regressor and the structural error. For
this reason, we model the relationships among the unobserved e¤ect, exogenous covariates,
and the error from the reduced form equation for the endogenous explanatory variable.
2.2 A General Approach to Estimation
Our speci...cation of the binary response model assumes that for a random draw i from the
population, there is an underlying latent variable model:
y1it = z1it 1 + 2 y2it + y1i;t 1 + c1i + u1it ; (1)
y1it = 1[y1it 0]; t = 1; :::; T; (2)
where z1it is a 1 (K 1) vector of strictly exogenous covariates, which may contain a
constant term, y2it is an endogenous covariate, c1i is an unobserved e¤ect, and u1it is an
idiosyncratic serially uncorrelated error such that Var(u1it ) = 1. 1[ ] is an indicator function.
We assume a sample of size N randomly drawn from the population, and that T; the number
of time periods, is ...xed in the asymptotic analysis. For simplicity, we assume a balanced
panel.
0
Let denote ( 1; 2; )0; which is a 1 (K + 1) vector of parameters. Importantly,
this model allows the probability of success at time t to depend not only on unobserved
heterogeneity, c1i , but also on the outcome in t 1. A key assumption is that the dynamics
in model (1) are correctly speci...ed, in which case dynamic completeness of the model implies
that the error term is serially uncorrelated. Allowing u1it to have arbitrary serial correlation
would suggest including more lags of the dependent variable (1). For example, in the simplest
6
case of a linear model, when an error term, uit , follows AR(1) process, a simple calculation
shows that a dependent variable, yit , actually depends on not only yi;t 1 but also yi;t 2 .
Similarly, in the context of our model, one should have a good reason to expect a serially
correlated error term u1it and yet to include only one lag of y1it .
Further, we make additional assumptions on strict exogeneity of the contemporaneous
explanatory variables. First, conditional on c1i ; the contemporaneous covariates, z1it ; are
assumed to be strictly exogenous. Second, we allow some of the explanatory variables, here
represented by the scalar y2it , to be endogenous:
y2it = z1it 1 + z2it 2 + c2i + u2it
= zit + zi + a2i + u2it
= zit + zi + v2it ; (3)
where t = 1; :::; T , c2i is an unobserved e¤ect, and u2it is an idiosyncratic serially uncorrelated
2
error with Var(u2it ) = 2. Let zit = (z1it ; z2it ) be a 1 L vector of instrumental variables,
with L K; i.e., we assume the vector z2it contains at least one element. Line two of
equation (3) reects our use of the Mundlak-Chamberlain device for the unobserved e¤ect,
c2i . We replace c2i with its projection onto the time averages of all the exogenous variables:
1
P
T
c2i = zi + a2i . Then, the new composite error term is v2it = a2i + u2it . Further, zi = T zit ,
t=1
and = ( 01 ; 0
2 )0: We follow Rivers and Vuong (1988) and refer to (3) as a reduced form
equation.
Next, consider the relationship between u1it and u2it . We assume that (u1it ; u2it ) has a zero
mean, bivariate normal distribution and is independent of zi = (z1i ; z2i ) = (zi1 ; zi2 ; :::; ziT ).
Note that under joint normality of (u1it ; u2it ); with Var(u1it ) = 1, we write
u1it = u2it + e1it
= (v2it a2i ) + e1it ; (4)
where = = 2;
2 = Cov(u1it ; u2it ); 2
2 = Var(u2it ); and e1it is a serially uncorrelated random
term, which is independent of zi and u2it . The absence of serial correlation of e1it follows
7
from the fact that u1it and u2it are both assumed not to su¤er from serial correlation. If
there were no lagged dependent variables on the right hand side of equation (1), there would
be little need to worry about possible serial correlation in the error term u2it of equation (3),
as long as we assume that u1it is also serially uncorrelated. However, we are interested in
a dynamic model, and the assumption of no serial correlation in u2it is crucial for equation
(4). Since equation (3) is essentially a reduced form equation for the endogenous variable
y2it , the assumption of no serial correlation in u2it (and in e1it , as a result) is appropriate in
the context of our model.
Equation (4) is essentially an assumption regarding the contemporaneous endogeneity
of y2it . It suggests that the contemporaneous v2it is su¢ cient for explaining the relation
between u1it and v2it . In other words, once we somehow account for endogeneity of y2it in
period t, we might think that y2it becomes "completely"exogenous, and we can estimate the
parameters of interest using standard methods valid for exogenous explanatory variables.
However, there is the possibility of an additional feedback from the endogenous variable y2
in di¤erent time periods to the main dependent variable of interest, y1 , at time t. This
possibility arises because we let the reduced form equation for the endogenous variable, y2it ,
contain a time-constant unobserved e¤ect, a2i .
2 2 2
From assumption (4), e1it Normal(0; e1 ), where e1 =1 , since Var(u1it ) = 1, and
= Corr(u1it ; u2it ); we write
y1it = 1[x1it + c1i + (v2it a2i ) + e1it 0]
= 1[x1it + v2it + (c1i a2i ) + e1it 0
= 1[x1it + v2it + c0i + e1it 0]; (5)
0
where t = 1; :::; T , x1it = (z1it ; y2it ; y1i;t 1 ), =( 1; 2; )0, and c0i = c1i a2i is a composite
unobserved e¤ect. A potential limitation of the assumptions we use to arrive at equation
(5) is that they rule out endogenous regressors that are discrete or have severely limited
support. In the application we present in section 3 below, y2it will be the share of registered
long-term village residents who are employed as migrants outside the village and the support
for this variables will be comfortably within the [0,1] interval. Thus, the above assumptions
8
are plausible.
Since the unobserved e¤ect c0i is present in equation (5), we should consider the relation
between the unobserved e¤ect c0i and the explanatory variables in equation (5). Importantly,
the composite unobserved e¤ect c0i is a function of v2it , where t = 1; :::; T; by construction:
c0i = c1i a2i = c1i (v2it u2it ); t = 1; :::; T:
Thus, in order to obtain consistent estimates of the parameters from equation (5), we must
take into account the relation between c0i and v2it in di¤erent time periods.
First, we use a "pure"random e¤ects approach, i.e., we assume that
2
c0i jzi ; y1i0 ; v2i Normal( 0 v 2i ; a1 ); t = 1; :::; T; (6)
which can be written as c0i = + a1i ; t = 1; :::; T; where a1i jzi ; y1i0 ; v2i Normal(0; 21 )
0 v 2i a
1
PT
and is independent of (zi ; y1i0 ; v2i ), where v 2i = T v2it ; and v2i = (v2i1 ; v2i2 ; :::; v2iT ). While
t=1
a limiting assumption in many potential applications, the "pure"random e¤ects assumption
(6) may be relevant for certain cases. In particular, when every individual in the initial
time period is in the same state (e. g., we are interested in the population of people who
smoke), assumption (6) might be appropriate. Further, since we assume that the composite
unobserved e¤ect, c0i , is independent of the initial condition, y1i0 , it is natural to think that
s
v2it ' in di¤erent time periods have equal impacts on c0i . Consequently, we employ v 2i as a
s
su¢ cient statistic for describing the relation between c0i and v2it ' in di¤erent time periods.
Then, under assumptions (1)-(4) and (6), we rewrite equation (5) as
y1it = 1[x1it + v2it + 0 v 2i + a1i + e1it 0]: (7)
Clearly, the estimates of = p 2 + 2
, = p 2 + 2
, and 0 = p 0
2 + 2
can be obtained
e1 a1 e1 a1 e1 a1
using standard random e¤ects probit software by including v 2i in each time period into the
^
1
P
T
list of explanatory variables along with x1it and v2it , where v 2i = T v2it :
^ ^ ^
t=1
As we discussed earlier, however, the assumption of independence between the unob-
served e¤ect, the initial conditions and the exogenous covariates is often too restrictive. In
9
particular, the "pure"random e¤ects assumption is unrealistic in the context of the applica-
tion to poverty persistence that we will examine below. For instance, unobserved dimensions
of ability are very likely to be related to poverty status not only in the initial period, but
also in future periods.
Rather than using a "pure" random e¤ects approach, we build on the dynamic "corre-
lated" random e¤ects model introduced by Wooldridge (2005). Instead of the conditional
distribution of c0i assumed in (6), we now assume that
2
c0i jzi ; y1i0 ; v2i Normal(v2i 0 + zi 1 + 2 y1i0 ; a1 ); (8)
2
which follows from writing c0i = v2i 0 +zi 1 + 2 y1i0 +a1i ; where a1i jzi ; y1i0 ; v2i Normal(0; a1 )
and independent of (zi ; y1i0 ; v2i ). Since we allow for a nonzero correlation between the com-
s
posite unobserved e¤ect, c0i , and the initial condition, y1i0 , v2it ' in di¤erent time periods
s
might have di¤erent e¤ects on c0i . Thus, we let v2it ' from di¤erent time periods have un-
s
equal "weights" for explaining c0i . Assumption (8) extends Chamberlain' assumption for
a static probit model to the dynamic setting. To allow for correlation between c0i and zi
and y1i0 ; we assume a conditional normal distribution with linear expectation and constant
variance. Assumption (8) is a restrictive assumption since it speci...es a distribution for c0i
given zi ; y1i0 ; v2i . However, it is an improvement on the "pure" random e¤ects approach
in that it allows for some dependence between the unobserved e¤ect and the vector of all
explanatory variables across all time periods.
Then, under assumptions (1)-(4) and (8), we rewrite equation (5) as
y1it = 1[x1it + v2it + c0i + e1it 0]
= 1[x1it + v2it + v2i 0 + zi 1 + 2 y1i0 + a1i + e1it 0]: (9)
Equation (9) suggests that we can estimate = p 2 + 2
and = p 2 + 2
along with
e1 a1 e1 a1
0 = p 0
2 + 2
, 1 = p 1
2 + 2
and 2 = p 2
2 + 2
using standard random e¤ects probit
e1 a1 e1 a1 e1 a1
software by including v2i ; zi ; and y1i0 in each time period into the list of explanatory variables
^
along with x1it and v2it :
^
10
2.3 Allowing for Serial Correlation of Errors in the First Stage
If the ...rst stage error, u2it ; is serially correlated, we must modify our two-step estimating
procedure. To be speci...c, assume u2it follows an AR(1) process: u2it = u2i;t 1 + e2it , where
2
e2it is a white noise error with Var(e2it ) = e2 , and
Cov(e1it ; e1it 1 ) = Cov(u1it u2it ; u1i;t 1 u2i;t 1 )
2
= Cov(u1it u2it e2it ; u1i;t 1 u2i;t 1 ) = E(u2 1 );
2i;t
which is more than 0; unless either = 0 or = 0. Clearly, assumption (4) is no longer
appropriate and must be modi...ed.
0
De...ne the variance-covariance matrix of v2i as E(v2i v2i ); a T T matrix that we
assume to be positive de...nite. Then,
0 1
2 T 2 T 1
1
B C
B T 3 T 2 C
B 1 C
B C
B 2 T 4 T 3 C
2B
B 1 C
0 2 0
E(v2i v2i ) = a2 jT jT + 2 B C; (10)
.
. ... .
. C
B . . C
B C
B C
B T 2 T 3 T 4
1 C
@ A
T 1 T 2 T 3
1
2
2 e2
where jT is a T 1 vector of ones, and 2 = 1 2 . We can obtain consistent estimates of the
parameters in (10), and use them to transform v2it to v2it , which is a ...rst stage error free
2 2 2
of serial correlation. One useful method for estimating , a2 , e2 , and 2 is the minimum
distance estimator, described in detail by Chamberlain (1984).6
Once we have ...rst stage errors free of serial correlation, we use the transformation u2it =
vit a2i to adjust assumption (4). We can then assume that under joint normality of
6
Cappellari (1999) has developed code that conveniently implements this method in Stata.
11
(u1it ; u2it ),
u1it = u2it + e1it
= (v2it a2i ) + e1it ; (11)
where e1it is a serially uncorrelated random term, which is independent of zi and u2it . Inclu-
sion of u2it instead of u2it in equation (11) guarantees that e1it will not be serially correlated.
We are then able to write
y1it = 1[x1it + c1i + v2it a2i + e1it 0]
= 1[x1it + v2it + (c1i a2i ) + e1it 0
= 1[x1it + v2it + c0i + e1it 0]; (12)
where t = 1; :::; T , and c0i = c1i a2i is a composite unobserved e¤ect.
Based on equation (12), it is straightforward to adjust the two-step estimating procedure
discussed in Section 2.2 to account for the presence of the serial correlation in u2it . Under
the "correlated"random e¤ects assumption (8), equation (12) can be written as
y1it = 1[x1it + v2it + c0i + e1it 0]
= 1[x1it + v2it + v2i 0 + zi 1 + 2 y1i0 + a1i + e1it 0]: (13)
Then, we can estimate the parameters , , 1, and 2 using standard random e¤ects probit
software by including v2i , zi , and y1i0 in each time period into the list of the explanatory
^
variables along with x1it .
2.4 Calculation of Average Partial E¤ects
To assess the magnitude of state dependence we must calculate the average partial e¤ect
(APE) of the lagged dependent variable on its current value. We follow an approach proposed
by Wooldridge (2002) to calculate the APEs after our two-step estimation procedure. The
12
APEs can be calculated by taking either di¤erences or derivatives of
E[ (x1t + v2it + v2i 0 + zi 1 + 2 y1i0 )]; (14)
where t = 1; :::; T , variables with a subscript i are random and all others are ...xed.
In order to obtain estimates of the parameter values in (14), we appeal to a standard
uniform weak law of large numbers argument.7 For any given value of x1t (x0 ), a consistent
1
estimator for expression (14) can be obtained by replacing unknown parameters by consistent
estimators:
X
N
N 1
(x0 ^ + ^ v2it + v2i ^ 0 + zi ^ 1 + ^ 2 y1i0 );
1 ^ ^ (15)
i=1
where t = 1; :::; T , the v2it are the ...rst-stage pooled OLS residuals from regressing y2it on zit ,
^
1=2
v2i = (^i1 ; vi2 ; :::; viT ), the
^ v ^ ^ subscript denotes multiplication by ^ 2 = ( 2\21 )
e1 + a , and
^ ; ^; ^ 0 ; ^ 1 ; ^ 2 ; and ^ 2 are the conditional MLEs. Note that ^ 2 is the usual error variance
estimator from the second-stage random e¤ects probit regression of y1it on x1it ; v2it ; zi ; and
^
y1i0 : One may then employ either a mean value expansion or a bootstrapping approach to
obtain asymptotic standard errors. We can compute either changes or derivatives of equation
(15) with respect to x1t to obtain the APEs of interest.
In common with the adjustment to our estimating procedure, one must also correct the
estimated APEs when errors are serially correlated. We obtain the APEs by taking either
di¤erences or derivatives of
E[ (x1t + v2it + v2i 0 + zi 1 + 2 y1i0 )]; (16)
where t = 1; :::; T . For any given value of x1t (x0 ), a consistent estimator of expression (16)
1
is obtained by replacing unknown parameters by consistent estimators:
X
N
N 1
(x0 ^ + ^ v2it + v2i ^ 0 + zi ^ 1 + ^ 2 y1i0 );
1 ^ ^ (17)
i=1
7
See Wooldridge (2002) for details.
13
where t = 1; :::; T , v2it is a ...rst stage residual cleaned of serial correlation, where the
^
1=2
subscript denotes multiplication by ^ 2 = ( 2\21 )
e1 + a , and ^ ; ^; ^ 1 ; ^ 2 ; and ^ 2 are the
conditional MLEs. One may then compute derivatives of equation (17) with respect to x1t
to obtain the APEs of interest.
3 Migrant Labor Markets and Poverty Persistence in
Rural China
Before applying the dynamic binary response model developed above to an analysis of how
y
migration a¤ects poverty status in rural China, we ...rst brie review the history of rural-
urban migration in China and review other evidence on the impacts of migration in migrant
sending communities, and introduce the data source that will be used for our analysis. Next,
we propose a speci...c implementation of the dynamic binary response model to an analysis of
the impact of migration on the probability that a rural household is poor. We then describe
our approach to identifying the migrant networks, which a¤ect the cost of ...nding migrant
employment for village residents.
3.1 Rural-Urban Migration in China
Rapid growth in the volume of rural migrants moving to urban areas for work during the
s
1990s signalled a fundamental change in China' labor market. Estimates using the one
percent sample from the 1990 and 2000 rounds of the Population Census and the 1995
one percent population survey suggest that the inter-county migrant population grew from
just over 20 million in 1990 to 45 million in 1995 and 79 million by 2000 (Liang and Ma,
2004). Surveys conducted by the National Bureau of Statistics (NBS) and the Ministry of
Agriculture include more detailed retrospective information on past short-term migration,
and suggest even higher levels of labor migration than those reported in the census (Cai,
Park and Zhao, 2008).
Before labor mobility restrictions were relaxed, households in remote regions of rural
China faced low returns to local economic activity, reinforcing geographic poverty traps
14
(Jalan and Ravallion, 2002). A considerable body of descriptive evidence related to the
growth of migration in China raises the possibility that migrant opportunity may be an
important mechanism for poverty reduction. Studies of the impact of migration on migrant
households suggest that migration is associated with higher incomes (Taylor, Rozelle and
de Brauw, 2003; Du, Park, and Wang, 2006), facilitates risk-coping and risk-management
(Giles, 2006; Giles and Yoo, 2007), and is associated with higher levels of local investment
in productive activities (Zhao, 2003).
The use of migrant networks and employment referral in urban areas are important
s
dimensions of China' rural-urban migration experience. Rozelle et al (1999) emphasize that
villages with more migrants in 1988 experienced more rapid migration growth by 1995. Zhao
(2003) shows that number of early migrants from a village is correlated with the probability
that an individual with no prior migration experience will choose to participate in the migrant
labor market. Meng (2000) further suggests that variation in the size of migrant ows to
di¤erent destinations can be partially explained by the size of the existing migrant population
in potential destinations.8
3.2 The RCRE Household Survey
The primary data sources used for our analyses are the village and household surveys con-
s
ducted by the Research Center for Rural Economy at China' Ministry of Agriculture from
1986 through the 2003 survey year. We use data from 90 villages in eight provinces (Anhui,
Jilin, Jiangsu, Henan, Hunan, Shanxi, Sichuan and Zhejiang) that were surveyed over the
17-year period, with an average of 6305 households surveyed per year. Depending on village
size, between 40 and 120 households were randomly surveyed in each village.
The RCRE household survey enumerates detailed household-level information on incomes
and expenditures, education, labor supply, asset ownership, land holdings, savings, formal
8
s
Referral through one' social network is a common method of job search in both the developing and
developed world. Carrington, Detragiache, and Vishnawath (1996) explicitly show that in a model of mi-
gration, moving costs can decline with the number of migrants over time, even if wage di¤erentials narrow
between source communities and destinations. Survey-based evidence suggests that roughly 50 percent of
new jobs in the US are found through referrals facilitated by social networks (Montgomery, 1991). In a study
s
of Mexican migrants in the US, Munshi (2003) shows that having more migrants from one' own village living
in the same city increases the likelihood of employment.
15
and informal access to credit, and remittances.9 In common with the National Bureau of
Statistics (NBS) Rural Household Survey, respondent households keep daily diaries of income
and expenditure, and a resident administrator living in the county seat visits with households
once a month to collect information from the diaries.
Our measure of consumption includes nondurable goods expenditure plus an imputed
ow
of services from household durable goods and housing. In order to convert the stock of
ow
durables into a of consumption services, we assume that current and past investments
in housing are "consumed"over a 20-year period and that investments in durable goods are
consumed over a period of 7 years.10 We also annually "inate" the value of the stock of
ect
durables to re the increase in durable goods'prices over the period. Finally, we deate
all income and expenditure data to 1986 prices using the NBS rural consumer price index
for each province.
There has been some debate over the representativeness of both the RCRE and NBS
surveys, and concern over di¤erences between trends in poverty and inequality in the NBS
and RCRE surveys. These issues are reviewed extensively in Appendix B of Benjamin et al
(2005), but it is worth summarizing some of their ...ndings here. First, when comparing cross
sections of the NBS and RCRE surveys with overlapping years from cross sectional surveys
not using a diary method, it is apparent that some high and low income households are
under-represented.11 Poorer illiterate households are likely to be under-represented because
enumerators ...nd it di¢ cult to implement and monitor the diary-based survey, and refusal
rates are likely to be high among a- uent households who ...nd the diary reporting method a
costly use of their time. Second, much of the di¤erence between levels and trends from the
NBS and RCRE surveys can be explained by di¤erences in the valuation of home-produced
grain and treatment of taxes and fees.
9
One shortcoming of the survey is the lack of individual-level information. However, we know the numbers
of working-age adults and dependents, as well as the gender composition of household members.
10
Our approach to valuing consumption follows the suggestions of Chen and Ravallion (1996) for the NBS
Rural Household Survey, and is explained in more detail in Appendix A of Benjamin et al. (2005).
11
The cross-sections used were the rural samples of the 1993, 1997 and 2000 China Health and Nutrition
Survey (CHNS) and a survey conducted in 2000 by the Center for Chinese Agricultural Policy (CCAP) with
Scott Rozelle (UC Davis) and Loren Brandt (University of Toronto).
16
3.3 Migration and Poverty
One of the bene...ts of the accompanying village survey is a question asked each year of
village leaders about the number of registered village residents working and living outside
the village. In our analysis, we consider all registered residents working outside their home
county to be migrants.12 Both the tremendous increase in migration from 1987 onward and
heterogeneity across villages are evident in Figure 1. In 1987 an average of 3 percent of
working age laborers in RCRE villages were working outside of their home villages, which
rose steadily to 23 percent by 2003. Moreover, we observe considerable variability in the
share of working age laborers working as migrants. Whereas some villages still had a small
share of legal village residents employed as migrants, more than 50 percent of working age
adults from other villages were employed outside of home villages by 2003.
In other research using this data source, de Brauw and Giles (2008) use linear dynamic
panel data methods with continuous regressors to demonstrate a robust relationship between
the reduction of obstacles to rural-urban migration and household consumption growth.
While one might suspect that the non-poor, who have su¢ ciently high human capital and
other dimensions of ability, may bene...t most from reductions in barriers to migration, gen-
eral equilibrium e¤ects of out-migration may lead to greater specialization of households in
villages and this may have bene...ts for the poor. In particular, de Brauw and Giles demon-
strate that households at the lower end of the consumption distribution tend to expand
both labor supply to productive activities and the land per capita cultivated by their house-
holds than do richer households when out-migration increases. This raises the prospect that
migration may be causally related to poverty reduction within rural communities as well.
Changes in the village poverty headcount are negatively associated with the change in the
number of out-migrants, suggesting that poverty declines with increased out-migration (Fig-
ure 2). Nonlinearities in the bivariate relationship are evident in the non-parametric lowess
plot of the relationship. Whether obvious non-linearities are related to the simultaneity of
shocks and increases in out-migration and poverty for some villages or the simple fact that we
12
From follow up interviews with village leaders, it is apparent that registered residents living outside the
county are unlikely to be commuters and generally live and work outside the village for more than six months
of the year.
17
have not controlled for other characteristics of villages, establishing a relationship between
migration and increased poverty within villages is likely to require an analytical approach
that eliminates endogeneity bias due to simultaneity and potential sources of unobserved
heterogeneity.
In the empirical application of our discrete binary response model below, we examine
whether out-migration from villages is associated with reductions in the probability that
household consumption falls below the poverty line in rural China. Researchers in the
poverty literature have questioned the appropriateness of running poverty regressions of this
type because the analyst discards richer information provided by the complete distribution
of consumption in favor of a binary variable. Not only is information discarded, but one also
introduces distributional assumptions associated with estimating a binary response model.13
While recognizing these concerns, our examination of poverty persistence using a dynamic
binary response model is useful for two reasons: ...rst, it helps to highlight the strengths
of our approach to estimating dynamic binary response models. When analysts only have
access to administrative data on such outcomes as receipt of unemployment bene...ts or
welfare participation, then analysis of persistence in participation or receipt of support is
important and requires a binary outcome model (e.g., Adren, 2007; Bane and Ellwood,
1986). While our analysis discards some information, we do this to provide evidence on the
appropriateness of our approach to estimating dynamic binary response models. Second, use
of a dynamic binary response model focusses attention on whether or not a household passes
a speci...c point in the distribution of consumption, or alternatively income (e.g., Biewen,
2009; Hansen and Wahlberg, 2009). By doing this, we address a policy-relevant question of
how a treatment, in this case increased migration, a¤ects the likelihood that poor households
will remain poor and the likelihood that non-poor households will fall into poverty. We are
agnostic as to whether poverty is reduced through direct participation in the migrant labor
market, or through indirect general equilibrium e¤ects that raise the return to labor in
agricultural and other local activities.
13
See Ravallion (1996) for a useful exposition of these issues.
18
3.4 Estimating the Impact of Migrant Labor Markets on Poverty
Persistence
We will estimate the dynamic binary outcome model for the likelihood that a household i
from village j falls below the poverty line at time t:
i i 0
povit = 1[ 1 povit 1 + 2 (Mjt povit 1 )+ 3 Mjt +Xit 1+ 2 lpcit +Dt +ui +vj tt +"it ]; (18)
where povit is a binary indicator for whether the household is poor in year t. Current poverty
status will be a¤ected by poverty status in the prior period, povit 1 , the size of the migrant
network from village j through which the household i may be able to obtain a job referral,
i
Mjt , a vector of household demographic and human capital characteristics, Xit , household
land per capita, lpcit , and year dummies to control for macroeconomic shocks, Dt . We
will be concerned about the possibility that an unobserved household e¤ect, ui , may be
s
systematically related to the size of the household' migrant network, to other covariates,
and to household poverty status, and thus introduce endogeneity concerns. Since village ...xed
e¤ects are at a higher level of aggregation than household ...xed e¤ects, when controlling for
household ...xed e¤ects, we also e¤ectively control for ...xed e¤ects associated with the village
in which households are located. Further, we will be concerned that there may be village-
speci...c trends, vj tt , related to underlying endowments and initial conditions that also
have an impact on household poverty status. The error term, "it , may be serially correlated,
and we are concerned that shocks in the error term may also be systematically related to
i
the size of the migrant network, Mjt , and to the possibility of falling into poverty, and thus
contribute an additional source of endogeneity.
From the model speci...ed in (18), we are particularly interested in identifying the coe¢ -
i i i
cients on povit 1 , Mjt and Mjt povit 1 . The coe¢ cients on povit 1 and Mjt povit 1 allow
us to gauge the importance of persistence in the probability that a household is poor, and
the impact of access to migrant employment opportunities through the migrant network on
i
poverty persistence. 3, the coe¢ cient on Mjt ; allows us to determine the impact of the
migrant network on the probability that a household will fall into poverty.
The speci...cation shown in (18) may have additional sources of endogeneity if we be-
19
lieve that household demographic and human capital variables in Xit , or land per capita,
lpcit , vary with unobserved shocks in period t or t 1. We address the possible concern
over endogenous household composition by using household demographic and human capital
variables for the legal long-term registered residents of households. While household size
may vary somewhat with shocks as individuals move in and out of the household for the
purpose of ...nding temporary work elsewhere, such variations do not show up in registered
household membership. Long-term membership only changes when households split with
such events as marriage or legal change of residence to another location. Land managed by
the household may also vary with shocks. Land markets in rural China do not function well:
land cannot be bought and sold, and only in the last few years have farmers gained the right
to explicitly transfer land. Instead land is allocated by village leaders, and reallocated or
adjusted among households within village small groups if a household is judged to have too
little land to support itself. Nonetheless, there is some possibility that reallocation may be
related to shocks that occur in period t or t 1 that may also be systematically related to
poverty status and the migrant network size.14 We thus use the period t 2 value of land
per capita and estimate:
i i 0
povit = 1[ 1 povit 1 + 2 (Mjt povit 1 )+ 3 Mjt +Xit 1+ 2 lpcit 2 +Dt +ui +vj tt +"it ] (19)
i
One remaining issue is that we do not perfectly observe the network Mjt through which
household i may use for job referrals. Instead, we observe the share of registered long-term
village residents who are employed as migrants outside the village in a particular year, or
Mjt . The true migrant network may include former legal registered residents who have
now changed their long-term residence status, implying that the actual potential network is
larger. Alternatively, the household may not be familiar with all of the village out-migrants,
and thus the actual network through which a household may seek referrals may be smaller.
Thus, we will estimate:
0
povit = 1[ 1 povit 1 + 2 (Mjt povit 1 )+ 3 Mjt +Xit 1+ 2 lpcit 2 +Dt +ui +vj tt +"it ] (20)
14
Wooldridge (2002) shows that when the assumption of strict exogeneity of the regressors fails in the
context of the standard FE estimation the inconsistency of the instrument is of order T 1 .
20
In our identi...cation strategy below, we will instrument the endogenous share of village
out-migrants, Mjt , with village level instruments, identifying the size of the village migrant
labor force, interacted with period t 2 lagged land per capita, lpcit 2 , in order to allow for
di¤erences in the e¤ective value of the village migrant network for households with di¤erent
amounts of land.
3.5 Identi...ying the Migrant Network
To identify the village migrant network, we make use of two policy changes that, working
together, a¤ect the strength of migrant networks outside home counties but are plausibly
unrelated to consumption growth. First, a new national ID card (shenfen zheng) was intro-
duced in 1984. While urban residents received IDs in 1984, residents of most rural counties
did not receive them immediately. In 1988, a reform of the residential registration system
made it easier for migrants to gain legal temporary residence in cities, but a national ID
card was necessary to obtain a temporary residence permit (Mallee, 1995). While some
rural counties made national IDs available to rural residents as early as 1984, others distrib-
uted them in 1988, and still others did not issue IDs until several years later. The RCRE
follow-up survey asked local o¢ cials when IDs had actually been issued to rural residents
of the county. In our sample, 41 of the 90 counties issued cards in 1988, but cards were
issued as early as 1984 in three counties and as late as 1997 in one county. It is important to
note that IDs were not necessary for migration, and large numbers of migrants live in cities
without legal temporary residence cards. However, migrants with temporary residence cards
have a more secure position in the destination community, hold better jobs, and would thus
plausibly make up part of a longer-term migrant network in migrant destinations. Thus, ID
distribution had two e¤ects after the 1988 residential registration (hukou) reform. First, the
costs of migrating to a city should fall after IDs became available. Second, if the quality
of the migrant network improves with the years since IDs are available, then the costs of
...nding migrant employment should continue to fall over time.
As a result, the size of the migrant network should be a function of both whether or not
cards have been issued and the time since cards have been issued in the village. Given that
the size of the potential network has an upper bound, we expect the years-since-IDs-issued to
21
have a non-linear relationship with the size of the migrant labor force and we expect growth
in the migrant network to decline after initially increasing with distribution of IDs. In Figure
2, we show a lowess plot of the relationship between years since IDs were distributed and
the number of migrants from the village from year t 1 to t. Note the sharp increase in
migrants from the time that IDs are distributed and then a slowing of the increase over time
(which would imply an even slower growth rate). This pattern suggests non-linearity in the
relationship between ID distribution and new participants in the village migrant labor force.
We thus specify our instrument as a dummy variable indicating that IDs had been issued
interacted with the years since they had been issued, and then experimented with quadratic,
cubic and quartic functions of years-since-IDs-issued. We settle on the quartic function for
our instruments because, as we show below, it ...ts the pattern of expanding migrant networks
better than the quadratic or the cubic functions.
Since ID distribution was the responsibility of county level o¢ ces of the Ministry of Civil
A¤airs, which are distinctly separate from agencies involved in setting policies a¤ecting
land, credit, taxation and poverty alleviation (the Ministry of Agriculture and Ministry of
Finance handle most decisions that a¤ect these policies at the local level), it is plausible
that ID distribution is not be systematically related to unobservable policy decisions with
more direct relationship to household consumption. Ideally, a policy would exist that was
randomly implemented, a¤ecting the ability to migrate from some counties but not others. As
the di¤erential timing of the distribution of ID cards was not random, we must be concerned
that counties with speci...c characteristics or that followed speci...c policies were singled out
to receive ID cards earlier than other counties, or that features of counties receiving IDs
earlier are systematically correlated with other policies a¤ecting consumption growth. These
counties, one might argue, were "allowed"to build up migrant networks faster than others.
In two earlier papers, de Brauw and Giles (2008a and 2008b) address several possible
concerns with use of the years-since-IDs quartic as instruments for the size of the village
migrant labor force. They ...rst show that timing of ID distribution appears to be related to
remoteness of the village, but not systematically related to village policies that may a¤ect
consumption growth, with village administrative capacity, or with the demand for IDs within
the village. They thus argue in favor of including a village ...xed e¤ect to control for features
22
of the local county which may have a¤ected timing of ID distribution, and then identify the
size of the village migrant labor force o¤ of non-linearities in the time that it requires for
migrant networks to build up.
In this paper, we identify the village migrant network by further interacting the quartic
in years-since IDs with land per capita held by households in period t 2: Why might we
expect that interacting with lpcit 2 might achieve this? We believe that the land per capita
managed by households will likely pick up a dimension of proximity of di¤erent households
within the village. Within villages in rural China, households are separated into smaller units
of roughly 20 households known as village small groups (cun xiaozu), which were referred to
as production teams during the Maoist period. These households are located in clusters and
will have closer relationships with one another than with households of other small groups.
Moreover, property rights to land in rural China typically reside with the small group,
not with the village. Thus, when land reallocations take place they typically take place
within but not across small groups. Small groups make more frequent small adjustments to
household land as the land per capita available starts to become unequal with di¤erential
changes in household structure across households within the small group, but there is much
less exibility in making adjustments across small groups. As a result, much of the variability
of land per capita within villages occurs across small groups.15 Interacting a village level
instrument for the migrant network with land per capita will allow the importance of Mjt
to vary across households, and much of the di¤erence across households occurs because of
unobserved di¤erences in the small groups in which they reside and from which migrants
refer to as home.
As period t 2 lagged land per capita appears as an exogenous regressor and is also in-
teracted with the quartic in years since IDs were distributed in the ...rst stage, our estimation
approach must also eliminate bias introduced through likely serial correlation of the error
term in both the ...rst stage regression. To this end, it is important to note that our two-step
15
We do not know village small group membership in the RCRE survey prior to 2003 when a new survey
instrument was introduced. If we regress land per capita on village dummy variables in 2003, we obtain
an R-Squared of 0.503, while if we run a regression of land per capita on small group dummy variables, we
obtain an R-Squared of 0.616. A Lagrange Multiplier test for whether the small group e¤ects add anything
signi...cant over the village e¤ects, which is e¤ectively a test of whether small group coe¢ cients are constant
within villages, yields an LM statistic of 310.67, which has a p-value of 0.0000.
23
estimation procedure developed in Section 2 above allows for serial correlation of ...rst-stage
errors.
4 Results
Before estimating equation (20), we establish that our instruments are signi...cantly related
to the migrant share of the village labor force. We estimate the relationship as a quadratic,
cubic, and quartic function of the years since IDs were issued each interacted with period t 2
(3) (6)
land per capita. These results are reported in columns (1) and columns (4) of Table
2 for odd years from 1989-2001.16 We ...nd a strong relationship between our instruments and
the size of the migrant network for each speci...cation. For the remainder of our estimation
we favor the quartic function interacted with t 2 land per capita for two reasons: First,
the e¤ects of ID card distribution on the migration network can be determined more exibly
when we use the quartic speci...cation. Secondly, the partial R2 increases slightly from the
quadratic to the quartic for the both samples we consider. After controlling for the household
characteristics, the instruments have jointly signi...cant e¤ects on the share of migrants with
an F-statistic of 44.62 for the 1989 to 2001 sample.
We next proceed to estimate model (20), but ...rst treat migration as exogenous and
show results for both linear probability and probit implementations in Table 3. In all four
speci...cations, we observe a positive association between migrant share of the village and
probability that a household is below the poverty line, and this reects the response of
households to short-term shocks and the simultaneity between short-term migration and
consumption decisions. Given the descriptive evidence shown in Figure 2, it is unsurprising
to ...nd that short-term increases in the poverty headcount will be correlated with year to
year changes in the share of migrants from the village. The positive relationship suggests
that migration is truly endogenous and suggests the need for an estimation strategy that
allows for identi...cation in a dynamic binary response model where there are endogenous
regressors. When we introduce the years-since IDs instrument, which is shown elsewhere to
16
Since the RCRE survey was not conducted in 1992 and 1994, we estimate the dynamic model with
two-year spacing from 1989 to 2001.
24
be unrelated to short-term uctuations in the local economy (de Brauw and Giles, 2008a and
2008b), we can identify the longer term relationship between growth of the migrant labor
market and the probability that a household will fall below the poverty line.
In Table 4, we report the control function (CF) estimation results based on the"pure"
random e¤ects and "correlated"random e¤ects approaches. For the purposes of comparison,
we also estimate model (20) using a naive linear probability model (LPM). As one might
expect, the coe¢ cients on lagged poverty status are signi...cant and positive, indicating a
strong persistence in poverty status, both in the pure random e¤ects approach shown in
columns (1) and (3) and in the correlated random e¤ects models shown in columns (2) and
(4). The decline in the value of the coe¢ cient on lagged poverty status between pure and
correlated random e¤ects models, (1) and (2) for linear probability models and (3) and (4) for
the dynamic probit models, suggests that unobserved heterogeneity associated with poverty
status introduces considerable upward bias in estimates of poverty persistence. Estimates of
poverty persistence using either a dynamic linear probability model or the dynamic probit
would lead the researcher to overstate the importance of chronic, persistent poverty. The
signi...cant coe¢ cient on the initial value of poverty status in the correlated random e¤ects
models suggests a substantial correlation between unobserved e¤ects and the initial condition.
Once we instrument for migrant share of the registered village population, and thus
control for simultaneity bias introduced through shocks to the local economy, we ...nd that
the migrant labor market is negatively associated with the probability of falling into poverty.
Moreover, the coe¢ cient on the interaction of village migrant share and lagged poverty
status suggests that the magnitude of the e¤ect of migration on poverty reduction is greater
among households who were poor in the previous period, and thus migration reduces poverty
persistence even more than it reduces the likelihood that the non-poor will fall into poverty.
This result is consistent with de Brauw and Giles (2008a), who ...nd that in a linear panel
data framework, households with lower levels of prior consumption tend to experience more
rapid consumption growth with increased out-migration from rural villages.17
17
We employ the Hausman test for endogeneity to formally assess the need to control for endogeneity of
migration share. The t-statistic for the signi...cance of the ...rst-stage residuals in the pure RE probit model
is 3.08 with p-value of 0.002, which suggests there is enough evidence to reject the null hypothesis that the
share of village out-migrants is exogenous. For the correlated RE probit, the t-statistic for the ...rst-stage
residuals is 2.93 with p-value of 0.003. Thus, for the correlated RE model, we also reject the null hypothesis
25
In order to examine the e¤ect of migration on poverty persistence, we calculate the
average partial e¤ects (APEs) using the coe¢ cient on share of migrants and the interaction
term and show the estimates in Table 5. The APEs calculated using the correlated random
e¤ects dynamic probit approach (models 3 and 4) are generally smaller than those calculated
using the linear probability model (models 1 and 2). The naive LPM approach, which is
often preferred as a means of avoiding dynamic nonlinear models, will lead us to conclude
that migraton has a more pronounced impact on poverty reduction than one ...nds using
the correlated random e¤ects probit model. Again the consequences of ignoring unobserved
heterogeneity in the dynamic binary response model are of considerable interest. Failure
to control for unobserved heterogeneity in the pure random e¤ects model would lead us to
overstate the e¤ects of previous period poverty status on current poverty and understate the
e¤ect of the migrant labor market in contributing to reductions in the probability that a
household would fall below the poverty line. For those households living above the poverty
line, the correlated random e¤ects CF estimate of the APE (model 4) suggests that a one
percent increase in the share of village residents working as migrants would reduce the
probability of falling into poverty by about 3.2 percentage points. For those already below
the poverty line, the correlated random e¤ects CF estimate of the APE shows that a one
percent increase in the village migrant share will reduce the probability of remaining in
poverty by 3.5 percentage points.
5 Conclusions
In this paper, we have developed a dynamic binary response panel data model that allows for
an endogenous regressor. The control function approach which we implement is of particular
value for settings in which one wants to estimate the e¤ects of a treatment which is also
endogenous. Our empirical example demonstrates that alleviating an omitted variables bias
can lead to estimated e¤ects that are larger in absolute value when we allow for the correlation
between unobserved heterogeneity, initial conditions and exogenous variables.
We apply the model to examine the impact of rural-urban migration on the likelihood
that the share of migrants is exogenous.
26
that households in rural China fall below the poverty line. Our application demonstrates
that migration is important both for reducing the likelihood that households remain in
poverty or fall into poverty if they were not poor in the previous period. From this speci...c
application, we show that failing to adequately control for unobserved heterogeneity in non-
linear dynamic panel data models will introduce substantial bias to parameter estimates.
In particular, failure to control for unobserved heterogeneity would lead us to overstate the
persistence of poverty and to understate the role that migration plays in poverty reduction.
Apart from analyzing the e¤ects of migration on a binary outcome, our application
suggests that there may be many other settings in which the correlated random e¤ects
control function approach may improve an existing analytical approaches. In any analysis
aiming to examine how a new program a¤ects persistence of a state, one may be concerned
that unobserved heterogeneity will lead to upward bias in estimates of the e¤ect of the initial
state. Moreover, as program participation, or take-up, may be endogenous, the analyst will
need to worry about this source of bias as well. The empirical strategy developed in Section
2 o¤ers a parametric solution to the more general problem of identifying the impact of an
endogenous treatment in a dynamic binary response model.
27
6 References
Adren, T. 2007. "The Persistence of Welfare Partipation," IZA Discussion Paper 3100, Oc-
tober 2007.
Bane, M.J. and D.T. Ellwood. 1986. "Slipping into and out of Poverty: The Dynamics
of Spells,"The Journal of Human Resources, 21(1): 1-23.
Benjamin, D., L. Brandt and J. Giles. 2005. "The Evolution of Income Inequality in
Rural China,"Economic Development and Cultural Change 53(4): 769-824.
Biewen, M. 2009. "Measuring State Dependence in Individual Poverty Status: Are There
Feedback E¤ects to Employment Decisions and Household Composition?" Journal of Applied
Econometrics 24(7): 1095-1116.
Cai, F., A. Park and Y. Zhao. 2008. "The Chinese Labor Market," chapter prepared
s
for China' Great Economic Transition, Loren Brand and Thomas Rawski (eds), Cambridge
University Press.
Cappellari, L. 1999. "Minimum Distance Estimation of Covariance Structures," 5th UK
Meeting of Stata Users.
Carrington, W., E. Detragiache and T. Vishnawath. 1996. "Migration with Endogenous
Moving Costs," American Economic Review 86(4): 909-930.
Chamberlain, G. 1980. "Analysis of Covariance with Qualitative Data," Review of Eco-
nomic Studies 47, 225-238.
in
Chamberlain, G. 1984. "Panel Data," Handbook of Econometrics, Volume 2, Z. Griliches
and M. D. Intriligator (eds.). Amsterdam: North Holland, 1247-1318.
Chan, Kam Wing and Li Zhang. 1999. "The Hukou System and Rural-Urban Migration
in China: Processes and Changes,"China Quarterly 160: 818-55.
Chay, K.Y. and D.R. Hyslop. 2000. "Identi...cation and Estimation of Dynamic Binary
Response Models: Empirical Evidence Using Alternative Approaches," mimeo.
Chen, S. and M. Ravallion. 1996. "Data in transition: Assessing Rural Living Standards
in Southern China,"China Economic Review, 7(1): 23-56.
Chiappori, P., and B. Salanie. 2000. "Testing for Asymmetric Information in Insurance
Markets," Journal of Political Economy 108, 56-78.
de Brauw, A. and J. Giles. 2008a. "Migrant Opportunity and the Educational At-
tainment of Youth in Rural China," Policy Research Working Paper 4585, The World Bank
(February 2008).
de Brauw, A. and J. Giles. 2008b. "Migrant Labor Markets and the Welfare of Rural
Households in the Developing World: Evidence from China,"Policy Research Working Paper
4526, The World Bank (April 2008).
Devicienti, D. and A. Poggi. 2007. "Poverty and social exclusion: two sides of the same
coin or dynamically interrelated processes?,"LABORatorio R. Revelli Working Papers Series
62, LABORatorio R. Revelli, Centre for Employment Studies.
Du, Y., A. Park and S. Wang. 2005. "Migration Helping China' Poor?" s Journal of
Comparative Economics, 33(4): 688-709.
Giles, J. 2006. "Life More Risky in the Open? Household Risk-Coping and the Opening
s
of China' Labor Markets," Journal of Development Economics 81(1): 25-60.
Giles, J. and K. Yoo. 2007. "Precautionary Behavior, Migrant Networks and Household
Consumption Decisions: An Empirical Analysis Using Household Panel Data from Rural
28
China," The Review of Economics and Statistics, 89(3): 534-551.
Hahn, J. and G. Kuersteiner. 2002. "Asymptotically Unbiased Inference for a Dynamic
Panel Model with Fixed E¤ects When Both n and T Are Large," Econometrica 70, 1639-1657.
Hansen, J. and R. Wahlberg. 2009. "Poverty Persistence in Sweden," Review of the
Economics of the Household, 7(2), 105-132.
Heckman, J.J. 1981. "The Incidental Parameters Problem and the Problem of Initial
Conditions in Estimating a Discrete Time - Discrete Data Stochastic Process," in: C.F.
Manski and D. McFadden, (Eds.), Structural Analysis of Discrete Data with Econometric
Applications. MIT Press, Cambridge, MA, 179-195.
Heckman, J.J. and R.J. Willis. 1977. "A Beta-logistic Model for the Analysis of Se-
quential Labor Force Participation by Married Women," Journal of Political Economy, 85,
27-58.
Honoré, B.E. and E. Kyriazidou. 2000. "Panel Data Discrete Choice Models with Lagged
Dependent Variables," Econometrica 68, 839-874.
Hyslop, Dean R. 1999. "State Dependence, Serial Correlation and Heterogeneity in
Intertemporal Labor Force Participation of Married Women,"Econometrica 67(6): 1255-94.
Jalan, J. and M. Ravallion. 1998. "Transient Poverty in Post-Reform Rural China,"
Journal of Comparative Economics, 26(2): 338-357.
Jalan, J. and M. Ravallion. 2002. "Geographic Poverty Traps? A Micro Model of
Consumption Growth in Rural China," Journal of Applied Econometrics 17(4): 329-46.
s
Liang, Z. and Z. Ma. 2004. "China' Floating Population: New Evidence from the 2000
Census," Population and Development Review 30(3): 467-488.
s
Mallee, H. 1995. "China' Household Registration System Under Reform," Development
and Change 26(1):1-29.
Meng, X. 2000. "Regional wage gap, information ow, and rural-urban migration" in
Yaohui Zhao and Loraine West (eds) Rural Labor Flows in China, Berkeley: University of
California Press, 251-277.
Montgomery, J.D. 1991. "Social Networks and Labor-Market Outcomes: Toward an
Economic Analysis," American Economic Review 81(5): 1407-18.
Mundlak, Y. 1978. "On the Pooling of Time Series and Cross Section Data," Econometrica
46, 69-85.
Munshi, K. 2003. "Networks in the Modern Economy: Mexican Migrants in the U.S.
Labor Market," Quarterly Journal of Economics 118(2): 549-99.
Papke, L.E. and J.M. Wooldridge. 2008. "Panel Data Methods for Fractional Response
Variables with an Application to Test Pass Rates," Journal of Econometrics 145: 121-33
Ravallion, M. 1996. "Issues in Measuring and Modelling Poverty," The Economic Journal
106: 1328-1343.
s
Ravallion, M. and S. Chen. 2007. "China' (Uneven) Progress Against Poverty,"Journal
of Development Economics, 82(1): 1-42.
Rivers, D. and Q. H. Vuong. 1988. "Limited Information Estimators and Exogeneity
Tests for Simultaneous Probit Models," Journal of Econometrics 39, 347-366.
s
Rozelle, S., L. Guo, M. Shen, A. Hughart and J. Giles. 1999. "Leaving China' Farms:
Survey Results of New Paths and Remaining Hurdles to Rural Migration," The China Quar-
terly 158: 367-393.
29
Smith, R. and R. Blundell. 1986. "An Exogeneity Test for a Simultaneous Equation
Tobit Model with an Application to Labor Supply," Econometrica 54, 679-685.
Taylor, J.E., S. Rozelle, and A. de Brauw. 2003. "Migration and Incomes in Source Com-
munities: A New Economics of Migration Perspective from China," Economic Development
and Cultural Change, 52(1), 75-101.
Wooldridge, J.M. 2000. "A Framework for Estimating Dynamic, Unobserved E¤ects
Panel Data Models with Possible Feedback to Future Explanatory Variables," Economics
Letters 68, 245-250.
Wooldridge, J.M. 2002. Econometric Analysis of Cross Section and Panel Data. MIT
Press, Cambridge, MA.
Wooldridge, J.M. 2005. "Simple Solutions to the Initial Conditions Problem in Dy-
namic, Nonlinear Panel Data Models with Unobserved Heterogeneity," Journal of Applied
Econometrics 20, 39-54.
Zhao, Y. 2003. "The Role of Migrant Networks in Labor Migration: The Case of
China," Contemporary Economic Policy 21(4): 500-511.
30
Figure 1
Share of Village Labor Force Employed
as Migrants By Year
.6
.5
.4
Share
.3
.2
.1
0
1987 1989 1991 1993 1995 1997 1999 2001 2003
Year
Source: RCRE Village Surveys 1987 to 2003.
31
Figure 2
Change in Poverty Headcount Versus Change in Number of Migrants
.1
.05
0
-.05
-.1
-200 -100 0 100 200
Change in Number of Migrants, Village
Lowess Fit Linear Fit
Source: RCRE Village and Household Surveys, 1987 to 2003.
32
Figure 3
Change in Out-Migrants in Village Labor Force
Versus Years-Since-IDs were Distributed
15
Change in Number of Out-Migrants
in Village Workforce
10
5
0
-5 0 5 10 15
Years Since ID Cards Issued
Source: 2004 RCRE Supplemental Survey on Land and Village Governance.
33
Table 1. Household and Village Characteristics
Odd Years from 1989 to 2001
Obs. Full Sample Obs. Balanced
Sample
Household Poverty Status mean 42453 0.20 26159 0.20
st. dev. 0.40 0.40
Household Income per Capita mean 42447 721.4 26159 685.8
st. dev. 649.3 537.5
Household Consumption per Capita mean 42453 521.9 26159 499.1
st. dev. 376.1 332.6
Number of Household Members mean 42491 4.1 26159 4.2
st. dev. 1.5 1.4
Number of Prime Age Household Laborers mean 42491 2.5 26159 2.6
st. dev. 1.1 1.0
Household Land per Capita mean 42453 1.4 26159 1.4
st. dev. 1.2 1.1
Household Average Years of Education mean 41658 6.2 26156 6.3
st. dev. 2.6 2.5
Household Share of Females mean 41659 0.45 26156 0.45
st. dev. 0.21 0.20
Share of Migrants from the Village mean 42491 0.06 26159 0.06
st. dev. 0.06 0.06
Year of ID Distribution in a Village mean 41814 1988.0 26159 1988.0
st. dev. 2.1 2.1
Years Since ID was Issued in a Village mean 41814 6.7 26159 7.0
st. dev. 4.5 4.5
Notes: Consumption and income per capita are reported in 1986 RMB Yuan.
34
Table 2. What Factors Determine the Size of the Village Migrant Network?
First-Stage Regressions
Dependent Variable: Village Migrant Share
Odd Years from 1989 to 2001
Model (1) (2) (3)
Household Population -0.0003 -0.0003 -0.0003
(0.0003) (0.0003) (0.0003)
Number of Working Age Laborers in 0.0002 0.0002 0.0003
the Household (0.0004) (0.0004) (0.0004)
Land Per Capita t-2 -0.0040*** -0.0036*** -0.0017***
(0.0004) (0.0005) (0.0006)
Average Years of Education -0.0006*** -0.0006*** -0.0007***
(0.0001) (0.0001) (0.0001)
Female Share of the Household -0.0011 -0.0011 -0.0011
(0.0015) (0.0015) (0.0015)
(Years-Since-IDs Available) * (Land 0.0008*** 0.0006*** -0.0018***
Per Capita t-2) (0.0001) (0.0002) (0.0004)
(Years-Since-IDs Available)2 * (Land -0.0000*** -0.0000 0.0007***
Per Capita t-2) (0.0000) (0.0000) (0.00009)
(Years-Since-IDs Available)3 * (Land 0.0008*** -0.0000 -0.0001***
Per Capita t-2) (0.0000) (0.0000)
(Years-Since-IDs Available)4 * (Land 0.0006*** 0.0000***
Per Capita t-2) (0.0002) (0.0000)
Observations 22422 22422 22422
R-squared 0.79 0.79 0.79
F-Statistic on IVs with Averages 62.51 58.11 44.62
F-Statistic on IVs w/o Averages 46.04 31.60 32.64
Partial R2, IVs with Averages 0.005 0.005 0.007
Partial R2, IVs w/o Averages 0.001 0.001 0.003
Notes: In parenthesis we show fully robust standard errors [*** p<0.01, ** p<0.05, * p<0.1]. All
regressions include time averages of the explanatory variables, year dummies, and interactions
between village dummies and time trend.
35
Table 3. Estimating Determinants of Poverty Status with Migrant Share Treated as Exogenous
Dependent Variable: Poverty Status
Linear Probability Model Probit
Pure RE Correlated RE Pure RE Correlated RE
Model (1) (2) (3) (4)
Lag Poverty Status 0.390*** 0.339*** 1.045*** 0.794***
(0.012) (0.013) (0.041) (0.048)
Village Migrant Share Interacted with -0.974*** -0.767*** -2.356*** -1.654***
and Lag Poverty Status (0.134) (0.132) (0.465) (0.484)
Village Migrant Share 0.285*** 0.221*** 1.476*** 1.233**
(0.076) (0.074) (0.519) (0.543)
Number of Household Members 0.046*** 0.057*** 0.257*** 0.371***
(0.002) (0.004) (0.014) (0.021)
Number of Prime Age Household Laborers -0.023*** -0.024*** -0.127** -0.147***
(0.003) (0.004) (0.017) (0.023)
Second Lag of Land per Capita -0.002 0.000 -0.034* -0.027
(0.003) (0.004) (0.018) (0.031)
Average Years of Education -0.007*** 0.000 -0.042*** -0.007
(0.001) (0.001) (0.006) (0.009)
Share of Females -0.053*** -0.006 -0.293*** -0.028
(0.012) (0.015) (0.068) (0.093)
Dependent Variable in 1989 0.090*** 0.508***
(0.009) (0.047)
Observations 22422 22422 22422 22422
Number of households 3737 3737 3737 3737
R-Squared 0.35 0.36
Notes: In parenthesis we show fully robust standard errors [*** p<0.01, ** p<0.05, * p<0.1]. All regressions include the explanatory variables in each year, year
dummies, and interactions between village dummies and time trend.
36
Table 4. Estimating Determinants of Poverty Status with Endogenous Share of Migrants
Second-Stage Regressions
Dependent Variable: Poverty Status
Linear Probability Model Control Function
(1) (2) (3) (4)
Model Pure RE Correlated RE Pure RE Correlated RE
Lag Poverty Status 0.391*** 0.335*** 1.046*** 0.792***
(0.013) (0.012) (0.054) (0.052)
Village Migrant Share Interacted with -0.994*** -0.784*** -2.443*** -1.779***
and Lag Poverty Status (0.128) (0.125) (0.512) (0.526)
Village Migrant Share -2.628*** -3.955*** -12.201** -18.896**
(0.833) (1.039) (5.660) (8.191)
Number of Household Members 0.047*** 0.057*** 0.261*** 0.368***
(0.003) (0.004) (0.020) (0.028)
Number of Prime Age Household Laborers -0.023*** -0.023*** -0.125*** -0.143***
(0.003) (0.005) (0.022) (0.031)
Second Lag of Land per Capita -0.006* -0.001 -0.050* -0.036
(0.003) (0.005) (0.026) (0.043)
Average Years of Education -0.009*** -0.002 -0.048*** -0.018
(0.001) (0.002) (0.008) (0.013)
Share of Females -0.058*** -0.012 -0.312*** -0.061
(0.013) (0.019) (0.098) (0.130)
Dependent Variable in 1989 0.086*** 0.497***
(0.009) (0.053)
Observations 22422 22422 22422 22422
Number of households 3737 3737 3737 3737
R-Squared 0.29 0.32
Replications for Bootstrap Errors 100 100 100 100
Notes: In parenthesis we show bootstrapped standard errors [*** p<0.01, ** p<0.05, * p<0.1]. All regressions include the explanatory variables in each year, year
dummies, and interactions between village dummies and time trend. Regressions (1) and (3) include first stage residuals free of serial-correlation and their time
averages. Regressions (2) and (4) include first stage residuals free of serial-correlation and residuals from the first stage in each year. For regressions (1) through (4)
the instrumental variables are quartic polynomial of years-since-ID-was-issued with each term interacted with second lag of land per capita.
37
38
Table 5. Average Partial Effects of Determinants of Poverty Status
(Endogenous Share of Migrants)
LPM Control Function
Pure RE Correlated RE Pure RE Correlated RE
Model (1) (2) (3) (4)
Lag Poverty Status 0.324*** 0.282*** 0.181*** 0.125***
(0.010) (0.009) (0.007) (0.006)
Share of Migrants when Lag Poverty = 0 -2.628*** -3.955*** -2.092** -3.156**
(0.833) (1.039) (1.056) (1.413)
Share of Migrants when Lag Poverty =1 -3.621*** -4.739*** -2.511** -3.453**
(0.834) (1.028) (1.044) (1.394)
Share of Migrants (averaged) -2.641*** -3.965*** -2.260** -3.273**
(0.833) (1.038) (1.050) (1.405)
Number of Household Members 0.047*** 0.057*** 0.045*** 0.061***
(0.003) (0.004) (0.004) (0.005)
Number of Prime Age Household Laborers -0.023*** -0.023*** -0.022*** -0.024***
(0.003) (0.005) (0.004) (0.006)
Second Lag of Land per Capita -0.006* -0.001 -0.009* -0.006
(0.003) (0.005) (0.005) (0.007)
Average Years of Education -0.009*** -0.002 -0.008*** -0.003
(0.001) (0.002) (0.002) (0.002)
Share of Females -0.058*** -0.012 -0.053*** -0.010
(0.013) (0.019) (0.018) (0.022)
Poverty Status in 1989 0.086*** 0.092***
(0.009) (0.009)
Replications 100 100 100 100
Notes: In parenthesis we show bootstrapped standard [*** p<0.01, ** p<0.05, * p<0.1].
39