WPS5601
Policy Research Working Paper 5601
A Practical Comparison of the Bivariate Probit
and Linear IV Estimators
Richard C. Chiburis
Jishnu Das
Michael Lokshin
The World Bank
Development Research Group
Poverty and Inequality Team
and Human Development and Public Services Team
March 2011
Policy Research Working Paper 5601
Abstract
This paper presents asymptotic theory and Monte- (b) comparing the mean-square error and the actual
Carlo simulations comparing maximum-likelihood size and power of tests based on these estimators
bivariate probit and linear instrumental variables across a wide range of parameter values relative to the
estimators of treatment effects in models with a binary existing literature; and (c) assessing the performance of
endogenous treatment and binary outcome. The three misspecification tests for bivariate probit models. The
main contributions of the paper are (a) clarifying the authors recommend two changes to common practices:
relationship between the Average Treatment Effect bootstrapped confidence intervals for both estimators,
obtained in the bivariate probit model and the Local and a score test to check goodness of fit for the bivariate
Average Treatment Effect estimated through linear IV; probit model.
This paper is a product of the Poverty and Inequality Team, and Human Development and Public Services Team;
Development Research Group. It is part of a larger effort by the World Bank to provide open access to its research and make
a contribution to development policy discussions around the world. Policy Research Working Papers are also posted on the
Web at http://econ.worldbank.org. The authors may be contacted at jdas1@worldbank.org and mlokshin@worldbank.org.
The Policy Research Working Paper Series disseminates the findings of work in progress to encourage the exchange of ideas about development
issues. An objective of the series is to get the findings out quickly, even if the presentations are less than fully polished. The papers carry the
names of the authors and should be cited accordingly. The findings, interpretations, and conclusions expressed in this paper are entirely those
of the authors. They do not necessarily represent the views of the International Bank for Reconstruction and Development/World Bank and
its affiliated organizations, or those of the Executive Directors of the World Bank or the governments they represent.
Produced by the Research Support Team
A Practical Comparison of the Bivariate Probit and
Linear IV Estimators
Richard C. Chiburis Jishnu Dasy Michael Lokshinz
University of Texas at Austin
y
Development Research Group, World Bank
z
Development Research Group, World Bank
1
1 Introduction
This paper examines estimation issues in empirical models with binary regressors and out-
come variables. A motivating example is the e¤ect of private schooling on graduation rates.
-- --
Here the "treatment" attending a private school-- and the "outcome" whether or not the
individual graduated-- can take one of two potential values. Comparing mean graduation
rates of children in public and private schools likely yields a biased estimate of the causal
e¤ect of private schooling on graduation rates if omitted variables, such as ability, are cor-
related both with private school attendance and graduation rates.
There are two common approaches to estimating causal e¤ects in such models. One ap-
proach disregards the binary structure of the outcome and treatment variables and presents
linear instrumental variables (IV) estimates of the treatment e¤ect; the second computes
maximum-likelihood estimates of a bivariate probit (BP) model, which assumes that the
outcome and treatment are each determined by latent linear index models with jointly nor-
mal error terms. Sometimes the two approaches can produce markedly di¤erent results. A
persistent problem in the private schooling literature, for instance, is the large di¤erence be-
tween linear IV and BP estimates of the treatment e¤ect. In some cases, these di¤er by an
order of magnitude with the linear IV estimates exhibiting larger coe¢ cients and standard
errors (Altonji, Elder, and Taber 2005).
Keeping aside the discussion on the relevance of reduced-form impacts versus structural
parameters (Angrist 2001, Mo¢ tt 2001), the existing literature sometimes o¤ers conicting
advice on the best course of action in empirical problems of this sort. For instance, Angrist
(1991, 2001) argues that the hard part of the empirical problem is ...nding a good instrument
and that the "di¢ culties with endogenous variables in nonlinear limited dependent variables
models are usually more apparent than real." This argument is supported by a stress on
causal e¤ects as opposed to structural parameters in these models and by Monte-Carlo sim-
ulations that argue for the robustness of the simpler linear estimator to the distribution of
the error terms. On the other hand, Bhattacharya, Goldman, and McCa¤rey (2006) present
2
simulations that suggest that BP is slightly more robust than IV to non-normality of the
error terms. We show that both of these seemingly di¤erent viewpoints can be justi...ed
depending on the structure of the problem. The reconciliation is based on (a) distinguishing
carefully between the Local Average Treatment E¤ect (LATE) estimated under the linear
IV with the Average Treatment E¤ect (ATE) estimated under the BP model and (b) ex-
tending the parameter values in Monte-Carlo simulations to cover a far wider range of model
speci...cations relative to the existing literature.
We present asymptotic and ...nite-sample Monte-Carlo simulation results for an extensive
range of parameter values to help decide on a practical course of action when faced with a
single dataset, a reliable instrument and (possibly) widely di¤ering estimates of the treatment
e¤ect depending on the technique used. We focus on both the mean-square error of the
estimators and on the size and power of hypothesis tests based on these estimators. We
present simulations both with the BP model correctly speci...ed and with misspeci...cation
due to non-normal error terms. Finally, we propose two straightforward additions to current
practice that vastly improve the performance of the estimators and our con...dence in the
normality assumptions of the BP model.
Our ...rst set of results assumes that the BP model is correctly speci...ed. Under this
assumption, when there are no covariates, BP tends to perform better than IV for smaller
sample sizes (below 5000), with mixed results for larger samples. With a continuous co-
variate, the performance of BP dominates IV in all of our simulations, and BP performs
especially well when the treatment probability is close to 0 or 1: For instance, when the
treatment probability is 0:1, for all ranges of the outcome probabilities and even with sam-
ple sizes greater than 10,000 observations, the con...dence intervals of the IV estimate remain
too large for any meaningful hypothesis testing; in contrast, BP con...dence intervals are
much smaller.1 Therefore, researchers should expect IV and BP coe¢ cients to di¤er quite
1
This particular ...nding explains the large di¤erences between the IV and BP con...dence intervals in the
motivating example above, since the percentage of children in private schools in the United States is 10
percent or lower in most samples
3
substantially when treatment probabilities are low or when sample sizes are below 5000;
linear IV estimates are particularly uninformative for hypothesis testing when treatment
probabilities are low.
Further, as pointed out by Imbens and Angrist (1994) and others, the IV estimate is con-
sistent for the local average treatment e¤ect (LATE) and not the overall average treatment
e¤ect (ATE), which can be recovered from the maximum-likelihood BP estimate. That the
estimators are estimating di¤erent e¤ects accounts for a ...nding by Angrist (1991) that in
some cases, the variance of the IV estimator is lower than that of the maximum-likelihood
BP ATE estimator despite the well known e¢ ciency of maximum likelihood.
As expected, across most parameters of our simulations, the BP estimator is not robust
to misspeci...cation of the BP model. Simulation results where the error terms exhibit excess
skewness or excess kurtosis often lead to highly biased BP estimates, and tests based on
BP estimates greatly overreject a true null hypothesis when the model is misspeci...ed. Tests
based on IV estimates are more robust in terms of size, but they are also less powerful. The
results presented in Bhattacharya, Goldman, and McCa¤rey (2006) on the robustness of the
BP estimator to non-normal error terms arise only for speci...c combinations of the relevant
parameters, as clari...ed by the extensive Monte-Carlo simulations considered here.
We propose two additional steps to recover better con...dence intervals and check for
model misspeci...cation in BP estimators. For both BP and IV estimators, sample sizes have
to exceed 10,000 before the coverage rates of the standard Wald-type con...dence intervals
approach the nominal coverage rate. In general, IV con...dence intervals tend to be too
conservative and BP con...dence intervals are not conservative enough. We show that using
bootstrapped con...dence intervals (relative to analytical versions) improves the coverage
rate of both IV and BP estimators for all parameter values. Second, we recommend running
s
Murphy' (2007) score test to check the goodness-of-...t of the BP model; our simulations
suggest that the score test is fairly good at picking up misspeci...cations arising from excess
kurtosis or skewness in the error distributions.
4
The remainder of the paper is structured as follows. Section 2 reviews standard asymp-
totic results. Section 3 discusses the data generating process for the Monte-Carlo simulations
and presents results. Section 4 concludes.
2 Asymptotic properties of IV and BP estimators
We ...rst derive asymptotic results for the case of a single binary instrument and no covariates.
The section also details the relationship between two commonly used treatment e¤ects, the
Average Treatment E¤ect (LATE) and the Local Average Treatment E¤ect (ATE).
Let T 2 f0; 1g be the endogenous treatment, and let Y 2 f0; 1g be the outcome of interest.
s
Let Y1 be an individual' potential outcome had she received the treatment (T = 1), and let
s
Y0 be the individual' potential outcome had she not received the treatment (T = 0). Let
s
Z 2 f0; 1g be an instrument for the treatment. Let T1 be an individual' chosen treatment
s
had she been given Z = 1, and let T0 be an individual' chosen treatment had she been given
Z = 0.
We follow Imbens and Angrist (1994) in de...ning an instrument Z as satisfying the
following conditions:
Z is independent of (Y0 ; Y1 ; T0 ; T1 ) (1)
and
E [T j Z = 1] 6= E [T j Z = 0] : (2)
We think of individuals as being sampled from the joint distribution of the random
variables (Z; T1 ; T0 ; Y1 ; Y0 ). For each individual i, we actually observe (Z; T; Y ), where
T = TZ and Y = YT . Suppose that we have an i.i.d. sample of n individuals. We focus on
three commonly estimated treatment e¤ects, de...ned as follows:
5
1. The average treatment e¤ect (ATE) over the entire population is given by
AT E = E [Y1 ] E [Y0 ] : (3)
2. The average treatment e¤ect on the treated (ATT) is the average treatment e¤ect only
over those individuals who actually received the treatment:
AT T = E [Y1 j T = 1] E [Y0 j T = 1] : (4)
3. The probability limit of the IV estimator is what Imbens and Angrist (1994) called the
local average treatment e¤ect (LATE):
E [Y j Z = 1] E [Y j Z = 0]
LAT E = : (5)
E [T j Z = 1] E [T j Z = 0]
Under the condition that T1 T0 for all individuals, Imbens and Angrist show that
LAT E can be interpreted as the average treatment e¤ect for the subpopulation that complies
with the instrument, i.e. the subpopulation for which T would be equal to Z regardless of
whether Z = 0 or Z = 1.
It is informative to compare these e¤ects to the result we would obtain if we ignore that
T is endogenous and run an OLS regression of Y on T and a constant.2 The probability
limit of such a regression is
OLS = E [Y1 j T = 1] E [Y0 j T = 0] : (6)
If E [Y1 j T ] and E [Y0 j T ] are invariant to T , so that there is no selection bias, then OLS =
AT E = AT T . Note that this condition does not ensure that LAT E = AT E .
2
Since Y is binary, one might also consider running a probit of Y on T and a constant. With no
covariates, running a probit and computing the treatment e¤ect produces exactly the same result as OLS
since both models have two parameters to ...t two moments E [Y j T = 1] and E [Y j T = 0].
6
2.1 Bivariate probit model
Typically it is necessary to impose additional structure on the model in order to identify
3
AT E and AT T . One way to do this while still allowing the treatment to be endogenous
is to assume a bivariate probit model, which is a linear index model with bivariate normal
shocks (Heckman 1978):
T = Z+ T + "1 (7)
T = 1 fT > 0g
Y = T+ Y + "2
Y = 1 fY > 0g
with ("1 ; "2 ) jointly distributed as standard bivariate normal with correlation and indepen-
dent of Z. Note that assumption (1) above follows from this independence condition, and
that 6= 0 implies (2). When > 0, LAT E (5) has the interpretation given by Imbens
and Angrist (1994).
De...ne pT = Pr[T = 1] and pY = Pr[Y = 1]. Let and be the standard normal
distribution and density functions, respectively. Let B ( ; ; ) be the distribution function
for the standard bivariate normal distribution with correlation . The ATE (3) is
AT E = ( Y + ) ( Y ): (8)
3
Heckman and Vytlacil (1999) observe that if there exists a value z of the instrument such that
E [T j Z = z] = 0, then AT T is nonparametrically identi...ed. If additionally there exists z 0 such that
E [T j Z = z 0 ] = 1, then AT E is also identi...ed. This is a type of "identi...cation at in...nity" result since
it typically requires extreme values of Z to be observed. However, with a binary Z as we have in our
simulations, these conditions are rarely satis...ed.
7
A ...rst-order Taylor approximation about = 0 is
1
AT E ( (pY )): (9)
The ATT (4) is given by
B( T; Y+ ; ) B( T ; Y ; )
AT T = Pr[Z = 0] (10)
( T)
B( T + ; Y + ; ) B( T + ; Y; )
+ Pr[Z = 1] :
( T+ )
The LATE (5) can also be written as a function of the parameters in the bivariate probit
model:
[B( T+ ; Y + ; )+B( ( T+ ); Y ; )] [B( T; Y + ; )+B( T; Y ; )]
LAT E = ( T+ ) ( T)
: (11)
While all of the types of treatment e¤ects are equal when = 0, they can di¤er signif-
icantly for other values of ; in particular, the ordering of AT E , AT T , and LAT E varies
across parameter values. In Appendix A.2, we derive a Taylor approximation for the ratio
of LAT E to AT E as
LAT E 1 1
1+ (pY ) (pT ): (12)
AT E
Since the probability limit of the IV estimator ^ IV is LAT E , (12) can be used to obtain a
quick and intuitive approximation of the bias of ^ IV for AT E . In general, ^ IV is most
biased relative to AT E when j j is large, and pT and pY are far from 1 .
2
The sign of
1 1
the bias depends on the signs of , (pY ), and (pT ). In Figure 1, we graph AT E ,
AT T , LAT E , and OLS for model (7) with = 0:3, = 0:4, and di¤erent values of pT ,
pY , and . We vary pT and pY by changing the constants T and Y. As suggested by
the approximation (9) and given that is ...xed, AT E changes with pY but is very little
a¤ected by pT or . When = 0, all of the e¤ects are equal. When > 0, the OLS e¤ect
OLS (6) that ignores the endogeneity of T is biased upward relative to AT E , AT T , and
LAT E , which correctly take into account the endogeneity of the treatment. As predicted
8
by the approximation (12), AT E and LAT E di¤er substantially when pT and pY are far
from 1 , and this di¤erence increases with . The e¤ect on the treated
2 AT T is naturally
close to the overall average e¤ect AT E when the probability of treatment pT is high. When
pT is low, AT T is closer to LAT E because in that case there is high overlap between the
treated and complier populations. Figure 1 also highlights the limits of our approximation
1
(12)-- according to (12), LAT E and AT E should the same when either pY or pT is 2 , but
at higher values of , we see that the two e¤ects diverge.
2.2 Asymptotic variance of linear IV estimator
The asymptotic variance Avar[ ^ ] of an estimator ^ is de...ned such that
p
n Var[ ^ ] ! Avar[ ^ ]:
The asymptotic variance of the IV estimator ^ IV is
Pr[Y =1jZ=1](1 Pr[Y =1jZ=1]) Pr[Y =1jZ=0](1 Pr[Y =1jZ=0])
Pr[Z=1]
+ Pr[Z=0]
Avar[ ^ IV ] = 2 : (13)
(Pr [T = 1 j Z = 1] Pr [T = 1 j Z = 0])
To provide intuition for how Avar[ ^ IV ] changes with pT , pY , and , we derive in Appendix
A.3 the following Taylor approximation of Avar[ ^ IV ] within the context of the bivariate
probit model:
pY (1 pY )
Avar[ ^ IV ] 2[
: (14)
( 1 (pT ))]2 Var[Z]
The asymptotic variance of ^ IV increases as pY approaches 1
2
and as pT moves away from
1
2
. Furthermore, the approximation (14) of Avar[ ^ IV ] does not depend on at all, and the
exact Avar[ ^ IV ] (13) exhibits very little dependence on , as illustrated in Figure 2, which
plots Avar[ ^ IV ] for various parameter values.
9
2.3 Asymptotic variance of ML bivariate probit estimators
Let denote the vector of the parameters of , , T, Y, and in the bivariate probit
model (7). Maximum-likelihood estimates of are obtained by selecting ^ to maximize the
log-likelihood function:
X
n
^ = argmax log Li ( )
i=1
where 8
> B( Z + ;
> Ti + ); if Ti = 1 and Yi = 1;
> i T Y;
>
>
>
>
< B( Zi + T ; ( Ti + Y ); ); if Ti = 1 and Yi = 0;
Li ( ) = (15)
> B( ( Z +
> ); if Ti = 0 and Yi = 1;
>
> i T ); Ti + Y;
>
>
>
: B( ( Z +
i T ); ( Ti + Y ); ); if Ti = 0 and Yi = 0:
Once we have estimates of the parameters, we can estimate most types of treatment
e¤ects, because they are functions of ; by substituting the estimated parameters into the
expressions (8) for the ATE, (10) for the ATT, and (11) for the LATE. We denote the
respective ML estimators as ^ BPE , ^ BPT , and ^ BP E . Hence, if the bivariate probit model
AT AT LAT
(7) is correctly speci...ed, maximum-likelihood can be used to consistently estimate the ATE,
ATT, or LATE, whereas the linear IV estimator ^ IV only consistently estimates the LATE
(5).
The asymptotic variance of the ML estimator ^ of is given by Avar[^] = I( ) 1 , the
inverse information matrix evaluated at the true . There are two common ways to calculate
I( ):
0
@ @
I1 ( ) = E log Li ( ) log Li ( ) (16)
@ @
and
@2
I2 ( ) = E log Li ( ) : (17)
@ 2
Using the delta method, we compute the asymptotic variance of any continuously di¤eren-
10
tiable function f of as
f 0 ( )0 Avar[^]f 0 ( ): (18)
Since the ATE, ATT, and LATE are all functions of , we can compute the asymptotic
variance of ^ BPE , ^ BPT , and ^ BP E in this fashion. The results for ^ BPE for model (7)
AT AT LAT AT
with = 0:3, = 0:4, and many di¤erent values of pT , pY , and are shown in Figure 2.
Note that the asymptotic variance of ^ BPE is highly sensitive to
AT
1
when pT is far from 2 .
2.4 Comparing the asymptotic variances
Because linear IV only consistently estimates the LATE, the asymptotic variances of linear
IV and maximum-likelihood BP are compared most fairly for estimation of the LATE. When
the BP model (7) is correctly speci...ed, maximum likelihood BP is asymptotically e¢ cient
for the LATE since it is asymptotically e¢ cient for any smooth function of the parameters
. Using the formulas (13) and (18), we compared the asymptotic variances of ^ IV and
^ BP for the LATE in model (7) with = 0:3, = 0:4, and across many di¤erent values
LAT E
of pT , pY , and . The asymptotic variance of ^ BP E is always lower than that of ^ IV , and
LAT
on average, the variance of ^ IV is 28 percent higher than ^ BP E , or its standard deviation
LAT
is 13 percent higher than ^ BP E .
LAT The e¢ ciency gain from using ^ BP E instead of ^ IV
LAT
is far greater when a covariate is included in the model. In our models with a continuous
covariate, on average the standard deviation of ^ IV is 150 percent higher than ^ BP E .
LAT
Angrist (1991) found that despite the e¢ ciency of BP, the variance of ^ BPE sometimes
AT
exceeds the variance of ^ IV . s
Angrist' results follow from the asymptotics, as shown in
Figure 2, where for certain values of pT , pY , and , the asymptotic variance of ^ BPE is
AT
higher than that of ^ IV . The key observation is that ^ BPE and ^ IV are estimating two
AT
di¤erent things, ATE and LATE respectively, and depending on parameters one can be
estimated more precisely than the other. If we are interested in minimizing mean-square
error for estimating the ATE, then ^ BPE will always be the better choice (when the BP
AT
11
model is correct) except in rare cases in which ^ IV has lower asymptotic variance than
^ BP and the LATE happens to be close to the ATE.
AT E
3 Monte-Carlo simulations
To examine the properties of the BP and IV estimators in ...nite samples and with misspeci-
...cations, we conducted Monte-Carlo simulations across a range of parameter values. These
parameter values represent a wider selection compared to those used in previous work by
Angrist (1991) and Bhattacharya, Goldman, and McCa¤rey (2006), and prove useful in un-
derstanding the performance of these estimators in practical applications. The wider range
of parameters considered here qualitatively a¤ects the nature of the recommendation. For
instance, we ...nd that for some combinations of pT and pY , deviations from normality in
the BP model result in signi...cant bias, in contrast to the results of Bhattacharya, Gold-
s
man, and McCa¤rey (2006) over more limited simulations. Also, Angrist' (1991) ...nding of
near-e¢ ciency of IV disappears when we add an exogenous covariate to the model. Table 1
compares the parameter ranges used in the di¤erent papers.
Our simulations Angrist (1991) Bhattacharya et al. (2006)
pT 0:1 0:9 0:2 0:5 0:5
pY 0:1 0:9 0:5 0:9 0:0 0:7
0:0 0:7 0:5 0:1 0:5
AT E 0:05 0:16 0:10 0:00 0:42
n 400 30; 000 400 800 5000
Number of covariates 0 or 1 0 1
Table 1: Ranges of parameter values used in various studies.
3.1 Data-generating processes
Our data-generating processes (DGPs) are all based on the following latent-index model:
12
Ti = Zi + T Xi + T + "T i (19)
Ti = 1 fTi > 0g
Yi = Ti + Y Xi + Y + "Y i
Yi = 1 fYi > 0g
where Ti and Yi are latent continuous variables; , , T, Y, T, and Y are parameters;
Xi is an exogenous covariate; and Zi is an instrumental dummy variable that is zero with
1 1
probability 2
and one with probability 2 .
Our DGPs are designed to mimic typical situations encountered in applied econometric
applications. The values of the coe¢ cients in the system (19) are chosen such that the true
ATE is positive and falls in the range from 0:05 to 0:16 depending on the model speci...cation.
In all of the DGPs, the coe¢ cients in (19) take the following values:
= 0:3; = 0:4; T = 0:9; Y = 0:4 (20)
We consider two DGPs for Xi :
1. In the ...rst DGP, Xi = 0 always, and hence we do not estimate T or Y.
2. In the second DGP, Xi N (0; 1), and Xi is independent of Zi .
The error terms "T i and "Y i are always jointly independent of (Xi ; Zi ) and can be gener-
ated according to any of six possible processes:
1. "T and "Y are jointly bivariate standard normal with correlation taking on one of
four possible values: 0; 0:3; 0:5; 0:7.
2. Generate (uT ; uY ) as bivariate normal with correlation 0:32. Then transform "T =
1 1
F( (uT )) and "Y = F ( (uY )), where F is the CDF of a chi-square distribution
13
with 5 degrees of freedom. This results in skewed distributions for "T and "Y , and the
bivariate probit model is misspeci...ed.4
3. Generate (uT ; uY ) as bivariate normal with correlation 0:32. Then transform "T =
1 1
F( (uT )) and "Y = F ( (uY )), where F is the CDF of a t distribution with 4
degrees of freedom. This results in distributions for "T and "Y with high kurtosis, and
the bivariate probit model is misspeci...ed.
Furthermore, we also consider many values of the constants T and Y. They are chosen
so that pT = Pr [T = 1] and pY = Pr [Y = 1] each range separately over f0:1; 0:3; 0:5; 0:7; 0:9g.
For each of the 300 combinations of possible DGPs for Xi , DGPs for ("T ; "Y ), and values
of pT and pY speci...ed above, we conduct Monte-Carlo simulations on samples of 400, 800,
1K, 2K, 3K, 5K, 8K, 10K, 15K, 20K, and 30K observations. We run 1000 simulations for
each sample size. In each simulation we compute the IV estimate of the LATE and the
maximum-likelihood BP estimates of the ATE. Greene (1998) observed that the endogeneity
of Ti does not a¤ect the form of the BP likelihood function (15), and hence BP estimates can
be obtained directly from the bivariate probit routine available in many statistical software
packages.
In the simulations with nonzero covariates Xi , the ATE for the bivariate probit model is
estimated as
Xn
^ BPE = 1
AT (^ + ^ Y Xi + ^ Y ) ( ^ Y Xi + ^ Y ) :
n i=1
The true ATE and LATE always lie in the interval [ 1; 1]. While ^ BPE will always fall
AT
in that interval, ^ IV is sometimes outside this interval, especially when the sample size is
small.
4
The correlation of 0:32 for (uT ; uY ) was chosen so that the correlation of the transformed ("T ; "Y ) is
approximately 0:30, allowing for comparison to the bivariate normal simulations with = 0:30.
14
3.2 Results
Our simulation results are presented in Figures 3 through 8. The ...rst three are representa-
tionally similar: in every sub-...gure, we plot the true AT E (the dotted line), the mean of
^ BP (the thick solid curve) and the mean of ^ IV (the thick dashed curve) against sample
AT E
sizes between 400 and 30,000. We also show the range between the 5th and the 95th per-
centiles of ^ BPE and ^ IV . There are 9 sub-...gures showing the behavior of the BP and IV
AT
estimators for di¤erent parameter values of pT and pY and in every ...gure, we ...x = 0:3.
Figure 3 presents simulations in estimations with no covariates, Figure 4 with covariates
and Figure 5 examines departures from the BP model assumptions. In Appendix A.5, we
provide tables of the root-mean-square error of ^ BPE and ^ IV for estimating
AT
AT E
over a
wider range of parameter values, which researchers can use with reference to the structure
of their own particular problem.
There are several noteworthy features. First, when there are no covariates (Figure 3) the
simulations match the asymptotics (Figures 1 and 2) fairly well in sample sizes larger than
about 5,000. In sample sizes smaller than 5,000, ^ BPE has lower variance than predicted by
AT
the asymptotics, because of mechanical bounds on the estimator ( ^ BPE 2 [ 1; 1]). Second
AT
^ BP can be biased in small samples, as often happens for maximum-likelihood estimators.
AT E
Even when sample sizes are large, ^ BPE can be biased under particular extreme combinations
AT
of pT and pY -- in our simulations, two particularly dramatic examples are (pT = 0:9; pY =
0:1) and (pT = 0:1; pY = 0:9).5 Third, due to its relatively lower small-sample variance,
^ BP generally performs better than ^ IV in terms of RMSE for sample sizes smaller than
AT E
about 5,000. For larger sample sizes, the e¢ ciency of ^ BPE relative to ^ IV is somewhat
AT
reduced and for extreme combinations of parameter values, ^ IV can be the better estimator
in terms of RMSE.
5
Firth (1993) describes several techniques for removing the ...rst-order bias from maximum-likelihood
estimates. We simulated both asymptotic ...rst-order bias removal and bootstrap bias removal for the BP
estimator but found that both techniques perform rather poorly, especially when j j is close to 1; since
lower-order expansions poorly approximate the ...nite-sample bias.
15
Figure 4 shows that once we include covariates X in the BP model (19), ^ BPE has much
AT
lower variance and outperforms ^ IV across all of our simulations in terms of RMSE for
AT E . Indeed, in most cases, the IV standard errors are too large for meaningful hypothesis
testing, a problem that is particularly severe when pT is close to 0 or 1. These simulations
highlight that the use of linear IV estimators with covariates can lead to extremely high
standard errors and dramatic di¤erences in ^ BPE relative to ^ IV .
AT
An overarching theme thus far is that the BP estimators are generally more e¢ cient than
linear IV, especially when the model speci...cation includes additional covariates. However,
the gain in e¢ ciency may be outweighed by the severe bias when the BP model is misspeci...ed.
Figure 5 examines departures from the BP model assumptions in the case with covariates.6
In this case, ^ BPE continues to have low variance but can be severely biased in some cases,
AT
with no clear guidance on the parameter values under which the expected bias will be worse.
The evidence of bias presented here contrasts with the results of Bhattacharya, Goldman,
and McCa¤rey (2006), who suggest that BP is slightly more robust to non-normality than IV.
s
As Figure 5 clari...es, Bhattacharya, Goldman, and McCa¤rey' result is a direct consequence
of their choice of parameters. In their simulations with non-normal errors, pT is ...xed at 0:5,
pY ranges between 0:5 and 0:7, and is 0:5. Our results in Figure 5 suggest that these
happen to be values of pT and pY for which ^ BPE performs fairly well even when the BP
AT
model assumptions are violated.
6
For simulations with skewness or excess kurtosis of the error terms, the results are similar because the BP
estimates are still consistent, despite the misspeci...cation. This is because with no covariates our misspeci...ed
DGP is observationally equivalent to a correctly speci...ed bivariate probit model. Recall that (uT ; uY ) are
generated as bivariate normal with correlation , and then "T = f (uT ) and "Y = f (uY ) for some monotone
function f . Let ~ T = f 1 ( T ), ~ = f 1 ( T + ) ~ T , ~ Y = f 1 ( Y ), and ~ = f 1 ( Y + ) ~ Y . Then a
correctly speci...ed bivariate probit model with coe¢ cients ~ T , ~ , ~ Y , ~ , produces the same distribution of
observables as our DGP, and the values of all treatment e¤ects are the same in both models. It would have
been possible for the BP estimators to be inconsistent if we had modi...ed the joint distribution of ("T ; "Y )
rather than modifying the marginal distributions individually. With a nonzero covariate Xi , the assumption
of normality will actually be restrictive because the transformation f 1 will be applied at more than two
points and hence will no longer preserve linearity.
16
3.3 Coverage of con...dence intervals
Our ...nal simulation results examine the validity of con...dence intervals generated by the
various methods and the performance of goodness-of-...t tests for the BP model, which can
help detect potential misspeci...cation in the BP model. Figure 6 compares the nominal 95%
con...dence intervals based on ^ IV and ^ BPE in terms of coverage of
AT AT E . The standard
error used to construct the con...dence intervals for ^ IV is obtained using the sample analogue
of the asymptotic variance (13). Results are shown in Figure 6 for a correctly speci...ed model
with Xi N (0; 1) and = 0:3. As shown in the ...gure, the IV coverage tends to be too high
(greater than 95%) for small samples but slowly deteriorates towards zero as the sample size
increases and ^ IV converges to LAT E rather than AT E (the dashed curve in the ...gure).
The most common way to compute standard errors for the BP parameters is by estimating
the information matrix using the sample analogue of I2 (^) (17). We would then apply the
delta method as in (18) to obtain standard errors for ^ BPE . BP con...dence intervals for
AT AT E
computed in this way display signi...cantly lower coverage than the nominal 95% for sample
sizes below 5,000, even when the model is correctly speci...ed, but coverage improves toward
95% in samples larger than 10,000 observations (the solid curve in the ...gure). Further
investigation reveals that this undercoverage occurs because standard errors for the BP
parameters are too small, and additional undercoverage is introduced in the delta-method
step.7 Alternatively, we tried estimating the information matrix using the sample analogue
of I1 (^) (16), or standard errors can be estimated using the Huber-White sandwich (robust)
^ ^ ^
estimator I2 (^) 1 I1 (^)I2 (^) 1 . These methods result in similar undercoverage of AT E .
Fortunately, bootstrapped con...dence intervals appear to provide a simple ...x for over and
undercoverage in both the IV and BP models. In the bootstrap, we draw with replacement
n observations from the data and estimate ^ BPE and ^ IV using the new sample. This is
AT
7
Monfardini and Radice (2008) report a similar result that t-tests based on maximum-likelihood estimation
of the BP model systematically overreject the hypothesis = 0. Also as noted by Freedman and Sekhon
s
(2010), part of the di¢ culty may be caused by numerical issues with Stata' implementation of likelihood
at
maximization, as often the likelihood function is very and the algorithm fails to ...nd the global maximum.
17
repeated many times, and the size- con...dence interval for AT E or LAT E is reported as
the interval between the a
2
and 1 2
quantiles of the simulated draws of ^ BPE and ^ IV .8 By
AT
bootstrapping the entire procedure of calculating ^ BPE , we avoid using the delta method.
AT
Because we ran thousands of simulations we used 39 bootstrap replications in each of our
simulations to save time (each bootstrap replication took about 15 seconds at n = 30; 000),
but we recommend at least 199 bootstrap replications in practice to reduce sampling noise. In
addition, we simulated bootstrap results for only two sample sizes (n = 400 and n = 3; 000)
given the processing time involved.
The coverage rates of the bootstrapped BP con...dence intervals for AT E are close to
the nominal 1 , as shown in Figures 6 and 7. The only exceptions are in small samples
in the extreme cases (pT = 0:1; pY = 0:9) and (pT = 0:9; pY = 0:1), which have been shown
in Figure 3 to be particularly problematic for BP. We therefore strongly recommend using
bootstrapped con...dence intervals for BP, whether one is estimating treatment e¤ects or just
the BP coe¢ cients .9 In addition, Figures 6 and 7 show that bootstrapping also reduces
the overcoverage of IV con...dence intervals that we saw in small samples although it does
not prevent undercoverage of IV in large samples because ^ IV is generally inconsistent for
AT E .
3.4 Goodness-of-...t tests for bivariate probit
Figure 8 presents results of goodness-of-...t tests for the bivariate probit model. We compare
the ability of two di¤erent goodness-of-...t tests to detect our non-normal data-generating
processes. Our ...rst test is an adaptation of the Hosmer and Lemeshow (1980) test to the
bivariate probit model. This test divides the observations into subgroups and checks whether
the frequencies of observed (yi ; ti ) match predicted frequencies given ^ and the distribution
8
Our simulations indicate that these quantile-based con...dence intervals perform better than bootstrap-
ping standard errors and then using a normal approximation to obtain con...dence intervals.
9
When the BP model is misspeci...ed, the coverage rates of BP con...dence intervals are severely a¤ected by
the misspeci...cation when there are covariates Xi . The misspeci...cation has a lesser impact on IV coverage
rates, since IV standard errors tend to be larger and IV is generally not consistent even in the BP model.
However, for the same reason, tests based on ^ IV are generally less powerful than tests based on ^ BP E .
AT
18
of Xi and Zi in each subgroup. The details of our adaptation of the Hosmer-Lemeshow
test are given in Appendix A.4. The second goodness-of-...t test we use is a Rao score test
developed by Murphy (2007).10 This test embeds the bivariate normal distribution within
a larger family of distributions by adding more parameters to the model and checks whether
the additional parameters are all zeros using the score for the additional parameters at the
BP estimate.11 We set both tests to reject at a 5% signi...cance level using asymptotic chi-
square critical values.12 In our simulations with a bivariate normal data generating process,
both tests reject about 5% of the time, as expected. The score test performs much better
than the Hosmer-Lemeshow test in detecting our non-normal data-generating processes, as
shown in Figure 8. The results of this comparison of the two tests agree with those of
Chiburis (2010) from simulations without an endogenous regressor.
4 Conclusion
We have derived asymptotic results and presented simulations comparing bivariate probit
and linear IV estimators of the average treatment e¤ect of a binary treatment on a binary
outcome. Our simulation results provide some practical guidance on the choice of speci-
...cation in practical problems with di¤erent parameter values and the presence/absence of
covariates and can help explain widely di¤ering results depending on the speci...cation cho-
sen. Our ...ndings can be summarized as four main messages for practical applications in
empirical models with binary regressors and binary outcome variables:
Researchers should expect IV and BP coe¢ cients to di¤er substantially when treatment
probabilities are low or when sample sizes are below 5000. Linear IV estimates are
particularly uninformative for hypothesis testing when treatment probabilities are low,
10
See Chiburis (2010) for corrections to several errors in Murphy (2007) and an alternative derivation of
the test.
11
Since Ti is endogenous, predicted probabilities of (yi ; ti ) used to calculate the test statistic are computed
conditional on Xi and Zi but not Ti .
12
Murphy (2007) recommends bootstrapping the critical value of his test, but we ...nd that the asymptotic
critical values work well enough even at small sample sizes that the time-consuming bootstrap is not necessary.
19
a problem that is accentuated when there are covariates in the model. Table A5 in
the Appendix provides the ATE, the ratio of LATE to ATE and the root mean square
error for di¤erent values of pT , pY and for di¤erent sample sizes. These tables can
be used as a guide for practical applications. One recommendation is to present both
linear IV and BP estimates when there are covariates in the model, and for the ranges
of pT and pY where IV con...dence intervals are large.
ect
The di¤erence between IV and BP estimates could also re di¤erences between the
LATE and ATE estimates recovered by the linear IV and BP procedures respectively.
Again, Table A5 as well as our asymptotic ratio approximation provide a guide for the
variance in these estimates.
Con...dence intervals recovered through bootstrapping are a must in these models when
sample sizes are below 10,000 and should be preferred to analytical standard errors for
all applications.
As is well known, researchers should be aware that for a broad range of parameter
values, misspeci...cation of the BP model can lead to severe bias in BP estimates.
This problem, however, does not arise in models with no covariates. In models with
s
covariates, Murphy' goodness-of-...t score test (Murphy 2007, Chiburis 2010) can help
detect misspeci...cations of the BP model.
A Appendix
A.1 Stata commands
In this appendix we describe how to run our recommended BP and IV procedures for cal-
culating treatment e¤ects in Stata. The Stata commands biprobittreat, scoregof, and
20
bphltest are available for download at
https://webspace.utexas.edu/rcc485/www/code.html
Suppose we have a dataset with binary outcome Y, binary treatment T, instrument Z,
and covariates X1, X2.
1. To compute ^ IV along with bootstrapped con...dence intervals, type:
ivregress 2sls Y X1 X2 (T=Z), vce(bootstrap, reps(199))
estat bootstrap, percentile
2. To compute ^ BPE and ^ BPT along with bootstrapped con...dence intervals, type:
AT AT
bootstrap _b ate=r(ate) att=r(att), reps(199): biprobittreat (Y = T
X1 X2) (T = Z X1 X2)
estat bootstrap, percentile
3. To run the Murphy score and Hosmer-Lemeshow goodness-of-...t tests, type:
biprobit (Y = T X1 X2) (T = Z X1 X2)
scoregof
bphltest
A.2 Derivation of LAT E = AT E approximation (12)
Using (8) and (11), we compute a ...rst-order Taylor approximation of LAT E
AT E
about ; ; = 0.
Although the ratio is unde...ned for = 0 or = 0, the limit lim ; ; !0
LAT E
AT E
exists and
is equal to 1, so we can still compute the Taylor expansion around this point. The terms
involving the derivatives with respect to and are zero because LAT E
AT E
= 1 when = 0,
regardless of and . This leaves us with
21
p p p p
( T+ )2 +( Y + )2 ( T+ )2 + 2
Y
2 +(
T Y + )2 + 2+ 2
T Y
LAT E
1+ lim p
2 ( (
:
AT E ; !0 T+ ) ( T ))( ( Y + ) ( Y ))
Hôpital' rule twice to obtain
The limit is indeterminate, so we apply L' s
LAT E
1+ T Y:
AT E
1
In order to write this in terms of pT and pY , at = = 0 we approximate T (pT )
1
and Y (pT ), yielding
LAT E 1 1
1+ (pT ) (pY ):
AT E
A.3 Derivation of Avar[ ^ IV ] approximation (14)
We can write (13) as
Pr[Y =1jZ=1](1 Pr[Y =1jZ=1]) Pr[Y =1jZ=0](1 Pr[Y =1jZ=0])
Pr[Z=1]
+ Pr[Z=0]
Avar[ ^ IV ] = ;
~2
where
~= ( T + ) ( T)
is the probability limit of the coe¢ cient on Z in the ...rst stage of the IV regression. A
Taylor approximation of ~ in terms of pT is
1
~ ( (pT ))
22
1
since (pT ) is between T and T+ . Furthermore, for reasonable values of the treatment
e¤ect we can use
E [Y j Z = 1] (1 E [Y j Z = 1]) E [Y j Z = 0] (1 E [Y j Z = 0])
to approximate
pY (1 pY )
Avar[ ^ IV ] 2
~ Pr [Z = 1] (1 Pr [Z = 1])
Var[Y ]
=
~ 2 Var[Z]
pY (1 pY )
2[ ( 1 (p ))]2 Var[Z]
:
T
A.4 Adapted Hosmer-Lemeshow goodness-of-...t test for bivariate
probit
The Hosmer and Lemeshow (1980) test statistic was developed to correct a problem with
the simple Pearson test statistic. To compute the Pearson test statistic in the bivariate
probit model with an endogenous regressor, we create cells for each unique value of (x; z) in
the data and sort the observations into those cells. For each cell c = 1; : : : ; C, let Ocyt be
the number of observations in cell c with Y = y and T = t, and let Ecyt be the expected
number of observations in cell c with Y = y and T = t according to the BP model. It is
P
computed as Ecyt = i2cell c ^ iyt , where ^ iyt is the predicted probability of (Y; T ) = (y; t)
given (X; Z) = (xi ; zi ), evaluated using the BP model at the estimated parameters ^. The
Pearson test statistic is
X X X (Ocyt
C 1 1
Ecyt )2
2
X = :
c=1 y=0 t=0
Ecyt
When X and Z are discrete and the number of unique cells (x; z) is small relative to n, X 2
is approximately distributed as chi-square with 3C dim ( ) degrees of freedom under the
null hypothesis that the true model is BP (Osius and Rojek 1992). We recommend the use
23
of the Pearson test statistic in such cases. However, when there are many unique values of
(x; z) in the data, as is the case when X or Z is continuously distributed, Osius and Rojek
(1992) show that this asymptotic approximation for X 2 breaks down. They compute a
better asymptotic distribution of X 2 for the continuous case.
The method of Hosmer and Lemeshow (1980) and Fagerland, Hosmer, and Bo...n (2008),
which was originally developed for logistic models, is another way to modify the Pearson
test for use with continuous X or Z. This test combines the observations into a smaller
number of groups to ensure that the test statistic is well approximated by its asymptotic
distribution.13 To adapt the Hosmer and Lemeshow (1980) test to the bivariate probit
model, we choose two constants G1 and G2 . We ...rst sort the observations into G1 groups of
roughly equal size based on Pr[T = 1 j ^; X = xi ; Z = zi ]. Within each of these groups, we
then sort the observations into G2 subgroups based on Pr[Y = 1 j ^; X = xi ; Z = zi ]. This
results in a total of G = G1 G2 groups. For each of these groups g, let Ogyt be the number of
P
observations in group g with Y = y and T = t, and let Egyt = i2group g ^ iyt . The adapted
Hosmer-Lemeshow test statistic is
X X X (Ogyt
G 1 1
Egyt )2
C= :
g=1 y=0 t=0
Egyt
Under the null hypothesis that BP is the correct model, we expect C to be distributed
approximately chi-square with 3(G 2) degrees of freedom. This distribution was derived
by Fagerland, Hosmer, and Bo...n (2008) based on simulations. In our simulations, the
Hosmer-Lemeshow test statistic C is computed with G1 = G2 = 3.14
13
Pigeon and Heyse (1999) add a small modi...cation to the Hosmer-Lemeshow statistic. Their statistic
has a slightly di¤erent asymptotic distribution.
14
This results in 9 total groups, which is in the range of 8 to 12 groups used by Fagerland, Hosmer, and
Bo...n (2008) in their simulations of the analogous test for multinomial logistic regressions.
24
A.5 Simulation root-mean-square error tables
Root-mean-square error
n = 400 n = 1; 000 n = 3; 000 n = 10; 000 n = 30; 000
pT pY AT E
LAT E
AT E
BP IV BP IV BP IV BP IV BP IV
0.1 0.1 0.08 1.00 0.38 2.57 0.35 1.53 0.27 0.22 0.16 0.12 0.09 0.06
0.1 0.3 0.15 1.00 0.36 3.46 0.33 1.99 0.26 0.35 0.16 0.19 0.10 0.10
0.1 0.5 0.16 1.00 0.36 4.24 0.32 3.14 0.25 0.39 0.16 0.21 0.10 0.12
0.1 0.7 0.13 1.00 0.38 4.08 0.34 3.14 0.24 0.36 0.14 0.19 0.08 0.11
0.1 0.9 0.06 1.00 0.43 2.58 0.38 2.94 0.27 0.23 0.15 0.12 0.06 0.07
0.3 0.1 0.08 1.00 0.28 1.93 0.22 0.21 0.13 0.11 0.07 0.06 0.04 0.03
0.3 0.3 0.14 1.00 0.31 1.91 0.24 0.31 0.16 0.16 0.09 0.09 0.05 0.05
0.3 0.5 0.16 1.00 0.31 2.40 0.24 0.34 0.15 0.18 0.10 0.10 0.06 0.06
0.3 0.7 0.13 1.00 0.31 1.23 0.22 0.31 0.14 0.17 0.08 0.09 0.05 0.05
0.3 0.9 0.06 1.00 0.29 2.29 0.22 0.21 0.12 0.11 0.06 0.06 0.03 0.03
0.5 0.1 0.07 1.00 0.24 0.54 0.18 0.17 0.10 0.09 0.05 0.05 0.03 0.03
0.5 0.3 0.14 1.00 0.29 0.96 0.22 0.27 0.13 0.14 0.08 0.08 0.04 0.04
0.5 0.5 0.16 1.00 0.30 0.81 0.22 0.30 0.14 0.15 0.09 0.09 0.05 0.05
0.5 0.7 0.14 1.00 0.29 0.64 0.21 0.27 0.14 0.15 0.08 0.08 0.05 0.05
0.5 0.9 0.07 1.00 0.25 0.49 0.18 0.17 0.10 0.09 0.05 0.05 0.03 0.03
0.7 0.1 0.06 1.00 0.27 1.13 0.21 0.20 0.12 0.11 0.05 0.06 0.03 0.03
0.7 0.3 0.13 1.00 0.31 1.70 0.24 0.32 0.14 0.17 0.08 0.09 0.05 0.05
0.7 0.5 0.16 1.00 0.32 1.15 0.24 0.34 0.15 0.18 0.09 0.10 0.06 0.06
0.7 0.7 0.14 1.00 0.30 0.88 0.24 0.31 0.16 0.17 0.09 0.09 0.06 0.05
0.7 0.9 0.08 1.00 0.28 0.47 0.22 0.21 0.13 0.11 0.07 0.06 0.04 0.03
0.9 0.1 0.06 1.00 0.44 2.45 0.37 0.83 0.27 0.23 0.15 0.12 0.06 0.06
0.9 0.3 0.13 1.00 0.39 5.02 0.35 0.91 0.26 0.35 0.15 0.18 0.08 0.10
0.9 0.5 0.16 1.00 0.38 5.09 0.34 1.26 0.25 0.37 0.16 0.21 0.10 0.12
0.9 0.7 0.15 1.00 0.36 4.03 0.33 0.85 0.26 0.35 0.17 0.19 0.10 0.11
0.9 0.9 0.08 1.00 0.37 3.65 0.35 0.68 0.26 0.24 0.16 0.12 0.09 0.06
Table 2: Root-mean-square error of ^ BPE and ^ IV for true ATE as a function of pT and pY ,
AT
in bivariate probit model simulations with no covariates and = 0. For most values of pT
and pY , the RMSE of BP is much smaller than the RMSE for IV in the sample sizes below
3000, but the di¤erence shrinks with larger sample sizes.
25
Root-mean-square error
n = 400 n = 1; 000 n = 3; 000 n = 10; 000 n = 30; 000
pT pY AT E
LAT E
AT E
BP IV BP IV BP IV BP IV BP IV
0.1 0.1 0.08 1.47 0.38 3.03 0.32 1.88 0.23 0.22 0.11 0.12 0.06 0.07
0.1 0.3 0.15 1.12 0.36 4.93 0.33 4.81 0.25 0.34 0.15 0.18 0.09 0.10
0.1 0.5 0.16 0.90 0.37 6.65 0.34 4.22 0.26 0.38 0.17 0.20 0.10 0.11
0.1 0.7 0.12 0.70 0.41 5.30 0.37 2.99 0.28 0.34 0.19 0.19 0.11 0.11
0.1 0.9 0.05 0.47 0.32 3.40 0.41 2.67 0.40 0.23 0.25 0.12 0.16 0.07
0.3 0.1 0.07 1.19 0.27 1.38 0.20 0.21 0.11 0.11 0.06 0.06 0.03 0.04
0.3 0.3 0.14 1.11 0.31 1.58 0.24 0.32 0.15 0.16 0.09 0.09 0.05 0.05
0.3 0.5 0.16 1.02 0.32 2.16 0.24 0.34 0.16 0.18 0.09 0.10 0.05 0.05
0.3 0.7 0.13 0.91 0.31 1.78 0.24 0.31 0.15 0.16 0.09 0.09 0.05 0.05
0.3 0.9 0.06 0.73 0.32 1.29 0.26 0.21 0.16 0.11 0.08 0.06 0.04 0.04
0.5 0.1 0.06 0.96 0.24 0.50 0.18 0.18 0.10 0.09 0.05 0.05 0.03 0.03
0.5 0.3 0.14 1.03 0.29 0.89 0.22 0.28 0.13 0.14 0.08 0.08 0.04 0.04
0.5 0.5 0.16 1.05 0.30 1.08 0.23 0.29 0.15 0.15 0.08 0.08 0.05 0.05
0.5 0.7 0.14 1.03 0.29 0.83 0.21 0.26 0.13 0.14 0.08 0.08 0.04 0.04
0.5 0.9 0.06 0.96 0.25 0.54 0.18 0.17 0.10 0.09 0.05 0.05 0.03 0.03
0.7 0.1 0.06 0.73 0.32 1.19 0.25 0.23 0.15 0.11 0.08 0.06 0.04 0.04
0.7 0.3 0.13 0.91 0.31 1.82 0.25 0.33 0.15 0.16 0.09 0.09 0.05 0.05
0.7 0.5 0.16 1.02 0.32 1.31 0.25 0.37 0.16 0.18 0.09 0.10 0.05 0.05
0.7 0.7 0.14 1.11 0.31 1.25 0.24 0.31 0.15 0.16 0.09 0.09 0.05 0.05
0.7 0.9 0.07 1.19 0.27 0.72 0.20 0.21 0.11 0.11 0.06 0.06 0.03 0.04
0.9 0.1 0.05 0.47 0.34 2.73 0.41 1.28 0.40 0.24 0.27 0.12 0.15 0.07
0.9 0.3 0.12 0.70 0.42 4.12 0.37 1.90 0.29 0.35 0.19 0.18 0.11 0.10
0.9 0.5 0.16 0.90 0.38 4.81 0.34 2.07 0.26 0.38 0.17 0.20 0.10 0.11
0.9 0.7 0.15 1.12 0.38 4.24 0.34 2.85 0.25 0.35 0.16 0.18 0.09 0.10
0.9 0.9 0.08 1.47 0.39 3.12 0.34 2.45 0.23 0.22 0.12 0.12 0.07 0.08
Table 3: Root-mean-square error of ^ BPE and ^ IV for true ATE as a function of pT and
AT
pY , in bivariate probit model simulations with no covariates and = 0:3.
26
Root-mean-square error
n = 400 n = 1; 000 n = 3; 000 n = 10; 000 n = 30; 000
pT pY AT E
LAT E
AT E
BP IV BP IV BP IV BP IV BP IV
0.1 0.1 0.08 1.90 0.39 2.55 0.30 2.33 0.19 0.22 0.09 0.13 0.05 0.10
0.1 0.3 0.15 1.17 0.39 3.58 0.33 5.51 0.24 0.33 0.15 0.18 0.09 0.10
0.1 0.5 0.16 0.75 0.38 4.92 0.34 4.74 0.28 0.37 0.19 0.20 0.12 0.12
0.1 0.7 0.12 0.44 0.38 3.78 0.42 3.10 0.34 0.35 0.23 0.19 0.15 0.12
0.1 0.9 0.05 0.18 0.18 2.82 0.24 3.12 0.35 0.23 0.41 0.13 0.31 0.08
0.3 0.1 0.07 1.31 0.26 1.82 0.19 0.21 0.10 0.11 0.05 0.06 0.03 0.04
0.3 0.3 0.14 1.25 0.31 2.01 0.22 0.29 0.14 0.16 0.08 0.09 0.05 0.06
0.3 0.5 0.16 1.06 0.31 2.12 0.24 0.33 0.16 0.17 0.09 0.09 0.05 0.06
0.3 0.7 0.13 0.82 0.31 2.30 0.24 0.32 0.16 0.16 0.09 0.09 0.05 0.06
0.3 0.9 0.06 0.47 0.30 1.78 0.32 0.21 0.23 0.11 0.11 0.07 0.06 0.04
0.5 0.1 0.06 0.84 0.26 0.50 0.18 0.17 0.10 0.09 0.05 0.05 0.03 0.03
0.5 0.3 0.13 1.09 0.29 0.98 0.21 0.26 0.13 0.14 0.07 0.07 0.04 0.04
0.5 0.5 0.16 1.15 0.29 1.11 0.22 0.27 0.14 0.15 0.08 0.08 0.05 0.05
0.5 0.7 0.13 1.09 0.28 0.81 0.21 0.25 0.12 0.13 0.07 0.08 0.04 0.04
0.5 0.9 0.06 0.84 0.26 0.49 0.18 0.17 0.11 0.09 0.05 0.05 0.03 0.03
0.7 0.1 0.06 0.47 0.31 1.03 0.31 0.21 0.23 0.11 0.12 0.07 0.06 0.04
0.7 0.3 0.13 0.82 0.31 2.61 0.25 0.31 0.16 0.16 0.09 0.09 0.05 0.05
0.7 0.5 0.16 1.06 0.32 1.32 0.25 0.33 0.16 0.17 0.09 0.09 0.05 0.05
0.7 0.7 0.14 1.25 0.31 0.87 0.23 0.29 0.14 0.16 0.08 0.09 0.05 0.06
0.7 0.9 0.07 1.31 0.26 0.50 0.17 0.20 0.09 0.11 0.05 0.06 0.03 0.04
0.9 0.1 0.05 0.18 0.19 2.37 0.25 0.96 0.36 0.23 0.44 0.13 0.31 0.08
0.9 0.3 0.12 0.44 0.39 4.19 0.41 1.20 0.34 0.35 0.24 0.19 0.15 0.12
0.9 0.5 0.16 0.75 0.40 4.66 0.35 1.25 0.28 0.37 0.19 0.20 0.12 0.12
0.9 0.7 0.15 1.17 0.39 3.94 0.34 1.35 0.24 0.34 0.15 0.18 0.09 0.11
0.9 0.9 0.08 1.90 0.37 2.76 0.30 0.58 0.19 0.22 0.09 0.13 0.05 0.09
Table 4: Root-mean-square error of ^ BPE and ^ IV for true ATE as a function of pT and
AT
pY , in bivariate probit model simulations with no covariates and = 0:5.
27
Root-mean-square error
n = 400 n = 1; 000 n = 3; 000 n = 10; 000 n = 30; 000
pT pY AT E
LAT E
AT E
BP IV BP IV BP IV BP IV BP IV
0.1 0.1 0.08 2.60 0.40 3.06 0.29 2.23 0.16 0.24 0.07 0.16 0.04 0.14
0.1 0.3 0.15 1.12 0.41 4.07 0.34 3.47 0.25 0.33 0.15 0.17 0.09 0.10
0.1 0.5 0.16 0.46 0.40 4.25 0.39 3.95 0.33 0.38 0.24 0.21 0.16 0.14
0.1 0.7 0.12 0.15 0.25 3.57 0.29 3.02 0.35 0.37 0.37 0.21 0.29 0.15
0.1 0.9 0.05 0.02 0.09 2.93 0.09 2.18 0.10 0.23 0.15 0.13 0.21 0.09
0.3 0.1 0.06 1.34 0.26 1.10 0.17 0.23 0.08 0.11 0.04 0.06 0.02 0.04
0.3 0.3 0.14 1.54 0.30 1.61 0.21 0.31 0.12 0.17 0.07 0.11 0.04 0.09
0.3 0.5 0.16 1.11 0.31 2.65 0.24 0.33 0.15 0.17 0.09 0.09 0.05 0.06
0.3 0.7 0.12 0.60 0.32 2.93 0.28 0.34 0.18 0.17 0.11 0.10 0.06 0.07
0.3 0.9 0.05 0.16 0.17 1.17 0.19 0.22 0.25 0.12 0.28 0.07 0.21 0.06
0.5 0.1 0.06 0.53 0.23 0.50 0.22 0.19 0.17 0.10 0.08 0.06 0.04 0.04
0.5 0.3 0.13 1.16 0.27 0.76 0.20 0.26 0.12 0.14 0.06 0.08 0.04 0.05
0.5 0.5 0.16 1.39 0.28 0.87 0.20 0.26 0.12 0.15 0.07 0.10 0.04 0.08
0.5 0.7 0.13 1.16 0.26 0.70 0.19 0.24 0.11 0.14 0.06 0.07 0.04 0.05
0.5 0.9 0.06 0.53 0.23 0.48 0.22 0.18 0.17 0.10 0.08 0.06 0.04 0.04
0.7 0.1 0.05 0.16 0.18 0.81 0.20 0.23 0.24 0.12 0.28 0.07 0.20 0.06
0.7 0.3 0.12 0.60 0.32 1.40 0.28 0.32 0.18 0.17 0.11 0.10 0.06 0.07
0.7 0.5 0.16 1.11 0.31 1.53 0.23 0.33 0.15 0.17 0.08 0.09 0.05 0.06
0.7 0.7 0.14 1.54 0.29 1.32 0.20 0.29 0.12 0.17 0.07 0.11 0.04 0.09
0.7 0.9 0.06 1.34 0.26 0.67 0.16 0.20 0.08 0.10 0.04 0.06 0.02 0.04
0.9 0.1 0.05 0.02 0.09 2.43 0.08 0.72 0.09 0.24 0.14 0.13 0.22 0.08
0.9 0.3 0.12 0.15 0.24 4.51 0.29 0.94 0.35 0.36 0.39 0.20 0.30 0.14
0.9 0.5 0.16 0.46 0.40 4.53 0.39 1.37 0.33 0.38 0.23 0.21 0.16 0.14
0.9 0.7 0.15 1.12 0.42 4.73 0.34 1.66 0.25 0.35 0.16 0.17 0.09 0.10
0.9 0.9 0.08 2.60 0.39 3.45 0.27 0.92 0.15 0.23 0.06 0.16 0.03 0.14
Table 5: Root-mean-square error of ^ BPE and ^ IV for true ATE as a function of pT and
AT
pY , in bivariate probit model simulations with no covariates and = 0:7.
28
Root-mean-square error
n = 400 n = 1; 000 n = 3; 000 n = 10; 000 n = 30; 000
pT pY AT E
LAT E
AT E
BP IV BP IV BP IV BP IV BP IV
0.1 0.1 0.08 1.35 0.20 5.87 0.14 5.89 0.08 0.30 0.04 0.15 0.03 0.09
0.1 0.3 0.14 1.09 0.25 8.04 0.17 21.61 0.10 0.47 0.06 0.23 0.03 0.13
0.1 0.5 0.15 0.92 0.26 10.12 0.18 18.59 0.11 0.49 0.06 0.24 0.03 0.14
0.1 0.7 0.12 0.77 0.27 11.15 0.18 34.14 0.10 0.48 0.05 0.23 0.03 0.14
0.1 0.9 0.05 0.56 0.21 3.06 0.18 19.59 0.09 0.31 0.04 0.16 0.02 0.10
0.3 0.1 0.07 1.13 0.18 3.19 0.13 0.39 0.08 0.14 0.05 0.07 0.03 0.04
0.3 0.3 0.13 1.07 0.24 8.31 0.18 1.04 0.11 0.22 0.06 0.11 0.03 0.06
0.3 0.5 0.15 1.01 0.25 25.13 0.17 0.98 0.10 0.23 0.06 0.12 0.03 0.07
0.3 0.7 0.12 0.93 0.23 42.05 0.15 0.47 0.09 0.23 0.05 0.11 0.03 0.07
0.3 0.9 0.06 0.79 0.18 39.50 0.10 0.30 0.05 0.15 0.03 0.08 0.02 0.05
0.5 0.1 0.06 0.97 0.18 4.46 0.12 0.93 0.07 0.12 0.04 0.07 0.02 0.04
0.5 0.3 0.13 1.01 0.26 18.72 0.18 1.64 0.11 0.19 0.06 0.10 0.04 0.05
0.5 0.5 0.15 1.02 0.28 45.66 0.20 1.82 0.12 0.20 0.07 0.10 0.04 0.06
0.5 0.7 0.13 1.01 0.25 12.23 0.19 1.24 0.11 0.19 0.06 0.10 0.04 0.06
0.5 0.9 0.06 0.96 0.17 10.47 0.11 0.85 0.07 0.13 0.04 0.07 0.02 0.04
0.7 0.1 0.06 0.80 0.19 24.85 0.11 0.82 0.06 0.14 0.03 0.08 0.02 0.04
0.7 0.3 0.12 0.93 0.23 36.60 0.15 1.09 0.08 0.22 0.05 0.11 0.03 0.06
0.7 0.5 0.15 1.02 0.26 49.65 0.18 1.09 0.10 0.23 0.05 0.12 0.03 0.07
0.7 0.7 0.13 1.07 0.25 56.82 0.18 0.83 0.11 0.22 0.06 0.11 0.03 0.07
0.7 0.9 0.07 1.13 0.18 41.57 0.13 0.90 0.08 0.15 0.05 0.08 0.03 0.05
0.9 0.1 0.05 0.58 0.23 4.47 0.20 1.85 0.11 0.31 0.04 0.15 0.02 0.09
0.9 0.3 0.12 0.77 0.27 7.58 0.17 2.96 0.10 0.47 0.05 0.24 0.03 0.13
0.9 0.5 0.15 0.94 0.26 10.60 0.18 2.96 0.10 0.51 0.06 0.24 0.03 0.14
0.9 0.7 0.14 1.11 0.25 9.90 0.18 2.62 0.11 0.47 0.06 0.23 0.03 0.14
0.9 0.9 0.08 1.36 0.21 7.01 0.14 2.21 0.08 0.31 0.04 0.15 0.02 0.09
Table 6: Root-mean-square error of ^ BPE and ^ IV for true ATE as a function of pT and
AT
pY , in bivariate probit model simulations with covariate X and = 0.
29
Root-mean-square error
n = 400 n = 1; 000 n = 3; 000 n = 10; 000 n = 30; 000
pT pY AT E
LAT E
AT E
BP IV BP IV BP IV BP IV BP IV
0.1 0.1 0.07 1.75 0.22 3.79 0.15 10.98 0.09 0.29 0.05 0.15 0.03 0.10
0.1 0.3 0.14 1.17 0.27 6.61 0.19 11.02 0.12 0.47 0.06 0.22 0.04 0.13
0.1 0.5 0.15 0.81 0.29 9.62 0.20 5.82 0.12 0.51 0.07 0.24 0.04 0.15
0.1 0.7 0.12 0.52 0.28 8.79 0.22 39.55 0.13 0.48 0.07 0.23 0.04 0.14
0.1 0.9 0.05 0.27 0.13 4.58 0.14 11.49 0.13 0.32 0.09 0.16 0.04 0.10
0.3 0.1 0.06 1.29 0.20 22.22 0.14 0.32 0.09 0.14 0.04 0.07 0.03 0.05
0.3 0.3 0.13 1.21 0.27 7.64 0.19 0.90 0.12 0.21 0.06 0.11 0.04 0.07
0.3 0.5 0.15 1.06 0.27 14.39 0.18 0.74 0.11 0.24 0.06 0.12 0.04 0.07
0.3 0.7 0.12 0.84 0.23 20.86 0.15 0.50 0.09 0.22 0.05 0.11 0.03 0.07
0.3 0.9 0.05 0.53 0.17 37.83 0.12 0.29 0.06 0.15 0.03 0.08 0.02 0.05
0.5 0.1 0.06 0.89 0.19 18.25 0.12 1.31 0.07 0.12 0.03 0.06 0.02 0.04
0.5 0.3 0.12 1.08 0.26 23.20 0.19 0.67 0.12 0.18 0.06 0.10 0.04 0.06
0.5 0.5 0.15 1.11 0.29 8.86 0.21 1.72 0.13 0.20 0.07 0.10 0.04 0.06
0.5 0.7 0.13 1.07 0.26 16.75 0.19 0.87 0.12 0.18 0.06 0.09 0.04 0.06
0.5 0.9 0.06 0.86 0.17 1.47 0.11 0.28 0.06 0.13 0.03 0.07 0.02 0.04
0.7 0.1 0.05 0.56 0.19 14.48 0.13 0.67 0.06 0.14 0.03 0.08 0.02 0.05
0.7 0.3 0.12 0.86 0.24 40.01 0.15 1.47 0.09 0.22 0.05 0.11 0.03 0.07
0.7 0.5 0.15 1.05 0.26 46.67 0.18 0.92 0.11 0.22 0.06 0.12 0.04 0.07
0.7 0.7 0.13 1.20 0.27 49.33 0.19 1.78 0.12 0.22 0.06 0.11 0.04 0.07
0.7 0.9 0.06 1.26 0.19 9.03 0.14 1.55 0.08 0.15 0.05 0.08 0.03 0.05
0.9 0.1 0.05 0.28 0.16 4.39 0.17 1.58 0.15 0.30 0.09 0.15 0.04 0.09
0.9 0.3 0.12 0.52 0.29 10.31 0.22 3.26 0.13 0.47 0.07 0.24 0.04 0.14
0.9 0.5 0.15 0.81 0.28 10.45 0.20 4.10 0.12 0.51 0.07 0.24 0.04 0.15
0.9 0.7 0.14 1.17 0.27 10.56 0.20 2.45 0.12 0.49 0.07 0.22 0.04 0.14
0.9 0.9 0.07 1.76 0.21 8.89 0.15 2.13 0.09 0.34 0.05 0.16 0.03 0.10
Table 7: Root-mean-square error of ^ BPE and ^ IV for true ATE as a function of pT and
AT
pY , in bivariate probit model simulations with covariate X and = 0:3.
30
Root-mean-square error
n = 400 n = 1; 000 n = 3; 000 n = 10; 000 n = 30; 000
pT pY AT E
LAT E
AT E
BP IV BP IV BP IV BP IV BP IV
0.1 0.1 0.07 2.13 0.24 10.15 0.16 13.46 0.09 0.29 0.05 0.16 0.03 0.12
0.1 0.3 0.14 1.19 0.29 9.15 0.20 10.95 0.13 0.47 0.07 0.23 0.04 0.13
0.1 0.5 0.15 0.65 0.31 15.02 0.23 10.69 0.14 0.51 0.08 0.24 0.05 0.15
0.1 0.7 0.12 0.32 0.27 7.99 0.24 22.77 0.18 0.50 0.10 0.24 0.06 0.15
0.1 0.9 0.05 0.15 0.10 4.87 0.09 14.08 0.10 0.32 0.12 0.16 0.11 0.10
0.3 0.1 0.06 1.38 0.22 10.52 0.15 0.31 0.09 0.14 0.04 0.07 0.02 0.05
0.3 0.3 0.13 1.36 0.28 17.03 0.19 0.66 0.12 0.21 0.06 0.12 0.04 0.08
0.3 0.5 0.15 1.10 0.26 35.91 0.18 0.54 0.11 0.23 0.06 0.11 0.04 0.07
0.3 0.7 0.12 0.75 0.23 9.86 0.15 0.49 0.09 0.23 0.05 0.12 0.03 0.07
0.3 0.9 0.05 0.34 0.15 24.94 0.12 0.29 0.09 0.15 0.04 0.09 0.02 0.05
0.5 0.1 0.05 0.75 0.18 17.90 0.12 0.88 0.06 0.12 0.03 0.07 0.02 0.04
0.5 0.3 0.12 1.13 0.26 9.02 0.19 1.10 0.11 0.18 0.06 0.09 0.04 0.06
0.5 0.5 0.15 1.23 0.29 8.71 0.21 1.26 0.13 0.19 0.07 0.10 0.04 0.07
0.5 0.7 0.12 1.12 0.25 7.44 0.18 0.84 0.11 0.18 0.06 0.10 0.03 0.06
0.5 0.9 0.05 0.72 0.18 1.66 0.11 0.51 0.06 0.13 0.03 0.07 0.02 0.04
0.7 0.1 0.05 0.36 0.15 4.83 0.13 0.42 0.08 0.14 0.04 0.08 0.02 0.05
0.7 0.3 0.12 0.76 0.23 19.62 0.15 0.99 0.09 0.22 0.05 0.11 0.03 0.07
0.7 0.5 0.15 1.10 0.26 9.00 0.18 1.87 0.11 0.22 0.06 0.11 0.04 0.07
0.7 0.7 0.13 1.35 0.27 28.15 0.19 2.07 0.12 0.21 0.07 0.11 0.04 0.08
0.7 0.9 0.06 1.34 0.21 6.61 0.15 1.70 0.08 0.15 0.04 0.08 0.02 0.05
0.9 0.1 0.05 0.17 0.10 4.84 0.10 2.63 0.11 0.30 0.11 0.16 0.10 0.10
0.9 0.3 0.11 0.33 0.26 10.61 0.24 3.27 0.18 0.50 0.10 0.24 0.06 0.15
0.9 0.5 0.15 0.66 0.30 12.18 0.23 3.01 0.14 0.54 0.08 0.25 0.05 0.15
0.9 0.7 0.14 1.19 0.28 10.92 0.20 4.55 0.12 0.54 0.07 0.22 0.04 0.13
0.9 0.9 0.07 2.15 0.23 7.39 0.17 3.29 0.09 0.34 0.05 0.17 0.03 0.12
Table 8: Root-mean-square error of ^ BPE and ^ IV for true ATE as a function of pT and
AT
pY , in bivariate probit model simulations with covariate X and = 0:5.
31
Root-mean-square error
n = 400 n = 1; 000 n = 3; 000 n = 10; 000 n = 30; 000
pT pY AT E
LAT E
AT E
BP IV BP IV BP IV BP IV BP IV
0.1 0.1 0.07 2.75 0.25 5.99 0.15 8.13 0.08 0.29 0.04 0.18 0.03 0.15
0.1 0.3 0.14 1.12 0.31 17.42 0.22 18.43 0.13 0.44 0.07 0.23 0.04 0.12
0.1 0.5 0.15 0.42 0.35 12.13 0.29 15.32 0.18 0.53 0.10 0.27 0.06 0.17
0.1 0.7 0.12 0.15 0.22 20.78 0.21 9.53 0.21 0.50 0.18 0.25 0.11 0.17
0.1 0.9 0.05 0.13 0.06 7.74 0.05 13.87 0.05 0.32 0.06 0.17 0.08 0.10
0.3 0.1 0.06 1.34 0.24 2.31 0.16 0.86 0.07 0.14 0.03 0.07 0.02 0.05
0.3 0.3 0.13 1.61 0.27 6.40 0.19 0.63 0.11 0.20 0.06 0.13 0.03 0.10
0.3 0.5 0.15 1.13 0.24 15.60 0.16 0.54 0.10 0.23 0.05 0.12 0.03 0.07
0.3 0.7 0.12 0.56 0.23 38.01 0.16 0.64 0.09 0.23 0.05 0.12 0.03 0.09
0.3 0.9 0.05 0.16 0.11 5.50 0.09 0.39 0.08 0.15 0.07 0.09 0.04 0.06
0.5 0.1 0.05 0.48 0.16 17.87 0.11 0.63 0.06 0.12 0.03 0.07 0.02 0.05
0.5 0.3 0.12 1.17 0.23 5.10 0.16 1.32 0.09 0.17 0.05 0.09 0.03 0.06
0.5 0.5 0.15 1.44 0.28 14.33 0.20 1.18 0.12 0.19 0.07 0.12 0.04 0.09
0.5 0.7 0.12 1.18 0.24 11.22 0.16 0.99 0.09 0.18 0.05 0.09 0.03 0.06
0.5 0.9 0.05 0.48 0.16 5.67 0.10 0.46 0.06 0.13 0.03 0.07 0.02 0.05
0.7 0.1 0.05 0.16 0.10 4.21 0.09 0.46 0.09 0.15 0.07 0.09 0.05 0.06
0.7 0.3 0.12 0.56 0.22 7.76 0.16 0.86 0.09 0.22 0.05 0.12 0.03 0.08
0.7 0.5 0.15 1.12 0.24 4.12 0.17 0.88 0.10 0.22 0.06 0.12 0.03 0.07
0.7 0.7 0.13 1.59 0.27 9.49 0.19 0.78 0.11 0.21 0.06 0.13 0.04 0.10
0.7 0.9 0.06 1.35 0.24 3.27 0.15 0.64 0.07 0.14 0.03 0.08 0.02 0.05
0.9 0.1 0.05 0.14 0.06 15.44 0.06 12.17 0.06 0.29 0.07 0.17 0.07 0.10
0.9 0.3 0.11 0.15 0.21 15.21 0.21 12.75 0.21 0.52 0.17 0.25 0.10 0.16
0.9 0.5 0.14 0.41 0.32 18.70 0.28 8.28 0.19 0.56 0.11 0.27 0.06 0.17
0.9 0.7 0.14 1.09 0.31 26.71 0.21 14.96 0.13 0.49 0.07 0.22 0.04 0.13
0.9 0.9 0.07 2.78 0.25 11.85 0.17 5.20 0.09 0.33 0.05 0.19 0.03 0.15
Table 9: Root-mean-square error of ^ BPE and ^ IV for true ATE as a function of pT and
AT
pY , in bivariate probit model simulations with covariate X and = 0:7.
32
References
Altonji, J., T. Elder, and C. Taber (2005). "Selection on Observed and Unobserved Variables:
Assessing the E¤ectiveness of Catholic Schools," Journal of Political Economy, 113(1):
151184.
Angrist, J. (1991). "Instrumental Variables Estimation of Average Treatment E¤ects in
Econometrics and Epidemiology,"NBER Technical Working Paper No. 0115.
Angrist, J. (2001). "Estimation of Limited Dependent Variable Models with Dummy En-
dogenous Regressors: Simple Strategies for Empirical Practice,"Journal of Business and
Economic Statistics, 19(1): 216.
Angrist, J., and J. Pischke (2009). Mostly Harmless Econometrics. Princeton University
Press, Princeton.
Bhattacharya, J., D. Goldman, and D. McCa¤rey (2006). "Estimating Probit Models with
Self-selected Treatments,"Statistics in Medicine, 25(3): 389413.
Chiburis, R. C. (2010). "Score Tests of Normality in Bivariate Probit Models: Comment,"
Working paper, University of Texas at Austin.
Fagerland, M. W., D. W. Hosmer, and A. M. Bo...n (2008). "Multinomial Goodness-of-...t
Tests for Logistic Regression Models,"Statistics in Medicine, 27(21): 42384253.
Firth, D. (1993). "Bias Reduction of Maximum Likelihood Estimates," Biometrika, 80(1):
2738.
Freedman, D. A., and J. S. Sekhon (2010). "Endogeneity in Probit Response Models," Po-
litical Analysis, 18(2): 138150.
Greene, W. (1998). "Gender Economics Courses in Liberal Arts Colleges: Further Results,"
Journal of Economic Education, 29(4): 291300.
33
Heckman, J. J. (1978). "Dummy Endogenous Variables in a Simultaneous Equation System,"
Econometrica, 46(6): 931959.
Heckman, J. J., and E. J. Vytlacil (1999). "Local Instrumental Variables and Latent Variable
Models for Identifying and Bounding Treatment E¤ects," Proceedings of the National
Academy of Sciences, 96(8): 47304734.
Hosmer, D. W., and S. Lemeshow (1980). "Goodness of Fit Tests for the Multiple Logistic
Regression Model,"Communications in Statistics, 9(10): 10431069.
Imbens, G., and J. Angrist (1994). "Identi...cation and Estimation of Local Average Treat-
ment E¤ects,"Econometrica, 62(2): 467475.
Mo¢ tt, R. A. (2001). "Estimation of Limited Dependent Variable Models with Dummy
Endogenous Regressors: Simple Strategies for Empirical Practice: Comment,"Journal of
Business and Economic Statistics, 19(1): 2023.
Monfardini, C., and R. Radice (2008). "Testing Exogeneity in the Bivariate Probit Model:
A Monte Carlo Study,"Oxford Bulletin of Economics and Statistics, 70(2): 271282.
Murphy, A. (2007). "Score Tests of Normality in Bivariate Probit Models,"Economics Let-
ters, 95(3): 374379.
Osius, G., and D. Rojek (1992). "Normal Goodness-of-...t Tests for Multinomial Models with
Large Degrees of Freedom," Journal of the American Statistical Association, 87(420):
11451152.
Pigeon, J. G., and J. F. Heyse (1999). "An Improved Goodness of Fit Statistic for Probability
Prediction Models,"Biometrical Journal, 41(1): 7182.
34
p = 0.1 p = 0.5 p = 0.9
Y Y Y
0.6 0.6 0.6
p = 0.1
0.4 0.4 0.4
T
0.2 0.2 0.2
0 0 0
0 0.3 0.5 0.7 0 0.3 0.5 0.7 0 0.3 0.5 0.7
0.6 0.6 0.6
p = 0.5
0.4 0.4 0.4
T
0.2 0.2 0.2
0 0 0
0 0.3 0.5 0.7 0 0.3 0.5 0.7 0 0.3 0.5 0.7
0.6 0.6 0.6
p = 0.9
0.4 0.4 0.4
T
0.2 0.2 0.2
0 0 0
0 0.3 0.5 0.7 0 0.3 0.5 0.7 0 0.3 0.5 0.7
Figure 1: AT E (solid lines), LAT E (long dashed lines), and AT T (dotted lines), for the
bivariate probit model (7) with = 0:3, = 0:4, and several values of pT , pY , and . The
circles denote OLS , the probability limit of an OLS regression of Y on T .
35
p = 0.1 p = 0.5 p = 0.9
Y Y Y
400 400 400
p = 0.1
200 200 200
T
0 0 0
0 0.3 0.5 0.7 0 0.3 0.5 0.7 0 0.3 0.5 0.7
400 400 400
p = 0.5
200 200 200
T
0 0 0
0 0.3 0.5 0.7 0 0.3 0.5 0.7 0 0.3 0.5 0.7
400 400 400
p = 0.9
200 200 200
T
0 0 0
0 0.3 0.5 0.7 0 0.3 0.5 0.7 0 0.3 0.5 0.7
Figure 2: Asymptotic variance of ^ AT E (solid lines with circles) and ^ IV (dashed lines with
BP
triangles), for various values of , pT , and pY . For example, an asymptotic variance of 200
means that at a given sample size n, the variance of the estimator is approximately 200=n.
36
p = 0.1 p = 0.5 p = 0.9
Y Y Y
0.5 0.5 0.5
p = 0.1
0 0 0
T
-0.5 -0.5 -0.5
3 4 3 4 3 4
10 10 10 10 10 10
0.5 0.5 0.5
p = 0.5
0 0 0
T
-0.5 -0.5 -0.5
3 4 3 4 3 4
10 10 10 10 10 10
0.5 0.5 0.5
p = 0.9
0 0 0
T
-0.5 -0.5 -0.5
3 4 3 4 3 4
10 10 10 10 10 10
Sample size Sample size Sample size
Figure 3: Spread of BP and IV estimates in simulations with no covariates and = 0:3. The
area between the thin solid curves represents the range between the 5th and 95th percentiles
of the BP estimator, and the area between the thin dashed curves represents the same range
for the IV estimator. The thick solid curve is the mean BP estimate, the thick dashed curve
is the mean IV estimate, and the dotted line is the true ATE.
37
p = 0.1 p = 0.5 p = 0.9
Y Y Y
0.5 0.5 0.5
p = 0.1
0 0 0
T
-0.5 -0.5 -0.5
3 4 3 4 3 4
10 10 10 10 10 10
0.5 0.5 0.5
p = 0.5
0 0 0
T
-0.5 -0.5 -0.5
3 4 3 4 3 4
10 10 10 10 10 10
0.5 0.5 0.5
p = 0.9
0 0 0
T
-0.5 -0.5 -0.5
3 4 3 4 3 4
10 10 10 10 10 10
Sample size Sample size Sample size
Figure 4: Spread of BP and IV estimates in simulations with covariate X and = 0:3. The
area between the thin solid curves represents the range between the 5th and 95th percentiles
of the BP estimator, and the area between the thin dashed curves represents the same range
for the IV estimator. The thick solid curve is the mean BP estimate, the thick dashed curve
is the mean IV estimate, and the dotted line is the true ATE.
38
p = 0.1 p = 0.5 p = 0.9
Y Y Y
0.5 0.5 0.5
p = 0.1
0 0 0
T
-0.5 -0.5 -0.5
3 4 3 4 3 4
10 10 10 10 10 10
0.5 0.5 0.5
p = 0.5
0 0 0
T
-0.5 -0.5 -0.5
3 4 3 4 3 4
10 10 10 10 10 10
0.5 0.5 0.5
p = 0.9
0 0 0
T
-0.5 -0.5 -0.5
3 4 3 4 3 4
10 10 10 10 10 10
Sample size Sample size Sample size
Figure 5: Spread of BP and IV estimates in simulations with covariate X and = 0:3 and
skewed error terms. The area between the thin solid curves represents the range between
the 5th and 95th percentiles of the BP estimator, and the area between the thin dashed
curves represents the same range for the IV estimator. The thick solid curve is the mean
BP estimate, the thick dashed curve is the mean IV estimate, and the dotted line is the true
ATE.
39
p = 0.1 p = 0.5 p = 0.9
Y Y Y
1 1 1
p = 0.1
0.8 0.8 0.8
T
0.6 0.6 0.6
3 4 3 4 3 4
10 10 10 10 10 10
1 1 1
p = 0.5
0.8 0.8 0.8
T
0.6 0.6 0.6
3 4 3 4 3 4
10 10 10 10 10 10
1 1 1
p = 0.9
0.8 0.8 0.8
T
0.6 0.6 0.6
3 4 3 4 3 4
10 10 10 10 10 10
Sample size Sample size Sample size
Figure 6: Coverage of the true AT E for nominal 95% con...dence intervals in simulations with
normally distributed covariate Xi and = 0:3. The solid and dashed curves correspond to
the size of tests based on ^ BPE and ^ IV , respectively. The poor coverage can be improved
AT
by bootstrapping the critical values, and the size from bootstrapping for ^ BPE and ^ IV is
AT
shown by the starred solid and starred dashed curves, respectively.
40
p = 0.1 p = 0.5 p = 0.9
Y Y Y
1 1 1
p = 0.1
0.8 0.8 0.8
T
0.6 0.6 0.6
3 4 3 4 3 4
10 10 10 10 10 10
1 1 1
p = 0.5
0.8 0.8 0.8
T
0.6 0.6 0.6
3 4 3 4 3 4
10 10 10 10 10 10
1 1 1
p = 0.9
0.8 0.8 0.8
T
0.6 0.6 0.6
3 4 3 4 3 4
10 10 10 10 10 10
Sample size Sample size Sample size
Figure 7: Coverage of the true AT E for nominal 95% con...dence intervals in simulations with
no covariates and = 0:3. The solid and dashed curves correspond to the size of tests based
on ^ BPE and ^ IV , respectively. The poor coverage can be improved by bootstrapping the
AT
critical values, and the size from bootstrapping for ^ BPE and ^ IV is shown by the starred
AT
solid and starred dashed curves, respectively.
41
p = 0.1 p = 0.5 p = 0.9
Y Y Y
1 1 1
p = 0.1
0.5 0.5 0.5
T
0 0 0
3 4 3 4 3 4
10 10 10 10 10 10
1 1 1
p = 0.5
0.5 0.5 0.5
T
0 0 0
3 4 3 4 3 4
10 10 10 10 10 10
1 1 1
p = 0.9
0.5 0.5 0.5
T
0 0 0
3 4 3 4 3 4
10 10 10 10 10 10
Sample size Sample size Sample size
Figure 8: Power (rejection probability) of 5%-level Murphy score (solid curves) and adapted
Hosmer-Lemeshow (dashed curves) goodness-of-...t tests for normality in simulations with
covariate X and = 0:3 and skewed error terms.
42