POLICY RESEARCH WORKING PAPER 1553. Spatial Correlations A correction for spatial correlation in panel data. in Panel Data John Driscoll Aart Kraay The World Bank Policy Research Department Macroeconomics and Growth Division l)ecember 1995 I POLICY RESEARCH WORKING PAPER 1553 Summary findings In many empirical applications involving combined time- inference procedures that combine time-series and cross- series and cross-sectional data, the residuals from sectional data since these techniques typically require the different cross-sectional units are likely to be correlated assumption that the cross-sectional units are with one another. This is often the case in applications in independent. When this assumption is violated, estimates macroeconomics and international economics where the of standard errors are inconsistent, and hence are not cross-sectional units may be countries, states, or regions useful for inference. And standard corrections for spatial observed over time. "Spatial" correlations among such correlations will be valid only if spatial correlations are cross-sections may arise for a number of reasons, ranging of particular restrictive forms. from observed common shocks such as terms of trade or Driscoll and Kraay propose a correction for spatial oil shocks, to unobserved "contagion" or "neighbor- correlations that does not require strong assumptions hood" effects which propagate across countries in concerning their form - and show that it is superior to a complex ways. number of commonly used alternatives. Driscoll and Kraay observe that the presence of such spatial correlations in residuals complicates standard This paper - a product of the Macroeconomics and Growth Division, Policy Research Department - is part of a larger effort in the departmentto study international macroeconomics. Copies of thepaperare available free from theWorld Bank, 1818 H Street NW, Washington, DC 20433. Please contact Rebecca Martin, room Nl 1-059, telephone 202-473-9065, fax 202-522-3518, Internet address rmartinl@worldbank.org. December 1995. (28 pages) The Policy Research Working Paper Series disseminates the findings of work in progress to encourage the exchange of ideas about development issues. An objective of the series is to get the findings out quickly, even if the presentations are less than fully polished. The papers carry the names of the authors and should be used and cited accordingly. The findings, interpretations, and conclusions are the authors' own and should not be attributed to the World Bank, its Executive Board of Directors, or any of its member countries. Produced by the Policy Research Dissemination Center Spatial Correlations in Panel Data' John Driscoll Department of Economics Brown University, Box B Providence RI 02912 jd@econ.pstc.brown.edu and Aart Kraay The World Bank 1818 H Street NW Washington DC 20433 akraay@worldbank.org We would like to thank John Campbell, Greg Mankiw, Matthew Shapiro and especially Gary Chamberlain and Jim Stock for helpful comments and suggestions. Financial support from the Earle A. Chiles Foundation (Driscoll) and the Social Sciences and Humanities Research Council of Canada (Kraay) during work on earlier drafts of this paper is gratefully acknowledged. 1 Introduction Economists are frequently faced with the problem of drawing inferences from data sets which combine cross-sectional and time-series data. In such situations, it has become standard practice to base inferences on techniques which pool the cross-sectional and time-series dimensions in some way. For such techniques to be valid, it must be the case that the error terms are not correlated across different cross-sectional units, either contemporaneously or at leads and lags. This condition is directly analogous to the usual requirement that the residuals from different observations in a single cross-sectional regression be independent of each other. If this condition is not met, estimates of standard errors will be inconsistent, and will not be useful for inference. This paper begins with the observation that in many applications, especially in macroeconomics and international economics, the assumption of independent cross-sectional units is inappropriate. While it may be reasonable to assume that cross-sectional units are independent when they are households or individuals chosen according to a well-designed sampling scheme from a large population, this assumption becomes less tenable when the cross-sectional units are countries or regions. Countries or regions are likely to be subject to observable and unobservable common disturbances which will cause the residuals from one cross-section to be correlated with those of another. We will refer to such cross-sectional correlations as "spatial correlations" Spatial correlations may arise for a number of reasons. For example, in applications in which real GDP growth rates are the dependent variable, various channels of interdependence such as trade, capital flows or policy coordination mechanisms will induce cross-country correlations in GDP growth rates.' Unless the regressions of interest include right-hand side variables which correctly specify these channels of interdependence, the residuals from these regressions will be correlated across countries. Similarly, in studies of capital flows to developing countries, common external shocks such as US interest rates, or else unobserved contagion effects ' See Kraay and Ventura (1995) for a discussion of the roles of trade and capital mobility in the synchronization of GDP growth rates across countries. Ades and Chua (1993) and Easterly and Levine (1995) provide empirical evidence that policies tend to be correlated among neighbours, leading to correlations of growth rates over long horizons. 1 (sometimes dubbed "tequila" effects in aftermath of the Mexican peso crisis) can cause residuals from capital flows regressions to be correlated across countries. A number of standard corrections for spatial correlations exist, all of which require strong assumptions regarding the form of the spatial correlations. For example, it is common to include time dummy variables in pooled time-series, cross-sectional regressions to capture the effect of common disturbances. This technique is the appropriate correction for spatial correlation only if one assumes that the contemporaneous correlations between any pair of cross-sectional units are equal, and the lagged cross-sectional correlations are zero. Unfortunately, such strong restrictions on the form of the spatial correlations are unlikely to be correct in most applications. For example, different countries may react differently to common disturbances, or contagion effects may spread across countries only after a lag. When the structure of the spatial correlations is misspecified in this way, the properties of the resulting estimator are in general unknown. Since it is not desirable to impose restrictions on the form of the spatial correlations, it is less clear how to proceed. One alternative is to attempt to parametrically estimate the full unrestricted matrix of spatial correlations for use in a feasible generalized least squares (FGLS) procedure. This procedure, which is a variant of the Seemingly Unrelated Regressions (SUR) technique, will only be effective in a limited set of applications. To see why this is so, suppose that there are N cross-sectional units and T time-series observations. The NxN matrix of contemporaneous cross-sectional correlations has N(N+1)/2 free parameters to be estimated using the NT available observations. Thus, in order to obtain reliable estimates of the matrix of spatial correlations, it must be the case that T>>(N+1)/2. However, in many cross-country applications using annual data, there are many more countries in the sample of interest than there are time- series observations, so this approach will be infeasible. In this paper we propose an alternative correction for spatial correlation. Building on the non-parametric heteroskedasticity and autocorrelation consistent (HAC) covariance matrix estimation technique of Newey and West (1987) and Andrews (1991), we show how this approach can be extended to a panel setting with cross-sectional dependence, in addition to serial correlation and heteroskedasticity. We present very weak conditions on the form of the cross- sectional and time-series dependence under which a simple variant on the Newey and West 2 estimator yields consistent estimates of standard errors. In particular, we can obtain consistent estimates of standard errors in the presence of arbitrary contemporaneous cross-sectional correlations, as well as lagged cross-sectional correlations which are restricted to become small only as the time interval separating the two observations becomes large. This very general structure is likely to encompass most forms of spatial correlations encountered in practice. Our results on consistency are based on asymptotic theory which requires the time dimension, T, to tend to infinity. Thus, our results will only be relevant for panel data sets in which the time dimension is reasonably large (our Monte Carlo simulations suggest that a value of T=20 or T=25 is the minimum). However, our results do not place any restrictions on the size of the cross-sectional dimension, N, and we can even allow the extreme case in which N tends to infinity at any rate relative to T. This implies that our techniques, in contrast to SUR, will be applicable in situations such as cross-country panel data sets where the number of countries is very large. The rest of this paper proceeds as follows. In Section 2, we first develop the intuitions for our results using a simple ordinary least squares example. We then provide a formal statement of our result, using a mixing random field structure to characterize the permissible extent of cross- sectional and time-series dependence. Since this structure is somewhat unfamiliar, we provide some examples of forms of cross-sectional dependence which satisfy the conditions we impose. In Section 3, we consider the finite-sample properties of our estimator using Monte Carlo evidence, and find that our non-parametric estimator performs significantly better than common alternatives such as time dummies or SUR. Section 4 concludes. 3 2 Consistent Covariance Matrix Estimation with Spatial Dependence 2.1 Preliminary Discussion In order to develop the intuition for the results of this paper, consider the following simple bivariate linear panel regression: Yit= xj: + eit 1 , ...,N, T =I ... 1 { E[TE,Ej] } = To obtain an estimate of ,, it is common practice to pool the cross-sectional and time-series observations and apply OLS to the full set of NT observations. If the errors are independently and identically distributed (i.e. if Q =U2'lr), this will yield consistent estimates of 3 and its standard errors. However, in the presence of spatial correlations, Q is no longer diagonal. In this case, although the OLS estimator of ,B is still consistent, the OLS standard errors will be inconsistent, and hence will not be useful for inference. We can write the OLS estimator of P in the usual way as follows: rT N T/SE E xj,e1t /T(DOOL j3) ; N(2) { I S EX3} NT To simplify the above expression, denote the term in brackets in the denominator of (2) as QT 2, and define N ht E Sxj,ei, ~~~~~~~~(3) 2 For the purposes of this illustrative example, we can assume that the x, are constants and that QT- Q>O as N,T- 4 Substituting into the expression for the OLS estimator, we obtain - IOO =11: ht (4) QTTrTt=i This change of variables is useful because it reduces the original panel data estimation problem to a simple time-series estimation problem. In other words, by defining a cross-sectional average h4 at every point in time, we have "collapsed" the cross-sectional dimension of the problem to a single time-series observation by averaging over the N cross-sectional units in each period. Since OLS estimates of P will be consistent even in the presence of spatial correlations, our main concern is with obtaining consistent estimates of the variance of the OLS estimator. Using the above notation, we can write this variance in terms of the h, as VT ! Z-EE[hthj = -S5 VT = -T E (Q) The main intuition of the paper is as follows. Given appropriate conditions on the h,, we can apply standard time-series non-parametric covariance matrix estimation techniques such as those employed by Newey and West to obtain a consistent estimate of ST, and hence of VT. These conditions (known as "mixing conditions" in the standard time-series literature) place restrictions on the autocovariances of the h1, requiring the dependence between h, and h, , to become small as the time interval separating them, s, becomes large. Imposing restrictions on the autocovariances of h, will amount to placing restrictions on the contemporaneous and lagged spatial dependence in the residuals, E[EEj,,e] ., since the autocovariances of the sequence h, are a weighted average of these covariances, i.e. N N E[htht-s] = N2 1 Xxy t _E[Eey tj (6) In this paper, we show that only very weak restrictions on the form of the spatial correlations are 5 required to ensure that h1 satisfies the regularity conditions necessary for consistent estimation of ST. In particular, we can permit arbitrary contemporaneous correlations, and we require only that lagged cross-sectional dependence declines at a particular rate as the time separation becomes large. As in Newey and West (1987), our asymptotic results rely on a large time dimension. However, we do not need to restrict the size of the cross-sectional dimension, which can tend to infinity at any rate relative to T. We use a mixing random field structure to characterize the permissible extent of spatial and temporal dependence. As mixing random fields are somewhat unfamiliar in the econometrics literature, we briefly present the necessary intuitions here, and relegate the details to the appendix. Random fields are simply random variables with multiple indices. For example, returning to Equation (3), we can define the random field N,t=x,,Ejt, indexed by i and t. In the standard univariate time-series literature, a time series is described as "mixing" if the dependence between two random variables x, and x,-, becomes small as the time interval separating them, s, becomes large. In this paper, we will analogously describe a random field as being "mixing" if the dependence between h-, and h becomes small as the time interval s becomes large, for any pair of cross-sectional observations i and j.3 In this way, the standard time-series definition of mixing corresponds to the special case where i=j. Finally, the "size" of a mixing is defined as the rate at which the dependence between two observations must decline as a function of the distance between them. This particular definition of a mixing random field has the extremely useful property that the cross-sectional averges of this random field, h1 (as defined in Equation (3)), form a univariate This definition of mixing departs from the standard definitions in the random field literature in that it treats the cross- sectional and timne-series dimensions asymmetrically. Typically, mixing restriction would require the dependence between hN, and h,,, to become small as the Euclidean distance d=((i-j)2+s2)"2 between these two random variables becomes large. This is an unattractive property of standard definitions of mixing random fields for two reasons. First, in our panel data applications, it precludes canonical forms of cross-sectional dependence such as equal contemporaneous cross-unit correlations. To see why this is so, notice that the distance between h, and h, is simply li-jl according to the above definition. Standard definitions of mixing would then rule out equal cross-sectional correlations between any hN and hj, since this correlation will not decline as ji-l becomes large. The second problem is that in order to impose the restriction that observations "far apart" in the cross-sectional ordering be approximately uncorrelated, it is necessary to know what the cross-sectional ordering is. This is problematic, since unlike in the time dimension, in most cases there is no natural ordering in the cross-sectional dimension. 6 mixing sequence of the same size as the underlying random field. This is true for any value of N (the size of the cross-sectional dimension), including the limiting case where N--. If we impose the restriction that the hit form a mixing random field of the appropriate size, then h, will be a mixing sequence of the same size, and we can directly apply standard time-series covariance matrix estimation techniques to obtain an estimate of ST in Equation (5). Thus, our results amount to a simple extension of the Newey and West estimator, which may be viewed in the above context as the case in which N= 1. 2.2 Results In this section, we present our main result, which is simply a generalization and formalization of the discussion of the previous section. The theorem is stated in terms of a broad class of Generalized Method of Moments estimators, of which the OLS case discussed above is an example. 7 Theorem Consider the class of GMM models identified by a pxl vector of orthogonality conditions E[/4(00, zd)]=O, where 019F6& is an axl vector of parameters with asp, t9is a compact subset of R, z1, is a kxl vector of data, and denote z,=(zi .'. ,zv,) 'and h,=h(O, z) =NA`Xi=,Mb(0, z,) Supposefurther that (1) z, is an a-mixing random field of size 2(r+ o)/(r+ 6-1), as defined in the Appendix; (2) (a) #r(d, z) is continuously differentiable in 0 and measurable in z,,; (b) E[IVI(fX, zJ 14r+°l] IP[F,nF2] - P[F,]P[F2] A mixing random field is defined as follows: Definition: A random field is mixing of size r/(r-l), r> I iffor some A>r/(r-l), a(S) =O(sA). " Random field structures have been developed extensively in the statistics literature. See Rosenblatt (1970), Deo (1975), Bolthausen (1982), and Bulinskii (1988). Some economic applications include Wooldridge and White (1988), Quah (I1990), and Conley (1994). 12 It is straightforward to extend these definitions and the results which follow to ¢-mixing random fields by defining 0- mixing coefficients in the usual way. 16 This definition of mixing departs from the more standard a-mixing structures on random fields in that it treats the cross-sectional dependence differently from the time-series dependence. Most definitions of mixing13 restrict the dependence in both dimensions symmetrically, requiring the dependence between two observations to decline as either the distance in the cross-sectional ordering becomes large, or as the time separation becomes large (see, for example, Quah (1990)). This restriction on the dependence across units is required to deliver (NT)' asymptotic normality for double sums over i and t of the e1,, just as in the one-dimensional case restrictions on the temporal dependence are required to deliver T"2 asymptotic nornality for appropriately normalized sums.'4 The definition of mixing presented here, however, does not restrict the degree of cross- sectional dependence. Instead, we only require the dependence between Ei1 and Ejt, to be small when s is large, for any value of i and j. This is a desirable property, since it will not preclude canonical forms of cross-sectional dependence, such as factor structures in which cross-sectional units may be equicorrelated in a given time period or grouped structures in which observations are correlated according to possibly unobservable group characteristics. This greater permissible cross-sectional dependence comes at the cost that it will not be possible to obtain (NT)' asymptotics for double sums over i and t of the Ei, However, we do not require this as we rely exclusively on T112 asymptotics for this double sum. A useful property of this random field structure is that the sequence of cross-sectional averages of the Ei, forms a univariate a-mixing sequence, as summarized in the following lemma:. S see Doukhan (1994) for an extensive survey of mixing in random fields and in other contexts. " For such random fields, (NT)* asymptotics typically require N and T to go to infinity at the same rate, suggesting that in finite sample applications, the cross-sectional and time-series dimension must be roughly equal for asymptotic approximations to be plausible. For example, Quah (I 990) has the restriction that T=iN. We do not require this restriction in our asymptotic theory. 17 Lemma Suppose that E,, is an t-mixing randomfield of size r/(r-I), r>l. Then =N, hr = -E Ei, is an a-mixing sequence of the same size as Ej,for any N. Proof The proof is simply a matter of verifying that h, satisfies the definition of univariate mixing. Define B,={sls sup < GecS't, G2Ec9,+,> I P[Gjn G2]- P[G,]P[G2] |. Now we claim that cS-t.t and c9§ Y+[c%t+ . Given this claim, we have ah(s)< a(s) Vs, and hence ah(s) converges at least as quickly as a(s). Thus the sequence ht is mixing of the same size of E&t. To verify the claim, note that hi:Q-R' is a Borel function of (Ejtji=l,...,N,.. }, and hence is o(Ej Ii=1,...N,...)-measurable, Le. h)'(C)co(Ejtji=l,...,N,..) where (3 is the a-algebra generated by the Borel sets. Thus by defimition o(h.)=o(h)('(3)) ca(c1tIi=1,..,N,..). Finally, note that co(Us= a(hj)) and =o (U,=-,'Jo( ej| i=1,..,N,.. )), and so the claim is verified. This lemma is useful, as it permits us to move from restrictions on temporal and spatial dependence in the random field to simple mixing restrictions on the univariate sequence of cross- sectional averages, h,. 18 Proof of Theorem To prove consistency and asymptotic normality of the GMM estimator, we will verify the conditions in Hamilton (1994), Proposition 14.1. Consistency of the covariance matrix estimator will follow from the arguments of Newey and West (1987). To verify consistency of the GMM estimator (Hamilton, Proposition 14.1, Condition (a)), we need only verify conditions for the consistency of extremum estimators (for example, Amemiya (1985), Theorem 4.1.1). Conditions A and B of Amemiya (1985), Theorem 4.1. I follow immediately from the compactness of e and Assumption 2(a). Condition C of this theorem requires the minimand in the GMM problem to converge uniformly in Oee. This condition will be satisfied if the sequence h=h(O, z,) obeys a LLN for all 60E3. Since *r is a measurable function of Z,, it is a mixing random field of the same size as z;, by an argument similar to the one use to prove Lemma 1. By Lemma 1, 4 is a univariate a-mixing sequence of size 2(r+8)/(r+o-1) > r/(r-1). Thus, to apply the McLeish (1975) LLN (See White (1984), Theorem 3.47) for a-mixing sequences of size r/(r-1), we need only verify that 4 has finite (r+6)th moments. However, to prove consistency of the covariance matrix estimator, we will require the stronger moment condition that E[Ihj4(r+6)] O (15) uniformly in a as T-o. To verify this, observe first that the mixing property of h, allows us to bound the autocovariances of h, in the usual way. That is, for s>O, we have I E[hl4h,j I < a(s)A where a(s)=O(s<('+6) (White, (1984), Corollary 6.16). This corollary requires h, to be an a- mixing sequence of size (2+2T)/rj, rq>O with E[ I htI 2+2,,] Ie |s EI| W(s,M) E Ztl > E| 4 E zts > (22) S ( Cm )2 TA( AZC2m(J)3 2E2T 22 This final term converges to zero by the assumption that m(T)=o(T"3). Consistency of the first term follows immediately from the consistency of the OLS estimator and the final paragraph in Newey and West (1987). 23 Spatial Correlations with Factor Structure Representations The following corrolary to the theorem verifies the claim made in Section 2.3 that it is possible to obtain consistent covariance matrix estimates in the presence of spatial correlations which have a factor structure representation. Corrolary Suppose that y,,=xjtp+l , , with Ef, =ff,'2i + v, f = (f,, . fm,) 'and x,,=g,'KX+uU. g, = (gj,, ..., gp,) 'where A, and Ki are Mxl and Pxl vectors of uniformly bounded constant factor loadings and M and P are finite constants. Suppose further that fmt l f., andf , i vi, Vm, n, m on and Vt, and that gmt £ gnt and g., £ uit Vm, n, m on and Vt, and that ELfJ=E[g^J =0 and Effm,/]=E[g., = 1 Vm, t. Suppose further that Vij, t and m (1) (a) (ft 'g, 'is an a-mixing sequence of size 2(r+ o5)/(r + A-l) for r> I and some 6>0; (b) Efvij=E[uJ=O andv, 1 v,, and u,, i u>,,for s#0; (2) (a) E[x,,ej=0; (b) E[Lfm,xit,(r+ 4)< oo