WPS3932
WELFARE MEASUREMENT BIAS IN
HOUSEHOLD AND ON-SITE SURVEYING
OF WATER-BASED RECREATION:
AN APPLICATION TO LAKE SEVAN, ARMENIA
Craig Meisner
and
Hua Wang
Development Research Group
The World Bank
and
Benoît Laplante
Independent Consultant, Montreal, Canada
Keywords On and off-site sampling, recreation demand, zero-inflated models, truncated count data
models, endogenous stratification, Armenia.
World Bank Policy Research Working Paper 3932, June 2006
The Policy Research Working Paper Series disseminates the findings of work in progress to encourage the
exchange of ideas about development issues. An objective of the series is to get the findings out quickly,
even if the presentations are less than fully polished. The papers carry the names of the authors and should
be cited accordingly. The findings, interpretations, and conclusions expressed in this paper are entirely
those of the authors. They do not necessarily represent the view of the World Bank, its Executive Directors,
or the countries they represent. Policy Research Working Papers are available online at
http://econ.worldbank.org.
Correspondence should be addressed to: Craig Meisner, MC2-205, World Bank, 1818 H Street, NW,
Washington, DC 20433, cmeisner@worldbank.org.
I. Introduction
Several recent travel cost studies have aimed to compare recreational benefits
derived from household and on-site surveys (e.g. Loomis, 2003; Shaw, 2003). If it can be
shown that welfare estimates derived from cost-effective on-site surveying techniques are
similar to household survey results, this may justify using on-site surveys in lieu of large
and costly population-based surveys. However, a robust comparison of estimates
obtained from each sample requires addressing a number of important statistical issues.
In particular, household survey demand is typically censored due to the possibility of
observing a large number of zeros (or non-users of the site). Simply treating all zeros in
the sample as users of the site introduces an upward bias of the demand and welfare
measures. On the other hand, on-site sample demand is truncated at one since it surveys
only users at the site. In this case, estimates are prone to higher standard errors and an
upward bias from over-sampling individuals whose characteristics may be correlated with
higher trip frequencies (endogenous stratification - ES).
In the case of household surveys, it is possible to resolve the issue by separating
the recreation `participation' decision from the trip `quantity' decision, thus reducing the
bias introduced by non-users of the site. In the case of on-site surveys, it is possible to
correct for the potential bias by providing adjustments to the distribution function (Shaw,
1988; Englin and Shonkwiler, 1995). To our knowledge, none of the existing travel cost
studies have attempted to correct for both biases when conducting comparative analyses
of estimates obtained from household and on-site surveys.1
In this paper, we test the proposition of whether the household and on-site
demand estimation yield similar welfare measures, after accounting for both biases
discussed above. For this purpose, we use a household and on-site survey conducted at
Lake Sevan, Armenia. This single-site comparison has two advantages. First, as the site
is unique, we avoid problems of having to incorporate substitute sites into the decision to
1Loomis (2003) does not discuss the prevalence of zeros in his comparative household sample, and does
not consider their relative influence on expected trip demand or welfare.
1
recreate. Second, since we are not valuing a change in the quality of the lake, we also
avoid any quality change impacts on expected trip demand.
The household survey consisted of 3,358 households across Armenia, and the on-
site survey of 389 tourists recreating at Lake Sevan. Travel cost models were constructed
and estimated using travel expenditure and socio-demographic information contained in
each survey. As visitation rates in the household survey contained a large percentage of
zeros and the presence of over-dispersion in trip frequency, a zero-inflated negative
binomial model (ZINB) was estimated. For the on-site survey, two truncated negative
binomial models were estimated with and without an adjustment for endogenous
stratification (ES).
Likelihood ratio tests for over-dispersion were rejected in favor of the negative
binomial specification in both the household and on-site models. Results from the
household model also reveal that the participation decision is indeed relevant to the
household's recreation decision. However, in the case of the on-site sample, estimated
coefficients for the ES and non-ES models were not significantly different. This may
suggest that characteristics from the on-site sample are representative of the household
sample. Other studies have found similar results where accounting for ES did not yield
any significant differences in trip demand or welfare (Ovaskainen et al., 2001; Englin et
al., 2003). Per trip consumers surplus was estimated to be $8.82 for the household
sample, $8.73 for the on-site model without ES adjustment, and $8.21 with ES.
The remainder of this paper is structured as follows. The next section provides a
description of travel cost and count data models utilized in this study along with
recommendations of how to remedy several dependent variable issues typically
encountered with household and on-site recreational surveys. In Section III, the two
surveys are described in more detail. In Section IV, the results of estimation are
presented, along with a comparison in expected trip demand and estimated welfare
measures. Section V provides a brief summary and discussion of the findings.
2
II. Travel Cost Modeling
In travel cost modeling, the decision to recreate is typically modeled as a latent
demand, yi , representing the number of trips taken in one year as a function of travel cost
*
(P), site quality attributes (Z) and individual demographic characteristics (X):
Tripsi = yi = f (Pi, Xi, Zi) + i
* i = 1, 2,..., N (1)
Travel cost-modeling (TCM) can be implemented through household or on-site surveys.
However, each sampling method involves a number of different statistical issues.
(i) Household survey
An important modeling issue when applying TCM pertains to the treatment of
non-negative integers observed in individual recreational data, as one may encounter a
large proportion of zeros in a general household survey (Shaw, 1988; Grogger and
Carson, 1991; Hellerstein, 1991). Observing a zero implies that the services from the site
do not enter into the utility function of the individual. In the utility maximization
framework, it implies that the individual is currently at some choke price where he is
consuming zero trips, and that if the current "market" price were to fall below the choke
price, the individual would demand a positive number of trips. However, one may also
observe a zero if for some reason (such as age, health-related reasons, etc.) services from
the site would never enter an individuals' utility function (Habb and McConnell, 1996).
Thus, there is an important distinction between observing zeros for those who are
participants and for those who are non-participants (`true zeros'). Standard count data
models such as the Poisson or negative binomial assume that all individuals surveyed are
potential users of the good in question, and that the same variables influence all potential
users similarly. In the presence of a large number of zeros, and where the participation
question is relevant, this assumption may not be valid and should be tested for its
significance.
3
To account for the participation issue, we consider two augmented count data
models which account for the presence of a large number of zeros - the zero-inflated
Poisson (ZIP) and zero-inflated negative binomial (ZINB) (Mullahy, 1986; Lambert,
1992; Greene, 1994; Haab and McConnell, 1996). By distinguishing between
participants and non-participants, the zero observations may contain valuable
information, and a gain in efficiency will be achieved by including all of the observations
(Haab and McConnell, pg. 90).2 Empirically, zero-inflated count models change the
mean structure to allow zeros to be generated by two distinct processes, one for the
participation decision (logit or probit) and one for the mean number of trips (count
model).3 By expanding the standard count model to allow for individual-specific
characteristics which may keep an individual from entering the recreation market, one
can separate factors which influence the participation issue from those that influence the
quantity of trips taken to a recreation site (Haab and McConnell, 1996). In estimation, the
ZIP model allows for over-dispersion in the Poisson data generating process by allowing
a mass of zero observations independent of the true Poisson process.
The distribution function for the ZIP model is:
Pi + (1- Pi)e- i if yi = 0,
Pr(yi | xi) =
(1- Pi ) e- i
i yi
otherwise. (2)
yi!
where E(yi) = (1 - Pi)i, Var(yi) = (1 - Pi)(1 + Pii)i, and Pi is the probability of zero
visitation, with mean i = exp(xi). Note that in this formulation, zeros can occur in
either the binomial process (when yi = 0) or the Poisson process (when yi 1), since exp(-
i)i /0! = exp(-i). Again, i can be modeled as exp(xi), and Pi as g(zi), where is a
0
vector of participation-decision parameters and zi is a vector of explanatory variables that
may or may not be the same as those for the quantity decision, xi. The function g(·) can
be modeled using either logit or probit (or cumulative standard normal) function as they
2In the past, one crude option was simply to drop the zeros from the sample.
3The zero-inflated models differ from the Heckman continuous two-stage model as they allow for zero
observations in the second stage of the decision process (in the mean model).
4
both give similar results. In the presence of over-dispersion4 (variance>mean), the
participation decision can be similarly decomposed in a zero-inflated negative binomial
model as:
Pi + (1- Pi)1+ 1
1
i if yi = 0,
Pr(yi | xi) =
1
(1- Pi ) (yi +1)(1 )1+i
(yi +1 ) 1 1 i yi
otherwise. (3)
+ i
where E(yi) = (1 - Pi)i and Var(yi) = (1 - Pi)[1 + i( + Pi)]i. The presence of the
parameter in the calculation of the conditional variance of y (if greater than 0), guarantees
that the variance is greater than the mean. As 0, the moments of the distribution
converge to a Poisson distribution and so testing for =0 provides a case for selecting the
negative binomial over the Poisson, and indirectly for the presence of over-dispersion.
The flexibility of modeling the participation decision in this manner has lead to a
number of interesting applications in recreational demand analysis, including beach trips
(Shonkwiler and Shaw, 1996; Haab and McConnell, 1996), rock climbing (Shaw and
Jakus, 1996), lake recreation, (Gurmu and Trivedi, 1996), water-based recreation (Curtis,
2003), and angling site choice (Scrogin et al., 2004).
(ii) On-site sampling
Interview surveys conducted on-site obviously avoid the non-participation issue,
but as the dependent variable yi is strictly non-zero, the truncated demand relationship
4An undesirable feature of Poisson count models is the assumption that the conditional mean and variance
are equal (Yen and Adamowicz, 1993). This is especially problematic in empirical research because
conditional variances are typically greater than conditional means in socio-economic data (also known as
over-dispersion, a form of heteroskedasticity). The presence of over-dispersion still allows for consistently
estimated means of parameter estimates (Gourieroux et al. 1984), but causes the standard errors of these
estimates to be biased downward, resulting in erroneous tests of their statistical significance (Cameron and
Trivedi, 1986). The equality of the mean and the variance property of Poisson count models led to the
development of negative binomial models (Hausman et al., 1984). This model allows for over-dispersion
by combining the Poisson distribution with a gamma distribution and hence allowing for heterogeneity to
be gamma distributed.
5
measures only those with smaller error terms. In addition, because the sample is on-site,
there is a higher likelihood of intercepting a person whose characteristics are correlated
with higher trip frequencies, or what is known as `endogenous stratification' in sampling.
The implication is that the sample is not representative of the population at large, and in
measuring welfare effects, consumers surplus estimates will be biased upwards as it is
only capturing the effect of avid recreationists.
Truncation and endogenous stratification was first explored by Shaw (1988) in the
case of the Poisson distribution and extended by Englin and Shonkwiler (1995) to the
negative binomial distribution. The basic implication is to weight individual observations
by the inverse of the expected value of trips. Assuming that the density function of the ith
person in the population is f(yi*|xi), Shaw (1988) shows that the density function of the
same person in the on-site population is:
Pr(yi | xi) = yi f (yi | xi ) (4)
t f (t | xi)
t=1
If the conditional density f(yi*| xi) is chosen to be Poisson with the location parameter i,
then the on-site sample's density function is:
Pr(yi | xi) =e- i
i yi -1
(5)
(yi -1)!
where E(yi | xi) = i + 1 and Var(yi | xi) = i. Defining wi = yi - 1, the standard Poisson
model can be estimated, substituting wi for yi in (5) above.
In the presence of over-dispersion, the equality of the mean and variance is
violated and thus the negative binomial model is preferred with the following density
function (Englin and Shonkwiler, 1995):
1
Pr(yi | xi) =(yi +1)(1 )1+i
yi(yi +1 ) 1 1+i yi
yii-1 (6)
6
where E(yi | xi) = i + 1 + ii and Var(yi | xi) = i(1 + i + ii + i i). As the 2
specification in (6) cannot be transformed into any simpler form as in the case of the
truncated Poisson, the likelihood function must be programmed directly into a likelihood
maximization routine. The log likelihood function used in this context is:5
ln L = y N
i=1 ln yi + ln((yi +1/ )) - ln((yi +1)) - ln((1/ )) + (7)
iln + (yi -1)lni - (yi +1/ )ln(1+ i)
Defining i as the expected number of person-day-trips6 individual i takes to the site in a
year, the empirical demand relationship can be defined as:
i = exp(Xi + i) = exp(ppi + xi + i) i = 1,...,n (8)
where is a K x 1 vector of parameters, Xi is a 1 x K vector of explanatory variables for
individual i, pi is the travel cost for individual i to the site, xi is the 1 x K 1 vector of
explanatory variables after pi is subtracted from Xi, p is the parameter on travel cost, and
is the remaining vector of parameters corresponding to xi.
(ii) Welfare measures
The benefit (consumer surplus) of access to the site is defined as the area under
the estimated Marshallian demand curve specified in (8) and above the current price
level. By integrating the demand function over travel costs (prices) faced by individuals,
we calculate expected consumers surplus as:
E (CSi) = i dP = - i / p (9)
where i is as defined in (8) and p is the estimated parameter on travel cost. Summed
across all i, the area measures the total per trip willingness-to-pay by all individuals to
recreate at the site. In the case of the ZINB model expected consumers surplus must be
weighted by the probability of zero visitation (1 - Pi), where Pi is a function of variables
5 The likelihood function in (7) was entered into a modified zero-truncated negative binomial maximum
likelihood routine provided by Hilbe (1999).
6 Person-day-trips were defined as the number of trips taken by the respondent in one year. All cost
information was then divided by the number of days to form per-day trip costs.
7
that affect the participation decision. Compensating and equivalent variation measures
can also be calculated from the expenditure function implied by the Marshallian demand
relationship specified above. From a welfare perspective, CV and EV may be of interest
as measures of potential compensation from those who degrade the resource. Table 1
summarizes the welfare measures used in the analysis.
Table 1: Welfare measures
Model Consumers Compensating Equivalent
surplus variation variation
Household sample:
_
1 ln1+
1
Negative binomial i i
- = -eX
p p i p - ln1-
i p
_ i 1 i
Zero-inflated negative binomial -(1- P) eX 1
p (1- P) i ln1- p -(1- P)i ln1+ p
On-site sample:
Trunc. negative binomial/ _
1 ln1+
1
Trunc. negative binomial - = -eX i i
w/endogenous stratification p p i p - ln1-
i p
_ _
Note: = exp ( X ) from equation (8), where X represents the sample means; i is the coefficient on
income.
III. Application to Lake Sevan, Armenia
Lake Sevan is the largest high altitude reservoir of freshwater in the
Transcaucasus, and is one of the highest lakes in the world. However, over the course of
last 50 years, the level of the lake has dropped by 18 m, its surface area has decreased by
15%, and the volume of water in Lake Sevan fell by more than 40% (from 58.5 to 34.6
km3). These changes had various significant adverse impacts on Lake Sevan's ecology.
As it is located only 70 km away from the capital city Yerevan, Lake Sevan is the
preferred and most accessible recreational site of most Armenians.
The Government of Armenia has been working on a Lake Sevan protection action
plan. The objectives under consideration by the Government of Armenia include
preventing a further lowering of the level of Lake Sevan, and raising the level of the lake
by at least 3 meters as quickly as possible. However to date, there has not been a
8
thorough measurement of the current recreational benefits to include in benefit-cost
analysis. Welfare measurement would be useful to policymakers tasked with weighing
the alternative options of restoring Lake Sevan. Our model and welfare comparison is
also useful in this context as Lake Sevan is a single site, with no substitutes, so
comparing the two samples is not confounded by alternative sites that may enter into an
individuals' water-based recreation decision. Also, since we are measuring current
recreational benefits, we avoid having to predict what the impact improvements would
have on expected trip demand.
To estimate benefits by the general population and users of the site, two surveys
were conducted one comprising of 3,358 households across Armenia and the other an
interceptor survey of 389 on-site tourists recreating at Lake Sevan.7 Both were conducted
in the year 2000, with the tourist survey during the summer to better capture the high
season of annual recreational use at the lake. The household sample was selected and
stratified by the 1996 Population Census of Armenia, while the on-site survey relied on
tourist interception at the lake.
Annual visitation to Lake Sevan by these two groups is reported in Table 2.
Household survey responses indicate that nearly 75% did not visit the lake in the past
year, with a sample mean of 0.81 day-trips. The tourist survey, obviously truncated at one
as interviews took place at the lake, averaged 3.17 day-trips per year. The average person
from the household survey was 44 years old, earned the equivalent of 1,383 USD per
annum, had 10 years of formal education, and a household size of 4. The average person
from the on-site survey was 36 years old, earned $2,933 USD per annum, had 10 years of
education and a household size of 5 (see Appendix I for details).
In Table 2 we also note that the standard deviation of visitation in each sample exceeds
its mean, thus we suspect the presence of over-dispersion, and therefore formally test the
7 The detailed questionnaires included six major parts: (1) environmental attitudes and perceptions; (2) a
Lake Sevan action plan for restoration; (3) contingent valuation questions; (4) socio-economic
characteristics; (5) recreational use of Lake Sevan; and (6) interview debriefing questions. For the
purposes of this paper, only sections (4) and (5) are used.
9
negative binomial counterpart of the Poisson distribution. In addition, given the large
number of zeros in the household survey, this leads us to formally test the use of the zero-
inflated negative binomial model for the household survey.
Table 2: Frequency of visitation
Household Tourist
Person-day-trips frequency Percent frequency Percent
0 2516 74.93 0 0.00
1 455 13.55 185 47.56
2 152 4.53 94 24.16
3 84 2.50 41 10.54
4 30 0.89 25 6.43
5 37 1.10 14 3.60
6 12 0.36 5 1.29
7 7 0.21 0 0.00
8 5 0.15 0 0.00
9 0 0.00 0 0.00
10 26 0.77 5 1.29
10 to 15 12 0.36 6 1.54
15 to 20 10 0.30 6 1.54
20 to 30 3 0.09 4 1.03
30 to 40 3 0.09 2 0.51
40 to 50 1 0.03 2 0.51
50 to 100 5 0.15 0 0.00
Total 3358 100.00 389 100.00
Mean 0.81 3.17
Standard deviation 3.95 5.75
IV. Estimation Results
(i) Determinants of visitation
The household sample was initially modeled using the Poisson, negative binomial
(NB), zero-inflated Poisson (ZIP) and zero-inflated negative binomial (ZINB). The on-
site sample was modeled using the truncated Poisson, truncated negative binomial
(TRNB) and the truncated negative binomial with endogenous stratification (TRNBES).
Comparative tests between each model were performed and are reported below. For
brevity, only the estimation results for the household (NB and ZINB) and on-site models
10
(TRNB and TRNBES) are reported in Table 3 with marginal effects for the ZINB and
TRNBES models listed in Table 4.
From the empirical demand relationship in equation (8), we model the
participation and trip quantity decisions using travel cost and several individual-specific
variables that may co-vary with each decision - income, age, household size, education,
and a Yerevan city dummy.8 Travel costs included: (1) transport costs; (2) on-site costs
(per day); and (3) the value of time traveling to and at Lake Sevan. The value of time
was elicited from the respondent by asking them how much they would have earned had
they not traveled to Lake Sevan. This amount was then divided by the number of days
they were at the lake to arrive at a trip-per-day cost. Note that for the household model,
each equation (logit and mean) contain the same explanatory variables as they may
contribute to either of the participation or quantity decisions.
Beginning with the household survey results in the second and third columns of
Table 3, we note that the likelihood ratio (LR) test of = 0 is rejected indicating the
significance of over-dispersion and thus the selection of the negative binomial
specification over the Poisson. A further formal specification test between the NB and
ZINB is possible (Vuong, 1989). The test statistic is directional and distributed standard
normal and for values |V| > 1.96, the zero-inflated version is supported. With a value of
4.86, the ZINB specification is favored over the NB.
Parameter estimates of the household ZINB model reveal that income, age and
education, along with respondents who reside in Yerevan significantly determine the
household participation decision to recreate at Lake Sevan (see logit inflation model).
The coefficients are interpreted relative to observing a zero count, thus the positive
coefficient on age implies that older respondents are more likely to record zero
participation, whereas individuals with higher income or education are less likely to
report zero trips to Lake Sevan. Those who reside in Yerevan city are also more likely to
8A dummy variable to capture previous visitation to the lake was also initially considered for each model,
however, over 94% of respondents in the household survey and over 95% in the tourist survey visited Lake
Sevan at least once in the past three years (and thus insufficient statistical variation).
11
report zero visitation in the past year. Among those who do choose to participate (see
mean model), increases in income and household size increase trip demand, while
increases in travel costs and education decrease trip demand.
For the on-site survey, first an LR test between a truncated Poisson and truncated
negative binomial (TRNB) was rejected indicating that over-dispersion in visitation is
significant, leading to us to favor the TRNB specification. Second, the TRNBES model
was estimated to see whether higher trip frequencies have any systematic association with
an individual's characteristics. Estimation results for both TRNB and TRNBES show
that increases in travel costs, age and education decrease visitation, whereas increases
household size increase trip demand. In the TRNB model, estimated coefficients and
standard errors are higher leading to a lower significance across each explanatory
variable. By correcting for ES, the magnitude of estimated coefficients falls, and standard
errors fall by a greater extent such that significance rises among the major determinants
of visitation. In the next section, we explore the consequences of these differences on
expected trip demand as well as the implications on welfare estimates.
12
Table 3: Household and on-site model estimates of visitation to Lake Sevan
Variable HH: NB HH: ZINB On-site: TRNB On-site: TRNBES
Mean model
Travel costs -0.0256*** -0.0153*** -0.0521*** -0.0519***
(-5.41) (-3.46) (-3.37) (-4.79)
Income 0.00035*** 0.00015*** 0.000040 0.000013
(7.54) (3.63) (0.60) (0.32)
Age -0.0233*** 0.0035 -0.0313*** -0.0263***
(-6.36) (0.78) (-3.45) (-4.58)
Household size 0.1219*** 0.0974*** 0.2969*** 0.2711***
(4.02) (2.64) (3.57) (5.26)
Education -0.0094 -0.0686*** -0.0912* -0.0926***
(-0.43) (-2.66) (-1.66) (-2.79)
Constant -0.0392 0.2174 -10.7080 -15.4955
(-0.11) (0.56) (-0.33) (-0.12)
Logit inflation model
Travel costs 0.0109
(0.91)
Income -0.0012***
(-4.77)
Age 0.0903***
(8.47)
Household size 0.0313
(0.43)
Education -0.2768***
(-4.80)
Yerevan city 0.8631***
(2.68)
Constant -1.5611*
(-1.83)
5.8005 3.7079 13.2317 17.0166
Log-likelihood -3,334.71 -3,249.60 -656.48 -679.79
LR test (=0) ~ 2 (d.f.) 6,469.23 (1) 3,271.69 (1) 846.11 (1) 799.49 (1)
Vuong test ~ N (0,1) - 4.86 - -
Number of observations 3,358 3,358 389 389
Non-zero observations 842 842 389 389
Zero observations 2,516 2,516
t-statistics in parentheses; * significant at the 10% level; ** significant at the 5% level; *** significant
at the 1% level.
(ii) Visitation sensitivity
The sensitivity of trip demand for the household ZINB and tourist TRNBES
models to changes in the parameter values is summarized in Table 4. Beginning with the
household survey and under the binary participation equation, estimated coefficients
13
from the regression are interpreted as increasing or decreasing the odds of non-
participation (or observing a zero). As this may be counter-intuitive, we reverse the
signs on the estimated coefficients and re-interpret the results in terms of the odds of
participation in Table 4.
A unitary increase in age or household size of the respondent leads to a decrease
in likelihood of participation by 9.5% and 3.2%, respectively, whereas an increase in one
year of education increases the odds of participation by 75%. Income only marginally
impacts trip demand with increases by $1 USD leading to an increase in participation of
0.12%. This relative insensitivity to income changes is a common finding among
recreational demand studies. If the respondent lives in Yerevan, the likelihood of
participation is decreased by an overwhelming 137%. This may be owing to the fact that
in the household sample, over 80% of the sampled househols are from Yerevan, the
capital city. For the trip count equation, a one unit increase in travel costs or education
decreases the number of trips by 1.5% and 6.6%, respectively. Thus, although travel
costs are not a significant determinant in the decision to recreate, they do impact the
number of trips a person decides to take. Also, a person's education appears be important
both decisions, but in opposite directions. Those with higher education tend to participate
more often, but as one frequents the site more often this effect diminishes. Greater
household size also works in opposite directions for the participation and quantity
decisions. A one unit change in household size decreases participation by 3.2% but for
those who do go, it increases the number of trips by 10.2%. Upon closer inspection of
the data, it was found that households with more children were associated with higher trip
frequencies. The impact of income on trip frequency was found to be negligible.
14
Table 4: Marginal effects on trip demand
HOUSEHOLD: ZINB ON-SITE: TRNBES
Visits Coefficient % trips Coefficient % trips
Count participation equation
Travel costs ($USD) -0.0153*** -1.52 -0.0519*** -5.06
Income ($USD) 0.00015*** 0.00 0.000013 0.00
Age (years) 0.0035 0.35 -0.0263*** -2.59
Household size (number) 0.0974*** 10.23 0.2711*** 31.13
Education (years) -0.0686*** -6.63 -0.0926*** -8.85
Participation % Pr(participation)
Binary participation equation
Travel costs ($USD) -0.0109 -1.10
Income ($USD) 0.0012*** 0.12
Age (years) -0.0903*** -9.45
Household size (number) -0.0313 -3.18
Education (years) 0.2768*** 75.82
Yerevan (1=lives in Yerevan) -0.8631*** -137.06
* significant at the 10% level; ** significant at the 5% level; *** significant at the 1% level
For on-site trip demand, unitary increases in travel costs, age and education
decrease the number of trips by 5.1%, 2.6% and 8.9%, respectively, and an increase in
household size significantly increases trip frequency by 31%. With the exception of age,
each impact has a similar interpretation as in the household model, but the effects are
much larger. In the case of age, older individuals are significantly and negatively
correlated with higher visitation.
(iii) Estimated trip demand and welfare measures
Using the parameter estimates from the four models in Table 3, the expected
_
number of trips, E(yi | X ) , and consumers surplus (CS) measures were calculated (Table
5).9 The expected number of trips was estimated for each model using sample means of
the independent variables. Comparing the NB with the ZINB, note that the expected
number of trips falls once we account for the inflation of zeros (participation). Indeed,
since the NB model is treating every zero as being a part of the quantity decision, this
9Although the CV and EV measures are not formally reported above, as the estimated coefficient on
income, i, in both the ZINB and TRNBES models is small, CS is tightly bounded by CV and EV; for the
ZINB model CV= $8.7984, EV=$8.8478 and for TRNBES model CV=$8.2137, EV=$8.2123.
15
biases the estimates upwards, whereas the ZINB recognizes that the zeros may come
from different stochastic processes (participation or quantity).
For the on-site model, TRNB, the expected number of trips far exceeds the
demand estimated by the household survey. This seems reasonable since we are
comparing casual versus avid users of the site. However, the expected number of trips is
even higher after accounting for ES (TRNBES). At first glance this may seem counter-
intuitive, but recall that expected trip demand is calculated as E(yi | xi) = i + 1 + ii), and
note that the only substantial difference between the estimated parameters of TRNB and
TRNBES is the value of the over-dispersion parameter, (see Table 3). Thus it is the
overdispersion that is driving this result. This finding is similar to that found by Englin
and Shonkwiler (1995), where expected trip demand is 1% higher for their sample-based
`restricted negative binomial model' (analogous to our TRNBES model) and 63% higher
for their population-based trip demand. Martinez-Espineira and Amoako-Tuffour (2005)
also find an 18% higher expected trip demand in their ES model.
Estimated household consumers surplus was $8.82 per trip whereas for the on-site
sample CS was calculated as $8.73 without compensating for ES and $8.21 per trip with
ES. Although all three results are close, it is rather surprising to find the closest estimate
to be between the TRNB and ZINB models. One would initially expect the TRNBES to
be the closest if ES were present in the on-site sample. The most plausible explanation is
rooted in the very reason why one argues for ES adjustment; if adjustments for ES yield
only small differences in expected demand or consumer surplus, this suggests that those
surveyed at Lake Sevan possess characteristics similar to those in the household sample.
This implies that either the TRNB or TRNBES model is sufficient for estimation. This
can be more clearly seen if one views the mean function , and the similarity of estimated
characteristics between the TRNB and TRNBES models (especially the similarity
between the estimated coefficient on travel cost, p; which is the denominator in the CS
calculation, - i / p. Ovaskainen et al. (2001) and Englin et al. (2003) also find similar
results where the ES adjustment had little effect on coefficient and benefit estimates.
16
Table 5: Expected visitation and benefit estimates
Measure Household: Household: On-site: On-site:
NB ZINB TRNB TRNBES
_
E(yi | X ) 0.8926 0.5787 5.8822 6.9664
CS ($USD per day-trip) 8.16 8.82 8.73 8.21
Total WTP1 ($USD) 6,362,295 6,875,160 6,802,126 6,399,840
Note: X is evaluated at the sample mean.
1 Calculated for households as: CS * 779,230 households in 2001.
V. Conclusion
In this paper, a population-based household sample and an on-site sample are
modeled in a travel cost framework to compare estimated consumers surplus for the value
of site access. If each model is corrected for several dependent variable issues, we expect
the models to produce similar welfare estimates. In the household model, we account for
the potential for over-dispersion (variance>mean) by the use of a negative binomial
distribution function, and for the possibility of observing a large number of zero visits (a
recreation participation decision) by splitting the participation and quantity decisions
directly in one censored model, the zero-inflated negative binomial (ZINB). For the on-
site survey, there is a possibility of over-sampling those who recreate quite often, thus the
truncated distribution function is augmented for endogenous stratification (e.g. the
likelihood of surveying respondents whose characteristics are associated with higher trip
frequencies). To compare the effect of ES, we model the on-site sample as a truncated
negative binomial with and without endogenous stratification (TRNB and TRNBES,
respectively).
Each of these models are then applied to a unique water-based recreational site in
Armenia, Lake Sevan. The site has few, if any, alternatives facilitating a comparative
welfare exercise. In addition, as the surveys measured only current revealed preference
behavior, no quality changes are present to confound the measurement of expected trips
outside the current experience.
17
Results from the zero-inflated negative binomial model (ZINB) for households
suggest that separating the participation and quantity decisions is significant in modeling
household behavior. In this application, explanatory variables such as age, education and
income were found to be significant factors in the binary decision to recreate at Lake
Sevan. The quantity of trips was determined by travel costs, income, household size and
education. Expected trip demand was found to be 0.58 trips per individual per annum,
and the welfare measure calculated from the underlying demand function reveal a per trip
consumers surplus of $8.82. From the on-site sample the TRNB and TRNBES models
yielded expected trip demands of 5.9 and 7 trips per person per year with consumers
surplus values of $8.79 and $8.21 per person per year, respectively. Expected trip
demand from the on-site models is higher than the household sample due to the
difference in sampling casual versus more avid users of the site. However, an even
higher trip demand is found in the TRNBES model due to a higher estimated
overdispersion parameter, used in the calculation of expected trip demand.
All three models appear to yield similar welfare measures, but it appears that
accounting for endogenous stratification in the TRNES model did not yield a
significantly different estimate from the TRNB model. In fact, consumers surplus from
the TRNB model is slightly closer to the household result than the TRNBES model. One
possible explanation is that individual characteristics of the on-site sample are not
correlated with higher trip frequencies (arguing against the precise reason we factor in
ES). This does not imply that ES is not an important consideration in modeling on-site
behavior, rather the results found here suggest that the on-site sample was merely
representative of the population-based household survey. This finding is quite contrary
to other studies where the ES bias in welfare measurement has been found to be quite
significant (Shaw, 1988; Englin and Shonkwiler, 1995; Loomis, 2003; Martinez-
Espineira and Amoako-Tuffour, 2005).
Although we did not find any significant difference in accounting for ES, this
does not negate the main result that when comparing household and on-site samples,
either can be used to derive a consistent welfare measure of access to the site after
18
accounting for each dependent variable problem. As was previously mentioned, quite
often the method of surveying is a constrained choice, usually by cost or time. It is
therefore reassuring that if one is truly constrained in some sense, that by implementing
the proper technique, the quality of the measure need not be in question.
19
References
Cameron, A. C. and P. K. Trivedi. 1986. Econometric models based on count data:
comparisons and application of some estimators and tests. Journal of Applied
Econometrics. 1: 29-53.
Curtis, J. 2003. Demand for water-based leisure activity. Journal of Environmental
Planning and Management. 46(1): 65-77.
Englin, J. and J. S. Shonkwiler. 1995. Estimating social welfare using count data models:
an application to long-run recreational demand under conditions of endogenous
stratification and truncation. The Review of Economics and Statistics. 77(1): 104-
112.
Englin, J. T. Holmes and E. Sills. 2003. Estimating forest recreation demand using count
data models. In E. Sills (Ed.), Forests in a Market Economy, Chapter 19, pp. 341-
359. Dordrecht, The Netherlands: Kluwer Academic Publishers.
Gourieroux, C. A., A. Monfort, A. Trogon. 1984. Pseudo maximum likelihood methods:
Applications. Econometrica. 52: 701-720.
Green, W. 1994. Accounting for excess zeros and sample selection in Poisson and
negative binomial regression models. Working Paper EC-94-10, Department of
Economics, Stern School of Business, New York University, New York, N.Y.
Grogger, J. and R. Carson. 1991. Models for truncated counts. Journal of Applied
Econometrics. 6: 225-238.
Gurmu, S. and P. K. Trivedi. 1996. Excess zeros in count models for recreational trips.
Journal of Business and Economics Statistics. 14: 469-477.
Haab, T. C. and K. E. McConnell. 1996. Count data models and the problem of zeros in
recreation demand analysis. American Journal of Agricultural Economics, 78: 89-
102.
Hausman, J., B. Hall, Z. Griliches. 1984. Econometric models for count data with an
application to the patents R&D relationship. Econometrica. 52: 909-938.
Hellerstein, D. M. 1991. Using count data models in travel cost analysis with aggregate
data. American Journal of Agricultural Economics. 73: 860-866.
Hilbe, J. 1999. sg102: Zero-truncated Poisson and negative binomial regression. STATA
Technical Bulletin No. 47.
Lambert, D. 1992. Zero-Inflated Poisson regression, with an application to defects in
manufacturing. Technometrics. 34: 1-14.
20
Loomis, J. 2003. Travel cost demand model based river recreation benefit estimates with
on-site and household surveys: comparative results and a correction procedure. Water
Resources Research. 39(4): 1105.
Martinez-Espineira, R. and J. Amoako-Tuffour. 2005. Recreation demand analysis under
truncation, overdispersion, and endogenous stratification: an application to Gros
Morne National Park. Working Paper 2005-03. Department of Economics, St.
Francis Xavier University: Canada.
Mullahy, J. 1986. Specification and testing of some modified count data models. Journal
of Econometrics. 33: 341-365.
Ovaskainen, K., J. Mikkola and E. Pouta. 2001. Estimating recreation demand with on-
site data: an application of truncated and endogenously stratified count data models.
Journal of Forest Economics. 7(2): 125-144.
Scrogin, D., K. Boyle, G. Parsons and A. Plantinga. 2004. Effects of regulations on
expected catch, expected harvest, and site choice of recreational anglers. American
Journal of Agricultural Economics. 86(4): 963-974.
Shaw, D. 1988. On-site samples' regression: problems of non-negative integers,
truncation and endogenous stratification. Journal of Econometrics. 37: 211-223.
Shaw, W. D. and P. Jakus. 1996. Travel cost models of the demand for rock climbing.
Agricultural and Resource Economics Review. 25: 133-142.
Shaw, W. D., E. Fadali, and F. Lupi. 2003. Comparing consumer's surplus estimates
calculated from intercept and general survey data. Proceedings of the W-133
(U.S.D.A.) Regional Economics Group, compiled by J. S. Shonkwiler. Las Vegas,
Nevada, February.
Shonkwiler, J. S. and W. D. Shaw. 1996. Hurdle count-data models in recreation demand
analysis. Journal of Agricultural and Resource Economics. 21: 210-219.
Vuong, Q. 1989. Likelihood ratio tests for model selection and non-nested hypotheses.
Econometrica. 57: 307-334.
Yen, S. T. and W. L. Adamowicz. 1993. Statistical properties of welfare measures from
count-data models of recreation demand. Review of Agricultural Economics. 15: 203-
215.
21
Appendix 1: Descriptive statistics for the Household (HH) and Tourist survey (Tourist)
Variable Mean Standard deviation Minimum Maximum
HH w/ HH w/ Tourist HH w/ HH w/ Tourist HH w/ HH w/ Tourist HH w/ HH w/ Tourist
Trips > 0 Trips 0 Trips 1 Trips > 0 Trips 0 Trips 1 Trips > 0 Trips 0 Trips 1 Trips > 0 Trips 0 Trips 1
Visits (person-day-trips) 3.24 0.81 3.17 7.36 3.95 5.75 1 0 1 100 100 50
Travel costs ($USD) 9.42 9.00 10.23 10.28 5.15 7.58 0.06 0.06 0.1 147 147 41
Income ($USD) 1,861 1,383 2,933 1,623 1,246 2,052 150 120 480 14,976 14,976 15,120
Age (years) 39 44 36 12 14 13 18 18 18 76 81 71
Household size 5 4 5 2 2 1 1 1 2 12 13 8
Education (years) 11 10 10 2 2 2 0 0 5 14 14 14
Past visitation (1=yes) 1.0 0.95 0.94 0 0.22 0.24 1 0 0 1 1 1
Yerevan city (1=yes) 0.80 0.82 - 0.40 0.38 - 0 0 - 1 1 -
Lake Sevan (1=yes) 0.12 0.06 1.00 0.33 0.24 0.00 0 0 1 1 1 1
Observations 842 3358 389
22