ï»¿ WPS6557
Policy Research Working Paper 6557
Top Incomes and the Measurement
of Inequality in Egypt
Vladimir Hlasny
Paolo Verme
The World Bank
Middle East and North Africa Region
Poverty Reduction and Economic Management Department
August 2013
Policy Research Working Paper 6557
Abstract
By all accounts, income inequality in Egypt is low and The analysis finds that correcting for unit non-response
had been declining during the decade that preceded significantly increases the estimate of inequality by just
the 2011 revolution. As the Egyptian revolution was over 1 percentage point, that the Egyptian distribution
partly motivated by claims of social injustice and of top incomes follows rather closely the Pareto
inequalities, this seems at odds with a low level of income distribution, and that the inverted Pareto coefficient is
inequality. Moreover, while income inequality shows located around median values when compared with 418
a decline between 2000 and 2009, the World Values household surveys worldwide. Hence, income inequality
Surveys indicate that the aversion to inequality has in Egypt is confirmed to be low while the distribution
significantly increased during the same period and for of top incomes is not atypical compared with what
all social groups. This paper utilizes a range of recently Pareto had predicted and compared with other countries
developed statistical techniques to assess the true value of in the world. This would suggest that the increased
income inequality in the presence of a range of possible frustration with income inequality voiced by Egyptians
measurement issues related to top incomes, including and measured by the World Values Surveys is driven by
item and unit non-response, outliers and extreme factors other than income inequality.
observations, and atypical top income distributions.
This paper is a product of the Poverty Reduction and Economic Management Department, Middle East and North Africa
Region. It is part of a larger effort by the World Bank to provide open access to its research and make a contribution to
development policy discussions around the world. Policy Research Working Papers are also posted on the Web at http://
econ.worldbank.org. The authors may be contacted at pverme@worldbank.org or vhlasny@ewha.ac.kr.
The Policy Research Working Paper Series disseminates the findings of work in progress to encourage the exchange of ideas about development
issues. An objective of the series is to get the findings out quickly, even if the presentations are less than fully polished. The papers carry the
names of the authors and should be cited accordingly. The findings, interpretations, and conclusions expressed in this paper are entirely those
of the authors. They do not necessarily represent the views of the International Bank for Reconstruction and Development/World Bank and
its affiliated organizations, or those of the Executive Directors of the World Bank or the governments they represent.
Produced by the Research Support Team
Top Incomes and the Measurement of Inequality in Egypt
Vladimir Hlasny 1 and Paolo Verme 2
JEL: D31, D63, N35.
Keywords: Top incomes, inequality measures, survey nonresponse, Pareto distribution, parametric
estimation, Egypt.
Sector Board: Poverty (POV)
1 Ewha Womans University, Seoul
2
World Bank. The authors are grateful to Branko Milanovic, Francisco Ferreira and Johan Mistiaen for very useful
comments, the CAPMAS of Egypt for providing access to the full 2009 Household Income, Expenditure and
Consumption Survey (HIECS) on site in Cairo, to Olivier Dupriez (World Bank) for estimating Pareto coefficients
and Ginis using a World Bank harmonized data set of household income and consumption data and to participants to
a World Bank internal review meeting that took place in June 2013. Preliminary results were presented in Cairo
during a workshop organized with the Egyptian Social Contract Center. The authors are grateful to the panel of
Egyptian economists and statisticians that provided very rich comments during the workshop. All remaining errors
are ours.
Contents
1. Introduction ...................................................................................................................................... 3
2. Measurement issues ......................................................................................................................... 5
3. Models ............................................................................................................................................. 7
Unit non-response ............................................................................................................................ 7
Extreme values ................................................................................................................................. 9
4. Data ................................................................................................................................................ 12
5. Results............................................................................................................................................ 13
Data errors...................................................................................................................................... 13
Subsampling................................................................................................................................... 13
Unit non-response .......................................................................................................................... 15
Extreme observations ..................................................................................................................... 22
6. How different is Egypt from other countries? ............................................................................... 27
7. Discussion ...................................................................................................................................... 29
References ................................................................................................................................................... 30
2
1. Introduction
A recent study of inequality in Egypt (World Bank 2013) has shown that there is an important
discrepancy between income inequality as measured by household expenditure surveys and the perception
of income inequality as reported by people in values surveys. This is no small issue given that part of the
frustration voiced by the people of Egypt and culminated with the Egyptian revolution in 2011 has been
explained in terms of inequality and social injustice. The Egyptian Center for Economic Studies (ECES)
for example, in a note shortly after the revolution argued that â€œSocial inequality and inadequate human
development coupled with the lack of political reforms have been among the main factors that led to the
outbreak of the revolution.â€? (p. 7, ECES, Policy Viewpoint, May, 2011). The World Bank (2013) study
also shows that the aversion to income inequality as measured by the World Values Surveys in 2000 and
2008 has significantly increased for all social groups, which is at odds with the apparently declining
values of income inequality during the same period.
This discrepancy may be explained by a variety of factors, including the various determinants of feelings
of inequality such as expectations about the future, or factors related to the measurement of the facts of
inequality such as whether household surveys are able to capture incomes well. The World Bank (2013)
study provided some initial leads on what could constitute an explanation of such a paradox and one of
these leads concerned the measurement of top incomes. It is rather well known that household surveys are
not particularly accurate at measuring top incomes because richer households tend to either underreport
income or expenditure, or are less likely to participate in household surveys altogether. When this
happens, measures based on incomes such as the Gini index for income inequality are biased and do not
reflect the actual extent of inequality in a country. However, beyond anecdotal evidence, there is little
research that has shown convincingly that household surveys worldwide underreport income inequality
while there is no evidence as yet of this being the case in Egypt.
Studying the relation between top incomes and inequality using statistical techniques is important not just
for statistical reasons. It has been observed, for example, that GDP growth in national accounts statistics
is at odds with the growth of household incomes inferred from the Egyptian Household Income,
Expenditure and Consumption Survey (HIECS). While GDP growth has shown a consistent cumulative
growth over the period 2000-2009, household incomes have shown a slight decline. This may be
explained by the fact that growth occurred only among top income households and that these households
are not well captured by household surveys. As shown by Atkinson et al. (2011), for example, despite
strong GDP growth, US household income measured by tax records grew by only 1.2% on average
between 1976 and 2007, and by only 0.6% if the top 1% of earners are excluded. The relative
performance of top earners also has an impact on perceptions. In the same example for the US, Atkinson
et al. (2011) noted that while the top 1% of incomes grew at similar rates during the Clinton (1993-2000)
and Bush (2002-2007) administrations â€“ around 10% per year â€“ the bottom 99% had very different
growth rates during the two administrations â€“ of 2.7% and 1.3% respectively. According to these authors,
this could explain why the public outcry over top incomes was much louder during the Bush years than
during the Clinton years. Therefore, studying the relation between top incomes and inequality is important
to better understand who benefits from GDP growth and how top incomes affect the measurement and
perception of inequality, two issues that may help to understand the Egyptian revolution.
3
This paper focuses on top incomes in an effort to determine how top incomes affect the measurement of
inequality in Egypt. We will attempt to investigate a number of well-known issues related to the
measurement of inequality and top incomes including: 1) item non-response; 2) unit non-response; 3) the
role of extreme observations and 4) the shape of the top income distribution. In doing so, we will draw on
three separate bodies of literature that we join into a consistent framework. In our knowledge, this is the
first time that these three bodies of literature are considered jointly.
The first body of literature we consider, on survey non-response, is vast and part of a long tradition in
statistics and economics addressing questions related to item or unit non-response biases in household
surveys. Deaton (2005), for example, has shown how unit non-response may well be one the factors that
can explain the discrepancy between national accounts and household surveys when it comes to the
measurement of household consumption. We will focus here only on a recent strand of this literature that
provides guidance on how to assess and correct measures of inequality in the presence of income biases
determined by household non-response (Korinek et al. 2006 and 2007). This literature essentially
provides two instruments that could help our investigative effort on inequality in Egypt: an instrument to
detect whether an income bias exists due to non-responses, and an instrument to correct for such a bias.
The previous paragraph suggests that top incomes may be systematically under-represented in surveys.
On the other hand, a large body of literature (Cowell and Victoria-Feser, 1996, Cowell and Flachaire,
2007, and Davidson and Flachaire, 2007) has found that extreme values of income can greatly influence
the measurement of inequality. This literature tests how extreme observations (such as top incomes) affect
measures of poverty and inequality, and proposes a method for correcting such measures in the presence
of biases induced by extreme observations. The correction is facilitated by a semi-parametric approach
that combines a parametric method for the uppermost part of the income distribution and a classic non-
parametric method using actual household data for the rest of the distribution. This approach has been
shown to be effective in correcting sample distributions that do not capture top incomes precisely.
Comparing the results from this exercise with those for the correction of non-response bias allows us to
understand better the role of top incomes in household surveys.
The third body of literature is that on top incomes, summarized in a recent paper by Atkinson et al.
(2011). This literature uses the Pareto distribution and the Pareto coefficients to study the distribution of
top incomes across the world using tax records. We borrow from this literature and apply the same tools
to household data instead of tax records. In our context, given our findings regarding the role of top
incomes in the Egyptian data, studying the shape of the top income distribution in Egypt can provide
some clues on whether this distribution departs from Paretoâ€™s law regarding top incomes, and whether the
distribution is very different from those in other countries.
The Pareto distribution used in Atkinson et al. (2011) is one of the distributions suggested by the Cowell
et al. literature mentioned above, while the Pareto and inverted Pareto parameters used by the Atkinson et
al. (2011) literature can be evaluated along with the Gini index using the Korinek et al. methods, which
may correct these parameters for income biases caused by unit non-responses. In essence, these three
bodies of literature can be nicely combined in a consistent framework to provide a robust assessment and
correction of the measures of inequality estimated with household data. We will also be able to
benchmark some of our results by comparing the Pareto parameters in Egypt with those estimated using a
data set of 418 household surveys administered in 107 countries worldwide.
4
The paper is organized as follows. The next section discusses the key issues and literature that we use in
the study. The following section outlines the main models and methods used in the empirical section.
Section four briefly describes the data under analysis. Section five presents and discusses results. Section
six compares Egyptian data with the rest of the world and section seven summarizes results and discusses
policy implications.
2. Measurement issues
Our objective is to understand how top incomes affect the measurement of income inequality in Egypt. In
this section we discuss the main issues to consider as pinpointed by studies focusing on top incomes and
inequality. In the next section, we will outline some of the models that can help us in addressing these
issues.
Sub-sample random extraction. In the particular case of Egypt, the national statistical agency provides to
researchers 25% or 50% of the sample extracted randomly from the four quarterly independent
subsamples of the full sample. As we know from sampling theory, random extraction is the best option
for extracting a sub-sample in the absence of any information on the underlying population. However,
only one sub-sample is extracted from the full sample and given to researchers and this implies that a
particularly â€œunluckyâ€? random extraction can potentially provide skewed estimates of the statistics of
interest. This is a question that we will test with a simple Monte Carlo experiment later in the paper.
Data errors. Extreme observations in an income distribution can sometime be explained in terms of
errors. Before any analysis with the available sample, it is worth checking whether extreme observations
among top incomes are simply errors such as data input errors or they are plausible data particularly
distant from the central moments of the distribution (outliers or extreme values). Statistical agencies are
usually quite thorough on this issue and clear data of errors before providing the data to researchers. In
our experience with the Central Agency for Public Mobilization and Statistics (CAPMAS), this is also the
case in Egypt. We will, however, report top observations and briefly discuss this issue before carrying out
any further analysis.
Item non-response. Item non-response occurs when households participating in the survey do not reply to
an item of interest (income or expenditure in our case). Item non-responses may be related to householdsâ€™
particular factors such as wealth or education, and this may bias statistics based on the surveyed incomes
or expenditures. The standard practice to address this problem is to impute the value of the item by
predicting this value based on a number of socio-economic characteristics observed for households with
the missing item. Alternative practices include assigning the mean or median values to the missing items
using information for households responding to the item, or information from external sources. In our
case, we do not have households that do not report any income or expenditure. It is possible that some of
the components of income or expenditure are not reported but the data do not distinguish between sub-
item non-response and sub-item nonexistence. For example, a household may report no income for rent
but the interviewer may not be able to distinguish whether the household does not own rented properties
or whether the household does not wish to reply to the question. This is a problem similar to
underreporting (we have a part of income or expenditure that is supposedly not observed) and that will be
treated as any other non-observed factor (in the error term).
5
Unit non-response. Unit non-response refers to households that were selected into the sample but did not
participate in the survey. The reasons for non-participation can be many such as a change of address or
non interest on the part of the household. Interviewers generally have lists of addresses that can be used to
replace the missing household but this practice is not always sufficient to complete the survey with the
full expected sample. Most of the available household survey data, particularly in developing countries,
suffer from substantial unit non-response. For some surveys, the reason for non-response is recorded and
sometimes this reason is used to correct the weights when the survey is completed if households have not
been replaced. In the case of Egypt, we did not have any information at our disposal concerning the
reason for non-response while we have about 4% of the sampled households that did not participate in the
2009 survey.
Unit non-response may or may not affect the statistics of interest. We therefore need to understand first
whether unit non-response affects income inequality. If this is the case, we can attempt to correct the bias
so as to obtain more accurate statistics. Korinek et al. (2006, 2007) have developed a method to estimate
whether unit non-response affects the measurement of inequality and also a method to correct for such a
bias if it exists. In this paper, we will follow these methods to address these issues as discussed in the next
section.
Top incomes distribution. Vilfredo Pareto introduced long ago the notion that the top observations in an
income series follow a particular distribution and pattern represented respectively by the Pareto
distribution and by the Pareto coefficient. More recently, Piketty and a number of other authors have used
these tools in conjunction with tax records to study top incomes across countries and across time, a
literature neatly summarized in Atkinson et al. (2011). In this paper, we will follow this literature to study
the shape of the top income distribution in Egypt and use the Pareto measures in three different contexts.
First, we will apply the Korinek et al. (2007) approach to the Pareto coefficients and identify how these
are affected by unit non-response and its correction. Second, we will use the parametric properties of the
Pareto distribution to evaluate how representative are the top income observations in our sample to the
underlying income distribution. And third, we will use the Pareto and inverted Pareto coefficients to
compare the top income distribution in Egypt with those in the rest of the world using a unique database
of 418 household budget surveys administered in 107 countries.
Extreme values and inequality. How sensitive is the Gini to extreme values? Cowell and Victoria-Feser
(1996), and Cowell and Flachaire (2007) have shown that, unlike poverty measures, inequality measures
are very sensitive to extreme observations, to the extent that even a single observation can significantly
affect the measurement of inequality. What constitutes extreme observations is a matter of judgment of
course. Neri et al. (2009), for example, define outliers as observations exceeding the median 4-5 times or
more. Working with the EU Surveys on Income and Living Conditions (EU-SILC), they find that this
typically comprises 0.1-0.2% of households. Cowell and Flachaire (2007) and Davidson and Flachaire
(2007) define extreme values as those values that can significantly change the value of inequality, and
propose a methodology to test and address the problem. In this paper, we will use this methodology to
evaluate the role of extreme values for the measurement of inequality with our data.
The same literature also shows that the choice of the measure of inequality and the choice of method to
estimate income inequality are very important. Measures of income inequality are many and some of
these measures are more sensitive to extreme values than others. The Gini index, for example, is known to
6
give more weight to central observations in a distribution and consequently discounts observations in the
tails. Cowell and Victoria-Feser (1996) have found that the Gini index is more robust to contamination of
extreme values than two members of the generalized entropy family, a finding later confirmed by Cowell
and Flachaire (2007). For these reasons and throughout the paper, we will use only the Gini index as a
measure of inequality while we would expect measures of the generalized entropy family to exhibit
sharper sensitivity to extreme income observations.
Cowell and Victoria-Feser (1996) and Cowell and Flachaire (2007) have also shown that even the Gini
index can be consistently underestimated with household surveys that cannot capture top observations
precisely. These authors concord in finding that inequality estimates imputed from a parametric
distribution function are less sensitive to extreme observations than non-parametric observations from
actual household data, and suggest combining parametric Pareto estimates for the top of the distribution
with non-parametric statistics for the rest of the distribution. This approach complements the Korinek et
al. (2009) method for correcting for unit non-response of high-income households and overlaps with the
Atkinson et al. (2011) method to model top incomes. We will use this method to correct the Gini
coefficient for the potential influence of top observations, so as to compare the results with non-corrected
Ginis or Ginis corrected for other statistical issues. This will allow us to comment on the relative
influence of extreme observations and other statistical issues in our data.
3. Models
Unit non-response
To test for the presence of a systematic non-response bias properly, we can use a formal model to
estimate the relationship between household income and its probability of response. Unfortunately, unlike
in the case of item non-response, we cannot simply infer householdsâ€™ unreported income from their other
reported characteristics, because we donâ€™t observe any information for the non-responding households.
Assigning the mean or median values to the missing items would be inappropriate, as the missing values
may be systematically very different from the rest of the distribution. However, following a technique
developed by Korinek et al. (2006 and 2007), we can still use information about household-response rates
at a higher level of geographic aggregation to infer the propensity of households with different
characteristics, such as different incomes, to participate in the survey. This approach essentially takes
advantage of the variation in household response rates and the variation in the distribution of observable
variables (income or expenditure per capita) across geographical areas. We estimate the response
probability for households as a function of their characteristics by observing the propensity of households
with similar characteristics across all regions to participate in the survey, and by fitting regional
population imputed from the participating householdsâ€™ response probabilities to the regionsâ€™ actual
population.
We assume that the probability of a household i to respond to the survey, Pi, is a logistic function of its
arguments (Korinek et al. 2006, 2007):
í µí±’ í µí±”(í µí±¥í µí±–,í µí¼ƒ)
í µí±ƒí µí±– (í µí±¥í µí±– , í µí¼ƒ ) = , (1)
1 + í µí±’ í µí±“(í µí±¥í µí±–,í µí¼ƒ)
7
where g(xi,Î¸) is a stable function of xi, the observable characteristics of responding households i that are
used in estimations, and of Î¸, the corresponding vector of parameters from a compact parameter space.
Variable-specific subscripts are omitted for conciseness. g(xi,Î¸) is assumed to be twice continuously
differentiable in Î¸. The parameters Î¸ can be estimated by fitting the estimated and actual number of
households in each region using the generalized method of moments (GMM) estimator
ï¿½ = arg min ï¿½ï¿½ï¿½í µí±š
í µí¼ƒ ï¿½í µí±— âˆ’ í µí±ší µí±— ï¿½í µí±¤í µí±—âˆ’1 ï¿½í µí±š
ï¿½í µí±— âˆ’ í µí±ší µí±— ï¿½ï¿½ .
í µí¼ƒ (2)
í µí±—
Here mj is the reported number of households in region j, í µí±š ï¿½í µí±— is the estimated true number of households
in the region, and wj is a region-specific analytical weight proportional to mj. The estimated number of
households, í µí±šï¿½í µí±— , can be imputed as the inverse of the estimated response probability of responding
households in the region, í µí±ƒï¿½í µí±–í µí±— , summed over all Nj households. If the sample is extracted from a larger
population, the imputed true number of households should be divided by the sampling rate for the
underlying population in each region, sj, to obtain population estimates. Finally, if the available sample
includes only a fraction of the households responding to the full survey â€“ such as the 25% random
extraction from the full HIECS sample â€“ we should divide by the sub-sampling rate for each region, ssj:
í µí±?í µí±—
ï¿½í µí±–í µí±—
ï¿½í µí±— = í µí± í µí±—âˆ’1 í µí± í µí± í µí±—âˆ’1 ï¿½ í µí±ƒ
í µí±š âˆ’1
. (3)
í µí±–=1
Under the assumptions of random sampling within and across regions, representativeness of the sample
for the underlying population in each region, and stable functional form of g(xi,Î¸) for all households, the
ï¿½ that are significantly different from zero
ï¿½ is consistent for the true Î¸. Estimated values of í µí¼ƒ
estimator í µí¼ƒ
would serve as an indication of a systematic non-response bias. In that case, we can use the imputed
household response probabilities to correct for the bias. In the absence of any information about non-
responding households, we have two options for correcting the bias: imputing the income of non-
responding households, or re-weighting households that responded to the survey according to their
inferred probability of response. Under the first option, estimation of the expected value of income for
non-responding households would entail integrating incomes weighted by the corresponding probabilities
of non-response across all possible incomes. With the imputed incomes of non-responding households,
we would obtain the full income distribution on which to estimate measures of inequality. The problem
with this method is that the results are sensitive to our assumption regarding the domain of incomes, and
representativeness of the estimated incomeâ€“probability relationship to counterfactual income levels. The
second option entails imputing the true distribution of incomes by correcting the mass of each observation
for its probability of being sampled. In this study we take the latter approach. Inverses of the estimated
response probabilities serve as the appropriate household weights. In the income distribution imputed in
this way, the derived measures of inequality converge to their true values as long as our sample is
representative of the underlying population.
8
The model presented in equations 1-3 above uses within-j information as well as between-j information. It
uses within-j information because the estimated number of households í µí±šï¿½í µí±— is estimated within-j and it uses
between-j information because the number of households observed within-j and the distribution of
explanatory variables vary across js. The choice of geographic disaggregation involves a trade-off
between the number of j data points, and the number and distribution of within-j observations vis-Ã -vis
the underlying population. On the one hand, observations should be behaviorally similar to non-
responding households within-j, calling for smaller geographic units. On the other hand, Equation 3
requires that the sample encompass the entire range of values of relevant characteristics of the underlying
population, potentially calling for larger geographic units. In this paper we opt to use 2,526 Primary
Sampling Units (PSU) as j regions with an average of 18.6 responding households per region, as
compared to the 51 US states with an average of 1,649 households per state used by Korinek et al. (2006
and 2007). These are clearly two different approaches with different implications.
In our case, the primary sampling units have relatively homogeneous households, with similar behavioral
responses and presumably also similar survey-response probabilities. Because of a high response rate to
the HIECS survey (96.3%), the observed range of household characteristics in each PSU is expected to
comprise the values of the few non-responding households. A higher level of geographic aggregation
would make behavioral responses less likely to be stable within j areas, while offering little additional
assurance that values of characteristics of responding households encompass values of non-responding
units. In our case, householdsâ€™ response probabilities are essentially inferred by comparing regions with
similar, narrow ranges of explanatory variables. The response probability curve is constructed using 2,526
sets of probability estimates that are little overlapping on the curve. In Korinek et al. case, response
probabilities are inferred by comparing fewer regions with greater ranges of explanatory variables. The
response probability curve is constructed using 51 sets of probability estimates largely overlapping. In our
case, the non-response bias correction is limited by the low observed non-response rate and by
homogeneity of households in each PSU, which prevent the response probabilities to be estimated too
low. In Korinek et al. case, response probabilities can be very low for some households, because other
households in the same region can be assigned very high probabilities in compensation. This difference in
methodologies is important because model errors are at the level of regions j. We think that our approach
represents a more appropriate bias correction of the Gini coefficients in the HIECS data, that it is less
likely to overshoot the correction, and that it is more consistent with the Pareto corrections illustrated in
the next section. We will test these claims in the results section.
Extreme values
To evaluate the distribution of topic incomes and study the presence of extreme values in our data, we
follow the approach pioneered by Pareto (1896) and recently rediscovered by the work of Piketty and
others summarized in Atkinson et al. (2011). The Pareto distribution is a particular type of distribution
which is skewed and heavy-tailed. It has been used to model various types of phenomena and it is thought
to be suitable to model incomes, particularly upper incomes. The Pareto distribution can be described as
follows:
1
í µí°¹ (í µí±¥ ) = 1 âˆ’ , 1 â‰¤ í µí±¥ â‰¤ âˆž , (4)
í µí±¥ í µí»¼
9
where í µí»¼ is a fixed parameter called the Pareto coefficient and x is the variable of interest, which in our
case will be income or expenditure per capita. It follows that the probability density function can be
described as:
í µí»¼
í µí±“(í µí±¥ ) = , 1 â‰¤ í µí±¥ â‰¤ âˆž . (5)
í µí±¥ í µí»¼+1
The probability density function has the properties of being decreasing, tending to zero as x tends to
infinity and with a mode equal to 1. Intuitively, as income becomes larger, the number of observations
declines following a law dictated by the constant parameter í µí»¼ . Clearly, this is not a distribution function
that suits well all incomes under all income distributions, but should be thought as one possible
alternative to model the right hand tail of a general income distribution, which is the focus of this paper.
In the application that follows, and for empirical purposes, we will use a slightly different definition of
the Pareto coefficient (í µí»¼) as well as the Inverted Pareto coefficient (í µí»½) as proposed in Atkinson et al.
(2011):
1
í µí»¼ =
í µí± 10 (6)
1 âˆ’ ï¿½log( í µí± 1 )/ log(10)ï¿½
í µí»¼
í µí»½ = , (7)
í µí»¼ âˆ’ 1
where s10 and s1 represent the income shares of the top 10% and 1% of the population respectively. With
tax records, it is generally more common to use the top 1% and 0.1% respectively but with household
data, where samples are typically in the thousands of observations, the top 0.1% of households is a
sample too small to be representative of the very top of the distribution as it may comprise extreme
observations, hence the choice of the top 1% of the population.
The interpretation of the beta coefficient is that larger betas correspond to larger top income shares while
the opposite is true for the alpha coefficient. In what follows, we will report both coefficients but, as a
rule of thumb, the beta coefficient is what provides a snapshot indication of top incomes. Research on top
incomes has shown that the alpha and beta coefficients are effectively stable for any income distribution,
and in any given year and country, as originally predicted by Pareto. The work by Piketty and others,
which used much longer time-spans than previous research, has shown that the beta coefficient can vary
over time and that this variation can be explained by a combination of economic and political factors.
Measures of inequality can be influenced by the presence of even few observations with unusually high
values. To evaluate the possible presence of extreme observations in our sample, and to evaluate the
sensitivity of our Gini coefficients to these observations, we follow a procedure proposed by Cowell and
Flachaire (2007) and Davidson and Flachaire (2007) to replace highest-income observations with values
estimated under an expected distribution, and to combine the corresponding parametric inequality
measure for these incomes with a non-parametric measure for lower incomes. As the afore-mentioned
10
literature has confirmed, top incomes appear to be distributed as under the Pareto distribution with an
estimable coefficient í µí»¼. Cowell and Flachaire (2007), propose the following formulation of í µí»¼
1
í µí»¼ = , (8)
í µí±˜ âˆ’1 âˆ‘í µí±˜âˆ’1
í µí±–=0 log í µí±‹(í µí±›âˆ’í µí±–) âˆ’ log í µí±‹(í µí±›âˆ’í µí±˜+1)
where X(j) is the jth order statistic in the sample of incomes n, and k is the delineation of top incomes such
as the top 10% of observations. We could also estimate í µí»¼ using maximum-likelihood methods to obtain
the estimate with its robust standard error. All these methods allow weighting of observations by their
sampling probability. These estimation methods yield results that are similar to the formulation proposed
by Atkinson et al. (2011) in Equation 6 above. We will therefore use this formulation in the rest of the
paper for consistency. The Gini coefficient under the estimated Pareto distribution for the k top-income
households can be derived from the expression for the corresponding Lorenz curve (expression inside of
the integral below) as
1
1ï¿½ 1
í µí°ºí µí±–í µí±›í µí±– = 1 âˆ’ 2 ï¿½ 1 âˆ’ [1 âˆ’ í µí°¹ (í µí±¥ )]1âˆ’ í µí»¼ í µí±‘í µí°¹ (í µí±¥ ) = . (9)
0 2í µí»¼ âˆ’ 1
Finally, this parametric Gini coefficient can be combined with the non-parametric Gini coefficient for the
n-k lower-income observations using geometric properties of the Lorenz curves as
í µí±˜ í µí±˜ 2í µí±˜
í µí°ºí µí±–í µí±›í µí±– = (1 + í µí°ºí µí±–í µí±›í µí±–í µí±˜ ) í µí± í µí±˜ âˆ’ (1 âˆ’ í µí°ºí µí±–í µí±›í µí±–í µí±›âˆ’í µí±˜ ) ï¿½1 âˆ’ ï¿½ (1 âˆ’ í µí± í µí±˜ ) + ï¿½1 âˆ’ ï¿½ . (10)
í µí±› í µí±› í µí±›
Here sk refers to the share of aggregate income held by the richest k percent of households. As long as it
was correct to assume that top incomes in the population are Pareto-distributed, this semi-parametric Gini
coefficient can be compared to an uncorrected non-parametric estimate for the observed income
distribution. A difference between the semi-parametric and non-parametric estimates would indicate that
some observed high incomes may have been generated by a statistical process other than Pareto, and that
our inequality measure is sensitive to this. A semi-parametric Gini that is lower than the non-parametric
Gini can be interpreted as evidence that some top incomes in the sample are â€˜extremeâ€™ compared to those
predicted under the Pareto distribution. A higher semi-parametric Gini would indicate that the observed
top incomes are lower than what the Pareto distribution would predict, potentially implying under-
representation of high-income units in the sample.
The unit non-response and extreme-observations conjectures thus yield opposite predictions about the
influence of top incomes on inequality measures, to the extent that they may even cancel each other out.
The former conjecture is that the observed top incomes are valid for the measurement of inequality, and
should be even used to stand for unobserved incomes of non-responding households. The latter conjecture
is that the observed top incomes may have been generated by processes different from those in the
underlying population, by error or by different accounting practices, and should be replaced by values
imputed from the data generating process in the population.
11
To comment on the validity of these opposite predictions and evaluate their relative size, we can compare
a set of four Gini coefficients: semi-parametric Gini accounting for the possibility of extreme
observations but not for the non-response bias (i.e., Equation 10 where Ginik is derived from Equations 6
and 9 in an unweighted income distribution); semi-parametric Gini accounting for them both (Equation 10
where Ginik is derived from Equations 6 and 9 in an income distribution weighted as per Model 4); non-
parametric Gini correcting only for the non-response bias (Gini observed in a Model 4-weighted income
distribution); and the baseline uncorrected non-parametric Gini (observed in the unweighted income
distribution). This comparison can inform us about the relative importance of extreme income
observations versus non-response bias among high-income households and about their combined effect
for the measurement of inequality in Egypt.
4. Data
This study relies on the Household Income, Expenditure and Consumption Survey (HIECS) administered
by the Central Agency for Public Mobilization and Statistics (CAPMAS). The survey was conducted
every five years until 2009 and is now implemented every two years. In this study, we use four rounds of
the HIECS: 1999-2000, 2004-2005, 2008-2009 and 2010-2011 (2000, 2005, 2009 and 2011 for short).
Survey samples comprise four quarterly independent subsamples that are nationally representative and
stratified by governorate, and urban and rural substrata. The original full samples of the 2000, 2005 and
2009 surveys included 48,000 households, but starting from 2011 the survey includes a smaller sample of
16,000 households. All samples are stratified by governorate, and urban and rural substrata, and they are
multi-stage random samples based on the most recent population censuses: the 1996 census for the 2000
and 2005 HIECS, and the 2006 census for the 2009 and 2011 HIECS. For a full description of the data
and for a discussion of comparability issues over time see World Bank (2013).
The CAPMAS traditionally provided researchers with access to only 25% of observations in the HIECS.
Since May 2013, however, the agency decided to grant researchers access to 50% of the data for selected
years and posted these data on the internet. Extraction of the 25% or 50% subsamples is carried out
randomly within each of the quarterly independent subsamples. For the purpose of this study, the
CAPMAS has also granted exceptional access to 100% of the 2005 sample and allowed us to investigate
100% of the 2009 sample on site in Cairo. We therefore use 25% of the sample for 2000 (12,000
observations), 100% for the 2005 and 2009 samples (48,000 observations each) and 25% for 2011 (4,000
observations). From a methodological perspective, these samples and subsamples provide a complex
combination of challenges for the measurement of inequality that, in our view, make Egypt a very good
case study.
In this paper we put special emphasis on the 2009 HIECS. This is the only sample for which we have
information on household response rates for all Primary Sampling Units (PSU), which is essential to
implement some of the tests conducted in this paper. The 2009 sample is based on the last national census
(November 21, 2006), and follows it most closely, which implies better accuracy of the sampling frame.
To the extent that new residential developments may have arisen or people changed their residence since
the latest census, this proximity in time minimizes any distortions or biases in sample coverage.
Improvements to the sampling and methodology made by the CAPMAS between 2000 and 2009 as well
as the fact that the 2010-2011 survey was carried out during the revolution make the 2009 survey the most
12
accurate of all surveys implemented by the agency to date. We will use the other rounds of the HIECS to
carry out several additional tests and compare statistics over time.
The main welfare measure used in this paper is income per capita. It is common practice in developing
countries to use consumption as a proxy of income rather than income itself given that income tends to be
underreported and given that consumption is smoother than income, especially in rural areas. The World
Bank (2013) report on income inequality in Egypt and our own work have shown that the income variable
in the HIECS is actually good. The distribution of income is very similar in shape to that of consumption
while the central moments of the distribution of income are higher than those of consumption. The
difference between income and consumption (savings) is also an increasing linear function of income as
one should expect. Therefore, while we will use also expenditure to compare our results for income, our
preference in the Egyptian case is for the income variable. Income includes six main groups of items:
wages and salaries, income from non-agricultural activities, cash transfers, income from agricultural
activities, income from non-financial assets and income from financial assets.
5. Results
Data errors
Our inspection of the HIECS data, done by evaluating the distributions of income, expenditure and other
socio-economic characteristics of households, did not reveal any likely data errors. Nevertheless, it is
worth inspecting top values of income and expenditure for anomalies. We inspect the top observations of
income and expenditure in each of the four years under analysis using either the top 100 observations or
the top 1% of observations given that our samples are of different sizes (figure A1 in Annex). None of the
samples show implausibly high observations or implausibly steep distribution functions. However, the
2009 sample is consistently the most extreme from the standpoint of top observations as compared to
other years and for both income and expenditure. This is an important finding as our focus in this paper is
on the 2009 income distribution.
Subsampling
Can sub-samples randomly extracted from the full surveyed sample bias the measurement of inequality?
The Egyptian national statistical agency provides to researchers 25% or 50% of the full surveyed sample
extracted randomly from the full sample. Extracting 25% of observations randomly from the full sample
reduces the number of top and bottom income observations in the sample. This is similar to the problem
of sampling of top observations already discussed. The probability of capturing top-income observations
in a sub-sample follows the same probability laws as in the original sampling and we cannot predict ex-
ante whether inequality will be under or over-estimated in the sub-sample randomly selected.
However, we can conduct a simple Monte Carlo experiment and extract 25% or 50% of observations
randomly from the 2005 sample (which is available in full) 100 times, and then recalculate the Gini for
each subsample. The figure below shows the results with the 100 Ginis sorted in the ascending order. As
expected, the Gini of the full sample falls right in the middle of the distribution of Ginis calculated from
the sub-samples for both income and expenditure per capita. However, the CAPMAS provides only one
13
sub-sample to researchers, and this sub-sample could yield any Gini in the range depicted in the Figures
below. As can be seen, for both income and expenditure and for the 25 percent subsample, there are about
20 extractions that provide a Gini below the 95% lower bound of the value estimated from the full sample
and about ten extractions that provide a Gini above the upper bound. This means that the 25% random
sample can potentially provide biased Ginis about a third of the times, although we cannot predict ex-ante
the direction or size of the bias. With the 50% subsample the problem persists but is reduced by about
half. There is about a 15% chance that a Gini from an extracted subsample will be outside of the 95%
confidence interval of the full-sample Gini. The final estimations in this paper will rely on 100% of the
2009 sample as we were able to run our programming codes on the full sample in the CAPMAS offices in
Cairo. However, researchers who are currently using the 25% and 50% subsamples should be aware of
this potential issue.
Figure 1. Monte Carlo experiment Ginis (100 repetitions, 25% or 50% random sample extractions)
inc-25% inc-50%
.37
.37
.36
.36
Gini
Gini
.35
.35
.34
.34
0 20 40 60 80 100 0 20 40 60 80 100
n n
exp-25% exp-50%
.33
.33
.325
.325
.32
.32
Gini
Gini
.315
.315
.31
.31
0 20 40 60 80 100 0 20 40 60 80 100
n n
14
Unit non-response
Unit non-response is a problem in the HIECS data, particularly in some regions. Across governorates, the
survey non-response rate in 2009 ranged from 0.0% to 10.5% with a mean of 3.7%. While the nationwide
average non-response rate in the HIECS data is lower than in household surveys in other countries (for
instance, refer to the literature surveyed in Korinek et al. 2006), it still leads to biases in statistics based on
the observed sample. Out of 48,635 households contacted for the 2009 survey, only 46,857 responded to
the survey, while 1,778 reportedly did not respond, a large number. Secondly, the problem may be more
serious in some governorates than in others, and so interregional demographic comparisons based on the
sample may be flawed. Table 1 illustrates the interregional differences in non-response rates and mean
incomes of reporting households.
Table 1. Non-response rates and mean incomes and expenditures by governorate
PSUs in Non- Mean Mean Mean Mean
the 100% Response Household Income Household Expenditure
Governorate Sample Households Rate (%) Income per Capita Expenditure per Capita
Alexandria 149 2,801 6.0 22,094.95 5,393.10 20,815.49 5,082.83
Assiut 101 1,872 2.4 14,188.56 2,665.06 11,800.88 2,216.75
Aswan 52 978 1.0 17,442.17 3,635.79 13,018.19 2,713.95
Behera 152 2,871 0.6 17,268.48 3,680.44 14,240.29 3,035.94
Beni Suef 69 1,294 1.3 15,258.93 2,887.36 13,514.71 2,557.90
Cairo 285 5,194 8.9 26,693.58 6,499.94 23,781.25 5,794.74
Dakahlia 176 3,289 1.6 18,852.61 4,467.94 15,898.13 3,768.32
Damietta 52 959 2.9 21,379.38 5,460.37 18,202.50 4,654.69
Fayoum 78 1,466 1.1 17,120.80 3,071.68 15,523.90 2,784.29
Gharbia 139 2,584 2.2 20,925.32 4,606.58 18,255.12 4,025.31
Giza 215 3,939 6.5 19,684.33 4,347.80 17,270.96 3,821.73
Ismailia 52 967 2.1 25,295.13 5,401.84 17,843.52 3,810.80
Kafr ElSheikh 85 1,547 4.2 25,035.71 4,279.37 20,465.43 3,497.10
Kalyoubia 145 2,668 3.2 20,178.65 4,137.20 17,753.81 3,642.90
Luxor 14 263 1.1 20,629.04 4,704.10 15,746.41 3,591.63
Matrouh 11 209 0.0 28,858.18 5,861.38 22,282.55 4,525.81
Menia 128 2,371 2.5 19,469.71 3,451.37 16,205.61 2,876.04
Menoufia 107 1,977 2.8 19,622.80 4,147.15 15,742.03 3,324.27
New Valley 8 146 3.9 26,562.99 5,322.18 22,243.22 4,458.13
North Sinai 14 243 10.5 17,891.85 3,768.41 13,423.69 2,829.52
Port Said 50 925 7.4 28,091.89 6,501.37 25,207.07 5,844.91
Qena 88 1,628 2.6 17,655.77 3,302.03 14,099.05 2,637.08
Red Sea 13 239 3.2 30,745.62 7,050.69 22,396.95 5,151.85
Shrkia 175 3,262 1.9 16,454.62 3,662.45 13,896.70 3,093.52
South Sinai 4 69 9.2 52,438.13 10,969.95 29,246.05 6,357.09
Suez 50 951 4.9 31,069.54 7,269.37 27,198.66 6,370.75
Suhag 114 2,145 1.0 13,961.63 2,809.37 11,880.76 2,391.82
Mean 94 1,735 3.7 20,549.65 4,653.03 17,375.99 3,974.44
Note: Non-response rate, reported in the survey at the PSU level, is weighted by the number of responding
households in each PSU. Household income and expenditure, reported in the survey at the household level, are also
weighted by the number of responding households in each PSU. Per-capita income and expenditure are further
15
weighted by household size. These mean incomes and expenditures may not be representative of those for the entire
governorates, as they omit non-responding households.
Figure 2. Mean household non-response rates versus mean incomes per capita at the PSU level
(a) Histogram of mean non-response rates (b) Mean non-response rate by income per capita
Note: The unit of observation in this figure is a PSU. Average household non-response rate and average income per
capita in a PSU are shown.
Unit non-response in a region is associated positively with income of responding households in that
region. At the level of governorates, the Pearson correlation of the non-response rate with per-capita
income of reporting households is 0.53, and 0.54 with per-capita expenditure of reporting households. At
the level of individual primary sampling units (PSU), the correlations are 0.39 and 0.46, respectively.
Figure 2a reports that survey non-response rate ranges from 0.0% to 55% with a heavy right tail. Figure
2b shows the systematic relationship between household non-response rate and mean per-capita income
of responding households at PSUs. Non-response rates greater than 33% occur only among the richest
25% of PSUs in terms of income per capita, and only among the richest 15% of PSUs in terms of
expenditure per capita. Because of these findings, it is likely that mean incomes and expenditures are even
higher in the underlying populations of regions with high non-response rates, and that the associations are
even stronger with the incomes and expenditures of the underlying populations.
Table 2 shows the results of estimation of householdsâ€™ survey response as a function of household income
or expenditure. Our sample covers the 100% sample of the 2009 HIECS. Response-probability is thus
estimated for 46,857 households, by fitting population in 2,526 PSUs. Following Korinek et al.â€™s (2006,
2007) lead, all models estimate survey-response probability as a nonlinear function of income or
expenditure. Models 1 and 2 make g(x) in Equation 1 a function of household income or expenditure.
Models 3-10 use imputed income or expenditure per capita as explanatory variables, by dividing
household-level variables by household size. Model specifications in Table 2 were selected in
concurrence with Korinek et al.â€™s models, and with the aim to evaluate a variety of functional forms, from
linear to highly non-linear.
The basic finding is that householdsâ€™ survey response is related negatively to income and expenditures.
The coefficients on income and expenditures are consistently negative, and statistically very significant.
16
The simplest uni-variate logarithmic functions exhibit better fit than more complex or polynomial
functions. They yield greater significance of all coefficients, lower value of the minimization objective
function, and lower values of the Akaike and Schwarz Information Criteria, implying more efficient
overall model fit.
Household expenditure appears to have a better explanatory power than household income, yielding lower
values of the Akaike and Schwarz Information Criteria. Income and expenditure per capita provide better
fit than household-level income and expenditure, implying that dividing household-level values by
household size yields variables that are more predictive of householdersâ€™ probability to respond than the
household-level equivalents, without introducing additional noise into the model.
The negative relationship between income (expenditure) and response probability is particularly strong at
high incomes (expenditures). The estimated relationship is highly nonlinear, with the response rate
dropping rapidly in the highest range of expenditures. Models using linear, quadratic or polynomial
functions (such as square-root or cubic-root of expenditures) rather than logarithmic functions achieve
inferior measures of fit. Linear, quadratic and square-root models (Models 7-9) exhibit the poorest fit.
The various models correcting for non-response bias yield similar estimates for the measure of income
inequality. The last two columns in Table 2 report the estimated Gini coefficients for income and
expenditure per capita across models. They range from 0.329 to 0.351, for income, and from 0.305 to
0.320, for expenditure. Considering the differences in specifications used and fit achieved, these ranges
are quite narrow, particularly for expenditure. Across models, 95% confidence intervals of the income
Gini coefficients have lower bounds of 0.324-0.336 and upper bounds of 0.333-0.365. Expenditure Gini
coefficients have lower bounds of 0.302-0.313 and upper bounds of 0.309-0.327. With the exception of
the Gini coefficients from the poorly performing Models 7-9, all Ginis fit within the 95% confidence
intervals of each other. This provides some evidence of consistency of the estimates.
Table 2. Estimation results for various logistic models of response probability
Objective Factor of
Value: Sum of Proportio Akaike Schwarz Per-Capita Per-Capita
E(Î¸1) / E(Î¸2) / Squared -nality Informat. Informat. Income Expendit.
Specification of g(x) s.e. s.e. Weighted Errors (Ïƒ2) Criterion Criterion Gini / s.e. Gini / s.e.
Household level
1: Î¸1+Î¸2log(income) 14.9909 -1.1853 85,079.65 .0776 8,887.82 8,885.20 .3506 .3151
(.0169) (.0016) (.0072) (.0024)
2: Î¸1+Î¸2log(expenditure) 17.2057 -1.4232 81,219.50 .0753 8,770.53 8,767.92 .3426 .3200
(.0184) (.0017) (.0035) (.0033)
Per capita
3: Î¸1+Î¸2log(income) 11.6554 -.9939 83,400.47 .0757 8,837.46 8,834.85 .3488 .3151
(.0122) (.0013) (.0062) (.0023)
4: Î¸1+Î¸2log(expenditure) 13.0790 -1.1742 80,554.84 .0737 8,749.77 8,747.16 .3423 .3181
(.0142) (.0015) (.0035) (.0025)
5: Î¸1+Î¸2log(exp.)2 7.4535 -.0603 81,623.97 .0744 8,783.08 8,780.46 .3421 .3176
(.0066) (.0001) (.0039) (.0026)
6: 1.5485 -.1391 83,644.60 .0757 8,844.85 8,842.23 .3418 .3168
Î¸1log(exp.)+Î¸2log(exp.)2 (.0013) (.0001) (.0045) (.0028)
7: Î¸1+Î¸210-3expenditure 3.3528 -.0254 95,919.03 .0845 9,190.73 9,188.11 .3338 .3084
(.0019) (.0000) (.0044) (.0023)
8: Î¸1+Î¸210-9expenditure2 3.2832 -.0026 99,480.83 .0873 9,282.83 9,280.21 .3289 .3054
(.0020) (.0189) (.0023) (.0017)
17
9: Î¸1+Î¸2expenditureÂ½ 4.0854 -.0137 88,808.82 .0792 8,996.18 8,993.56 .3388 .3130
(.0023) (.0000) (.0052) (.0029)
10: Î¸1+Î¸2expenditure1/3 5.1798 -.1224 85,366.91 .0768 8,896.33 8,893.72 .3408 .3153
(.0035) (.0001) (.0049) (.0029)
Note: Sample size is 2,526 PSUs, containing 46,857 household observations. PSU populations are fitted using
response probabilities estimated for all households. Standard errors on Gini coefficients are bootstrapped estimates.
Beside the ten models in Table 2, we have considered other polynomial specifications as well as a model
controlling for the four quarterly rounds in the 2009 HIECS. While some coefficients in these models
were statistically significant, the modelsâ€™ overall fit was worse than in Models 1-4, and the corresponding
Gini coefficients did not depart significantly from those in Table 2. The imputed household response
probabilities and Gini coefficients are thus not too sensitive to the addition of more variables into g(x).
In the rest of the analysis, we will use Model 4 as a benchmark specification, due to its superior fit, and
similarity to the model used by Korinek et al. (2006, 2007). The following figures provide additional
results for this model, as well as other comparison models. Figure 3 shows householdsâ€™ probability of
survey response by income or expenditure per capita estimated in Models 3 and 4. In agreement with
negative estimates of Î¸2 in the logarithmic specifications, the estimated response-probability falls with
income, most rapidly in the highest range of incomes (expenditures). Figure 3 thus confirms the central
premise of this analysis, that richer households are systematically less likely to participate in surveys, and
that this issue is particularly grave for top-income households. The response probabilities shown here will
be used as the appropriate household weights for the imputation of income distribution and measures of
inequality.
Figure 3. Estimated household response probability by income or expenditure per capita (Models 3,
4)
(a) Model 3 (b) Model 4
The corrected weights differ significantly from the CAPMAS-provided sampling weights. The CAPMAS
provides sampling weights that correct for unit non-response by simply expanding the weight for the non-
response rate at PSUs. CAPMAS-provided sampling weights are normalized to 1, have standard deviation
of 0.173, and are identical for all households within a PSU. Weights from Model 4, obtained as the
inverse response probabilities estimated in that model, have a mean of 1.041, standard deviation of 0.057,
18
and vary across all households even within PSUs. Figure 4 reports the distribution of householdsâ€™
sampling weights provided by the CAPMAS and those derived from Model 4 (demeaned for ease of
comparison).
Figure 4. Distribution of CAPMAS-provided sampling weights and weights correcting for non-
response bias from Model 4
(a) Weights for household-level variables (b) Weights for per-capita variables
Note: Weights are normalized to have a mean of 1, and of mean household size (4.665), respectively.
Use of the corrected weights affects the imputed income distribution. Figures 5-6 show the implications
of our estimation for the imputed distribution of per-capita incomes and the corresponding Lorenz curves,
for the entire population as well as for the poorest and richest households. (Similar results for expenditure
per capita are available on request.) These figures show that our correction of the survey-nonresponse bias
increases our measurement of income-inequality. The Lorenz curve imputed using our weights first-order
dominates both the uncorrected Lorenz curve as well as the CAPMAS sampling-weights corrected Lorenz
curve on the entire domain. The uncorrected and CAPMAS-corrected Lorenz curves do not exhibit clear
dominance over one another. Under our corrected income distribution, the estimated fraction of
households in the highest income range increases, and the fraction of households in all lower income
ranges â€“ including the lowest-income range (less than LE2,500 in Figure 5 panel b) â€“ falls.
19
Figure 5. Cumulative distribution of income per capita across population, and among the poorest
25% and richest 10% of households (Model 4)
(a) Per-capita income distribution (Model 4) (b) Poorest 25% per-capita incomes (Model 4)
(c) Richest 10% per-capita incomes (Model 4)
20
Figure 6. Lorenz curves in the population, and for the poorest 25% and richest 10% of households
(Model 4)
(a) Lorenz curve in the population (Model 4) (b) Lorenz curve for the poorest 25% (Model 4)
(c) Lorenz curve for the richest 10% (Model 4)
Correspondingly, use of the corrected weights affects the imputed Gini index of inequality positively. By
reweighting income distribution to account for householdsâ€™ endogenous survey response, we obtain
significantly higher measures of income inequality. The Gini coefficient for per-capita incomes using
simple household-size weights is 0.3289 (s.e. 0.0023). The Gini coefficient using the CAPMAS-provided
sampling weights is 0.3305 (s.e. 0.0024). The Gini coefficient using response-probability weights
estimated in our Model 4 is 0.3423 (s.e. 0.0035). This corrected Gini coefficient is statistically higher than
both of the uncorrected ones at the 1% level of significance (p-values of 0.002).
For per-capita expenditure, the Gini coefficient for the unweighted distribution is 0.3054 (s.e. 0.0017),
while that using the CAPMAS-provided sampling weights is 0.3070 (s.e. 0.0019). The Gini coefficient
using response-probability weights estimated in Model 4 is 0.3181 (s.e. 0.0025). Again, this corrected
Gini coefficient is statistically higher than either of the uncorrected ones at the 1% level of significance
(p-values of 0.001).
21
Use of the corrected weights also significantly affects the estimated distribution of top incomes. The
Pareto coefficient for unweighted per-capita incomes is 2.428, and the inverted Pareto coefficient is
1.700. For incomes weighted by the CAPMAS-provided weights, these coefficients are 2.392 and 1.718,
respectively. For incomes weighted by the response-probability weights estimated in Model 4, these
coefficients are 2.250 and 1.800. For per-capita expenditure, the Pareto and inverted Pareto coefficients
are 2.685 and 1.593 in the unweighted income distribution, 2.606 and 1.623 in the income distribution
weighted using the CAPMAS weights, and 2.478 and 1.677 in the income distribution weighted as per
Model 4.
The corrected weights estimated across the alternative models in Table 2 give rise to very different
estimates of top-income distribution. The Pareto coefficients for per-capita incomes estimated in Models
1-3 and 5-10 are, respectively: 2.051, 2.268, 2.078, 2.231, 2.217, 2.291, 2.428, 2.219, and 2.210. (These
and additional results for the Gini and Pareto coefficients across all models are provided in the annex.)
This variation can be explained by the differential treatment of top-income households across models.
Different models assign different weights to households with the highest incomes. By estimating
householdsâ€™ survey-response probability as a function of their log-expenditure (or log-income), versus
regular or squared expenditure, we assign very different weights to the highest-income households, while
keeping weights of lower-income households similar. Figure 7 plots the alternative weights across
households with different expenditures. 3 Clearly, the weights diverge for the richest households.
Correspondingly, the estimated Lorenz curves differ particularly for highest-income households (as
evident in Figure A2 in the annex).
Figure 7. Household weights across selected models
Extreme observations
In this section we test the sensitivity of the Gini coefficients to extreme observations on the right-hand
side of the distribution (top incomes), in the raw data as well as in the income distribution corrected for
3
Expenditure, rather than income, is shown for clarity of presentation, since most models use functions of
expenditure. Note that the weights from Model 3 are a function of income, hence their plot against expenditure is not
as smooth as for other models.
22
unit non-response. If top incomes turn out to be influential, we then correct for their presence using an
estimated Pareto distribution as discussed in the methodological part.
In our data, the Gini is very sensitive to extreme observations irrespective of sample size. In the Figure
below, we recalculated the Gini for the CAPMAS surveys by removing top-income observations one at a
time, up to 100 observations and for each of the four years considered. This was done on the 25%
subsample for 2000, 2009 and 2011 and for the full sample in 2005, which means that the sample size
used is different for 2000 and 2009 (12,000 observations), 2005 (48,000 observations) and 2011 (4,000
observations). In this way, we can check how different sample sizes affect the sensitivity of Gini
coefficients to top observations. 100 observations were chosen for removal in recognition of the finding
by Neri et al. (2009) that up to 0.2% of income observations may represent outliers.
We can clearly see a tendency for the Gini to decline rapidly, and we can also see that the sensitivity to
top observations is different for income and expenditure, and different for the four years considered. The
scale of the sensitivity is related to sample size and the welfare aggregate. For both income and
expenditure, the steepest curves are those for 2011 (the smallest sample) and the least steep are those for
2005 (the largest sample). This is perhaps expected as larger subsamples are likely to capture extreme
observations more completely, and the Gini may be less sensitive to each one of them. It is also evident
that the Gini on expenditure is more sensitive to extreme values than the Gini on incomes. This is less
expected given that income has a higher Gini than expenditure and given that expenditure is less likely to
have extreme observations.
Figure 8 â€“ Sensitivity tests of the Gini to the removal of the top 100 observations
Income Expenditure
.34
.36
.32
.34
.3
.32
Gini
Gini
.28
.3
.26
.28
.24
0 20 40 60 80 100 0 20 40 60 80 100
n n
giniinc2000 giniinc2005 giniexp2000 giniexp2005
giniinc2009 giniinc2011 giniexp2009 giniexp2011
It is clear that the extreme values in the Egyptian distribution of income and expenditure cannot be
ignored. On the one hand, removing some of the top observations may contribute to underestimation of
inequality if these observations are accurate and representative of the underlying population. On the other
hand, by keeping top observations that arise from data errors or those that do not represent the underlying
23
population well may lead to overestimation of inequality. In both cases, our inequality estimates would be
biased, particularly in small sample extractions.
The sensitivity of the Gini to extreme observations persists when we correct for unit non-response. A
sensitivity analysis reported in Figure 9 shows that inequality measures are very sensitive to the top
0.025% of observations. In this analysis, we recalculate the Gini and Pareto coefficients after removing
0.025%-0.2% households with the highest incomes (12-96 households in the 100% sample of the 2009
HIECS). A significant portion of the difference in Gini coefficients across models disappears as we
remove the highest-earning 0.025%-0.05% of households (12-24 households). Exclusion of additional
high-income households does not yield significant changes. The difference in statistics across models
appears to converge to a particular level, which decreases at a much slower rate with exclusion of
additional households. 4 (Figures A3 in the annex reports the same patterns for the Pareto and inverted
Pareto coefficients.)
Figure 9. Gini coefficients for income and expenditure in trimmed distributions (Models 3-7)
(a) Gini coefficient for income per capita (b) Gini coefficient for expenditure per capita
4
Not surprisingly, Model 3 yields a distribution of income that is more sensitive to the removal of highest-earning
households than other models. (Refer to the left panels of Figures 9 and A3.) This is because household weights in
Model 3 are functions of income, whereas weights in other models are functions of expenditure. The converse about
lower sensitivity of the Model 3 Gini coefficient for expenditures does not hold, however. Because the distribution
of expenditures suffers less from extreme observations than income, as Table 1 has suggested, expenditure Ginis
across the alternative models vary less in the overall sample, and Model 3 Gini is no less sensitive to the top 0.025%
of observations than Ginis from other models.
Gini coefficients from Models 3-6 are always substantially higher than the unweighted or the CAPMAS-weighted
Gini coefficients, for both income and expenditure. On the other hand, Gini coefficients from Model 7, the linear
model, converge to the uncorrected Ginis after the initial 12 top households are removed. This suggests that the
imputed income (or expenditure) distribution from the linear model does not differ much from the uncorrected
distribution, except for the influence of the topmost 0.025% observations. With the exception of the linear Model 7,
the differences between all response-bias corrected and CAPMAS-corrected Gini coefficients for incomes are
significant at the 1% level in the overall sample, but become significant even at the 0.1% level when the top 0.025-
0.2% of households are removed. This is because the presence of the highest-earning households in our sample
introduces noise that increases standard errors even more than it moves model Gini coefficients away from the
CAPMAS weight-corrected Gini coefficients. Hence, as high-income households are excluded, the values of
uncorrected and corrected Gini coefficients become closer to each other, but their differences retain their statistical
significance.
24
Note: â€˜100â€™ indicates full, untrimmed income distribution. â€™99.975â€™ indicates income distribution with the 0.025%
households with the highest incomes trimmed (12 households in the 100% sample of the 2009 HIECS). Similarly,
â€™99.8â€™ indicates the trimming of 0.2% of highest-earning households (96).
The discussion above suggests that observations with the highest incomes affect the measurement of
inequality. Excluding these observations from the sample yields lower and more homogeneous estimates
of inequality across models. A question arises whether exclusion is appropriate theoretically, given that it
reduces sample size and may result in the censoring of meaningful observations. Here we address these
questions by comparing actual observations of top incomes with values imputed under their expected
Pareto distribution, and estimating the effects on Gini coefficients. This provides an alternative way to
evaluate robustness of our Gini coefficients to the presence of extreme income observations in the sample.
In view of our results about survey non-response by top-income households, this also allows us to
comment on the relative significance of the two statistical issues.
Table 3 presents semi-parametric estimates of Gini coefficients, obtained by replacing the highest top 10
percent of income observations (alternatively, 5% or 20%) with values imputed from the corresponding
Pareto distribution as per Cowell and Flachaire (2007), and Davidson and Flachaire (2007). The first three
rows show the benchmark non-parametric estimates â€“ unweighted; corrected for sampling probability
using CAPMAS weights; and corrected for non-response bias as per Model 4. These three rows again
illustrate the importance of correcting for survey non-response. These rows serve as a benchmark to
which the following semi-parametric estimates will be compared.
The next three rows present the main results â€“ semi-parametric estimates when the top 10 percent of
incomes are imputed from a corresponding Pareto distribution. The following six rows report on a
robustness check, where such imputation is performed on top 5 percent, or top 20 percent of incomes.
The main finding is that the CAPMAS data do not appear to suffer from extreme income observations
relative to what would be predicted if our top-income data followed the Pareto distribution exactly. The
corrected Gini coefficients are essentially unchanged, falling or rising by a very small amount. This
suggests that the exclusion of top incomes in the previous section is not warranted on the grounds that
they are outliers, but simply as a robustness test of the Gini estimates to individual income observations.
The size of the correction for extreme observations is trivial compared to the correction for unit non-
response. The results for expenditure per capita are analogous, and are shown in the annex.
In the income distribution uncorrected for non-response bias, the semi-parametric Gini coefficient â€“
corrected for the possible presence of extreme observations among the top 10% of incomes â€“ is 0.3278
compared to the non-parametric value of 0.3289. When we increase the range of top incomes to be
imputed, from 10% to 20% of households, the semi-parametric Gini falls to 0.3273. In the income
distribution sampling-corrected using CAPMAS weights, the semi-parametric Gini coefficient is same as
the non-parametric estimate, 0.3305. Finally, in the income distribution corrected for non-response bias
using weights from Model 4, the corrected Gini coefficient is again the same as the uncorrected value,
0.3423. When we increase the range of top incomes to be imputed, from 10% to 20% of households, the
semi-parametric Gini rises slightly, to 0.3425.
25
Table 3. Non-parametric and semi-parametric estimates of Gini coefficients
Correction Pareto
Modeling of for extreme Sampling coefficient a Ginin-k, Ginik, Gini
top incomes observations k correction (s.e.) (s.e.) (s.e.) (s.e.)
Non- No k=10% No 2.4279 .2191 .2584 .3289
parametric (.0309) (.0007) (.0069) (.0023)
No k=10% Yes, CAPMAS 2.3919 .2175 .2654 .3305
(.0326) (.0007) (.0070) (.0024)
No k=10% Yes, Model 4 2.2501 .2214 .2844 .3423
(.0329) (.0007) (.0112) (.0035)
Semi- Yes k=10% No 2.4279 .2191 .2594 .3278
parametric (.0309) (.0007)
Yes k=10% Yes, CAPMAS 2.3919 .2175 .2643 .3305
(.0326) (.0007)
Yes k=10% Yes, Model 4 2.2501 .2214 .2857 .3423
(.0329) (.0007)
Semi- Yes k=5% No 2.4638 .2463 .2546 .3288
parametric (.0937) (.0008)
Yes k=5% Yes, CAPMAS 2.4378 .2452 .2580 .3305
(.0969) (.0008)
Yes k=5% Yes, Model 4 2.2507 .2503 .2856 .3422
(.0961) (.0008)
Semi- Yes k=20% No 2.4190 .1864 .2606 .3273
parametric (.0223) (.0007)
Yes k=20% Yes, CAPMAS 2.3811 .1849 .2658 .3306
(.0234) (.0007)
Yes k=20% Yes, Model 4 2.2603 .1876 .2840 .3425
(.0232) (.0007)
We can now come back to the question of within-j/between-j trade-off discussed in the methodological
section. We argued that using a highly aggregated j would be likely to overshoot the Gini correction and
would lead to results that are less consistent with the Pareto corrections proposed. Indeed, our non-
response correction â€“ of 1-2 percentage points â€“ is smaller than that reported by Korinek et al. (2006,
2007) â€“ of 4-5 percentage points.
To test the claims regarding appropriate geographic aggregation, we have re-estimated the models in
Table 2 using governorates by urban and rural substrata (50 areas) rather than PSUs (see Table 4). If we
compare the models with the best fit (model 4) we find that using governorates by urban and rural areas
raises the corrected Gini (s.e.) for income from 34.23 (0.0035) to 37.14 (0.0129) and the corrected Gini
for expenditure from 31.81 (0.0025) to 34.19 (0.0075). Across most models, the estimated Ginis rise by
3-5 percentage points for income, and by 1-4 percentage points for expenditure.
In our view, Table 2 provides more accurate estimates for the HIECS data than Table 4. First, Ginis
estimated at the governorate by urban/rural areas are consistently higher than the semi-parametric Ginis
estimated using the alternative Cowell and Flachaire (2007) and Davidson and Flachaire (2007)
methodology proposed while the Ginis estimated with PSUs are very much in line with those estimates.
Second, in Table 4, all Ginis show significantly higher standard errors. Third, the HIECS data has a much
higher household response rate (96.3%) than the US Current Population Survey (91.7%), implying less
bias. And fourth, inequality is much lower in the HIECS data, suggesting that the percentage-point
26
correction may be lower. The optimal tradeoff of the within-j/between-j number of bins depends on the
nature of the model and on the nature of the data at hand. This paper has proposed a different approach
and applied this approach to a different data set as compared to Korinek et al. (2006 and 2007). Clearly,
the question of optimal within-j/between-j trade-off will require testing in a separate paper to be fully
exhausted but this paper showed that an alternative path is possible and also preferable in the case of the
HIECS data.
Table 4. Estimation Results for Various Logistic Models of Response Probability (by governorate
and urban/ rural areas)
Objective Factor of
Value: Sum of Proportio Akaike Schwarz Per-Capita Per-Capita
E(Î¸1) / E(Î¸2) / Squared -nality Informat. Informat. Income Expendit.
Specification of g(X) s.e. s.e. Weighted Errors (Ïƒ2) Criterion Criterion Gini / s.e. Gini / s.e.
Household level
1: Î¸1+Î¸2log(income) 20.8870 -1.7686 780,896 .8543 486.81 484.19 .4411 .3398
(.0088) (.0008) (.0389) (.0070)
2: Î¸1+Î¸2log(expenditure) 25.5496 -2.2284 299,122 .3321 438.83 436.21 .3798 .3625
(.0073) (.0007) (.0151) (.0181)
Per capita
3: Î¸1+Î¸2log(income) 15.8384 -1.4714 577,654 .6505 471.74 469.12 .4210 .3375
(.0063) (.0007) (.0301) (.0086)
4: Î¸1+Î¸2log(expenditure) 18.6483 -1.7947 299,994 .3321 438.97 436.36 .3714 .3419
(.0062) (.0006) (.0129) (.0075)
5: Î¸1+Î¸2log(exp.)2 9.9506 -.0916 344,805 .3828 445.94 443.32 .3784 .3452
(.0028) (.0000) (.0188) (.0101)
6: Î¸1log(exp)+Î¸2log(exp)2 2.0269 -.1934 450,540 .5036 459.31 456.70 .3862 .3481
(.0005) (.0001) (.0266) (.0134)
7: Î¸1+Î¸210-3expenditure 3.1297 -.0344 2,189,226 2.2715 538.35 535.74 .3594 .3202
(.0007) (.0000) (.0256) (.0104)
8: Î¸1+Î¸210-9expenditure2 2.9787 -.1329 2,599,937 2.6735 546.95 544.34 .3375 .3089
(.0008) (.0005) (.0089) (.0037)
9: Î¸1+Î¸2expenditureÂ½ 4.3705 -.0195 1,107,645 1.2019 504.29 501.67 .3859 .3399
(.0009) (.0000) (.0373) (.0165)
10: Î¸1+Î¸2expenditure1/3 6.1436 -.1785 667,983 .7437 479.00 476.39 .3889 .3459
(.0014) (.0001) (.0335) (.0156)
Note: Sample size is 50 governorate-urban/rural strata containing 46,857 household observations. Standard errors on Gini
coefficients are bootstrapped estimates.
6. How different is Egypt from other countries?
In this section, we compare the Ginis and the inverted Pareto (beta) coefficients estimated for Egypt with
a sample of world country/year Ginis and betas. The purpose is to put our results into the global context
and understand whether our results pertain to an exceptional case-study or, rather, to an ordinary
distribution of incomes. For these comparisons, we will use a sample of 107 countries and 418
country/year observations taken from the World Bank micro data repository. This database joins and
standardizes several databases of household budget surveys for developing countries available at the
World Bank. For each country and year it contains the full distribution of incomes, expenditures or both
depending on the country and year considered.
27
The following figure plots the Gini and beta coefficients for both income and expenditure per capita
across country surveys sorted in ascending order. The top panels use all 107 countries available in the
database while the bottom panels use a selection of 65 countries that are the closest to Egypt in terms of
GDP per capita (2008 USD). In the eight panels of the figure, we superimpose the Gini and the beta
coefficients estimated for Egypt from the 2009 full sample (dashed line) and the median value for the full
world distribution (solid line).
The Egyptian Ginis are clearly situated in the lower part of the world distribution for both income and
expenditure. This is the case also if we restrict the analysis to countries at similar levels of GDP per
capita. This confirms that the Gini in Egypt is low by world standards. Instead, if we consider the beta
coefficient and all country/year points, Egypt falls very close to the median value for both income and
expenditure. This is also the case with expenditure for the selected sample of similar countries while the
beta coefficient is slightly to the left of the median value if we consider selected countries and income.
This last result should be taken with caution because the income panel for selected countries includes only
13 countries which are all Latin American countries.
Figure 10 - Gini and inverted Pareto coefficients for Egypt and the rest of the world
Income (All) Expenditure (All)
Gini Beta Gini Beta
100 150 200 250
100 150 200 250
200
200
150
150
100
100
n
n
n
n
50
50
50
50
0
0
0
0
30 40 50 60 70 1.5 2 2.5 3 20 40 60 80 100 1 1.5 2 2.5 3
249 country/years, 26 countries 169 country/years, 92 countries
Income (Selected) Expenditure (Selected)
Gini Beta Gini Beta
150
150
80 100
80 100
100
100
60
60
n
n
n
n
40
40
50
50
20
20
0
0
0
0
30 40 50 60 70 1.5 2 2.5 3 30 40 50 60 1 1.5 2 2.5 3
108 country/years, 13 countries 118 country/years, 60 countries
In essence, the right-hand tails of the Egyptian distributions are not much different from other countries
despite a very low income inequality. As we showed throughout the paper, the Gini is very sensitive to
top incomes. The Egyptian beta being close to the world median value suggests that top incomes are well
28
represented as compared to world countries and yet inequality is still very low. This is rather robust
evidence of the good quality of the Egyptian data, a finding consistent with the rest of the paper and with
the World Bank (2013) report on inequality in Egypt.
7. Discussion
This paper has evaluated income inequality and the distribution of top incomes in Egypt in the presence
of a variety of potential statistical issues. As a byproduct, it has evaluated the quality of data in the
Egyptian Household Income, Expenditure and Consumption Survey (HIECS).
We discussed the problem of item non-response in household surveys, but finding no missing items in the
HIECS data, we confirmed data quality on these grounds. We then tested and corrected for the problem of
unit non-response by top income households. Correction for unit non-response increased the estimate of
inequality by 1.3 percentage points. The estimated Gini coefficient for income per capita rose from 0.329
to 0.342, while the Gini for expenditure per capita rose from 0.305 to 0.318, statistically very significant.
Given the importance of representation of top incomes in the sample, we next evaluated how influential
are individual income observations at the upper tail of the Egyptian distribution, and whether they present
a measurement issue. We found, however, that the Egyptian distribution of top incomes follows rather
closely the Pareto distribution, so the observed top incomes appear to be representative of the underlying
population and need to be considered when measuring inequality. This analysis reinforces the case for
assigning of greater weight to the observed top incomes to correct for the systematic non-response of top
income households in the population.
Finally, we benchmarked the estimated inequality and top income distribution in Egypt vis-Ã -vis 418
household surveys drawn from 107 countries, and found that the Gini coefficient in Egypt is significantly
below median values for other countries, while the distribution of top incomes is around the median.
Income inequality in Egypt is thus confirmed to be low while the distribution of top incomes is not
atypical as compared to other countries.
There are several policy implications of these results that are relevant for Egypt today. First, the paper has
validated the quality of the Egyptian HIECS with respect to top observations, the income and expenditure
aggregates and the measurement of income inequality. Also, in the world of household surveys, the
Egyptian data stand out as particularly good data. There are many more issues that could be explored in
relation to data quality that were not covered but the tests conducted in this paper show that the HIECS
compares well to world standards. The HIECS data cannot be simply dismissed as â€œunreliableâ€? because
people have a different perception of income inequality.
Second, these findings motivate the search for factors that could explain popular perceptions about
income inequality elsewhere. As the World Bank (2013) report has shown, there are many factors that
could explain perceptions of inequality that are little related with the measurement of income inequality
itself and that are little researched, including the role of expectations about the future, changes in the
reference groups, the expansion and penetration of the social media or the lack of GDP trickle-down
effects. The priority for Egypt today may not be the reduction of income inequality but the expansion of
the growth base, providing more opportunities to economically marginalized groups such as the youth and
29
women, providing more voice to the media-excluded groups such as the poor and rural residents and
others. Inequality of opportunities, inequality of rights, inequality of aspirations and inequality of values
are some of the inequality dimensions that are easily confounded with income inequality but that should
be carefully distinguished by the policy maker.
Third, the fact that GDP growth did not trickle down to households during the decade that preceded the
2011 Egyptian revolution is very consistent with the fact that income inequality was low and changed
little during the period. Preliminary results of an on-going research on GDP in Egypt show that growth
has been mostly captured and retained by corporations and not paid to households via wages, benefits or
dividends. The overarching goals of the World Bank are poverty reduction and shared prosperity
measured in terms of the income growth of the bottom 40% of the population. Achieving these objectives
largely relies on making growth inclusive of the bottom 40%, something that has not been happening in
Egypt over the past decade. This is another question that requires further attention and priority as
compared to income inequality.
References
Atkinson, A, Piketty, T. and Saez, E. (2011) Top incomes in the long run of history, Journal of Economic
Literature, 49, 3-71.
Cowell, F.A. and Victoria-Feser, M.-P. (1996) Poverty measurement with contaminated data: A robust
approach, European Economic Review, 40, 1761-1771.
Cowell, F.A. and Victoria-Feser, M.-P. (1996) Robustness properties of inequality measures,
Econometrica, 64, 77- 101.
Cowell, F.A. and Victoria-Feser, M.-P. (2007) Robust Lorenz curves: a semiparametric approach, Journal
of Economic Inequality, 5, 21â€“35.
Cowell, F.A. and Flachaire, E. (2007) Income distribution and inequality measurement: The problem of
extreme values, Journal of Econometrics, 141(2), 1044â€“1072.
Davidson, R. and Flachaire, E. (2007) Asymptotic and bootstrap inference for inequality and poverty
measures, Journal of Econometrics, 141(1), 141-166.
Deaton, A. (2005) Measuring Poverty in a growing world (or measuring growth in a poor world), The
Review of Economics and Statistics, LXXXVII (1), 1-19
Korinek, A., Mistiaen, J.A. and Ravallion, M. (2006) Survey nonresponse and the distribution of income,
Journal of Economic Inequality, 4, 33-55.
Korinek, A., Mistiaen, J.A. and Ravallion, M. (2007) An econometric method of correcting for unit
nonresponse bias in surveys, Journal of Econometrics, 136, 213-235.
Neri, L., Gagliardi, F., Ciampalini, G., Verma, V. and Betti, G. (2009) Outliers at upper end of income
distribution (EU-SILC 2007), DMQ Working Paper n. 86, November 2009.
Pareto, V. (1896) La courbe de la repartition de la richesse, Ecrits sur la courbe de la repartition
30
de la richesse, (writings by Pareto collected by G. Busino, Librairie Droz, 1965), 1-15.
World Bank (2013) Inside Inequality in Egypt: Historical trends, recent facts, peopleâ€™s perceptions and
the spatial dimension, mimeo.
31
Figure A1 â€“ Top incomes and expenditures (EGP/year/capita)
inc2000 inc2005 inc2009 inc2011
250
400
50 100 150
10 20 30 40
200
300
150
200
100
0 100
0 50
0
0 20 40 60 80 100 0 20 40 60 80 100 0 20 40 60 80 100 0 20 40 60 80 100
n n n n
inc2000 inc2005 inc2009 inc2011
250
400
50 100 150
10 20 30 40
200
300
150
200
100
0 100
0 50
0
0 50 100 150 0 100 200 300 400 500 0 50 100 150 0 10 20 30 40
n n n n
exp2000 exp2005 exp2009 exp2011
50 100 150
120
10 20 30 40 50
20 40 60 80100
100
20406080
0
0 20 40 60 80 100 0 20 40 60 80 100 0 20 40 60 80 100 0 20 40 60 80 100
n n n n
exp2000 exp2005 exp2009 exp2011
50 100 150
50 100 150
0 1020304050
100
0 20406080
0
0
0 50 100 150 0 100 200 300 400 500 0 50 100 150 0 10 20 30 40
n n n n
Note: x-axis=top 100 observations (top panels for income and expenditure) or top 1% of observations; y-axis=income per capita
or expenditure per capita per year. The size of the four samples is different with 12,000 observations for 2000 and 2009, 48,000
observations for 2005 and 4,000 observations for 2011.
32
Figure A2. Differences between Lorenz curves (Model 4)
(a) Unweighted vs. Weighted (Model 4) (b) CAPMAS-Weighted vs. Weighted (Model 4)
33
Figure A3. Pareto and inverted Pareto coefficients for income and expenditure per capita in
trimmed distributions (Models 3-7)
(a) Pareto coefficient for income per capita (b) Pareto coefficient for expend. per capita
(c) Inverted Pareto coefficient for income per capita (d) Inverted Pareto coef. for expend. per capita
Note: â€˜100â€™ indicates full, untrimmed income distribution. â€™99.975â€™ indicates income distribution with the
0.025% households with the highest incomes trimmed (12 households in the 100% sample of the 2009
HIECS). Similarly, â€™99.8â€™ indicates the trimming of 0.2% of highest-earning households (96).
34