Urban Density Data

Data and replication files for 'The economics of urban density'

by Gilles Duranton and Diego Puga

This page distributes and documents computer programs and data to replicate the results obtained by Gilles Duranton and Diego Puga in their article 'The economics of urban density', published in Journal of Economic Perspectives, 34(3), Summer 2020: 3-26.

Urban density boosts productivity and innovation, improves access to goods and services, reduces typical travel distances, encourages energy-efficient construction and transport, and facilitates sharing scarce amenities. However, density is also synonymous with crowding, makes living and moving in cities more costly, and concentrates exposure to pollution and disease. In this article, we explore the appropriate measurement of density and describe how it is both a cause and a consequence of the evolution of cities. We then discuss whether and how policy should target density and why the trade-off between its pros and cons is unhappily resolved by market and political forces.

This replication package calculates two measures of density: "naive density" (population per square kilometre) and "experienced density" (population within ten kilometres of the average resident) for us metropolitan areas and uses these data to produce the two panels in figure 1 in the article. It also calculates three elasticities for us metropolitan areas reported in the text of the article: the elasticity of experienced density with respect to city population, the elasticity of naive density with respect to city population, and the elasticity of average distance to the centre with respect to city population. Finally, it calculates experienced density for the entire Canada and for the entire United States.

Population or employment density is often used as a summary statistic to describe the spatial concentration of economic activity. In this context, density is commonly defined as the number of individuals per unit geographic area. Such "naive density" is easy to calculate. However, it may not appropriately reflect the density actually faced by the individual or firm at hand. One problem is that economic units are traditionally defined as aggregates of administrative units: for example, us metropolitan areas are defined based on counties. However, if a metro area includes some counties with substantial rural portions, such calculation will understate the density experienced by most economic actors. In particular, the match between urban and county boundaries is systematically looser for younger and less dense metropolitan areas in the West.

De la Roca and Puga (2017) and and Henderson, Kriticos, and Nigmatulina (2020) have proposed measuring "experienced density" by counting population within a given radius around each individual. Such experienced density, in addition to dealing with the uneven tightness of area boundaries, captures better how close the typical individual is to other people when population is unevenly distributed. To give an illustrative example at the level of countries, where boundaries are given, the United States has nearly nine times the population of Canada with a slightly smaller surface area, so its naive density is ten times higher. And yet walking around cities and towns in both countries, one likely perceives similar concentrations of people nearby. Indeed, the average inhabitant in Canada has about 343,000 people living within a ten-kilometre radius, compared with about 306,000 in the United States.

The replication files

The full replication package is available for download from this site as a zip file: density_replication.zip (6.92 Gb) .

For researchers not intending to replicate the Python/ArcGIS part of the analysis, a much smaller partial replication package is also available for download as a zip file: density_replication_notif.zip (0.71 Gb) . This stills replicates all the results, but relies on intermediate data files from our own run of the Python/ArcGIS scripts. The only difference with respect to the full replication package is that two large population grids (data/src/grid/can_ppp_2010_UNadj.tif and data/src/grid/usa_ppp_2010_UNadj.tif) are not included.

Instructions and overview of the replication files

After downloading and placing the full uncompressed replication package under some directory on your computer that will be the root directory of the replication files:

The Stata script code/_density_run.do first runs code/1_density_data.do to perform the data construction. This is done on the basis of the data described under Source data below and located in the directories data/src/blkg, data/src/county, and data/src/grid.

If the flag global GeocodeAgain = 1 is set in code/_density_run.do, then code/1_density_data.do in turn runs the Python script code/python/python_batch_geocoding.py to re-geocode city centres, otherwise it relies on the intermediate data file from our run of this Python script. If the flag global DisableArc = 0 is set in code/_density_run.do, then code/1_density_data.do in turn runs the ArcGIS/Python scripts code/arcgis/density_exp.py and code/arcgis/density_exp_isocode.py, otherwise it relies on the intermediate data files from our run of these ArcGIS/Python scripts. The intermediate data files, described under Intermediate data below, are located in the directory data/intermediate.

After the Stata script code/1_density_data.do creates all data files used for the analysis and places them in the directory data/processed, the Stata script code/_density_run.do automatically runs code/2_density_analysis.do to perform the analysis of the processed data (described under Processed data below) and stores all the results (described under Results below) in the results/ directory.

Experienced density calculations for these and other data

The ArcGIS/Python scripts used to calculate experienced density have been written so that they can easily be used on data for other areas as well. We now discuss some important considerations to keep in mind when doing so.

We define experienced density as population within 10 kilometres of the average resident. To calculate experienced density for us metropolitan areas, we first measure the number of people within a 10 kilometres radius of each cell in a population grid for the entire United States. We then compute, for all grid cells in each metropolitan area, the population-weighted average of this count of people within 10 kilometres. Weighting by population is important, since otherwise we would be calculating population within ten kilometres of the average place instead of within ten kilometres of the average person.

Measuring the number of people within a 10 kilometres radius of each cell in a population grid requires approximating a circle with a jagged shape made up of cells (48,301 cells of 3 arc-seconds by 3 arc-seconds each in our case). We must then take into account that, on a grid with a geographic spatial projection, the actual surface area of those cells varies across the grid in proportion to the cosine of the latitude. Our code applies a correction factor so that our calculations reflect the number of people in a neighbouring area corresponding to a circle with a 10 kilometre radius.

The ArcGIS/Python script code/arcgis/density_exp.py calculates experienced density for us metropolitan areas. It takes as inputs the population grid in Geotiff format data/src/grid/usa_ppp_2010_UNadj.tif and the geographical boundaries for metropolitan areas in Shapefile format data/src/grid/msa1999_boundaries.shp (and the associated files with .dbf, .prj and .shx extensions).

By editing the header of code/arcgis/density_exp.py to point to a different population grid and a different set of city boundaries, interested users can easily calculate experienced density for cities in any other country or for us cities with alternative city definitions. Note that the script expects a population grid with a geographic projection and 3 arc-seconds by 3 arc-seconds resolution, but such grids are readily available for countries throughout the world from https://www.worldpop.org.

The ArcGIS/Python script code/arcgis/density_exp_isocode.py calculates experienced density for the entire United States and for the entire Canada. It is written so that the same calculation can be done for any country in the world simply by placing the https://www.worldpop.org grid for that country with number of people per pixel and total country population matching the corresponding official United Nations population estimates and then running the script code/arcgis/density_exp_isocode.py with the country's ISO Alpha-3 Code as an argument. For instance, running "C:/Program Files/ArcGIS/Pro/bin/Python/Scripts/propy.bat" density_exp_isocode.py CAN from the command prompt calculates experienced density for Canada. The Stata script code/1_density_data.do defines local isocodelist "CAN USA". If one also wanted to calculate experienced density for Mexico, it would just be a matter of editing this line to local isocodelist "CAN USA MEX" and re-running the replication code after downloading mex_ppp_2010_UNadj.tif from https://www.worldpop.org and placing it in data/src/grid/mex_ppp_2010_UNadj.tif.

Software and hardware notes

All of the results and figures in the Journal of Economic Perspectives article have been produced using the code and data provided, Stata version 16.1, Python version 3.8.2, and ArcGIS Pro version 2.5.

The code has been written to be as portable as possible. Nevertheless, the following considerations should be kept in mind (most, if not all, of these consideration will be irrelevant if one skips the re-geocoding of city centres and the re-calculation of experienced density and relies on the intermediate data files provided for these two steps of the data construction):

Source data

To calculate experienced density (population within ten kilometres of the average resident), we use gridded population data at 3 arc-second resolution (approximately 100m at the equator) from WorldPop (2018). These gridded population data are available to download in Geotiff format from https://www.worldpop.org. The units are number of people per pixel, with total country population matching the corresponding official United Nations population estimates. We use 2010 population grids for Canada (data/src/grid/can_ppp_2010_UNadj.tif) and the United States (data/src/grid/usa_ppp_2010_UNadj.tif).

For the United States, we calculate experienced density not just for the entire country, but for all 275 metropolitan areas in the conterminous United States. This calculation also uses the geographical boundaries for these metropolitan areas in Shapefile format in data/src/grid/msa1999_boundaries.shp (and the associated files with .dbf, .prj and .shx extensions). This Shapefile merges three Shapefiles obtained from the us Bureau of the Census (https://www.census.gov/geographies/mapping-files.html): ma99_99.shp for Metropolitan Statistical Areas, cm99_99.shp for Consolidated Metropolitan Statistical Areas, and ne99_d00.shp for New England County Metropolitan Areas. The Shapefile data/src/grid/msa1999_boundaries.shp also contains the area of each metropolitan area (variable area_ha, expressed in hectares), and we use this to calculate naive density for them.

To calculate naive density for metropolitan areas in the conterminous United States, in addition to their area, we need their population. We use population for us Counties from the 2010 Census obtained from us Census Bureau (2011) in data/src/county/co-est00int-tot.csv. This was downloaded from https://www2.census.gov/programs-surveys/popest/datasets/2000-2010/intercensal/county/co-est00int-tot.csv.

To assign 2010 County populations to metropolitan areas, we use Metropolitan Statistical Area (MSA) and Consolidated Metropolitan Statistical Area (CMSA) definitions outside of New England and New England County Metropolitan Area (NECMA) definitions in New England, as set by the Office of Management and Budget on 30 June 1999. These definitions are available in data/src/county/99mfips.txt for MSA/CMSAs and in data/src/county/99nfips.txt for NECMAs. These files were downloaded from https://www.census.gov/population/estimates/metro-city/99mfips.txt and https://www.census.gov/population/estimates/metro-city/99nfips.txt.

To estimate the elasticity of average distance to the city centre with respect to city population, we first determine the location of the centre of each metropolitan area from the location of its core municipality reported by Google Maps. This query is automatically done by the replication code. We then compute, for each metropolitan area, the population-weighted average distance to the centre of its Census block groups, using five-year 2008-2012 data from the 2012 American Community Survey obtained from the IPUMS-NHGIS project (Manson, Schroeder, Riper, and Ruggles, 2019). The data was downloaded from https://www.nhgis.org/. This includes the geographical boundaries for Census block groups corresponding to the 2012 American Community Survey in Shapefile format in data/src/blkg/US_blck_grp_2012.shp (and the associated files with .dbf, .prj, .shp.xml, and .shx extensions) and also the total population of each block group (files data/src/blkg/nhgis0026_ds191_20125_2012_blck_grp.dat, data/src/blkg/nhgis0026_ds191_20125_2012_blck_grp.do, and data/src/blkg/nhgis0026_ds191_20125_2012_blck_grp_codebook.txt). Only the subset of the 2012 American Community Survey data set strictly required for the replication is redistributed with this replication package, as per the guidelines in https://www.nhgis.org/research/citation.

Intermediate data

Since over time Google Maps may make minor modifications to the coordinates assigned to the centre of the main city in each us metropolitan area, and since querying Google maps for these coordinates also requires a Google API Key, we the provide the coordinates obtained in our run of the code on 2 May 2020. This allows skipping the re-geocoding of city centres (setting the flag global GeocodeAgain = 0 in code/_density_run.do) and still replicating all the results.

For the benefit of MacOS and Linux/Unix users as well as Windows users without an ArcGIS Pro license, we also provide the intermediate data files from our own run of the ArcGIS/Python code to calculate experienced density. This allows skipping this part of the data construction (setting the flag global DisableArc = 1 in code/_density_run.do) and still replicating all the results.

The intermediate data consist of the following files and variables:

Processed data

The processed data on which the data analysis is performed are provided with this replication package, but also fully recreated by the replication code from the original sources. The processed data consist of the following files and variables:

All of these processed data files are also provided in comma-delimited format with the same file names, but a .csv instead of .dta extension. These comma-delimited files are also fully recreated by the replication code.

Results

All the results are placed in the results/ directory.

Figure 1 plots density vs. population for us metropolitan areas. Panel (a), for experienced density, is saved in Encapsulated PostScript format as results/density_fsrc_exp_pop.eps. Panel (b), for naive density, is saved in Encapsulated PostScript format as results/density_fsrc_raw_pop.eps. Both panels are also saved in Portable Network Graphics format with the same file names, but a .png instead of .eps extension.

Results mentioned in the text are saved to the file results/density_text_results.txt, in which the relevant paragraphs are automatically written incorporating the numbers calculated by code/2_density_analysis. This output file reads as follows:

'The economics of urban density', by Gilles Duranton and Diego Puga

Results mentioned in the text

Section 2

The average inhabitant in Canada has about 343,000 people living within
a ten-kilometre radius, compared with about 306,000 in the United States.

Panel (a) of figure 1 plots for us metropolitan areas experienced
density, measured as population within ten kilometres of the average
resident, against total population. The implied elasticity is 0.51. If
we use instead naive density, dividing total population by total land
area within the official boundaries of the metropolitan areas, we find
the same elasticity with respect to total population, 0.51, but the fit
is poorer with an R² of 0.49 instead of 0.76.

Section 4

Earlier, we provided an estimate of the elasticity of density with
respect to population for us metropolitan areas of 0.51. In addition to
lowering their housing consumption, residents also react to higher
housing prices by moving to cheaper, less-accessible locations. When we
estimate the elasticity of average distance to the centre with respect
to city population, we get 0.30.

References

De la Roca, Jorge and Diego Puga. 2017. Learning by working in big cities. Review of Economic Studies 84(1): 106-142.

Duranton, Gilles and Diego Puga. 2020. The economics of urban density. Journal of Economic Perspectives 34(3): 3-26.

Henderson, J. Vernon, Sebastian Kriticos, and Jamila Nigmatulina. 2020. Measuring urban economic density. Journal of Urban Economics (forthcoming).

Manson, Steven, Jonathan Schroeder, David Van Riper, and Steven Ruggles. 2019. Integrated Public Use Microdata Series, National Historical Geographic Information System: Version 14.0. Minneapolis: ipums.

us Census Bureau. 2011. Intercensal Estimates of the Resident Population for Counties and States: April 1, 2000 to July 1, 2010. Washington, dc: us Census Bureau.

WorldPop. 2018. Global High Resolution Population Denominators Project. Southampton: WorldPop (https://www.worldpop.org). Funded by The Bill and Melinda Gates Foundation (opp1134076).