Country Ruggedness and Geographical Data

Data and replication files for 'Ruggedness: The blessing of bad geography in Africa'

by Nathan Nunn and Diego Puga

This site distributes and documents the dataset of terrain ruggedness and other geographical characteristics of countries created by Nathan Nunn and Diego Puga for their article 'Ruggedness: The blessing of bad geography in Africa', published in the Review of Economics and Statistics 94(1), February 2012: 20-36, as well as other variables and computer code required to replicate their results. Users of this dataset are asked to cite the Review of Economics and Statistics article as the source. We would also appreciate it if you let us know the details of any paper in which you use the data by sending an email to Diego Puga (diego.puga@cemfi.es).

There are two main components in this dataset:

The country-level data on terrain ruggedness and other characteristics of countries and the computer code required to replicate the regressions in the article 'Ruggedness: The blessing of bad geography in Africa'. These data and replication files, documented below, are available for download from this site as a zip file: rugged_data.zip (73 Kb) . This contains:
- The country-level data in Stata version 10/11 format: rugged_data.dta.
- The country-level data in comma-delimited ascii format: rugged_data.csv.
- A Stata do file that replicates the regression tables contained in the article 'Ruggedness: The blessing of bad geography in Africa': rugged_regr.do.
- A Stata log file produced when running the corresponding do file: rugged_regr.log
The underlying grid-cell-level data on terrain ruggedness. These are calculated at the level of 30 arc-second cells on a regular geographic grid covering the Earth, and can also be downloaded from this site by following the links provided below.

Country-Level Data

The country-level data on terrain ruggedness and other characteristics of countries includes the following variables:

isocode: Country 3-letter ISO code. Alpha-3 code definitions from the ISO 3166 Maintenance Agency as of 2000.
isonum: Country numeric ISO code. Numeric-3 code definitions from the ISO 3166 Maintenance Agency as of 2000.
country: Country name. English full name definitions from the ISO 3166 Maintenance Agency as of 2000.
rugged: Ruggedness (Terrain Ruggedness Index, 100 m). This is the Terrain Ruggedness Index originally devised by Riley, DeGloria, and Elliot (1999) to quantify topographic heterogeneity in wildlife habitats providing concealment for preys and lookout posts. Let e_r,c denote elevation at the point located in row r and column c of a grid of elevation points. Then the Terrain Ruggedness Index of Riley et al. (1999) at that point is calculated as $\sqrt{\sum_{i = r - 1}^{r + 1} \sum_{i = c - 1}^{c + 1} (e_{i, j} - e_{r, c})^{2}}$ . The source of elevation data is GTOPO30 (US Geological Survey, 1996), a global elevation data set developed through a collaborative international effort led by staff at the US Geological Survey's Center for Earth Resources Observation and Science (EROS). Elevations in GTOPO30 are regularly spaced at 30 arc-seconds across the entire surface of the Earth on a map using a geographic projection, so the sea-level surface distance between two adjacent grid points on a meridian is half a nautical mile or, equivalently, 926 metres. After calculating the Terrain Ruggedness Index for each point on the grid, we average across all grid cells in the country not covered by water to obtain the average terrain ruggedness of the country's land area. Since the sea-level surface that corresponds to a 30 by 30 arcsecond cell varies in proportion to the cosine of its latitude, when calculating the average terrain ruggedness — or the average of any other variable — for each country, we weigh each cell by its latitude-varying sea-level surface. We assign land to countries — for this and other variables — using digital boundary data based on the fifth edition of the Digital Chart of the World (US National Imagery and Mapping Agency, 2000), which we have updated to reflect 2000 country boundaries using information from the International Organization for Standardization ISO 3166 Maintenance Agency and other sources. We exclude areas covered by permanent inland water area features contained in the same edition of the Digital Chart of the World. The units for the terrain ruggedness index correspond to the units used to measure elevation differences. In our calculation, ruggedness is measured in hundreds of metres of elevation difference for grid points 30 arc-seconds (926 metres on the equator or any meridian) apart.
rugged_popw: Alternative ruggedness (pop. weighted TRI, 100 m). In addition to the Terrain Ruggedness Index, we provide four alternative ruggedness measures. To capture the possibility that ruggedness may be more important (and thus should be given more weight) in areas that are more densely populated today, we calculate a population-weighted measure of ruggedness. We start by calculating the Terrain Ruggedness Index of each 30 by 30 arc-second cell but, in averaging this for each country, we weight ruggedness in each cell by the share of the country's population located in that cell. The population data are for 2000 and are from the LandScan data set (Oak Ridge National Laboratory, 2001), which has the same 30 arc-second resolution as GTOPO30. Units are hundreds of metres.
rugged_slope: Alternative ruggedness (average slope, %). As another alternative ruggedness measure, using the same GTOPO30 elevation data, we calculate the average uphill slope of the country’s surface area. To do this, for each point on the elevation grid, we calculate the absolute value of the difference in elevation between this point and the point on the Earth's surface 30 arc-seconds North of it, and then divides this by the sea-level distance between the two points to obtain the uphill slope. The same calculation is performed for each of the eight major directions of the compass (North, Northeast, East, Southeast, South, Southwest, West, and Northwest), and the eight slopes obtained are then averaged to calculate the mean uphill slope for the 30 by 30 arc-second cell centred on the point. Finally, we average across all grid cells in the country not covered by water (taking into account the latitude-varying sea-level surface that corresponds to the 30 by 30 arc-second cell centred on each point) to obtain the average uphill slope of the country's land area.
rugged_lsd: Alternative ruggedness (local std. deviation in elevation, 100 m). Another alternative ruggedness measure is the average standard deviation of elevation within the same eight-cell neighbourhood. Units are hundreds of metres.
rugged_pc: Alternative ruggedness (% moderately to highly rugged). This alternative ruggedness measure is motivated by the possibility that what matters is having a large-enough amount of sufficiently-rugged terrain nearby, even if some portions of the country are fairly flat. To capture this logic, we calculate the percentage of a country's land area that is highly rugged. We use a threshold set at 240 metres for the Terrain Ruggedness Index calculated on the 30 arc-seconds grid, below which Riley et al., 1999, classify terrain as being 'level' to 'intermediately rugged'.
land_area: Land area (1000 Ha). The source is the Food and Agriculture Organization (2008), except for Macau and Hong Kong where it is the Encyclopædia Britannica. Units are thousands of hectares.
lat: Latitude. Expressed in decimal degress, for the geographical centroid of the country.
lon: Longitude. Expressed in decimal degress, for the geographical centroid of the country.
soil: % Fertile soil. On the basis of the FAO/UNESCO Digital Soil Map of the World and linked soil association composition table and climatic data compiled by the Climate Research Unit of the University of East Anglia, Fischer, van Velthuizen, Shah, and Nachtergaele (2002) identify whether each cell on a 5-minute grid covering almost the entire land area of the Earth is subject to various constraints for growing rainfed crops. Based on plates 20 (soil moisture storage capacity constraints), 21 (soil depth constraints), 22 (soil fertility constraints), 23 (soil drainage constraints), 24 (soil texture constraints), and 25 (soil chemical constraints) in Fischer et al. (2002) and the country boundaries described above, we calculate the percentage of the land surface area of each country that has fertile soil (defined as soil that is not subject to severe constraints for growing rainfed crops in terms of either soil fertility, depth, chemical and drainage properties, or moisture storage capacity). Cape Verde, French Polynesia, Mauritius and Seychelles are not covered by the Fischer et al. (2002) data, so for these countries we use instead the percentage of their land surface area that is classified by the Food and Agriculture Organization (2008) as arable land or permanent crop land.
desert: % Desert. The percentage of the land surface area of each country covered by sandy desert, dunes, rocky or lava flows, was calculated on the basis of the desert layer of the Collins Bartholomew World Premium digital map data (Collins Bartholomew, 2005) and the country boundaries described above. This was initially computed as a cruder measure of soil (in)fertility for an early draft of the paper and is no longer used in the final version. We have left it in the dataset in case it is of use to other researchers.
tropical: % Tropical climate. Using detailed temperature and precipitation data from the Climatic Research Unit of the University of East Anglia and the Global Precipitation Climatology Centre of the German Weather Service, Kottek, Grieser, Beck, Rudolf, and Rubel (2006) classify each cell on a 30 arc-minute grid covering the entire land area of the Earth into one of 31 climates in the widely-used Köppen-Geiger climate classification. Based on these data and the country boundaries described above, we calculate the percentage of the land surface area of each country that has any of the four Köppen-Geiger tropical climates.
dist_coast: Average distance to nearest ice-free coast (1000 km). To calculate the average distance to the closest ice-free coast in each country, we first compute the distance to the nearest ice-free coast for every point in the country in equi-rectangular projection with standard parallels at 30 degrees, on the basis of sea and sea ice area features contained in the fifth edition of the Digital Chart of the World (US National Imagery and Mapping Agency, 2000) and the country boundaries described above. We then average this distance across all land in each country not covered by inland water features. Units are thousands of kilometres.
near_coast: % Within 100 km of ice-free coast. On the basis of the same data used to calculate the average distance to nearest ice-free coast, we calculate the percentage of the land surface area of each country that is within 100km of the nearest ice-free coast.
gemstones: Gem diamond extraction 1958-2000 (1000 carats). Data on gem-quality diamond extracted by each country between 1958-2000 are obtained from the 1959-2004 editions of the Mineral Yearbook, published first by the US Bureau of Mines (US Bureau of Mines, 1960-1996) and then by the US Geological Survey (US Geological Survey, 1997-2007). We use the most recent data for each country-year in Volume I (Metals and Minerals), completed with data from Volume III (Area Reports: International) of the 1997-2000 editions. For countries that have split or changed boundaries, we assign diamond extraction on the basis of mine location with respect to current boundaries. Units are thousands of carats.
rgdppc_2000: Real GDP per person 2000 -- World Bank. We measure average country-level income by the natural logarithm of real gross domestic product per person in 2000. The data are from the World Bank World Development Indicators (World Bank, 2006). Units are 2006 international dollars, with purchasing power parity conversions performed using the Elteto-Koves-Szulc method.
rgdppc_1950_m: Real GDP per person 1950 -- Maddison. To check the robustness of our results to the use of income data from other time periods and from an alternative source, in the text we refer to results using the natural logarithm of real gross domestic product per person in 1950 and in 2000, and its annual average from 1950-2000, with data from Angus Maddison (Maddison, 2007, updated October 2008). Units are 1990 international dollars, with purchasing power parity conversions performed using the Geary-Khamis method.
rgdppc_1975_m: Real GDP per person 1975 -- Maddison.
rgdppc_2000_m: Real GDP per person 2000 -- Maddison.
rgdppc_1950_2000_m: Real GDP per person 1950-2000 Average -- Maddison.
q_rule_law: Rule of law 1996-2000. To measure the quality of governance in each country, we use the composite variable 'rule of law' from version VII of the World Bank's Worldwide Governance Indicators database (Kaufmann, Kraay, and Mastruzzi, 2008). It consists of “perceptions of the extent to which agents have confidence in and abide by the rules of society, and in particular the quality of contract enforcement, property rights, the police, and the courts, as well as the likelihood of crime and violence” (Kaufmann et al., 2008, p. 7).
cont_africa: Continent indicator: Africa. Continent indicators follow the definitions of the United Nations Statistics Division as of 2000.
cont_asia: Continent indicator: Asia
cont_europe: Continent indicator: Europe
cont_oceania: Continent indicator: Oceania
cont_north_america: Continent indicator: North America
cont_south_america: Continent indicator: South America
legor_gbr: Legal origin indicator: Common law. Legal origin indicators are from La Porta, Lopez-de-Silanes, Shleifer, and Vishny (1999). Some of our regressions include French Polynesia, absent from their data, which we have coded as French civil law.
legor_fra: Legal origin indicator: French civil law.
legor_soc: Legal origin indicator: Socialist law.
legor_deu: Legal origin indicator: German civil law.
legor_sca: Legal origin indicator: Scandinavian law.
colony_esp: Colonial origin indicator: Spanish. European colonial origin indicators are based on Teorell and Hadenius (2007). They distinguish between British, French, Portuguese, Spanish, and other European (Dutch, Belgian and Italian) colonial origin for countries colonized since 1700. For countries under several colonial powers, the last one is counted provided that it lasted for 10 years or longer. Since Teorell and Hadenius (2007) exclude the British settler colonies (the United States, Canada, Australia, Israel and New Zealand), we code theses as having a British colonial origin. We complete their data using the same rule to determine the European colonial origin of French Polynesia (French), Hong Kong (British), Macau (Portuguese), New Caledonia (French), Nauru (British), Philippines (Spanish), Puerto Rico (Spanish), and Papua New Guinea (British).
colony_gbr: Colonial origin indicator: British.
colony_fra: Colonial origin indicator: French.
colony_prt: Colonial origin indicator: Portuguese.
colony_oeu: Colonial origin indicator: Other European.
africa_region_n: African region indicator: North. Region indicators for Sub-Saharan Africa (East Africa, Central Africa, West Africa, and South Africa) are from Bratton and van deWalle (1997). We assign African countries North of the Saharan desert, which were not classified by Bratton and van deWalle (1997), to the region of North Africa.
africa_region_s: African region indicator: South.
africa_region_w: African region indicator: West.
africa_region_e: African region indicator: East.
africa_region_c: African region indicator: Central.
slave_exports: Slave exports 1400-1900. Estimates of the number of slaves exported between 1400 and 1900 in Africa's four slave trades are from Nunn (2008). The data are constructed by combining shipping data with data from various historic documents reporting the ethnicities of slaves shipped from Africa. Combining the two sources, Nunn is able to construct an estimate of the number of slaves shipped from each country in Africa between 1400 and 1900 during Africa's four slave trades. See Nunn (2008) for more information on the nature of the data. Units are number of people.
dist_slavemkt_atlantic: Distance to slave markets, Atlantic trade (1000 km). The four variables measuring the distance from each country to the closest final destination slave market in each of Africa's four slave trades are takenfrom Nunn (2008). For the trans-Atlantic and Indian Ocean slave trades, the measure is the sailing distance from the point on the coast that is closest to the country's centroid to the closest final export destination for slave trade. For the trans-Saharan and Red Sea slave trades, the measure is the great-circle overland distance from the country's centroid to the closest final export destination for that slave trade. Units are thousands of kilometres.
dist_slavemkt_indian: Distance to slave markets, Indian trade (1000 km).
dist_slavemkt_saharan: Distance to slave markets, Saharan trade (1000 km).
dist_slavemkt_redsea: Distance to slave markets, Red Sea trade (1000 km) .
pop_1400: Population 1400. The data are constructed using historic population estimates from McEvedy and Jones (1978). For countries grouped with others in McEvedy and Jones (1978), we allocate population to countries in the group according to the distribution of population in 1950, obtained from United Nations (2007). Units are number of people.
european_descent: % European descent. The variable, calculated from version 1.1 of the migration matrix of Putterman and Weil (2010), estimates the percentage of the year 2000 population in every country that is descended from people who resided in Europe in 1500. This variable was used to perform an additional robustness check in a draft of the paper and is no longer used in the final version. We have left it in the dataset in case it is of use to other researchers.

Grid-cell-level data on terrain ruggedness

Researchers interested in using the terrain ruggedness variables at the level of countries will find these included in the country-level data described above. For those interested in using the terrain ruggedness variables for different geographic units, we also provide the underlying data at the level of individual cells on a 30 arc-seconds grid across the surface of the Earth. Three grid files are available:

Terrain Ruggedness Index, in milimetres (see below for an explanation regarding the units): available as tri.txt, an ascii grid compressed in the zip file tri.zip (616,110 Kb) .
Average slope, as an alternative ruggedness measure, in thousandths of a percentage point: available as slope.txt, an ascii grid compressed in the zip file slope.zip (468,985 Kb) .
The surface area of each cell, in square metres, which must be used to weight the ruggedness measures when averaging across areas: available as cellarea.txt, an ascii grid compressed in the zip file cellarea.zip (9,298 Kb) .

To use these ascii grids in ArcGIS, after unzipping each downloaded file, you will need to convert it into a binary grid. You can do this through point-and-click by using the Arc Toolbox and, within Conversion Tools, selecting To Raster, and then ascii to Raster. As input ascii file, specify the text file you unzipped (e.g., tri.txt) and make sure Integer is selected as Output Data Type (at the moment of writing, ArcGIS is still a 32-bit application and a grid covering the Earth with 30 arc-seconds resolution is too large to be handled when values are stored as floating point values instead of integers). Alternatively, at the ArcInfo command line, one can use the ArcInfo Grid command asciigrid (e.g., tri=asciigrid(tri.txt,INT)).

When averaging the Terrain Ruggedness Index or average slope over areas, it is important to take into account that the sea-level surface that corresponds to a 30 by 30 arcsecond cell varies in proportion to the cosine of its latitude (so it starts at 0.860 square kilometres at the equator and approaches 0 square kilometres as one gets sufficiently close to the poles). One should therefore calculate a weighted average, using as weights the values of the area of each cell, provided by the grid cellarea.txt.

Note that the grids are in different units relative to the variables in the country-level data used in the regressions. In particular, the Terrain Ruggedness Index is in milimetres in the 30 arc-seconds grid as opposed to hundreds of metres in the country-level data. This is again due to storage constraints imposed by ArcGIS being a 32-bit application. After calculating the weighted average for an area, divide values by 100,000 to obtain the Terrain Ruggedness Index in hundreds of metres. Average slope is in thousandths of a percentage point in the 30 arc-seconds grid as opposed to percentage points in the country-level data. After calculating the weighted average for an area, divide values by 1,000 to obtain average slope in percent.

Finally, note that to calculate the country-level averages of the Terrain Ruggedness Index or average slope included in the country-level data, in addition to weighting cells by their sea-level surface area, we exclude any land in each country covered by permanent inland water features.

References

Bratton, Michael and Nicolas van deWalle. 1997. Political regimes and regime transitions in Africa, 1910-1994. Data Collection 6996, Interuniversity Consortium for Political and Social Research.

Collins Bartholomew. 2005. Collins Bartholomew World Premium. Glasgow, UK: Collins Bartholomew.

Fischer, Günther, Harrij van Velthuizen, Mahendra Shah, and Freddy Nachtergaele. 2002. Global Agroecological Assessment for Agriculture in the 21st Century. Laxenburg, Austria: Food and Agriculture Organization of the United Nations and International Institute for Applied Systems Analysis.

Food and Agriculture Organization. 2008. ResourceSTAT. Rome: Food and Agriculture Organization of the United Nations.

Kaufmann, Daniel, Aart Kraay, and Massimo Mastruzzi. 2008. Governance matters VII: Aggregate and individual governance indicators, 1996-2007. Policy Research Working Paper 4654, World Bank.

Kottek, Markus, Jürgen Grieser, Christoph Beck, Bruno Rudolf, and Franz Rubel. 2006. World map of the Köppen-Geiger climate classification updated. Meteorologische Zeitschrift 15(3): 259-263.

La Porta, Rafael, Florencio Lopez-de-Silanes, Andrei Shleifer, and Robert Vishny. 1999. The quality of government. Journal of Law, Economics and Organization 15(1): 222-279.

Maddison, Angus. 2007. Contours of the World Economy, 1-2030 AD: Essays in Macroeconomic History. Oxford: Oxford University Press.

McEvedy, Colin and Richard Jones. 1978. Atlas of World Population History. Harmondsworth: Penguin Books.

Nunn, Nathan. 2008. The long term effects of Africa's slave trades. Quarterly Journal of Economics 123(1): 139-176.

Nunn, Nathan and Diego Puga. 2012. Ruggedness: The blessing of bad geography in Africa. Review of Economics and Statistics 94(1): 20-36.

Oak Ridge National Laboratory. 2001. LandScan Global Population Database 2000. Oak Ridge, tn: Oak Ridge National Laboratory.

Putterman, Louis and David N. Weil. 2010. Post-1500 Population Flows and the Long-Run Determinants of Economic Growth and Inequality. Quarterly Journal of Economics 125(4): 1627-1682.

Riley, Shawn J., Stephen D. DeGloria, and Robert Elliot. 1999. A terrain ruggedness index that quantifies topographic heterogeneity. Intermountain Journal of Sciences 5(1-4): 23-27.

Teorell, Jan and Axel Hadenius. 2007. Determinants of democratization: Taking stock of the largeN evidence. In Dirk Berg-Schlosser (ed.) Democratization: The State of the Art. Opladen: Barbara Budrich Publishers, 69-95.

United Nations. 2007. United Nations Common Database. New York, NY: United Nations Statistics Division.

US Bureau of Mines. 1960-1996. Minerals Yearbook. Washington, DC: United States Government Printing Office.

US Geological Survey. 1996. GTOPO30. Sioux Falls, SD: United States Geological Survey Center for Earth Resources Observation and Science (EROS).

US Geological Survey. 1997-2007. Minerals Yearbook. Washington, DC: United States Government Printing Office.

US National Imagery and Mapping Agency. 2000. Vector Map (VMAP) Level 0/Digital Chart of the World. Fifth edition. Fairfax, VA: United States National Imagery and Mapping Agency.

World Bank. 2006. World Development Indicators. Washington, DC: World Bank.