sylvain durand

Spatial Data Analysis with R

If R language has already become a reference in statistical analysis and data processing, it may be thanks to its hability to represent and visualize data. They can be used in order to visualize spatial data in the form of cartographic representations which, combined with its other features, makes it an excellent geographic information system. This article sets out to show, through the provision of relevant example, how R can handle spatial data by creating maps.

Prerequisite

Once R is installed on your computer, few libraries will be used: rgdal allows us to import and project shapefiles, plotrix creates color scales, and classInt assigns colors to map data. Once the libraries installed with install.packages, load them at the beginning of the session:

library('rgdal')      # Reading and projecting shapefiles
library('plotrix')    # Creating color scales
library('classInt')   # Assigning colors to data

Graphics will be plot using R base functions. ggplot2 is an alternative, but it seems less relevant here: longer and less legible code, unability to plot holes inside polygones, fortify and ploting can last much longer.

Blank France map

Reading shapefiles

The rgdal library provides readOGR() in order to read shapefiles. dsn must contain the path where shapefiles are located, and layer the shapefile name, without extension. readOGR reads .shp, .shx, .dbf and .prj files. Departements of France are given by Geofla:

# Reading departements
departements <- readOGR(dsn="shp/geofla",  layer="DEPARTEMENT")

# Reading departements boundaries in order to plot France boundaries
bounderies <- readOGR(dsn="shp/geofla",  layer="LIMITE_DEPARTEMENT")
bounderies <- bounderies[bounderies$NATURE %in% c('Fronti\xe8re internationale','Limite c\xf4ti\xe8re'),]

In order to show neighbouring countries, we will use data provided by Natural Earth. We will select Europe countries only:

# Reading country and selecting Europe
europe <- readOGR(dsn="shp/ne/cultural", layer="ne_10m_admin_0_countries")
europe <- europe[europe$REGION_UN=="Europe",]

Projection and plot

The map will use the French official projection “Lambert 93”, already declared in the Geofla .prj files. spTransform will be used for the European coutries.

Then, we will first plot French boundaries, in order to center the map on France. Borders colors are defined in border, their tickness in lwd and the filling color in col.

# Projection
europe <- spTransform(europe, CRS("+init=epsg:2154"))

# Plot
pdf('france.pdf',width=6,height=4.7)
par(mar=c(0,0,0,0))

plot(bounderies,  col="#FFFFFF")
plot(europe,      col="#E6E6E6", border="#AAAAAA",lwd=1, add=TRUE)
plot(bounderies,  col="#D8D6D4", lwd=6, add=TRUE)
plot(departements,col="#FFFFFF", border="#CCCCCC",lwd=.7, add=TRUE)
plot(bounderies,  col="#666666", lwd=1, add=TRUE)

dev.off()

France blank map

Visualizing a data: population density

Reading data

The very large number of communes (the smallest administrative level in France) gives us excellent spatial data. Geofla provides us their bounderies, population and area. So we will plot population density:

# Reading shapefile
communes <- readOGR(dsn="shp/geofla", layer="COMMUNE")
# Calculate density
communes$DENSITY <- communes$POPULATION/communes$SUPERFICIE*100000

Color scale

In order to create a color scale, we will assign shades of blue to each percentile. classIntervals calculates percentiles, smoothColors create the blue scale, and findColours assigns blues depending on each commune depending on their population density. Then, we create a legend, with only five colors.

# Color scale
col <- findColours(classIntervals(
            communes$DENSITY, 100, style="quantile"),
            smoothColors("#0C3269",98,"white"))
# Legend
leg <- findColours(classIntervals(
            round(communes$DENSITY), 5, style="quantile"),
            smoothColors("#0C3269",3,"white"),
            under="less than", over="more than", between="–",
            cutlabels=FALSE)

Plot

We can now plot the map. In order to use an embedded font in the PDF, we will use cairo_pdf() instead of pdf:

# Exporting in PDF
cairo_pdf('densite.pdf',width=6,height=4.7)
par(mar=c(0,0,0,0),family="Myriad Pro",ps=8)

# Ploting the map
plot(bounderies, col="#FFFFFF")
plot(europe,     col="#E6E6E6", border="#AAAAAA",lwd=1, add=TRUE)
plot(bounderies, col="#D8D6D4", lwd=6, add=TRUE)
plot(communes,   col=col, border=col, lwd=.1, add=TRUE)
plot(bounderies, col="#666666", lwd=1, add=TRUE)

# Ploting the legend
legend(-10000,6387500,fill=attr(leg, "palette"),
        legend=names(attr(leg,"table")),
        title = "Density (p/km²):")

dev.off()

Population density in France

Visualizing an external data: incomes

The main value is to plot data provided by external files. We will plot the median taxable income per consumption unit.

Reading and supplementing data

Unfortunately, data is missing for more than 5 000 communes, due to tax secrecy. We can “cheat” in order to improve the global render by assigning to those communes the canton (a larger administrative level) median income, given in the same file.

# Loading communes data
incomes <- read.csv('csv/revenus.csv')
communes <- merge(communes, incomes, by.x="INSEE_COM", by.y="COMMUNE")

# Loading cantons data
cantons  <- read.csv('csv/cantons.csv')

# Merging data
communes <- merge(communes, cantons, by="CANTON", all.x=TRUE)

# Assigning canton median income to communes with missing data
communes$REVENUS[is.na(communes$REVENUS)] <- communes$REVENUC[is.na(communes$REVENUS)]

Color scale

We generate color scale and legend:

col <- findColours(classIntervals(
            communes$REVENUS, 100, style="quantile"),
            smoothColors("#FFFFD7",98,"#F3674C"))

leg <- findColours(classIntervals(
            round(communes$REVENUS,0), 4, style="quantile"),
            smoothColors("#FFFFD7",2,"#F3674C"),
            under="moins de", over="plus de", between="–",
            cutlabels=FALSE)

Plot

And we plot:

cairo_pdf('incomes.pdf',width=6,height=4.7)
par(mar=c(0,0,0,0),family="Myriad Pro",ps=8)

plot(boundaries, col="#FFFFFF")
plot(europe,     col="#F5F5F5", border="#AAAAAA",lwd=1, add=TRUE)
plot(boundaries, col="#D8D6D4", lwd=6, add=TRUE)
plot(communes,   col=col, border=col, lwd=.1, add=TRUE)
plot(boundaries, col="#666666", lwd=1, add=TRUE)

legend(-10000,6337500,fill=attr(leg, "palette"),
    legend=gsub("\\.", ",", names(attr(leg,"table"))),
    title = "Median income:")

dev.off()

Median taxable income per consumption unit in France in 2010

Visualizing map data: the road network

We can also add map data: cities, urban areas, rivers, forests… We will here plot the French road network, thanks to Route 500, from IGN:

roads <- readOGR(dsn="shp/geofla",  layer="TRONCON_ROUTE")
local     <- roads[roads$VOCATION=="Liaison locale",]
principal <- roads[roads$VOCATION=="Liaison principale",]
regional  <- roads[roads$VOCATION=="Liaison r\xe9gionale",]
highway <- roads[roads$VOCATION=="Type autoroutier",]

cairo_pdf('roads.pdf',width=6,height=4.7)
par(mar=c(0,0,0,0),family="Myriad Pro",ps=8)

plot(boundaries,  col="#FFFFFF")
plot(europe,      col="#F5F5F5", border="#AAAAAA",lwd=1, add=TRUE)
plot(boundaries,  col="#D8D6D4", lwd=6, add=TRUE)
plot(departements,col="#FFFFFF", border="#FFFFFF",lwd=.7, add=TRUE)
plot(local,       col="#CBCBCB", lwd=.1, add=TRUE)
plot(principal,   col="#CCCCCC", lwd=.3, add=TRUE)
plot(regional,    col="#BBBBBB", lwd=.5, add=TRUE)
plot(highway,     col="#AAAAAA", lwd=.7, add=TRUE)
plot(boundaries,  col="#666666", lwd=1, add=TRUE)

dev.off()

French roads network

Blank world map

Reading shapefiles

We will load the following Natural Earth:

# Loading Shapefiles
countries   <- readOGR(dsn="shp/ne/cultural",layer="ne_110m_admin_0_countries")
graticules  <- readOGR(dsn="shp/ne/physical",layer="ne_110m_graticules_10")
bbox        <- readOGR(dsn="shp/ne/physical",layer="ne_110m_wgs84_bounding_box")

Projection and plot

We will use the Winkel Tripel projection:

# Winkel Tripel projection
countries   <- spTransform(countries,CRS("+proj=wintri"))
bbox        <- spTransform(bbox,  CRS("+proj=wintri"))
graticules  <- spTransform(graticules, CRS("+proj=wintri"))

# Ploting world map
pdf('world.pdf',width=10,height=6) # PDF export
par(mar=c(0,0,0,0))                # Zero margins

plot(bbox, col="white", border="grey90",lwd=1)
plot(countries, col="#E6E6E6", border="#AAAAAA",lwd=1, add=TRUE)
plot(graticules, col="#CCCCCC33", lwd=1, add=TRUE)

dev.off()                          # Saving file

Carte du monde projetée en Winkel Tripel

Visualizing data: Human Development Index

Most frequent usage consists of visualizing data with a color scale. Let’s plot the HDI, provided by the United Nations Develment Programme in a CSV file. The procedure is as described above:

# Loading data and merging dataframes
hdi  <- read.csv('csv/hdi.csv')
countries <- merge(countries, hdi, by.x="iso_a3", by.y="Abbreviation")

# Converting HDI in numeric
countries$hdi <- as.numeric(levels(countries$X2012.HDI.Value))[countries$X2012.HDI.Value]

# Generating color scale and assigning colors
col <- findColours(classIntervals(countries$hdi, 100, style="pretty"),
                       smoothColors("#ffffe5",98,"#00441b"))

# Assigning grey to missing data
col[is.na(countries$hdi)] <- "#DDDDDD"

# Generating legend
leg <- findColours(classIntervals(round(countries$hdi,3), 7, style="pretty"),
                       smoothColors("#ffffe5",5,"#00441b"),
                       under="less than", over="more than", between="–", cutlabels=FALSE)

# Ploting
cairo_pdf('hdi.pdf',width=10,height=6)
par(mar=c(0,0,0,0),family="Myriad Pro",ps=8)

plot(bbox, col="white", border="grey90",lwd=1)
plot(countries,  col=col, border=col,lwd=.8, add=TRUE)
plot(graticules,col="#00000009",lwd=1, add=TRUE)

legend(-15000000,-3000000,fill=attr(leg, "palette"),
    legend= names(attr(leg,"table")),
    title = "HDI in 2012:")
dev.off()

Human Development Index (HDI) in 2012

Circles visualization: most populated cities

Reading data

An other kind of visualization is given by circles. Population of most populated cities is provided by Natural Earth:

# Loading shapefile
cities <- readOGR(dsn="shp/ne/cultural", layer="ne_110m_populated_places")
cities <- spTransform(cities, CRS("+proj=wintri"))

Circle size

The data shall be proportionate to the circles areas, not the radius; so the radius is the square root of the population:

# Calculating circles radius
cities$radius <- sqrt(cities$POP2015)
cities$radius <- cities$radius/max(cities$radius)*3

Plot

We plot the map:

pdf('cities.pdf',width=10,height=6)
par(mar=c(0,0,0,0))

plot(bbox, col="white", border="grey90", lwd=1)
plot(countries, col="#E6E6E6", border="#AAAAAA", lwd=1, add=TRUE)
points(cities, col="#8D111766", bg="#8D111766", lwd=1, pch=21, cex=cities$radius)
plot(graticules, col="#CCCCCC33", lwd=1, add=TRUE)

dev.off()

Most populated cities in the world

Visualizing map data: urban areas

Natural Earth provides urban areas shapefiles, derived from satellite data. Let’us map them with night lights colors:

areas  <- readOGR(dsn="shp/ne/cultural",layer="ne_10m_urban_areas")
areas <- spTransform(areas, CRS("+proj=wintri"))

pdf('areas.pdf',width=10,height=6)
par(mar=c(0,0,0,0))
plot(bbox, col="#000000", border="#000000",lwd=1)
plot(countries, col="#000000", border="#000000",lwd=1, add=TRUE)
plot(areas, col="#FFFFFF", border="#FFFFFF66",lwd=1.5, add=TRUE)

dev.off()

World map (Winkel Tripel projection)