Today, we’ll ease into doing some more advanced stuff using the sf
package in R
. The sf
package (short for simple features) is the state-of-the-art package for working with spatial data R
. It’s a whole lot easier to use than older packages (e.g. rgdal
or sp & maptools
), but there are still some interesting/confusing elements. We’ll try to take a tour of a large amount of the package, and identify a few tools or skills we’ll need going forward.
sf
tries to make it so that spatial data is not special in R
. Instead, it aims to make all the standard tools and idioms that you know how to use on dataframes apply to sf dataframes, too.1 This means that the sf
package uses similar strategies to other packages, like dplyr
, for how its dataframes work. At its most basic, this means that reading/writing spatial data works very similarly to how reading/writing non-spatial data works:
library(sf)
bristol = sf::st_read('./data/bristol-imd.shp')
## Reading layer `bristol-imd' from data source `/home/lw17329/OneDrive/teaching/second_year_methods/spatial-practicals/data/bristol-imd.shp' using driver `ESRI Shapefile'
## Simple feature collection with 263 features and 12 fields
## geometry type: MULTIPOLYGON
## dimension: XY
## bbox: xmin: 350239.4 ymin: 166638.9 xmax: 364618.1 ymax: 183052.8
## epsg (SRID): 27700
## proj4string: +proj=tmerc +lat_0=49 +lon_0=-2 +k=0.9996012717 +x_0=400000 +y_0=-100000 +ellps=airy +towgs84=446.448,-125.157,542.06,0.15,0.247,0.842,-20.489 +units=m +no_defs
bristol
## Simple feature collection with 263 features and 12 fields
## geometry type: MULTIPOLYGON
## dimension: XY
## bbox: xmin: 350239.4 ymin: 166638.9 xmax: 364618.1 ymax: 183052.8
## epsg (SRID): 27700
## proj4string: +proj=tmerc +lat_0=49 +lon_0=-2 +k=0.9996012717 +x_0=400000 +y_0=-100000 +ellps=airy +towgs84=446.448,-125.157,542.06,0.15,0.247,0.842,-20.489 +units=m +no_defs
## First 10 features:
## LSOA11CD imd_rank imd_score income employment education health crime
## 1 E01014485 11727 23.53 0.18 0.13 15.75 0.41 0.32
## 2 E01014486 2408 49.04 0.34 0.26 31.45 1.25 0.99
## 3 E01014487 16452 17.36 0.07 0.06 0.28 -0.43 0.82
## 4 E01014488 11047 24.64 0.19 0.13 4.99 0.11 1.07
## 5 E01014489 5318 37.42 0.22 0.18 10.67 0.80 1.31
## 6 E01014491 19934 13.70 0.08 0.09 1.02 -0.48 0.34
## 7 E01014492 6812 33.27 0.22 0.19 45.80 0.76 0.37
## 8 E01014493 7510 31.54 0.19 0.15 50.05 0.81 0.31
## 9 E01014494 20047 13.60 0.09 0.08 15.85 -0.18 0.49
## 10 E01014495 11742 23.51 0.13 0.13 28.23 0.63 0.40
## housing living_env idaci idaopi geometry
## 1 13.84 36.20 0.18 0.30 MULTIPOLYGON (((360256.3 17...
## 2 22.07 43.52 0.39 0.57 MULTIPOLYGON (((359676 1741...
## 3 22.84 62.34 0.04 0.14 MULTIPOLYGON (((359036.6 17...
## 4 16.49 38.16 0.28 0.31 MULTIPOLYGON (((359632.6 17...
## 5 20.97 62.02 0.29 0.36 MULTIPOLYGON (((359120.6 17...
## 6 18.62 41.51 0.05 0.15 MULTIPOLYGON (((359467.6 17...
## 7 20.54 22.51 0.35 0.21 MULTIPOLYGON (((353215.9 17...
## 8 20.09 30.64 0.29 0.23 MULTIPOLYGON (((352337 1770...
## 9 13.95 20.54 0.15 0.12 MULTIPOLYGON (((352702 1767...
## 10 17.86 27.29 0.21 0.15 MULTIPOLYGON (((353137.9 17...
The text above describes the dataframe we just read in. It gives most of the spatial information necessary to understand the geography:
bbox
,geometry type
) in the dataframe,epsg
code as well as its proj4
description)“Namespaces are one honking great idea,” they say. A namespace is a sectioned-off area of a working environment (a space) where everything inside of that area is only addressable by referring to that space (by its name), and then to what you want that is inside of it.
Thus, something like an address is a great example of a namespace: BS8 1SS shows us that the 1SS
is within BS8
, which itself is a subset of all BS
postcodes, and all of which are within the UK (since other countries use a different postal code system).
R
has a bit of a complex relationship with namespaces. By default, when you load a package using library
, it dumps out everything from within that library. This is kind of like starting work by dumping your entire rucksack out onto your desk, then sifting through its contents every time you need something. Thus, I’ll encourage you to use ::
to be specific about which namespace a tool is in. When I write sf::st_read
, this means “use the st_read
function from within the sf
package.” While R
lets you use st_read
by itself, writing sf::st_read
is more clear, since it’s always apparent which package is being called.
Since a sf
dataframe is simply a powered-up dataframe, we can treat it like a dataframe. This includes stuff like selection, subsetting, and aggregation. For an example, the School of Geographical Sciences is located in the Local Super Output Area with the index E01014542
. We can grab just that one LSOA in a similar fashion to how we operate with other dataframes:
is_geog = bristol$LSOA11CD == 'E01014542'
geog = bristol[is_geog, ]
geog
## Simple feature collection with 1 feature and 12 fields
## geometry type: MULTIPOLYGON
## dimension: XY
## bbox: xmin: 357787.7 ymin: 172880.4 xmax: 358931.7 ymax: 173851.2
## epsg (SRID): 27700
## proj4string: +proj=tmerc +lat_0=49 +lon_0=-2 +k=0.9996012717 +x_0=400000 +y_0=-100000 +ellps=airy +towgs84=446.448,-125.157,542.06,0.15,0.247,0.842,-20.489 +units=m +no_defs
## LSOA11CD imd_rank imd_score income employment education health crime
## 55 E01014542 11034 24.66 0.04 0.03 22.01 0.67 0.9
## housing living_env idaci idaopi geometry
## 55 31.76 72.08 0.04 0.36 MULTIPOLYGON (((358357.9 17...
Now, using the format of the data we explained above, we can see that this new geog
dataframe has only one row. We can make a plot of this single row using the plot
function. Let’s make a plot that highlights our geog
LSOA on top of the rest of bristol LSOAs:
# sf::st_geometry extracts *just* the geometry of the LSOA in bristol
plot(sf::st_geometry(bristol))
plot(sf::st_geometry(geog), col='red', add=TRUE)