Today, we’ll ease into doing some more advanced stuff using the sf package in R. The sf package (short for simple features) is the state-of-the-art package for working with spatial data R. It’s a whole lot easier to use than older packages (e.g. rgdal or sp & maptools), but there are still some interesting/confusing elements. We’ll try to take a tour of a large amount of the package, and identify a few tools or skills we’ll need going forward.

Intermediate Simple Features in R

sf tries to make it so that spatial data is not special in R. Instead, it aims to make all the standard tools and idioms that you know how to use on dataframes apply to sf dataframes, too.1 This means that the sf package uses similar strategies to other packages, like dplyr, for how its dataframes work. At its most basic, this means that reading/writing spatial data works very similarly to how reading/writing non-spatial data works:

library(sf)

bristol = sf::st_read('./data/bristol-imd.shp')
## Reading layer `bristol-imd' from data source `/home/lw17329/OneDrive/teaching/second_year_methods/spatial-practicals/data/bristol-imd.shp' using driver `ESRI Shapefile'
## Simple feature collection with 263 features and 12 fields
## geometry type:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: 350239.4 ymin: 166638.9 xmax: 364618.1 ymax: 183052.8
## epsg (SRID):    27700
## proj4string:    +proj=tmerc +lat_0=49 +lon_0=-2 +k=0.9996012717 +x_0=400000 +y_0=-100000 +ellps=airy +towgs84=446.448,-125.157,542.06,0.15,0.247,0.842,-20.489 +units=m +no_defs
bristol
## Simple feature collection with 263 features and 12 fields
## geometry type:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: 350239.4 ymin: 166638.9 xmax: 364618.1 ymax: 183052.8
## epsg (SRID):    27700
## proj4string:    +proj=tmerc +lat_0=49 +lon_0=-2 +k=0.9996012717 +x_0=400000 +y_0=-100000 +ellps=airy +towgs84=446.448,-125.157,542.06,0.15,0.247,0.842,-20.489 +units=m +no_defs
## First 10 features:
##     LSOA11CD imd_rank imd_score income employment education health crime
## 1  E01014485    11727     23.53   0.18       0.13     15.75   0.41  0.32
## 2  E01014486     2408     49.04   0.34       0.26     31.45   1.25  0.99
## 3  E01014487    16452     17.36   0.07       0.06      0.28  -0.43  0.82
## 4  E01014488    11047     24.64   0.19       0.13      4.99   0.11  1.07
## 5  E01014489     5318     37.42   0.22       0.18     10.67   0.80  1.31
## 6  E01014491    19934     13.70   0.08       0.09      1.02  -0.48  0.34
## 7  E01014492     6812     33.27   0.22       0.19     45.80   0.76  0.37
## 8  E01014493     7510     31.54   0.19       0.15     50.05   0.81  0.31
## 9  E01014494    20047     13.60   0.09       0.08     15.85  -0.18  0.49
## 10 E01014495    11742     23.51   0.13       0.13     28.23   0.63  0.40
##    housing living_env idaci idaopi                       geometry
## 1    13.84      36.20  0.18   0.30 MULTIPOLYGON (((360256.3 17...
## 2    22.07      43.52  0.39   0.57 MULTIPOLYGON (((359676 1741...
## 3    22.84      62.34  0.04   0.14 MULTIPOLYGON (((359036.6 17...
## 4    16.49      38.16  0.28   0.31 MULTIPOLYGON (((359632.6 17...
## 5    20.97      62.02  0.29   0.36 MULTIPOLYGON (((359120.6 17...
## 6    18.62      41.51  0.05   0.15 MULTIPOLYGON (((359467.6 17...
## 7    20.54      22.51  0.35   0.21 MULTIPOLYGON (((353215.9 17...
## 8    20.09      30.64  0.29   0.23 MULTIPOLYGON (((352337 1770...
## 9    13.95      20.54  0.15   0.12 MULTIPOLYGON (((352702 1767...
## 10   17.86      27.29  0.21   0.15 MULTIPOLYGON (((353137.9 17...

The text above describes the dataframe we just read in. It gives most of the spatial information necessary to understand the geography:

  • the number of features (observations) and fields (columns) in the dataframe,
  • the bounding box bbox,
  • the type of geometry (geometry type) in the dataframe,
  • the coordinate system (in terms of its epsg code as well as its proj4 description)
  • and then its rows at the end

Check Yourself: Namespaces

“Namespaces are one honking great idea,” they say. A namespace is a sectioned-off area of a working environment (a space) where everything inside of that area is only addressable by referring to that space (by its name), and then to what you want that is inside of it.

Thus, something like an address is a great example of a namespace: BS8 1SS shows us that the 1SS is within BS8, which itself is a subset of all BS postcodes, and all of which are within the UK (since other countries use a different postal code system).

R has a bit of a complex relationship with namespaces. By default, when you load a package using library, it dumps out everything from within that library. This is kind of like starting work by dumping your entire rucksack out onto your desk, then sifting through its contents every time you need something. Thus, I’ll encourage you to use :: to be specific about which namespace a tool is in. When I write sf::st_read, this means “use the st_read function from within the sf package.” While R lets you use st_read by itself, writing sf::st_read is more clear, since it’s always apparent which package is being called.

Selection & focusing plots

Since a sf dataframe is simply a powered-up dataframe, we can treat it like a dataframe. This includes stuff like selection, subsetting, and aggregation. For an example, the School of Geographical Sciences is located in the Local Super Output Area with the index E01014542. We can grab just that one LSOA in a similar fashion to how we operate with other dataframes:

is_geog = bristol$LSOA11CD == 'E01014542'
geog = bristol[is_geog, ]
geog
## Simple feature collection with 1 feature and 12 fields
## geometry type:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: 357787.7 ymin: 172880.4 xmax: 358931.7 ymax: 173851.2
## epsg (SRID):    27700
## proj4string:    +proj=tmerc +lat_0=49 +lon_0=-2 +k=0.9996012717 +x_0=400000 +y_0=-100000 +ellps=airy +towgs84=446.448,-125.157,542.06,0.15,0.247,0.842,-20.489 +units=m +no_defs
##     LSOA11CD imd_rank imd_score income employment education health crime
## 55 E01014542    11034     24.66   0.04       0.03     22.01   0.67   0.9
##    housing living_env idaci idaopi                       geometry
## 55   31.76      72.08  0.04   0.36 MULTIPOLYGON (((358357.9 17...

Now, using the format of the data we explained above, we can see that this new geog dataframe has only one row. We can make a plot of this single row using the plot function. Let’s make a plot that highlights our geog LSOA on top of the rest of bristol LSOAs:

# sf::st_geometry extracts *just* the geometry of the LSOA in bristol
plot(sf::st_geometry(bristol))
plot(sf::st_geometry(geog), col='red', add=TRUE)