The Spatial Relationships in Geographic Data

Everything is related to everything else, but near things are more related than distant things. [This] is thus very parochial, and ignores most of the world. Tobler, (1970)

Today, we’ll be using the previous practical’s information about spatial relationships, functions, and modelling to understand the spatial structure of deprivation in Bristol. We’ll only be doing exploratory spatial data analysis this time, with an eye to doing some more advanced modelling in the weeks ahead.

Our main interest for this problem will be to examine: 1. Is deprivation clustered in Bristol? 2. Where are the clusters of deprivation in Bristol? 3. At what scale does deprivation cluster in Bristol (very local, ward-level, or regionally?)

To do this, we’ll first read in our data from last time:

library(sf)
library(dplyr)
bristol = sf::read_sf('./data/bristol-imd.shp')

Note that imd_score indicates the deprivation of a local super output area; when imd_score is large, then that LSOA is more deprived. Now for something completely different, let’s prepare bit more with an interlude on R functions.

Check Yourself: A Little Bit More on Functions

At a very high-up level of abstraction, a function in R is a distinct and defined set of code that performs some operation or procedure. Last time, we wrote a kernel in terms of a function:

nexp_kernel = function(distance, bandwidth=1500){
                       u = distance / bandwidth
                       return(exp(-u))
}

In abstract, functions are descriptions of what you want to do to a set of inputs. Breaking it down, all functions are created using the same methods:

name_of_function = function(argument_1, argument_2, argument_with_default = TRUE)
                           {
                            if(argument_with_default){
                              return(argument_1 * argument_2)
                            }
                            else{
                              return(argument_1 / argument_2)
                            }
                    }
  • The name_of_function is an arbitrary name, and should be something that clearly describes what the function does (like, load_data(), or simulate_next_year()).
  • Then, the words inside of function(), the argument_1, argument_2, argument_with_default, are called arguments. Basically, an argument is an input provided to a function. Functions can have an unlimited number of arguments, but it’s best to only take arguments you need to use later in the function. The actual words used to describe argument_1 can be anything. They should be descriptive, clear, and helpful (such as distance and bandwidth when we were defining nexp_kernel).
  • Finally, the body of the function comes between the braces. In our example, this is where the if/else stuff is. Otherwise, you can think of it like: function(){This is the body of the function}. The body is where you describe the operations you want to use on the arguments. Above, our example function will:

    1. check if the argument_with_default is true
    2. if it is, then it will compute argument_1 * argument_2.
    3. if it’s not, then it will compute argument_1 / argument_2.
  • You send values outside of the function into the rest of your code using return(). Once you return(something), the function stops. For instance, in our example code, if we return(argument_1 * argument_2), we never even check the other branch of code after the else{}.

Functions are very helpful for code re-use, avoiding typos and ensuring that if you have to change or fix something, you change it in one place and one place only.. If you find yourself doing a sequence of steps many times in an R script, try writing a function that does what you want to an argument, like:

dataset_2010 = read.csv('work/data/2010.csv')
missing_score = is.na(dataset_2010)
dataset_2010[missing_attainment, 'score'] = mean(dataset_2010$score)
clean_model_2010 = lm(score ~ class + race + background, data=dataset_2010)

# copy and paste, hope that you change all the 2010s into 2011s
dataset_2011 = read.csv('work/data/2011.csv')
missing_score = is.na(dataset_2011)
dataset_2011[missing_attainment, 'score'] = mean(dataset_2011$score)
clean_model_2011 = lm(score ~ class + race + background, data=dataset_2011)

# copy and paste, hope that you change all the 2011s into 2012s
dataset_2012 = read.csv('work/data/2012.csv')
missing_score = is.na(dataset_2012)
dataset_2012[missing_attainment, 'score'] = mean(dataset_2012$score)
clean_model_2012 = lm(score ~ class + race + background, data=dataset_2012)

might become:

clean_and_model_year = function(year){
                         filename = paste('work/data/', year, '.csv', sep='')
                         dataset = read.csv(filename)
                         missing_score = is.na(dataset$score)
                         dataset[missing_score, 'score'] = mean(dataset$score)
                         clean_model = lm(score ~ class + race + background, 
                                          data = dataset)
                         return(clean_model)
                        }
clean_model_2010 = clean_and_model_year(2010)
clean_model_2011 = clean_and_model_year(2011)
clean_model_2012 = clean_and_model_year(2012)

Exercise: Breaking Down a Function

Now that you know all about functions in R, let’s break down what the following function does, line by line. At a higher level, this function returns a vector of TRUE/FALSE; the output is TRUE in rows where rest is within distance of target, and FALSE in rows where rest is not within distance of target.

within_distance = function(target, rest, distance){
    buffer = sf::st_buffer(target, distance)
    hits_buffer = sf::st_intersects(rest, buffer, sparse=FALSE)
    return(hits_buffer)
}
  1. What is the name of the function?
  2. What are the arguments of the function? Judging from the body of the function, what does each argument represent?
  3. Using your past knowledge about sf::st_buffer, what does the buffer object represent?
  4. Using your past knowledge about sf::st_intersects, what does the hits_buffer object represent?

The “Scale” of Deprivation

Now, let’s look into the scale of the process in Bristol. Using our aforementioned within_distance function, let’s plot a few distance bands around the School of Geography, or collections of observations that are within a given number of feet of geog. First, though, let’s take one step at a time and visualize the number of observations that are within, say… 1.5km of the School of Geographical Sciences (SoGS):

is_geog = bristol$LSOA11CD == 'E01014542'
geog = bristol[is_geog, ]

is_within_1500m = within_distance(geog, bristol, 1500)
plot(sf::st_geometry(bristol))
plot(sf::st_geometry(bristol[is_within_1500m, ]), 
     col='forestgreen', add=TRUE)