# The Spatial Relationships in Geographic Data

Everything is related to everything else, but near things are more related than distant things. [This] is thus very parochial, and ignores most of the world. Tobler, (1970)

Today, we’ll be using the previous practical’s information about spatial relationships, functions, and modelling to understand the spatial structure of deprivation in Bristol. We’ll only be doing exploratory spatial data analysis this time, with an eye to doing some more advanced modelling in the weeks ahead.

Our main interest for this problem will be to examine: 1. Is deprivation clustered in Bristol? 2. Where are the clusters of deprivation in Bristol? 3. At what scale does deprivation cluster in Bristol (very local, ward-level, or regionally?)

To do this, we’ll first read in our data from last time:

``````library(sf)
library(dplyr)

Note that `imd_score` indicates the deprivation of a local super output area; when `imd_score` is large, then that LSOA is more deprived. Now for something completely different, let’s prepare bit more with an interlude on `R` functions.

## Check Yourself: A Little Bit More on Functions

At a very high-up level of abstraction, a function in `R` is a distinct and defined set of code that performs some operation or procedure. Last time, we wrote a kernel in terms of a function:

``````nexp_kernel = function(distance, bandwidth=1500){
u = distance / bandwidth
return(exp(-u))
}``````

In abstract, functions are descriptions of what you want to do to a set of inputs. Breaking it down, all functions are created using the same methods:

``````name_of_function = function(argument_1, argument_2, argument_with_default = TRUE)
{
if(argument_with_default){
return(argument_1 * argument_2)
}
else{
return(argument_1 / argument_2)
}
}``````
• The `name_of_function` is an arbitrary name, and should be something that clearly describes what the function does (like, `load_data()`, or `simulate_next_year()`).
• Then, the words inside of `function()`, the `argument_1, argument_2, argument_with_default`, are called arguments. Basically, an argument is an input provided to a function. Functions can have an unlimited number of arguments, but it’s best to only take arguments you need to use later in the function. The actual words used to describe `argument_1` can be anything. They should be descriptive, clear, and helpful (such as `distance` and `bandwidth` when we were defining `nexp_kernel`).
• Finally, the body of the function comes between the braces. In our example, this is where the `if/else` stuff is. Otherwise, you can think of it like: `function(){This is the body of the function}`. The body is where you describe the operations you want to use on the arguments. Above, our example function will:

1. check if the `argument_with_default` is true
2. if it is, then it will compute `argument_1 * argument_2`.
3. if it’s not, then it will compute `argument_1 / argument_2`.
• You send values outside of the function into the rest of your code using `return()`. Once you `return(something)`, the function stops. For instance, in our example code, if we `return(argument_1 * argument_2)`, we never even check the other branch of code after the `else{}`.

Functions are very helpful for code re-use, avoiding typos and ensuring that if you have to change or fix something, you change it in one place and one place only.. If you find yourself doing a sequence of steps many times in an `R` script, try writing a function that does what you want to an argument, like:

``````dataset_2010 = read.csv('work/data/2010.csv')
missing_score = is.na(dataset_2010)
dataset_2010[missing_attainment, 'score'] = mean(dataset_2010\$score)
clean_model_2010 = lm(score ~ class + race + background, data=dataset_2010)

# copy and paste, hope that you change all the 2010s into 2011s
missing_score = is.na(dataset_2011)
dataset_2011[missing_attainment, 'score'] = mean(dataset_2011\$score)
clean_model_2011 = lm(score ~ class + race + background, data=dataset_2011)

# copy and paste, hope that you change all the 2011s into 2012s
missing_score = is.na(dataset_2012)
dataset_2012[missing_attainment, 'score'] = mean(dataset_2012\$score)
clean_model_2012 = lm(score ~ class + race + background, data=dataset_2012)
``````

might become:

``````clean_and_model_year = function(year){
filename = paste('work/data/', year, '.csv', sep='')
missing_score = is.na(dataset\$score)
dataset[missing_score, 'score'] = mean(dataset\$score)
clean_model = lm(score ~ class + race + background,
data = dataset)
return(clean_model)
}
clean_model_2010 = clean_and_model_year(2010)
clean_model_2011 = clean_and_model_year(2011)
clean_model_2012 = clean_and_model_year(2012)``````

## Exercise: Breaking Down a Function

Now that you know all about functions in `R`, let’s break down what the following function does, line by line. At a higher level, this function returns a vector of `TRUE`/`FALSE`; the output is `TRUE` in rows where `rest` is within `distance` of `target`, and `FALSE` in rows where `rest` is not within `distance` of `target`.

``````within_distance = function(target, rest, distance){
buffer = sf::st_buffer(target, distance)
hits_buffer = sf::st_intersects(rest, buffer, sparse=FALSE)
return(hits_buffer)
}``````
1. What is the name of the function?
2. What are the arguments of the function? Judging from the body of the function, what does each argument represent?
3. Using your past knowledge about `sf::st_buffer`, what does the `buffer` object represent?
4. Using your past knowledge about `sf::st_intersects`, what does the `hits_buffer` object represent?

# The “Scale” of Deprivation

Now, let’s look into the scale of the process in Bristol. Using our aforementioned `within_distance` function, let’s plot a few distance bands around the School of Geography, or collections of observations that are within a given number of feet of `geog`. First, though, let’s take one step at a time and visualize the number of observations that are within, say… 1.5km of the School of Geographical Sciences (SoGS):

``````is_geog = bristol\$LSOA11CD == 'E01014542'
geog = bristol[is_geog, ]

is_within_1500m = within_distance(geog, bristol, 1500)
plot(sf::st_geometry(bristol))
plot(sf::st_geometry(bristol[is_within_1500m, ]),