Today, we’re going to look at local spatial autocorrelation. Like a kind of outlier diagnostic, local spatial autocorrelation measures how the local structure of a spatial relationship around each site either conforms to what we expect or is different from what we expect. Together, local spatial statistics are a general branch of statistics tha aim to characterize the relationship between a single observation and the sites surrounding it.
Often, local spatial autocorrelation is contrasted with global spatial autocorrelation, which is the structural relationship between sites (in abstract) and their surroundings (again, in abstract)., and this may have strongly different structure for many ideas of what surrounds each observation. Thus, local statistics are an attempt at measuring the geographical beahvior of a given social, physical, or behaviorial process around each observation.
library(sf)
library(mosaic)
bristol = sf::st_read('./data/bristol-imd.shp')
## Reading layer `bristol-imd' from data source `/home/lw17329/OneDrive/teaching/second_year_methods/spatial-practicals/data/bristol-imd.shp' using driver `ESRI Shapefile'
## Simple feature collection with 263 features and 12 fields
## geometry type: MULTIPOLYGON
## dimension: XY
## bbox: xmin: 350239.4 ymin: 166638.9 xmax: 364618.1 ymax: 183052.8
## epsg (SRID): 27700
## proj4string: +proj=tmerc +lat_0=49 +lon_0=-2 +k=0.9996012717 +x_0=400000 +y_0=-100000 +ellps=airy +towgs84=446.448,-125.157,542.06,0.15,0.247,0.842,-20.489 +units=m +no_defs
First, though, let’s really refresh our understanding of plain, a-spatial correlation, and how our knowledge of outliers work in two dimensions for a standard relationship between two variables. Here, let’s look at the relationship between housing deprivation and crime in Bristol:
correlation = cor.test(housing ~ crime, data=bristol)
correlation
##
## Pearson's product-moment correlation
##
## data: housing and crime
## t = 7.8199, df = 261, p-value = 1.309e-13
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.3322359 0.5287749
## sample estimates:
## cor
## 0.4356841
While there is some correlation between these two scores (areas with housing deprivation tend to have higher crime), we can also see that not every observation agrees with this trend. Namely, we can highlight one such observation with really low crime rates, but relatively high housing deprivation:
plot(housing ~ crime, data=bristol, pch=19)
abline(lm(housing ~ crime, data=bristol),
col='orangered', lwd=2)
abline(v=mean(bristol$crime), lwd=1.5, col='slategrey', lty='dashed')
abline(h=mean(bristol$housing), lwd=1.5, col='slategrey', lty='dashed')
housing_outlier = bristol[bristol$LSOA11CD == 'E01014714',]
points(housing_outlier$crime, housing_outlier$housing, col='red', pch=20)
housing
deprivation. - Approximately (using your eyeballs, not R
), what’s the mean of those four observations’ crime
levels?crime
levels as our candidate outlier.R
), what’s the mean of those three observations’ housing
deprivation?housing
and crime
values. Highlight the potential outlier (don’t forget, named housing_outlier
), in white. Describe the difference in the two maps around the housing_outlier
. Where is the housing_outlier
?Humans do a few things they “see” outliers in a scatterplot. Mainly our intution about which observations are (visual) outliers comes from the distance between an observation and the center of the data being analyzed. This is why we are apt to think that the red observation is an outlier, but the blue may not be.