Stressing the Local

Today, we’re going to look at local spatial autocorrelation. Like a kind of outlier diagnostic, local spatial autocorrelation measures how the local structure of a spatial relationship around each site either conforms to what we expect or is different from what we expect. Together, local spatial statistics are a general branch of statistics tha aim to characterize the relationship between a single observation and the sites surrounding it.

Often, local spatial autocorrelation is contrasted with global spatial autocorrelation, which is the structural relationship between sites (in abstract) and their surroundings (again, in abstract)., and this may have strongly different structure for many ideas of what surrounds each observation. Thus, local statistics are an attempt at measuring the geographical beahvior of a given social, physical, or behaviorial process around each observation.

bristol = sf::st_read('./data/bristol-imd.shp')
## Reading layer `bristol-imd' from data source `/home/lw17329/OneDrive/teaching/second_year_methods/spatial-practicals/data/bristol-imd.shp' using driver `ESRI Shapefile'
## Simple feature collection with 263 features and 12 fields
## geometry type:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: 350239.4 ymin: 166638.9 xmax: 364618.1 ymax: 183052.8
## epsg (SRID):    27700
## proj4string:    +proj=tmerc +lat_0=49 +lon_0=-2 +k=0.9996012717 +x_0=400000 +y_0=-100000 +ellps=airy +towgs84=446.448,-125.157,542.06,0.15,0.247,0.842,-20.489 +units=m +no_defs

First, Correlation

First, though, let’s really refresh our understanding of plain, a-spatial correlation, and how our knowledge of outliers work in two dimensions for a standard relationship between two variables. Here, let’s look at the relationship between housing deprivation and crime in Bristol:

correlation = cor.test(housing ~ crime, data=bristol)
##  Pearson's product-moment correlation
## data:  housing and crime
## t = 7.8199, df = 261, p-value = 1.309e-13
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.3322359 0.5287749
## sample estimates:
##       cor 
## 0.4356841

While there is some correlation between these two scores (areas with housing deprivation tend to have higher crime), we can also see that not every observation agrees with this trend. Namely, we can highlight one such observation with really low crime rates, but relatively high housing deprivation:

plot(housing ~ crime, data=bristol, pch=19)
abline(lm(housing ~ crime, data=bristol), 
       col='orangered', lwd=2)
abline(v=mean(bristol$crime), lwd=1.5, col='slategrey', lty='dashed')
abline(h=mean(bristol$housing), lwd=1.5, col='slategrey', lty='dashed')
housing_outlier = bristol[bristol$LSOA11CD == 'E01014714',]
points(housing_outlier$crime, housing_outlier$housing, col='red', pch=20)

Exercise: Thinking about outliers

  1. There are about 4 values with our possible outlier’s level of housing deprivation. - Approximately (using your eyeballs, not R), what’s the mean of those four observations’ crime levels?
  • Is this mean substantially different from the value at our potential outlier?
  1. There are approximately three observations with the same crime levels as our candidate outlier.
  • Approximately (using your eyeballs, not R), what’s the mean of those three observations’ housing deprivation?
  • Is this mean substantially different from the value at our potential outlier?
  1. Make a map of bristol housing and crime values. Highlight the potential outlier (don’t forget, named housing_outlier), in white. Describe the difference in the two maps around the housing_outlier. Where is the housing_outlier?

Challenge: Statistics as noise & distance

Humans do a few things they “see” outliers in a scatterplot. Mainly our intution about which observations are (visual) outliers comes from the distance between an observation and the center of the data being analyzed. This is why we are apt to think that the red observation is an outlier, but the blue may not be.