Yet Another Geographer

Challenge in Science

I just finished attending the 2018 GIS Research UK conference at Leicester University. I presented twice; once on some new methods in spatial clustering and once for the CDRC brexit data analysis competition. I had a really good time participating in the data analysis competition, and it struck a chord with me reflecting on quite a few conversations I’ve had with my friend & colleague Taylor Oshan and something Morton O’Kelly said at this year’s annual American Association of Geographers meeting.

GISRUK I: CDRC Brexit Analysis Competition

My entry in the Consumer Data Research Center’s Brexit Data Competition is called “Tension Points: A Theory & Evidence” (static), which I talked about at the 2018 GISRUK conference There is an abstract describing some of the work that I submitted to get to the final round, but if you’re computationally inclined, you’ll find everything sufficient to replicate my modelling & analysis in this Jupyter Notebook (raw). You’ll need scikit-learn, pystan, statsmodels, and geopandas at minimum to run.

GISRUK II: Spatially-Encouraged Spectral Clustering

This paper culminates a bit of work I’ve started on since seeing a talk by Phil Chodrow on a paper that eventually became his quite interesting NAS paper paper on segregation and entropy surfaces. I was intrigued by the prospect of using spectral clustering for constrained clustering problems. Specifically, I’d known that affinity matrix clustering could be adapted to constrained contexts ever since reading about hierarchical ward clustering, but I hadn’t seen a really convincing method that showed me how I could work this out for a general affinity-matrix clustering method.

Mpl Is Just Fine

I’ve been using matplotlib for nearly 5 years at least once a week. I’m still learning things that exist within the pylab interface… not the most ideal UX. For instance, I just learned about plt.axvline, which I could use to draw vertical lines in my code instead of what I usually use, plt.vlines(coordinate_list, *plt.gca().get_ylim()), but it’s not as general as plt.vlines since it actually plots a rectangle. Still, though, for most of what I do (which is a single vertical line for drawing specific axes/time breaks in a plot) it’s easier.

Annoscatter

I found this function super useful in my dissertation and more generally in my work. What it does is take x,y coordinates and a set of strings and annotates a scatterplot using those labels. For example, here’s a figure from my dissertation where I use it to annotate a plot of regression leverage by year. I jitter the points a little to provide legibility to the text labels, but basically it’s just a call like annoscatter(df.

Spatial Autocorrelation Functions

I looked into using spatial autocorrelation functions in my dissertation to characterize the ``scale'' at which processes operate electorally. I did an analysis of presidential vote by county, trying to identify where, exactly, clusters of votes tend to become decorrelated. The typical diameter at which the so-called “spatial autocorrelation function” goes to zero denotes how wide a typical spatial cluster might be, and the partial spatial autocorrelation function gives an anticipated order at which spatial autocorrelation may hold.

throwing in a spatially-correlated random effect may mess up the fixed effect you love - revisiting Hodges and Reich (2010) for SAR models

import pysal as ps import numpy as np import pandas as pd import matplotlib.pyplot as plt import geopandas as gpd %matplotlib inline This is just a quick demonstration of what I understand from Hodges & Reich (2010)’s argument about the structure of spatial error terms. Essentially, his claim is that the substantive estimates ($\hat{\beta}$) from an ordinary least squares regression over $N$ observations and $P$ covariates: $$ Y \sim \mathcal{N}(X\hat{\beta}, \sigma^2)$$

Does projection matter for compactness indices?

This is a quick and dirty exploration of the compactness impacts of changing the projection of data on compactness measures. import geopandas as gpd import numpy as np import matplotlib.pyplot as plt from compact import reock as _reock import pysal as ps import seaborn as sns %matplotlib inline The data I’m using is the 113th districts from my Scientific Data publication, sourced originally from Jeff Lewis. df = gpd.read_file('./districts113.shp').query('STATENAME not in ("Alaska","Hawaii")') df.