Geographic Data Science

Updated: 2021-11-26

Levi John Wolf (levi.john.wolf[at]

Office hours: 2-5 PM Mondays ( Or, of course, by request.

Quick Info:

Lectures are 5PM Monday Local Time, delivered online.

Labs are 9AM Tuesday morning local time, delivered in person.


Geographic data science is an important emerging set of practices and skills that have become useful in a wide variety of environmental and social sciences. This module will teach students the introduction to critical/core concepts in the arrangement and analysis of data. Beyond linear modelling, this module offers students an “instrumental” knowledge of various high-level methods in data science, but also offers a “deeper” route to understanding the more fundamental concepts and theory behind many of the estimators used in day-to-day data science. The purpose of this module is twofold. Its immediate aims are to ensure that students are provided a working introduction to common concepts and concerns that practicing geographic data scientists face. It will include some practical programming and data cleaning skills, but is mainly oriented towards statistical analysis. This is not a programming course, but requires some basic programming at the outset to prepare for analysis. Instead, this course is focused on analysis, and successful students will need to be able to conduct a successful analysis from start to finish.

This course is based on a solid understanding of multivariate regression. If you would like to refresh your memory/understanding of linear regression, please consider the review reading listed below in the reading section of this document.

Getting Started

A short diagnostic quiz to check your background knowledge is here. You can take it as many times as you like. Your responses are anonymous, and will not be connected to your grade in any way.

If you intend to use your own computer for the unit, make sure you have installed:

Mark Structure

The course will be structured in four blocks:

Final marks are based on a mid-term exam and a final. The midterm is 40% of your overall mark, and the final 60%. We will provide answers on the “interim” workbooks as the course progresses. There will be one “consolidation” review before the final. For each assessment, answer keys will be posted after the due date, and the answers will be walked through in class. The final assessment is worth 60% of the overall mark. The midterm will cover the first two topics. The final will be cumulative, meaning that you’ll be expected to know how to tidy and visualize by then.

In addition to the timetabled lectures, there may be pre-recorded videos to help explain or discuss specific components of the reading. All lectures will be delivered live online.

The labs are intended as time for peer teaching and learning, so fostering a sense of community is critical for the module.


Data for assessments will be uploaded to Blackboard, as well as on the schedule at the bottom of this syllabus. The data required for the course is uploaded here, as well as on blackboard.


Readings are listed in the schedule. Please attempt the reading each week before the timetabled lecture. In some weeks, there may also be a short recorded lecture to clarify the reading. Readings for the module will be drawn primarily from three sources.

Often, ISL and SR contain very different developments of the same material. Broadly speaking, this arises from the fact that ISL is written from a “machine learning” perspective and “SR” is written from a “statistical” perspective. After the schedule, I discuss where “alternative” readings can be used to understand or cover the topic from a different perspective. You do not have to read both sources.

For reference, other good books to review and consolidate your programming and computation knowledge include:


Lectures are held synchronously on Zoom at 5PM Mondays local time.

One lab practical is held each week on Tuesday at 9AM local time.

I appreciate that this does not leave much time for consolidating your knowledge from lecture. So, do the reading before the lecture, and be proactive in scheduling appointments in my Monday Afternoon Office Hours.

Don’t ask, just book.

For all materials I have written, if you change the .html at the end of the URL to .Rmd, you can download the original R Markdown for the assignment. For example, the first comprehension material is available at, and the R Markdown used to build that material is

Block Week.Starting Topic Reading Materials
Tidy 27 September The normal form for data R4DS 12.1-2, Paper T1
Tidy 4 October A vocabulary for data shaping R4DS 5, 12.3-4 T2
Viz 18 October On the Grammar of Graphics FDA 1-4 V1
Viz 25 October A taxonomy of plots FDA 5,9,12,14 V2
Reg I 1 November Theory of Statistical Learning ISL 2.1; SR 1.1-2 R1.1
Reg I 8 November Regression as a supervised learning task ISL 3.1-2 R1.2
Reg II 15 November Consolidation week ISL 3.3-5 MA R1.2A
Reg II 22 November Moving beyond the normal task ISL 4.1-3 R2.2
Reg II 29 November Justifying your conclusions ISL 5.1 R2.3
Topic 6 December Student Choice! ISL 8.1-2 Trees
Close 13 December Review and Consolidation Mock Final

NOTE: abbreviations used in the table are covered in the reading section of this document.

Alternative Readings

SR’s chapter on linear regression covers similar material to ISL, but focuses on the statistical perspective. This means the two are very different: whereas ISL provides a more “classical” presentation of regression for applied settings, SR focuses on explaining the conceptual basis for regression, working from the basic distributional theory of regression up to regression itself. SR’s chapter 5 is, again, similar to ISL 3.3-3.5 but with much greater philosophical and conceptual depth. Equivalents of ISL 4.1-3 exist in SR 9.2, but the level of sophistication may be again more statistical than desired. SR 6 again is an analogue of ISL 12, but they approach the treatment from very different perspectives.