Nighttime luminosity


Ptolemy’s geography of the earth, 2nd century C.E.

History of GIS


History of geospatial data analysis / Public health

The blue wraith

In 1854, there was an outbreak of Cholera in London’s Soho district.
It killed 616 people.


The blue death

Cholera is an infectious disease of the small intestine.
It causes severe dehydration in its victims. It is quite deadly.


The experts

The Board of Health struggled to find the epidemic’s cause.


The scapegoats

The leading theory was that Cholera was caused by foul air and poor hygiene.


The geospatial data science

Dr. John Snow believed that Cholera was spread by contaminated water, not foul air. To test his theory, he created a map of Cholera cases at each address in Soho.


The pump

He found a cluster of cases around a water pump on Broad St.


The public health campaign

He told authorities to break pump’s handle so people couldn’t draw water from it.


Data scientist has policy impact

Cholera cases drop in Soho, supporting Snow’s theory about contaminated water.


History of geospatial data analysis / Military intelligence

A new weapon

During WWII, Germany launched 1,358 V-2 Rockets at London.


No way to intercept, no way to defend

V-2’s speed and trajectory made it invulnerable to anti-aircraft guns and fighter jets.


Terror rains from the sky

V-2 strikes kill 2,724 people in UK + 6,000 civilians and military across Europe.
(+ $~$15,000 concentration camp prisoners died constructing the V-2)


Locations of V-2 strikes in London

Bomb damage maps were interpreted by some analysts as showing that impact sites were clustered. This suggested the V-2’s guidance system was more sophisticated than intel estimates thought. Allies tried to jam V-2’s guidance system, to no effect.


Wartime geostatistics

R.D. Clarke applied a statistical test to assess whether any hard evidence could be found for clustering. For each square, Clark recorded the total number of observed bomb hits (537 total in study area), and number of squares with \(k=1,2,3,\dots\) hits.


Poisson cumulative density function

Clarke derived the expected number of squares with \(k\) hits from the cumulative density function of the Poisson distribution \(\sum_{k=1}^{n}\frac{e^{-\lambda}\lambda^{k}}{k!}\), with \(\lambda=\frac{537}{576}\) and \(n=576\).


No. of bombs per square Expected Observed
1 226.74 229
2 211.39 211
3 98.54 93
4 7.14 7
5+ 1.57 1

 

The distribution of observed V-2 strikes conformed quite closely to the Poisson distribution (\(\chi^{2}=1.17, p=0.88\)). If strikes were clustered, we would have seen many more squares with a high number of bombs.

 

Conclusion: V-2 impact sites were random, not clustered.
Rocket strikes were indiscriminate (within city of London), not targeted.


Contemporary uses of geospatial data analysis

 

Example: Track civilian infrastructure damage in Ukraine


Example: Provide situational awareness during military operations


Example: Provide real-time information for emergency management


Example: Analyze residential segregation in American cities


 

Example: Draw new legislative districts


 

Example: Identify crime hotspots

GIS Basics

Types of spatial data


geospatial \(=\) situated in geographic space

 

space is about more than geography

  • “space” refers to any dimension for which a notion of distance between objects can be defined (e.g. social networks, trade, culture, ideology)
  • “geographic space” refers to Earth’s surface and near-surface

geospatial data \(=\) information on “where” \(+\) “what”

 

where: absolute and relative locations of features
(e.g. coordinates, distance, clustering, dispersion)

  • dimension 1: \(x\), horizontal position, longitude, easting
  • dimension 2: \(y\), vertical position, latitude, northing
  • dimension 3: \(z\), elevation, altitude, depth

 

what: properties and attributes of those features
(e.g. vote share, number of fatalities, temperature)

 

 

spatio-temporal data \(=\) info on “where” \(+\) “when” \(+\) “what”

 

when: absolute and relative timing of observation
(e.g. year, day, electoral cycle, round)

  • dimension 4: \(t\), time

Example of multi-dimensional data / Battles in space and time

Battles in space Battles in time


Let’s add another dimension

Key: red lines denote pairs of battles with common participant


WWII battles in multidimensional space


Battles (1939-1941), linked by combatant


Battles (1939-1942), linked by combatant


Battles (1939-1943), linked by combatant


Battles (1939-1944), linked by combatant


Battles (1939-1945), linked by combatant


Battles (1939-2011), linked by combatant


 

 

 

Vector data
discrete objects in space

  • point: pair of coordinates
    (e.g. small objects, events)
  • polyline: open, connected set of points (e.g. roads, rivers)
  • polygon: closed, connected set of points (e.g. countries, administrative units)


Vector data objects


Points


Polyline


Polygon


 

 

 

Raster data
space as continuous field

  • image: regular, equally-spaced grid
  • pixel: individual grid cell
  • each pixel represents value or presence/absence of some quantity of interest (e.g. temperature, rainfall, elevation, land cover)


Raster data


Vector or raster?


Vector or raster?


Vector or raster?


Vector or raster?


Vector or raster?


Vector or raster?


Where to find (free/open-source) spatial data?

Coordinates and basemaps:

Geo-referenced data:

A large number of links is also available at

This is not a comprehensive list


Data file formats

Vector data:

  • GeoJSON (JavaScript Object Notation) is the new standard for vector data
  • but points, polylines, polygons are often stored in older Shapefile format
    • each Shapefile includes: shapes/geometries (.shp), positional index (.shx), attribute table (.dbf)
    • sometimes also includes: projection (), spatial index (), metadata (), other elements
  • other common formats include
    • GBD/MBD (File/Personal Geodatabase)
    • KML/KMZ (Keyhole Markup Language, used for Google Earth)
    • OSM (OpenStreetMap’s XML-based file format)

Raster data:

  • common formats include

    • ASC (ASCII delimited text file)
    • GeoTIFF (georeferenced TIFF image file)
    • IMG (ERDAS Imagine file)
    • DEM (Digital Elevation Model)
    • DTED (Digital Terrain Elevation Data)

Software options


Popular software for the analysis of spatial data

Application Availability Learning Curve Key Functionality
ArcGIS License Medium Geoprocessing, visualization, georeferencing
QGIS Free Medium Geoprocessing, visualization, georeferencing
GRASS Free High Image processing, spatial modeling
Matlab License High Spatial econometrics, basic visualization
Stata License Medium Spatial econometrics, basic visualization
Python Free High Geoprocessing, visualization, geostatistics,
spatial econometrics, point processes
R Free High Geoprocessing, visualization, geostatistics,
spatial econometrics, point processes

We will be using QGIS and R


 

Software & programming

  1. QGIS (option 1)
    1. free, open-source alternative to ESRI ArcGIS
    2. visualize, manage, edit, analyze spatial data, create maps
    3. intuitive graphical user interface (GUI)
    4. multiplatform (runs on Linux, Mac, Windows, Android)
    5. download it here: qgis.org


 

QGIS


 

 

 

Software & programming

  1. R (option 2)
    1. open-source statistical programming language
    2. can do (most) of what you can do in QGIS, and lots more
    3. can run R from the command line
      … or using source code editor
      (e.g. Sublime Text, XEmacs)
      … or using integrated development environment (e.g. RStudio Cloud)
    4. also multiplatform (runs on Linux, Mac, Windows, Android)
    5. download R here: r-project.org … or RStudio here: posit.co


R

RStudio


 

 

  1. RStudio Cloud (option 2.5)
    1. same as RStudio, but accessible through web browser
    2. advantages:
      • packages/dependencies already installed
      • no software to download
    3. all R lab exercises will be made available through RStudio Cloud
    4. you can access it through link posted on Canvas
    5. set up RStudio Cloud account w/ your georgetown.edu credentials
    6. link to sign-up page: posit.cloud


 

 

 

RStudio Cloud


Geospatial analysis in R

Task R Packages
Data management sf, terra, rgdal, rgeos, rmapshaper
Integration with other GIS rgdal, RArcInfo, SQLiteMap,
spgrass6, rpostgis, RPyGeo,
RQGIS, R2WinBUGS
Access spatial data RgoogleMaps, rnaturalearth, geonames,
OpenStreetMap
Point pattern analysis spatstat, splancs, spatialkernel
Geostatistics gstat, geoR, geoRglm, spBayes
Disease mapping DCluster, spgwr, glmmBUGS,
diseasemapping
Spatial regression spdep, spatcounts, McSpatial, splm,
spatialprobit, mgcv, spatialreg

Full(-ish) list: cran.r-project.org/web/views/Spatial.html


\(\to\) LAB EXERCISE 1