What are Geographic Information Systems?
Policy applications
Scientific applications
History of geospatial data analysis / Public health
The blue wraith
In 1854, there was an outbreak of Cholera in London’s Soho district.
It killed 616 people.
The blue death
Cholera is an infectious disease of the small intestine.
It causes severe dehydration in its victims. It is quite deadly.
The experts
The Board of Health struggled to find the epidemic’s cause.
The scapegoats
The leading theory was that Cholera was caused by foul air and poor hygiene.
The geospatial data science
Dr. John Snow believed that Cholera was spread by contaminated water, not foul air. To test his theory, he created a map of Cholera cases at each address in Soho.
The pump
He found a cluster of cases around a water pump on Broad St.
The public health campaign
He told authorities to break pump’s handle so people couldn’t draw water from it.
Data scientist has policy impact
Cholera cases drop in Soho, supporting Snow’s theory about contaminated water.
History of geospatial data analysis / Military intelligence
A new weapon
During WWII, Germany launched 1,358 V-2 Rockets at London.
No way to intercept, no way to defend
V-2’s speed and trajectory made it invulnerable to anti-aircraft guns and fighter jets.
Terror rains from the sky
V-2 strikes kill 2,724 people in UK + 6,000 civilians and military across Europe.
(+ $~$15,000 concentration camp prisoners died constructing the V-2)
Locations of V-2 strikes in London
Bomb damage maps were interpreted by some analysts as showing that impact sites were clustered. This suggested the V-2’s guidance system was more sophisticated than intel estimates thought. Allies tried to jam V-2’s guidance system, to no effect.
Wartime geostatistics
R.D. Clarke applied a statistical test to assess whether any hard evidence could be found for clustering. For each square, Clark recorded the total number of observed bomb hits (537 total in study area), and number of squares with \(k=1,2,3,\dots\) hits.
Poisson cumulative density function
Clarke derived the expected number of squares with \(k\) hits from the cumulative density function of the Poisson distribution \(\sum_{k=1}^{n}\frac{e^{-\lambda}\lambda^{k}}{k!}\), with \(\lambda=\frac{537}{576}\) and \(n=576\).
No. of bombs per square | Expected | Observed |
---|---|---|
1 | 226.74 | 229 |
2 | 211.39 | 211 |
3 | 98.54 | 93 |
4 | 7.14 | 7 |
5+ | 1.57 | 1 |
The distribution of observed V-2 strikes conformed quite closely to the Poisson distribution (\(\chi^{2}=1.17, p=0.88\)). If strikes were clustered, we would have seen many more squares with a high number of bombs.
Conclusion: V-2 impact sites were random, not clustered.
Rocket strikes were indiscriminate (within city of London), not targeted.
Contemporary uses of geospatial data analysis
Example: Track violence in the Russia-Ukraine War
Example: Provide situational awareness during military operations
Example: Provide real-time information for emergency management
Example: Analyze residential segregation in American cities
Example: Draw new legislative districts
Example: Identify crime hotspots
Example: Find a public restroom
Example: Find your way home
Goals of the class
How will we learn?
Learn new methods
Apply them to research
Research “walk-throughs”
Like this, but for GIS
Grading
Don’t worry
Final Project
geospatial \(=\) situated in geographic space
space is about more than geography
geospatial data \(=\) information on “where” \(+\) “what”
where: absolute and relative locations of features
(e.g. coordinates, distance, clustering, dispersion)
what: properties and attributes of those features
(e.g. vote share, number of fatalities, temperature)
spatio-temporal data \(=\) info on “where” \(+\) “when” \(+\) “what”
when: absolute and relative timing of observation
(e.g. year, day, electoral cycle, round)
Example of multi-dimensional data / Battles in space and time
Let’s add another dimension
Key: red lines denote pairs of battles with common participant
WWII battles in multidimensional space
Battles (1939-1941), linked by combatant
Battles (1939-1942), linked by combatant
Battles (1939-1943), linked by combatant
Battles (1939-1944), linked by combatant
Battles (1939-1945), linked by combatant
Battles (1939-2011), linked by combatant
Vector data
discrete objects in space
Vector data objects
Points
Polyline
Polygon
Raster data
space as continuous field
Raster data
Vector or raster?
Vector or raster?
Vector or raster?
Vector or raster?
Vector or raster?
Vector or raster?
Where to find (free/open-source) spatial data?
Coordinates and basemaps:
geonames.org
geoboundaries.org
, gadm.org
www.usgs.gov/centers/eros
Geo-referenced data:
nhgis.org
cambridgema.gov/GIS
sedac.ciesin.columbia.edu
x-sub.org
, ucdp.uu.se
payneinstitute.mines.edu/eog/
electiondataarchive.org
, cdmaps.polisci.ucla.edu
A large number of links is also available at
This is not a comprehensive list
Data file formats
Vector data:
GeoJSON
(JavaScript Object Notation) is the new standard for vector dataShapefile
format
Shapefile
includes: shapes/geometries (.shp
), positional index (.shx
), attribute table (.dbf
)GBD/MBD
(File/Personal Geodatabase)KML/KMZ
(Keyhole Markup Language, used for Google Earth)OSM
(OpenStreetMap’s XML-based file format)Raster data:
common formats include
ASC
(ASCII delimited text file)GeoTIFF
(georeferenced TIFF image file)IMG
(ERDAS Imagine file)DEM
(Digital Elevation Model)DTED
(Digital Terrain Elevation Data)Popular software for the analysis of spatial data
Application | Availability | Learning Curve | Key Functionality |
---|---|---|---|
ArcGIS | License | Medium | Geoprocessing, visualization, georeferencing |
QGIS | Free | Medium | Geoprocessing, visualization, georeferencing |
GRASS | Free | High | Image processing, spatial modeling |
Matlab | License | High | Spatial econometrics, basic visualization |
Stata | License | Medium | Spatial econometrics, basic visualization |
Python | Free | High | Geoprocessing, visualization, geostatistics, |
spatial econometrics, point processes | |||
R | Free | High | Geoprocessing, visualization, geostatistics, |
spatial econometrics, point processes |
We will be using QGIS and R
Software & programming
qgis.org
QGIS
Software & programming
r-project.org
… or RStudio here: posit.co
R
RStudio
harvard.edu
credentialsposit.cloud
RStudio Cloud
Geospatial analysis in R
Task | R Packages |
---|---|
Data management | sf, terra, rgdal, rgeos, rmapshaper |
Integration with other GIS | rgdal, RArcInfo, SQLiteMap, |
spgrass6, rpostgis, RPyGeo, |
|
RQGIS, R2WinBUGS |
|
Access spatial data | RgoogleMaps, rnaturalearth, geonames, |
OpenStreetMap |
|
Point pattern analysis | spatstat, splancs, spatialkernel |
Geostatistics | gstat, geoR, geoRglm, spBayes |
Disease mapping | DCluster, spgwr, glmmBUGS, |
diseasemapping |
|
Spatial regression | spdep, spatcounts, McSpatial, splm, |
spatialprobit, mgcv, spatialreg |
Full(-ish) list: cran.r-project.org/web/views/Spatial.html
QGIS and R help at HKS
@belle_lipton
@james_adams
, @james_capobianco
link
link