Goal: explain geographic variation in Islamic State violence
Operation Inherent Resolve
There is no formal Lab Exercise this week, but if you would like to replicate this in QGIS, follow along (for walk-through in R, see replication code on Canvas)
Overview of lab exercise
Data on political violence are easily found online. If we search for georeferenced violent events data
, a top result is for “UCDP Dataset Download Center”
The Uppsala Conflict Data Program (UCDP) is one of the world’s leading providers of open-source data on violence and armed conflict, based at Uppsala University
Download-able data are located at ucdp.uu.se/downloads
.
Click on “Georeferenced Event Dataset (GED)”
We have used GED before, for Problem Set 02 (Afghanistan and Yemen).
Download GED as a csv
file and extract it from the zip
archive
There are about 317K events here, each of which has point coordinates
For data on administrative boundaries, Rick Wilson’s “Free GIS Data” site has links to multiple data sources. Of these, we will use gadm.org
From GADM’s website, we can get data on sub-national administrative boundaries. Download the level2
GeoJSON
file for Iraq…
… and for Syria
Multiple data sources exist for roads. Among the easiest to get are those from diva-gis.org/gData
These are country-level road layers from “Digital Chart of the World”, download-able as zipped shapefiles
Download the roads data for both Iraq and Syria here
For population data, we can go back to Rick Wilson’s page to find the link for “Gridded Population of the World (GPW)”
This will take us to the Center for International Earth Science Information Network (CIESIN) at Columbia, which hosts a large number of global, free GIS datasets
Navigate to GPW’s “Population Count v4.11” dataset, and download the files for 2010 (you will need to create an account to login)
The data are available for multiple years, in multiple formats and resolutions. We will be using the 15-minute GeoTiff raster for 2010 (latest before ISIS emerged in 2014)
Also from CIESIN, we can download data on cropland (“Croplands, v1 (2000)”)
Download the global raster, in GeoTiff format
While we’re here, let’s also download data on hydroelectric dams (“Global Reservoir and Dam (GRanD), v1”)
For data on sectarian divisions, a relatively reliable global source is the “Ethnic Power Relations” (geoEPR) dataset — we used this in Problem Set 02
Specifically, we will download the GeoEPR-2021.geojson
file, which contains polygons for each ethno-religious group (icr.ethz.ch/data/epr/geoepr
)
Here is the full list of data sources and links:
Category | Type | Format | Data source |
---|---|---|---|
ISIS violence | Table (non-geo) | .csv |
UCDP GED |
Administrative units | Vector (polygons) | .geojson |
GADM |
Roads | Vector (polylines) | .shp |
DIVA-GIS |
Population | Raster | .tif |
CIESIN |
Cropland | Raster | .tif |
CIESIN |
Dams | Vector (points) | .shp |
CIESIN |
Sectarian divisions | Vector (polygons) | .geojson |
ETH-Zurich |
These are all in the WT01.zip
file posted on Canvas.
Always save your progress!
Go to Project
\(\to\) Save As...
Load administrative boundaries (Layer
\(\to\) Add Layer
\(\to\) Add Vector Layer
).
For Iraq: gadm41_IRQ_2.json
file in Data/GADM
folder.
For Syria: gadm41_SYR_2.json
file in Data/GADM
folder.
The two sets of district boundaries should be visible in the project window.
Let’s merge them into a single layer, with districts for both countries.
Open the Merge Vector Layers
tool (Vector
\(\to\) Data Management Tool
\(\to\) Merge Vector Layers
).
Click on the [...]
box next to Input layers
Check the boxes next to gadm41_IRQ_2
and gadm41_SYR_2
. Click OK
.
Click on the [...]
box next to Merged
. Find a location in which to save the output, and name the file admin.geojson
. Click Run
The merged layer should appear in the main project window.
You can remove the gadm41_IRQ_2
and gadm41_SYR_2
layers from the project (Right-click
on layer(s) in Layers menu \(\to\) Remove Layer...
)
To conserve memory, it’s good practice to remove data objects were are not actively using
Load roads data (Layer
\(\to\) Add Layer
\(\to\) Add Vector Layer
).
Try opening two datasets at the same time: IRQ_roads.shp
and SYR_roads.shp
files in Data/Roads
Paths to both files should appear in the Source
box. Click Add
Let’s repeat the Merge Vector Layers
procedure for these two layers (Vector
\(\to\) Data Management Tool
\(\to\) Merge Vector Layers
).
This time, perform the merge on the IRQ_roads
and SYR_roads
layers
Save the output file as roads.geojson
As before, you can remove the two country-specific roads layers (we no longer need them)
Let’s calculate each district’s road density with the Sum Line Lengths
tool (Vector
\(\to\) Analysis
\(\to\) Sum Line Lengths
)
Set Polygons
\(=\) admin
, Lines
\(=\) roads
, Line length field name
\(=\) roads_m
, Lines count field name
\(=\) roads_ct
. Save as admin_2.geojson
Open the attribute table for admin_2
. The roads_m
and roads_ct
variables should be in the table. Let’s convert from meters to kilometers
Open Field Calculator
for admin_2
. Create new field, roads_km
, of type Decimal number (real)
. For the Expression
, write roads_m/1000
. The Output preview
should show a number with decimals. Click OK
The resulting roads_km
field should be visible in the attribute table
If we plot this variable, we can see its main limitations: smaller districts have less road length, so this becomes a proxy for district size. Let’s create a road density measure that takes into account district size
Back in the Field Calculator
, create a new field called area_km2
of type Decimal number (real)
. For Expression
, write $area/1000000
(divide by 1M to convert from m\(^2\) to km\(^2\)). Click OK
Open the Field Calculator
again. Create a new field called road_density
of type Decimal number (real)
. For Expression
, write roads_km/area_km2
Check to make sure road_density
is added to attribute table
If we plot this variable, we see the opposite pattern from before (smaller, urban districts have higher road density)
Save the edits you just made to the admin_2
layer (right click \(\to\) Save Layer Edits
)
Load population raster (Layer
\(\to\) Add Layer
\(\to\) Add Raster Layer
).\ Open the gpw_v4_population_count_rev11_2010_15_min.tif
file in Data/GPW
folder.
The (global) population layer should now be visible. Let’s extract a subset of just the part of this layer that overlaps with Syria and Iraq.
Open the Clip Raster by Extent
tool (Raster
\(\to\) Extension
\(\to\) Clip Raster by Extent
)
Set Input layer
\(=\) gpw_v4_population_count_rev11_2010_15_min
. Click the [...]
button next to Clipping extent
\(\to\) Calculate from Layer
\(\to\) admin_2
Save the file as pop.tif
. Accept defaults for all other parameters, and click Run
A new layer called pop
should appear. You can remove the original, global gpw_v4...
layer
Let’s calculate the population size of each district. Open the Processing Toolbox
and open the Zonal statistics
tool in the Raster Analysis
menu
In the Zonal Statistics
tool, set Input layer
\(=\)admin_2
, Raster layer
\(=\)pop
, and Output column prefix
\(=\)pop_
.
Click the [...]
button next to Statistics to calculate
Check the box next to Sum
. Leave all others un-checked. Click OK
Save the zonal statistics output as admin_3.geojson
. Click Run
This should have generated a new layer, admin_3
, with a new field, pop_sum
(check the layer’s attribute table)
Load cropland raster (Layer
\(\to\) Add Layer
\(\to\) Add Raster Layer
).
Open the cropland.tif
file in Data/Cropland
folder.
The (global) crops layer should now be visible. Let’s extract a subset of it, as we just did for population
Open the Clip Raster by Extent
tool (Raster
\(\to\) Extension
\(\to\) Clip Raster by Extent
). Set Input layer
\(=\) cropland
. Click the [...]
button next to Clipping extent
\(\to\) Calculate from Layer
\(\to\) admin_3
Save the output to a file called crops.tif
Make sure everything looks right. Remove the original, global cropland
layer
Let’s calculate crop cultivation in each district. Open the Zonal statistics
tool in Processing Toolbox
. Set Input layer
\(=\)admin_3
, Raster layer
\(=\)crops
, and Output column prefix
\(=\)crops_
.
Click the [...]
button next to Statistics to calculate
Check the box next to Mean
. Leave all others un-checked. Click OK
, then Run
on the next screen
Save the output to a file called admin_4.geojson
This should have generated a new field, crops_mean
in the new admin_4
layer (check the layer’s attribute table)
Load hydroelectric dam locations (Layer
\(\to\) Add Layer
\(\to\) Add Vector Layer
). Open the GRanD_dams_v1_1.shp
file in Data/Dams
folder.
The dam locations should appear on the project window. Rename it from GRanD_dams_v1_1
to dams
Let’s calculate the number of dams per district. Open the Count Points in Polygon
tool (Vector
\(\to\) Analysis Tools
\(\to\) Count Points in Polygon
)
Select Polygons
\(=\) admin_4
, Points
\(=\) dams
. Name the count field dams
, and save the output file as admin_5.geojson
. Click Run
Add the sectarian divisions data to the project, using Add Vector Layer...
.\ Load the GeoEPR-2021.geojson
file in Data/GeoEPR
folder
The (global) ethnicity layer should appear in the project window. Let’s calculate the proportion of each district populated by Sunni Arabs. Let’s first extract the subset that overlaps with our study area.
Open the Clip
(vector) tool (Vector
\(\to\) Geoprocessing Tools
\(\to\) Clip
)
Set Input layer
\(=\) GeoEPR
, Overlay layer
\(=\) admin_5
. Save the file as ethnicity.geojson
. Click Run
Uh oh! Execution failed! The error message says that GeoEPR-2021
contains “invalid geometries”. How do we fix this?
Many “invalid geometry” problems can be solved with the “zero buffer trick”. Go to Vector
\(\to\) Geoprocessing Tools
\(\to\) Buffer...
Set Input layer
\(=\) GeoEPR-2021
and Distance
\(=\) 0
. Keep the defaults for the other parameters and save the file as geoepr_0buffer.geojson
. Click Run
Now let’s try the clipping operation again. To save time, go to Processing
\(\to\) History...
In the Processing History
window, double-click on Clip
(it should be the second-most recent item)
Set the Input layer
\(=\) geoepr_0buffer
. Click Run
This may take a few minutes, but should finish without errors…
The clipped layer should appear in the project window. You can remove the full GeoEPR
layer to save memory
Look at the group
field in the attribute table for ethnicity
. There are multiple features for several groups (Kurds, Alawites, etc.). Let’s extract Sunni Arab polygons only
Open the Select by Expression
tool (Edit
menu \(\to\) Select
\(\to\) Select Features by Expression
)
Let’s use regular expressions to extract the polygons we need. Set Expression
\(=\) regexp_match(group,'Sunni')
. Click Select Features
The selected polygons should appear yellow on the map. Let’s extract the selection into a new layer. Right-click ethnicity
in the Layer menu, go to Export
\(\to\)Save Selected Features As
Save the layer as sunni.geojson
. Make sure the box is checked next to Save only selected features
. Click OK
The extracted selection should appear. Hide the previous layer ethnicity
The attribute table for sunni
should now contain only features where the group
field contains the word Sunni
. Let’s consolidate these features into one polygon
Open the Dissolve
tool (Vector
\(\to\) Geoprocessing Tools
\(\to\) Dissolve
).
Set Input layer
\(=\)sunni
. Save the output as sunni_2.geojson
The dissolved polygon should appear in the project window. Now let’s calculate the proportion of each district populated by Sunnis
Open the Overlap Analysis
tool (in Processing Toolbox
\(\to\) Vector Analysis
). Set Input layer
\(=\) admin_5
and click the [...]
button next to Overlay layers
Check the box next to sunni_2
. Click OK
Save the output as admin_6.geojson
and click Run
By default, the overlap fields in admin_6
will be named sunni_2_area
and sunni_2_pc
. Let’s change the name of these fields
Open the layer Properties
for admin_6
, go to the Fields
tab and click on the “pencil” button (Toggle editing mode
)
Scroll down to sunni_2_pc
and double-click on its name
Rename the field prop_sunni
Hit Enter key (or Return key) to commit the name change. Click on the “pencil” again to leave editing mode
When prompted, save the changes you just made to admin_6
While we’re at it, let’s create a couple additional fields that could come in handy in the analysis. Create a population density field, named pop_density
, of type Decimal number (real)
, with Expression
set to pop_sum/area_km2
Let’s also create a rescaled population field (1000’s of residents). Name it pop_1000
, with type Decimal number (real)
. Set Expression
to pop_sum/1000
Save the layer edits to admin_6
!
Add the Islamic State violence data to the project, using Add Delimited Text Layer...
. Load the GEDEvent_v23.csv
file in Data/GED
folder. Set X field
\(=\) longitude
and Y field
\(=\) latitude
. Check the box next to Use spatial index
The (global) GED violent events layer should appear. There are several hundred thousand points here. We need to extract events in Syria and Iraq involving the Islamic State
To figure out how to extract this data subset (by actor and location), let’s explore the attribute table. We see fields for side_a
and side_b
, which list actors
There is also a country
field. So, we need to select points where country
is Iraq or Syria and the Islamic State is side_a
(or side_b
)
Highlight the GED layer and go to Edit
\(\to\) Select
\(\to\) Select by Expression...
Let’s combine regular expressions with logical operators. Set Expression
to
regexp_match(country,'Syria|Iraq') AND (
regexp_match(side_a,'IS|Islamic State') OR
regexp_match(side_b,'IS|Islamic State') )
The vertical slash |
is a regular expression for “OR”. So, in English this expression means ‘features where the field country
contains “Iraq or Syria” and either side_a
or side_b
contains “IS or Islamic State”’
This should select a little over 22,000 events. Now let’s calculate the number of ISIS attacks per district
Open the Count Points in Polygon
tool
Select Polygons
\(=\) admin_6
, Points
\(=\) GEDEvent_v23_1
. Make sure the box is checked next to Selected Features Only
for the points. Name the count field is_violence
, and save the output file as admin_7.geojson
. Click Run
This may take a minute or two to run
The attribute table for this new layer should include all the new fields we have generated. Let’s now export the table as CSV for further analysis
Right-click admin_7
in the Layer menu, go to Export
\(\to\)Save Features As
.
Save the layer as an Comma Separated Values
file
Name the file final_data_qgis.csv
. Click OK
Loading R packages
To conduct a regression analysis of these data in R, we will be using the stargazer
packages
library(stargazer)
NOTE: The replication code for all of the preceding steps R is in wt08_demo.R
on RStudio Cloud, and in WT01.zip
(posted on Canvas).
Now we’re finally able to proceed to the analysis stage. For this we will need to open the CSV file we just created in R.
This code chunk imports the final_data_qgis.csv
file into an object called X
, and then lists the variable names:
X = read.csv("Output/final_data_qgis.csv")
names(X)
## [1] "GID_2" "GID_0" "COUNTRY" "GID_1" "NAME_1"
## [6] "NL_NAME_1" "NAME_2" "VARNAME_2" "NL_NAME_2" "TYPE_2"
## [11] "ENGTYPE_2" "CC_2" "HASC_2" "roads_m" "roads_ct"
## [16] "roads_km" "area_km2" "road_density" "pop_sum" "crops_mean"
## [21] "dams" "sunni_2_area" "prop_sunni" "pop_density" "pop_1000"
## [26] "is_violence"
All the variables we created seem to be here.
Let’s now run some regression models!
Quick refresher: our regression analysis will test 5 hypotheses at once \[\begin{align*} \text{violence}_i=&\beta_1 \text{road density}_i + \beta_2 \text{population}_i +\beta_3 \text{cropland}_i \\ &+\beta_4 \text{dams}_i + \beta_5 \text{Sunni presence}_i + \epsilon_i \end{align*}\] where
is_violence
)road_density
, pop_1000
, crops_mean
, dams
, prop_sunni
) Hypothesis | Expectation | Observation |
---|---|---|
1. Power projection | \(\beta_1<0\) | ? |
2. Demographics | \(\beta_2>0\) | ? |
3. Political economy | \(\beta_3<0\) | ? |
4. Key infrastructure | \(\beta_4>0\) | ? |
5. Sectarian divisions | \(\beta_5>0\) | ? |
The first model (mod1
) is an Ordinary Least Squares model that regresses a logarithmically-transformed dependent variable log(is_violence + 1)
on all of the explanatory variables that correspond to our hypotheses. The log-transform is useful here because the is_violence
variable is highly skewed
mod1 = lm(log(is_violence+1) ~ road_density + pop_1000 + crops_mean +
dams + prop_sunni, data=X)
summary(mod1)
##
## Call:
## lm(formula = log(is_violence + 1) ~ road_density + pop_1000 +
## crops_mean + dams + prop_sunni, data = X)
##
## Residuals:
## Min 1Q Median 3Q Max
## -4.8209 -1.1849 -0.1103 0.9159 3.6005
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.2767764 0.3438320 3.713 0.000284 ***
## road_density 1.0261092 3.0441561 0.337 0.736513
## pop_1000 0.0012820 0.0003496 3.667 0.000336 ***
## crops_mean -1.7270999 0.4814247 -3.587 0.000446 ***
## dams 1.0476508 0.5331035 1.965 0.051168 .
## prop_sunni 0.0348957 0.0030465 11.454 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.543 on 156 degrees of freedom
## Multiple R-squared: 0.57, Adjusted R-squared: 0.5562
## F-statistic: 41.35 on 5 and 156 DF, p-value: < 2.2e-16
The second (mod2
) is a Generalized Linear Model with a quasi-Poisson link. This parameterization is designed to accommodate dependent variables that are (over-dispersed) event counts
mod2 = glm(is_violence ~ road_density + pop_1000 + crops_mean +
dams + prop_sunni, data=X, family="quasipoisson")
summary(mod2)
##
## Call:
## glm(formula = is_violence ~ road_density + pop_1000 + crops_mean +
## dams + prop_sunni, family = "quasipoisson", data = X)
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.1398937 0.4813895 8.600 7.95e-15 ***
## road_density -7.5459241 3.9548121 -1.908 0.0582 .
## pop_1000 0.0010212 0.0002261 4.518 1.23e-05 ***
## crops_mean -2.7261978 0.6513259 -4.186 4.74e-05 ***
## dams 0.7926585 0.3703707 2.140 0.0339 *
## prop_sunni 0.0287344 0.0047586 6.038 1.10e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for quasipoisson family taken to be 345.4133)
##
## Null deviance: 69006 on 161 degrees of freedom
## Residual deviance: 33381 on 156 degrees of freedom
## AIC: NA
##
## Number of Fisher Scoring iterations: 7
The third and fourth models (mod3
, mod4
) add province-level fixed effects (NAME_1
), which allow each province (e.g. Anbar) to have a different baseline level of violence
mod3 = lm(log(is_violence+1) ~ road_density + pop_1000 + crops_mean +
dams + prop_sunni + NAME_1, data=X)
mod4 = glm(is_violence ~ road_density + pop_1000 + crops_mean +
dams + prop_sunni + NAME_1, data=X, family="quasipoisson")
mod3
is OLS, mod4
is quasi-Poisson
We can use the stargazer()
command to export the models’ coefficient estimates into a formatted table, which you could add to a paper or report:
stargazer::stargazer(mod1,mod2,mod3,mod4,type = "text",
keep = c("road_density","pop_1000","crops_mean",
"dams","prop_sunni"),
add.lines = list(c("Province FE","N","N","Y","Y")))
##
## ========================================================================================================
## Dependent variable:
## ------------------------------------------------------------------------------------
## log(is_violence + 1) is_violence log(is_violence + 1) is_violence
## OLS glm: quasipoisson OLS glm: quasipoisson
## link = log link = log
## (1) (2) (3) (4)
## --------------------------------------------------------------------------------------------------------
## road_density 1.026 -7.546* 2.898 -11.922***
## (3.044) (3.955) (3.679) (4.533)
##
## pop_1000 0.001*** 0.001*** 0.001*** 0.001***
## (0.0003) (0.0002) (0.0003) (0.0002)
##
## crops_mean -1.727*** -2.726*** -2.684*** -2.537***
## (0.481) (0.651) (0.528) (0.606)
##
## dams 1.048* 0.793** 0.805* 0.880***
## (0.533) (0.370) (0.459) (0.298)
##
## prop_sunni 0.035*** 0.029*** 0.016*** 0.012***
## (0.003) (0.005) (0.005) (0.004)
##
## --------------------------------------------------------------------------------------------------------
## Province FE N N Y Y
## Observations 162 162 162 162
## R2 0.570 0.790
## Adjusted R2 0.556 0.730
## Residual Std. Error 1.543 (df = 156) 1.204 (df = 125)
## F Statistic 41.351*** (df = 5; 156) 13.079*** (df = 36; 125)
## ========================================================================================================
## Note: *p<0.1; **p<0.05; ***p<0.01
What does this tell us about whether the data support our Hypotheses?
Hypothesis | Expectation | Confirm? (OLS) | Confirm? (QP) |
---|---|---|---|
1. Power projection | \(\beta_{\texttt{road\_density}}<0\) | ||
2. Demographics | \(\beta_{\texttt{pop\_1000}}>0\) | ||
3. Political economy | \(\beta_{\texttt{crops\_mean}}<0\) | ||
4. Key infrastructure | \(\beta_{\texttt{dams}}>0\) | ||
5. Sectarian divisions | \(\beta_{\texttt{prop\_sunni}}>0\) |