Goal: explain geographic variation in Islamic State violence
Operation Inherent Resolve
There is no formal Problem Set this week, but if you would like to replicate this in QGIS, follow along (for walk-through in R or Python, see replication code on Canvas)
Overview of lab exercise
Data on political violence are easily found online. If we search for georeferenced violent events data, a top result is for “UCDP Dataset Download Center”
The Uppsala Conflict Data Program (UCDP) is one of the world’s leading providers of open-source data on violence and armed conflict, based at Uppsala University
Download-able data are located at ucdp.uu.se/downloads.
Click on “Georeferenced Event Dataset (GED)”
We have used GED before, for Problem Set 02 (Afghanistan and Yemen).
Download GED as a csv file and extract it from the zip archive
There are about 317K events here, each of which has point coordinates
For data on administrative boundaries, Rick Wilson’s “Free GIS Data” site has links to multiple data sources. Of these, we will use gadm.org
From GADM’s website, we can get data on sub-national administrative boundaries. Download the level2 GeoJSON file for Iraq…
… and for Syria
Multiple data sources exist for roads. Among the easiest to get are those from diva-gis.org/gData
These are country-level road layers from “Digital Chart of the World”, download-able as zipped shapefiles
Download the roads data for both Iraq and Syria here
For population data, we can go back to Rick Wilson’s page. There are many options, but let’s try “WorldPop”
On the landing page, go to “Access… data”.
Let’s grab the global mosaic file.
The earliest population raster they have is from 2015 (1 year after IS campaign began). Suboptimal, but we’ll take it. Click on Data and Resources.
Here we see a preview of the data and, if we scroll down…
… a link to download the data.
Also at the RT Wilson page, we can find data on cropland (e.g., “Earthstat”)
The first dataset we see here is a global raster from 2000. This pre-dates the IS campaign (good) by about 14 years (less good). This likely overstates how much cropland was still active in 2014. Suboptimal, but OK for an exploratory analysis.
For data on hydroelectric dams, we can do a quick web search. This quickly points us to a dataset called “Global Reservoir and Dam (GRanD)”.
Look for a link to download the data.
Seems legit… (scan ZIPs for viruses before extracting)
For data on sectarian divisions, a relatively reliable global source is the “Ethnic Power Relations” (geoEPR) dataset — we used this in Problem Set 02
Specifically, we will download the GeoEPR-2021.geojson file, which contains polygons for each ethno-religious group (icr.ethz.ch/data/epr/geoepr)
Here is the full list of data sources and links:
| Category | Type | Format | Data source |
|---|---|---|---|
| ISIS violence | Table (non-geo) | .csv |
UCDP GED |
| Administrative units | Vector (polygons) | .geojson |
GADM |
| Roads | Vector (polylines) | .shp |
DIVA-GIS |
| Population | Raster | .tif |
WorldPop |
| Cropland | Raster | .tif |
EarthStat |
| Dams | Vector (points) | .shp |
GRanD |
| Sectarian divisions | Vector (polygons) | .geojson |
ETH-Zurich |
These are all in the WT01.zip file posted on Canvas.
Always save your progress!
Go to Project \(\to\) Save As...
Load administrative boundaries (Layer \(\to\) Add Layer \(\to\) Add Vector Layer).
For Iraq: gadm41_IRQ_2.json file in Data/GADM folder.
For Syria: gadm41_SYR_2.json file in Data/GADM folder.
The two sets of district boundaries should be visible in the project window.
Let’s merge them into a single layer, with districts for both countries.
Open the Merge Vector Layers tool (Vector \(\to\) Data Management Tool \(\to\) Merge Vector Layers).
Click on the [...] box next to Input layers
Check the boxes next to gadm41_IRQ_2 and gadm41_SYR_2. Click OK.
Click on the [...] box next to Merged. Find a location in which to save the output, and name the file admin.geojson. Click Run
The merged layer should appear in the main project window.
You can remove the gadm41_IRQ_2 and gadm41_SYR_2 layers from the project (Right-click on layer(s) in Layers menu \(\to\) Remove Layer...)
To conserve memory, it’s good practice to remove data objects were are not actively using
Load roads data (Layer \(\to\) Add Layer \(\to\) Add Vector Layer).
Try opening two datasets at the same time: IRQ_roads.shp and SYR_roads.shp files in Data/Roads
Paths to both files should appear in the Source box. Click Add
Let’s repeat the Merge Vector Layers procedure for these two layers (Vector \(\to\) Data Management Tool \(\to\) Merge Vector Layers).
This time, perform the merge on the IRQ_roads and SYR_roads layers
Save the output file as roads.geojson
As before, you can remove the two country-specific roads layers (we no longer need them)
Let’s calculate each district’s road density with the Sum Line Lengths tool (Vector \(\to\) Analysis \(\to\) Sum Line Lengths)
Set Polygons \(=\) admin, Lines \(=\) roads, Line length field name \(=\) roads_m, Lines count field name \(=\) roads_ct. Save as admin_2.geojson
Open the attribute table for admin_2. The roads_m and roads_ct variables should be in the table. Let’s convert from meters to kilometers
Open Field Calculator for admin_2. Create new field, roads_km, of type Decimal number (real). For the Expression, write roads_m/1000. The Output preview should show a number with decimals. Click OK
The resulting roads_km field should be visible in the attribute table
If we plot this variable, we can see its main limitations: smaller districts have less road length, so this becomes a proxy for district size. Let’s create a road density measure that takes into account district size
Back in the Field Calculator, create a new field called area_km2 of type Decimal number (real). For Expression, write $area/1000000 (divide by 1M to convert from m\(^2\) to km\(^2\)). Click OK
Open the Field Calculator again. Create a new field called road_density of type Decimal number (real). For Expression, write roads_km/area_km2
Check to make sure road_density is added to attribute table
If we plot this variable, we see the opposite pattern from before (smaller, urban districts have higher road density)
Save the edits you just made to the admin_2 layer (right click \(\to\) Save Layer Edits)
Load population raster (Layer \(\to\) Add Layer \(\to\) Add Raster Layer).\ Open the global_pop_2015_CN_1km_R2025A_UA_v1.tif file in Data/WorldPop folder.
The (global) population layer should now be visible. Let’s extract a subset of just the part of this layer that overlaps with Syria and Iraq.
Open the Clip Raster by Extent tool (Raster \(\to\) Extraction \(\to\) Clip Raster by Extent)
Set Input layer \(=\) gpw_v4_population_count_rev11_2010_15_min. Click the [...] button next to Clipping extent \(\to\) Calculate from Layer \(\to\) admin_2
Save the file as pop.tif. Accept defaults for all other parameters, and click Run
A new layer called pop should appear. You can remove the original, global gpw_v4... layer
Let’s calculate the population size of each district. Open the Processing Toolbox and open the Zonal statistics tool in the Raster Analysis menu
In the Zonal Statistics tool, set Input layer\(=\)admin_2, Raster layer\(=\)pop, and Output column prefix\(=\)pop_.
Click the [...] button next to Statistics to calculate
Check the box next to Sum. Leave all others un-checked. Click OK
Save the zonal statistics output as admin_3.geojson. Click Run
This should have generated a new layer, admin_3, with a new field, pop_sum (check the layer’s attribute table)
Load cropland raster (Layer \(\to\) Add Layer \(\to\) Add Raster Layer).
Open the cropland.tif file in Data/Cropland folder.
The (global) crops layer should now be visible. Let’s extract a subset of it, as we just did for population
Open the Clip Raster by Extent tool (Raster \(\to\) Extraction \(\to\) Clip Raster by Extent). Set Input layer \(=\) cropland. Click the [...] button next to Clipping extent \(\to\) Calculate from Layer \(\to\) admin_3
Save the output to a file called crops.tif
Make sure everything looks right. Remove the original, global cropland layer
Let’s calculate crop cultivation in each district. Open the Zonal statistics tool in Processing Toolbox. Set Input layer\(=\)admin_3, Raster layer\(=\)crops, and Output column prefix\(=\)crops_.
Click the [...] button next to Statistics to calculate
Check the box next to Mean. Leave all others un-checked. Click OK, then Run on the next screen
Save the output to a file called admin_4.geojson
This should have generated a new field, crops_mean in the new admin_4 layer (check the layer’s attribute table)
Load hydroelectric dam locations (Layer \(\to\) Add Layer \(\to\) Add Vector Layer). Open the GRanD_dams_v1_3.shp file in Data/Dams folder.
The dam locations should appear on the project window. Rename it from GRanD_dams_v1_3 to dams
Let’s calculate the number of dams per district. Open the Count Points in Polygon tool (Vector \(\to\) Analysis Tools \(\to\) Count Points in Polygon)
Select Polygons \(=\) admin_4, Points \(=\) dams. Name the count field dams, and save the output file as admin_5.geojson. Click Run
Add the sectarian divisions data to the project, using Add Vector Layer....\ Load the GeoEPR-2021.geojson file in Data/GeoEPR folder
The (global) ethnicity layer should appear in the project window. Let’s calculate the proportion of each district populated by Sunni Arabs. Let’s first extract the subset that overlaps with our study area.
Open the Clip (vector) tool (Vector \(\to\) Geoprocessing Tools \(\to\) Clip)
Set Input layer \(=\) GeoEPR, Overlay layer \(=\) admin_5. Save the file as ethnicity.geojson. Click Run
Uh oh! Execution failed! The error message says that GeoEPR-2021 contains “invalid geometries”. How do we fix this?
Many “invalid geometry” problems can be solved with the “zero buffer trick”. Go to Vector \(\to\) Geoprocessing Tools \(\to\) Buffer...
Set Input layer \(=\) GeoEPR-2021 and Distance \(=\) 0. Keep the defaults for the other parameters and save the file as geoepr_0buffer.geojson. Click Run
Now let’s try the clipping operation again. To save time, go to Processing \(\to\) History...
In the Processing History window, double-click on Clip (it should be the second-most recent item)
Set the Input layer \(=\) geoepr_0buffer. Click Run
This may take a few minutes, but should finish without errors…
The clipped layer should appear in the project window. You can remove the full GeoEPR layer to save memory
Look at the group field in the attribute table for ethnicity. There are multiple features for several groups (Kurds, Alawites, etc.). Let’s extract Sunni Arab polygons only
Open the Select by Expression tool (Edit menu \(\to\) Select \(\to\) Select Features by Expression)
Let’s use regular expressions to extract the polygons we need. Set Expression \(=\) regexp_match(group,'Sunni'). Click Select Features
The selected polygons should appear yellow on the map. Let’s extract the selection into a new layer. Right-click ethnicity in the Layer menu, go to Export\(\to\)Save Selected Features As
Save the layer as sunni.geojson. Make sure the box is checked next to Save only selected features. Click OK
The extracted selection should appear. Hide the previous layer ethnicity
The attribute table for sunni should now contain only features where the group field contains the word Sunni. Let’s consolidate these features into one polygon
Open the Dissolve tool (Vector \(\to\) Geoprocessing Tools \(\to\) Dissolve).
Set Input layer\(=\)sunni. Save the output as sunni_2.geojson
The dissolved polygon should appear in the project window. Now let’s calculate the proportion of each district populated by Sunnis
Open the Overlap Analysis tool (in Processing Toolbox \(\to\) Vector Analysis). Set Input layer \(=\) admin_5 and click the [...] button next to Overlay layers
Check the box next to sunni_2. Click OK
Save the output as admin_6.geojson and click Run
By default, the overlap fields in admin_6 will be named sunni_2_area and sunni_2_pc. Let’s change the name of these fields
Open the layer Properties for admin_6, go to the Fields tab and click on the “pencil” button (Toggle editing mode)
Scroll down to sunni_2_pc and double-click on its name
Rename the field prop_sunni
Hit Enter key (or Return key) to commit the name change. Click on the “pencil” again to leave editing mode
When prompted, save the changes you just made to admin_6
While we’re at it, let’s create a couple additional fields that could come in handy in the analysis. Create a population density field, named pop_density, of type Decimal number (real), with Expression set to pop_sum/area_km2
Let’s also create a rescaled population field (1000’s of residents). Name it pop_1000, with type Decimal number (real). Set Expression to pop_sum/1000
Save the layer edits to admin_6!
Add the Islamic State violence data to the project, using Add Delimited Text Layer.... Load the GEDEvent_v25_1.csv file in Data/GED folder. Set X field \(=\) longitude and Y field \(=\) latitude. Check the box next to Use spatial index
The (global) GED violent events layer should appear. There are several hundred thousand points here. We need to extract events in Syria and Iraq involving the Islamic State
To figure out how to extract this data subset (by actor and location), let’s explore the attribute table. We see fields for side_a and side_b, which list actors
There is also a country field. So, we need to select points where country is Iraq or Syria and the Islamic State is side_a (or side_b)
Highlight the GED layer and go to Edit \(\to\) Select \(\to\) Select by Expression...
Let’s combine regular expressions with logical operators. Set Expression to
regexp_match(country,'Syria|Iraq') AND (
regexp_match(side_a,'IS|Islamic State') OR
regexp_match(side_b,'IS|Islamic State') )
The vertical slash | is a regular expression for “OR”. So, in English this expression means ‘features where the field country contains “Iraq or Syria” and either side_a or side_b contains “IS or Islamic State”’
This should select a little over 23,000 events. Now let’s calculate the number of ISIS attacks per district
Open the Count Points in Polygon tool
Select Polygons \(=\) admin_6, Points \(=\) GEDEvent_v25_1. Make sure the box is checked next to Selected Features Only for the points. Name the count field is_violence, and save the output file as admin_7.geojson. Click Run
This may take a minute or two to run
The attribute table for this new layer should include all the new fields we have generated. Let’s now export the table as CSV for further analysis
Right-click admin_7 in the Layer menu, go to Export\(\to\)Save Features As.
Save the layer as an Comma Separated Values file
Name the file final_data_qgis.csv. Click OK
Loading R packages
To conduct a regression analysis of these data in R, we will be using the stargazer package
library(stargazer)
NOTE: The replication code for all of the preceding steps R is in wt01_demo.R on RStudio Cloud, and in Lab09WT01.zip (posted on Canvas).
Now we’re finally able to proceed to the analysis stage. For this we will need to open the CSV file we just created in R.
This code chunk imports the final_data_qgis.csv file into an object called X, and then lists the variable names:
X = read.csv("Output/final_data_qgis.csv")
names(X)
## [1] "GID_2" "GID_0" "COUNTRY" "GID_1" "NAME_1"
## [6] "NL_NAME_1" "NAME_2" "VARNAME_2" "NL_NAME_2" "TYPE_2"
## [11] "ENGTYPE_2" "CC_2" "HASC_2" "layer" "path"
## [16] "roads_m" "roads_ct" "roads_km" "area_km2" "road_density"
## [21] "pop_sum" "crops_mean" "dams" "sunni_2_area" "prop_sunni"
## [26] "pop_density" "pop_1000" "is_violence"
All the variables we created seem to be here.
Let’s now run some regression models!
Quick refresher: our regression analysis will test 5 hypotheses at once \[\begin{align*} \text{violence}_i=&\beta_1 \text{road density}_i + \beta_2 \text{population}_i +\beta_3 \text{cropland}_i \\ &+\beta_4 \text{dams}_i + \beta_5 \text{Sunni presence}_i + \epsilon_i \end{align*}\] where
is_violence)road_density, pop_1000, crops_mean, dams, prop_sunni) | Hypothesis | Expectation | Observation |
|---|---|---|
| 1. Power projection | \(\beta_1<0\) | ? |
| 2. Demographics | \(\beta_2>0\) | ? |
| 3. Political economy | \(\beta_3<0\) | ? |
| 4. Key infrastructure | \(\beta_4>0\) | ? |
| 5. Sectarian divisions | \(\beta_5>0\) | ? |
The first model (mod1) is an Ordinary Least Squares model that regresses a logarithmically-transformed dependent variable log(is_violence + 1) on all of the explanatory variables that correspond to our hypotheses. The log-transform is useful here because the is_violence variable is highly skewed
mod1 = lm(log(is_violence+1) ~ road_density + pop_1000 + crops_mean +
dams + prop_sunni, data=X)
summary(mod1)
##
## Call:
## lm(formula = log(is_violence + 1) ~ road_density + pop_1000 +
## crops_mean + dams + prop_sunni, data = X)
##
## Residuals:
## Min 1Q Median 3Q Max
## -4.5068 -1.0639 -0.1906 0.9280 3.8392
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.4918950 0.3455419 4.318 2.79e-05 ***
## road_density -1.4859463 3.1746426 -0.468 0.640390
## pop_1000 0.0009083 0.0002421 3.752 0.000247 ***
## crops_mean -1.4268796 0.4836122 -2.950 0.003662 **
## dams 1.1105982 0.5346231 2.077 0.039408 *
## prop_sunni 0.0363425 0.0029937 12.140 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.547 on 156 degrees of freedom
## Multiple R-squared: 0.569, Adjusted R-squared: 0.5552
## F-statistic: 41.2 on 5 and 156 DF, p-value: < 2.2e-16
The second (mod2) is a Generalized Linear Model with a quasi-Poisson link. This parameterization is designed to accommodate dependent variables that are (over-dispersed) event counts
mod2 = glm(is_violence ~ road_density + pop_1000 + crops_mean +
dams + prop_sunni, data=X, family="quasipoisson")
summary(mod2)
##
## Call:
## glm(formula = is_violence ~ road_density + pop_1000 + crops_mean +
## dams + prop_sunni, family = "quasipoisson", data = X)
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.3622385 0.5788847 7.536 3.70e-12 ***
## road_density -9.2950240 5.8355866 -1.593 0.11323
## pop_1000 0.0004305 0.0001670 2.578 0.01087 *
## crops_mean -2.1724732 0.7841517 -2.770 0.00628 **
## dams 0.8329988 0.4094748 2.034 0.04361 *
## prop_sunni 0.0302367 0.0051981 5.817 3.29e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for quasipoisson family taken to be 429.0671)
##
## Null deviance: 71658 on 161 degrees of freedom
## Residual deviance: 38816 on 156 degrees of freedom
## AIC: NA
##
## Number of Fisher Scoring iterations: 7
The third and fourth models (mod3, mod4) add province-level fixed effects (NAME_1), which allow each province (e.g. Anbar) to have a different baseline level of violence
mod3 = lm(log(is_violence+1) ~ road_density + pop_1000 + crops_mean +
dams + prop_sunni + NAME_1, data=X)
mod4 = glm(is_violence ~ road_density + pop_1000 + crops_mean +
dams + prop_sunni + NAME_1, data=X, family="quasipoisson")
mod3 is OLS, mod4 is quasi-Poisson
We can use the stargazer() command to export the models’ coefficient estimates into a formatted table, which you could add to a paper or report:
stargazer::stargazer(mod1,mod2,mod3,mod4,type = "text",
keep = c("road_density","pop_1000","crops_mean",
"dams","prop_sunni"),
add.lines = list(c("Province FE","N","N","Y","Y")))
##
## ========================================================================================================
## Dependent variable:
## ------------------------------------------------------------------------------------
## log(is_violence + 1) is_violence log(is_violence + 1) is_violence
## OLS glm: quasipoisson OLS glm: quasipoisson
## link = log link = log
## (1) (2) (3) (4)
## --------------------------------------------------------------------------------------------------------
## road_density -1.486 -9.295 -1.845 -12.958***
## (3.175) (5.836) (3.525) (4.943)
##
## pop_1000 0.001*** 0.0004** 0.002*** 0.001***
## (0.0002) (0.0002) (0.0003) (0.0002)
##
## crops_mean -1.427*** -2.172*** -2.204*** -1.721***
## (0.484) (0.784) (0.485) (0.644)
##
## dams 1.111** 0.833** 0.647 0.603*
## (0.535) (0.409) (0.430) (0.315)
##
## prop_sunni 0.036*** 0.030*** 0.017*** 0.016***
## (0.003) (0.005) (0.005) (0.005)
##
## --------------------------------------------------------------------------------------------------------
## Province FE N N Y Y
## Observations 162 162 162 162
## R2 0.569 0.819
## Adjusted R2 0.555 0.767
## Residual Std. Error 1.547 (df = 156) 1.121 (df = 125)
## F Statistic 41.196*** (df = 5; 156) 15.695*** (df = 36; 125)
## ========================================================================================================
## Note: *p<0.1; **p<0.05; ***p<0.01
What does this tell us about whether the data support our Hypotheses?
| Hypothesis | Expectation | Confirm? (OLS) | Confirm? (QP) |
|---|---|---|---|
| 1. Power projection | \(\beta_{\texttt{road\_density}}<0\) | ||
| 2. Demographics | \(\beta_{\texttt{pop\_1000}}>0\) | ||
| 3. Political economy | \(\beta_{\texttt{crops\_mean}}<0\) | ||
| 4. Key infrastructure | \(\beta_{\texttt{dams}}>0\) | ||
| 5. Sectarian divisions | \(\beta_{\texttt{prop\_sunni}}>0\) |