Plan for today
Migration data
“Where can I find global, annual emigration data by country?”
Migration data (global coverage)
Source/link | Type | Spatial scale | Frequency | Availability |
---|---|---|---|---|
WB Global Bilateral Migration |
Migration flows (origin-dest.) | Country | Annual | 1960-2000 |
WB Open Data |
Net migration, migrant stock | Country | Annual | 1960-2023 |
IOM Migration Data Portal |
Multiple indicators | Country | Annual | 1990-2020 |
Our World in Data |
Multiple indicators | Country | Annual | 1960-2021 |
IDMC Data Portal |
IDPs from conflict, disasters | Country | Annual | 2018-2022 |
Migration data (sub-national and specialized data)
Source/link | Type | Spatial scale | Frequency | Availability |
---|---|---|---|---|
IOM Displacement Trackig Matrix |
IDP flows (origin-dest.) | Country, Adm1, Adm2 | Variable | 2010-2024 |
UNHCR Data Portal |
IDPs, refugees | Country, Adm1 | Variable | Variable |
CTDC |
Human trafficking | Individual | Annual | 1960-2023 |
DHS Immigration Data |
Multiple indicators | Points of entry | Monthly | 2002-2024 |
Classifying points by location
“I have a dataset in .CSV format with 700+ rows. I want to classify each data point into one of two categories, based on their geographical location … (e.g. classifying whether an oil spill occurred in an offshore or onshore area).”
There are 2 ways to do this in QGIS: Intersection
or Join attributes by location
. Let’s demonstrate here with data we’ve used before on dams (points) and country borders (polygons defining areas/categories).
The Intersection
tool (Vector
\(\to\) Geoprocessing tools
) will assign the attributes of the polygon that intersects with each point, while dropping points that fall outside the polygons.
Select the point layer as the Input layer
and the polygons as Overlay layer
, and adjust the overlay fields to keep/drop as needed.
If we compare the attribute tables of the intersection (top) vs. the origial (bottom), we see that the intersection contains multiple additional columns from the polygon layer (e.g. ADMIN
, ADM0_A3
, etc.), while the original ends with LAT_DD
.
However, the feature count in the layer menu tells us that the intersect layer contains 6832
features (points), but the original dams layer contained 6862
. So we lost 30 dams that fell outside of all national borders. What if we want to keep them?
The other option is to use the Join Attributes by Location
tool (Processing Toolbox
\(\to\) Vector general
). It’s the same idea, but with more options (like whether to \(\square\) “Discard records which could not be joined”)
The feature count for the joined layer is the same as for the original dams layer (6862
), as long as \(\square\) “Discard records which could not be joined” is unchecked.
The points that intersect with no polygons are given NULL
values for the joined fields. We can select them, and see that most of these are in coastal waters or on islands in/near international waters.
You can plot the joined attributes to be sure that everything worked out as expected
Visualizing comparisons between maps
“What is the best recommended way to show comparison between maps? … what mapping techniques do you recommend for displaying two different datasets on one map? (e.g. Climate and violence, or voting preferences and income per capita).”
If one variable is continuous (e.g. income) and the other is categorical (e.g. yes/no), you can use a gradient for the continuous variable, and shading lines for the categorical one (QGIS: Single Symbol
/ Hashed
; R: plot(..., density=15, angle=30)
). You’ll need to duplicate the layer to display \(>\) 1 variable at a time
Example from Baum and Zhukov (2015)
If both variables are continuous, it is better to create two maps side-by-side.
Example from Zhukov (2016)
When you do this, make sure the map extent is the same for both maps, and keep everything identical except for the variables you want to compare.
Example from Rozenas et al (2017)
Buffers
“Is there a way to create a safety margin when using raster extraction by mask? (Say, I want an additional 10km outside the boundary of my vector layer to be cropped as well)”
There is no way to do this within Zonal statistics
directly, but we can use the Buffer
tool to pre-process the polygon layer. Let’s demonstrate with data on luminosity (raster) and country borders (polygons).
If the input layer (koreas
) is unprojected, the Buffer
tool will ask for the distance in degrees. You can either change the CRS or do some back-of-the-envelope math.
1 degree \(\approx\) 100 km at the equator. If we want a 10km buffer, that’s roughly 10/110=.091
degrees.
Enter the converted distance in Distance
and run the buffer tool.
The buffered polygon should look similar, but “puffier”
Note that the buffers will overlap in neighboring polygons. So, North Korea will include 10km of South Korea and vice versa.
There are tutorials and YouTube videos online on how to remove the overlap, but it’s too complex to cover here.
You can now implement Zonal statistics
with the buffered polygons as the Input layer
.
You can plot the mean luminosity, and confirm that South Korea is brighter than the North. But you may also want to merge the results back to the original, non-buffered polygons
You can do this by adding a Vector join
in layer properties, here adding the nl_mean
variable from the buffered polygons to the original polygons.
This way, you get to keep the original polygon geometries, while using buffered geometries to calculate zonal statistics.
Re-using the same color symbology
“How do I duplicate a color scale across different layers of my project?”
In QGIS, right-click on the layer whose color symbology you want to duplicate, and select Styles
\(\to\) Copy style
\(\to\) Symbology
Now right-click on the layer whose color symbology you want to replace, and select Styles
\(\to\) Paste style
\(\to\) Symbology
The color scheme and break points should now be replicated in the second layer. Note that you will still need to re-classify the colors in Properties
if the numerical distribution is different in the second layer.
Regression analysis
“I am intrigued by regression analysis and its application. As a student with limited experience in statistics or regression analysis, I wonder if it is feasible for someone like myself to undertake a basic level of analysis.”
Flashback: we used regression analysis in Walk Through 1 (Islamic State): \[\begin{align*} \text{violence}_i=&\beta_1 \text{road density}_i + \beta_2 \text{population}_i +\beta_3 \text{cropland}_i \\ &+\beta_4 \text{dams}_i + \beta_5 \text{Sunni presence}_i + \epsilon_i \end{align*}\] where
Hypothesis | Expectation | Observation |
---|---|---|
1. Power projection | \(\beta_1<0\) | ? |
2. Demographics | \(\beta_2>0\) | ? |
3. Political economy | \(\beta_3<0\) | ? |
4. Key infrastructure | \(\beta_4>0\) | ? |
5. Sectarian divisions | \(\beta_5>0\) | ? |
Several popular types of (basic) regression models
Model | Type of dependent variable | R command |
---|---|---|
1. Linear regression (OLS) | continuous (0.47, -1.97, -0.29) | lm() |
2. Logistic regression (logit) | binary (0, 1) | glm(..., family="binomial") |
3. Quasi-Poisson | counts (0, 1, 2, 3, …) | glm(..., family="quasipoisson") |
Online tutorials (partial list)