What is geocoding?
Example:
Find your location!
Geocoder components
Input address, output data
What can be geocoded?
Descriptive locational data:
Hail to the victors
Sources of error in input data
How to avoid some of these problems?
Wrong number
What is pre-processing?
Why pre-process?
Undeliverable address
Common pre-processing tasks
Lost in translation
Filtering unnecessary words, text
Why strip capitalization, punctuation, etc?
\#
, \%
are special characters in many programming languages)
Sentences \(\to\) Tokens
Parts of speech tagging
Do we care if a word is a noun or a verb?
It depends on the application:
nlp.stanford.edu
)
Sentence \(\to\) POS tags
Lemmatization
relating multiple versions of same word to common, standard term
Procedure:
Many-to-one example
How to find the best output candidate?
Match-making
Sources of error in matching
Bad film (probably)
What are reference data?
Geographically-coded information used to match input to output
Like this, but electronic
Gazetteer data
Example gazetteer data
Topologically Integrated Geographic Encoding and Referencing (TIGER/Line)
Example TIGER/Line
Crowd-sourced data
OSM is free, Google isn’t
Sources of error in reference data
Re-routing
What is the output?
Any geographically-referenced information:
Location found!
Sources of error in output
Wrong centroid
Wrong line