Skip to content

LambertOSU/LexicalGeography

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 

Repository files navigation

LexicalGeography

LexicalGeography is a tool for extracting location data from text. It is heavily inspired by the geotext package, however it contains additional features that geotext does not.

The gazetteer is built with GeoNames data.

The lexigeo class will identify the names of states (Admin 1) and counties (Admin 2) in addition to cities and countries.

LexicalGeography is under development. I would like to optimize the gazetteer creation, as well as implement a method for predicting the "most likely location."

As of right now, the name "Franklin County, Virginia" will identify the state of Virginia, a city named Franklin, and the county name. Since GeoNames provides all of the parent data, it can be used to predict the most likely single location.

Required Files

These files must be downloaded from GeoNames.

  • countryInfo.txt

  • admin1CodesASCII.txt

  • admin2Codes.txt

  • cities1000.txt

Currently they are stored in a directory called ./01_location_data/ which is used in the class.

About

A tool for extracting location data from text.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages