Skip to content

halolimat/LNEx

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

License: AGPL v3 GitHub release Build Status

LNEx Logo

Knoesis Hazards SEES Project Logo

Location Name Extractor

Extracts location names from targeted text streams. [Paper, Poster]


How do you pronounce LNEx?

Le-N-x


Following are the steps which allows you to setup and start using LNEx.

Querying OpenStreetMap Gazetteers

We will be using a ready to go elastic index of the whole OpenStreetMap data provided by komoot as part of their photon open source geocoder (project repo). If you don't need to have the full index of OpenStreetMap then you might look for alternative options such as Pelias OpenStreetMap importer provided by Mapzen.

Using Photon might be a good idea for some users if they have enough space (~ 72 GB) and if they want to use LNEx for many streams along the way. If that sound like something you wanna do, follow the steps below:

  • Download the full photon elastic index which is going to allow us to query OSM using a bounding box

    wget -O - http://download1.graphhopper.com/public/photon-db-latest.tar.bz2 | bzip2 -cd | tar x
  • Now, start photon which starts the elastic index in the background as a service

    wget http://photon.komoot.de/data/photon-0.2.7.jar
    java -jar photon-0.2.7.jar
  • You should get the Port number information from the log of running the jar, similar to the following:

    [main] INFO org.elasticsearch.http - [Amelia Voght] bound_address {inet[/127.0.0.1:9201]},
    publish_address {inet[/127.0.0.1:9201]}
    
    • this means that elasticsearch is running correctly and listening on:
    localhost:9201
    
    • You can test the index by running the following command:
        curl -XGET 'http://localhost:9201/photon/place/_search/?size=5&pretty=1' -d '{
          "query": {
            "filtered": {
              "filter": {
                "geo_bounding_box" : {
                  "coordinate" : {
                    "top_right" : {
                      "lat" : 13.7940725231,
                      "lon" : 80.4034423828
                    },
                    "bottom_left" : {
                      "lat" : 12.2205755634,
                      "lon" : 79.0548706055
                    }
                  }
                }
              }
            }
          }
       }'

Using LNEx

  • Clone this repository to your machine as follows:

    git clone https://github.com/halolimat/LNEx.git
  • Install LNEx as follows:

    cd LNEx
    python setup.py install
  • Now, you can start using LNEx to spot locations in tweets.

    # Import LNEx inside your python script:   
    import LNEx as lnex
    
    # Define the elastic index connection string and index name
    lnex.elasticindex(conn_string='localhost:9200', index_name="photon")
    
    # Build the custom OSM gazetteer using your desired bounding box (e.g., for Chennai, India):
    chennai_bb = [12.74, 80.066986084, 13.2823848224, 80.3464508057]
    
    # Initialize LNEx using the defined bounding box. You can also choose to augment the gazetteer.
    lnex.initialize(chennai_bb, augment=True)
    
    # Now, we are ready to extract location names from the tweet
    lnex.extract("New avadi rd is closed #ChennaiFloods.")
  • The output is going to be a list of tuples of the following items:

    • Spotted_Location: is a substring of the tweet
    • Location_Offsets: are the start and end offsets of the Spotted_Location
    • Geo_Location: is the matched location name from the gazetteer
    • Geo_Info_IDs: are the ids of the geo information of the matched Geo_Locations
    # output of the above code
    [('Chennai', (24, 31), 'chennai', [6568]),
     ('New avadi rd', (0, 12), u'new avadi road', [9568, 5060, 7238, 5063, 1896, 12722, 2820, 9375])]
  • You can also use the pre-written test run module 'pytest.py' to test LNEx. You can use LNEx by initializing it using the cached files in the '_Data' folder or you can initialize it using the photon index after running it in the background.

  • Finally, LNEx is lightening fast and capable of tagging streams of texts, you can incorporate the following code to start streaming from Twitter (taking into consideration the spatial context) then define the bounding box that matches the spatial context established by your stream and start tagging the tweets.

Dataset

Tagged Location Names in Targeted Social Media Streams dataset

This dataset contains 4500 annotated tweets 1500 tweets from each of three Twitter streams (i.e., Chennai 2015, Louisiana 2016, and Houston 2016 floods). They were tagged using Brat tool recording the start and end character offsets of each mention with a given location category, i.e., inLoc, outLoc, and ambLoc, as mentioned in the LNEx paper.

You can fill out the following Form to the get the full dataset. Alternatively, you can get a subset of the dataset from this folder, which only contains 150 tweets.

Citing

If you do make use of LNEx or any of its components please cite the following publication:

Hussein S. Al-Olimat, Krishnaprasad Thirunarayan, Valerie Shalin, and Amit Sheth. 2018. 
Location Name Extraction from Targeted Text Streams using Gazetteer-based Statistical Language Models. 
In Proceedings of the 27th Internationl Conference on Computational Linguistics (COLING 2018), 
pages 700–710. Association for Computational Linguistics.

@InProceedings{C18-1169,
  author = "Al-Olimat, Hussein S.
           and Thirunarayan, Krishnaprasad
           and Shalin, Valerie
           and Sheth, Amit",
  title = "Location Name Extraction from Targeted Text Streams using Gazetteer-based Statistical Language Models",
  booktitle = "Proceedings of the 27th International Conference on Computational Linguistics",
  year = "2018",
  publisher = "Association for Computational Linguistics",
  pages = "1986--1997",
  location = "Santa Fe, New Mexico, USA",
  url = "http://aclweb.org/anthology/C18-1169"
}

We would also be very happy if you provide a link to the github repository:

... location name extractor tool (LNEx)\footnote{
    \url{https://github.com/halolimat/LNEx}
}