Follow these steps to reproduce the experiments in our paper.
You will need the following resources, which are created using the code in the wiki2gaz or can be downloaded from [TODO: add link]:
../resources/wikidata/wikidata_gazetteer.csv
../resources/wikidata/entity2class.txt
../resources/wikidata/mentions_to_wikidata.json
../resources/wikidata/mentions_to_wikidata_normalized.json
../resources/wikidata/wikidata_to_mentions_normalized.json
../resources/wikipedia/wikidata2wikipedia/index_enwiki-latest.db
You will also need the [word2vec embeddings](TODO: add link) trained from 19th Century data. These embeddings have been created by Nilo Pedrazzini. For more information, check https://github.com/Living-with-machines/DiachronicEmb-BigHistData.
To create the datasets that we use in the experiments presented in the paper, run the following command:
python prepare_data.py -p ../resources
NOTE: Use the
-p
flag to indicate the path to your resources directory.
This script takes care of downloading the LwM and HIPE datasets and format them as needed in the experiments.
To run the experiments, run the following script:
python toponym_resolution.py -p ../resources
NOTE: Use the
-p
flag to indicate the path to your resources directory.
This script does runs for all different scenarios reported in the experiments in the paper.
To evaluate the different approaches and obtain a table with results such as the one provided in the paper, go to the ../evaluation/
directory. There, you should clone the HIPE scorer. We are using the code version at commit 50dff4e, and have added the line return eval_stats
at the end of the get_results()
function. From ../evaluation/
, run the following script to obtain the results in latex format:
python display_results.py