Codes in this repository are for our IJGIS paper A Deep Learning Architecture for Semantic Address Matching.
Data are available at:
Please cite the following reference if you use the codes 😊.
@article{Lin+Kang+Wu+Du+Liu:2019,
author = {Yue Lin and Mengjun Kang and Yuyang Wu and Qingyun Du and Tao Liu},
title = {A deep learning architecture for semantic address matching},
journal = {International Journal of Geographical Information Science},
volume = {0},
number = {0},
pages = {1-18},
year = {2019},
publisher = {Taylor & Francis},
doi = {10.1080/13658816.2019.1681431}
}
Release version of the codes can also be cited as
Below is an overview of each file in this repository.
geo_config.py
Hyperparameter settings for the ESIMgeo_data_prepare.py
Tokenize the corpus and convert each address element into indexgeo_data_processor.py
Process the labeled address dataset and divide it into training, development and test setsgeo_ESIM.py
Implementation of the enhanced sequential inference model (ESIM)geo_similarity.py
Calculate statistical characteristics of the labeled address datasetgeo_test.py
Output predictive results of the ESIM on the test setgeo_token.py
Tokenize with the Jieba librarygeo_train.py
Train the ESIM and evaluate its accuracy on the development setgeo_word2vec.py
Train word vectors of address elementsother_CRF.py
Tokenize using CRF [Comber, S.; Arribas-Bel, D. (2019) “Machine learning innovations in address matching: A practical comparison of word2vec and CRFs”. Transactions in GIS, 23 (2): 334–348.]other_crf_w2v.py
Train word vectors of address elements (CRF tokenizer)other_string.py
String similarity-based address matching methods: measure the string relevanceother_w2v_cls.py
Use word2vec embeddings directly for classification: calculat cosine similarity