This repo includes:
- A Gazetteer of tokens and NE tags annotated by 3 domain experts
- A Corpus of 192,000 job titles crawled from Linkedin, with NE tags prefixed using BIOES schemes
- Title2Vec pre-trained job title embedding finetuned from ELMo. Checkpoint available for Download.
Please cite the following papers when using IPOD:
@article{liu2019ipod,
title={IPOD: An Industrial and Professional Occupations Dataset and its Applications to Occupational Data Mining and Analysis},
author={Junhua Liu and Yung Chuen Ng and Kristin L. Wood and Kwan Hui Lim},
year={2019},
journal={arXiv preprint arXiv:1910.10495}
}
@article{liu2020ipod,
title={A Large-scale Industrial and Professional Occupation Dataset},
author={Junhua Liu and Yung Chuen Ng and Kwan Hui Lim},
year={2020},
journal={arXiv preprint arXiv:2005.02780}
}