Skip to content
forked from junhua/IPOD

A Corpus of 192,000 Industrial Occupations

License

Notifications You must be signed in to change notification settings

datalab-vn/ipod

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 

Repository files navigation

Industrial and Professional Occupations Dataset (IPOD)

License: MIT

This repo includes:

  • A Gazetteer of tokens and NE tags annotated by 3 domain experts
  • A Corpus of 192,000 job titles crawled from Linkedin, with NE tags prefixed using BIOES schemes
  • Title2Vec pre-trained job title embedding finetuned from ELMo. Checkpoint available for Download.

Citing IPOD

Please cite the following papers when using IPOD:

@article{liu2019ipod,
    title={IPOD: An Industrial and Professional Occupations Dataset and its Applications to Occupational Data Mining and Analysis},
    author={Junhua Liu and Yung Chuen Ng and Kristin L. Wood and Kwan Hui Lim},
    year={2019},
    journal={arXiv preprint arXiv:1910.10495}
}

@article{liu2020ipod,
    title={A Large-scale Industrial and Professional Occupation Dataset},
    author={Junhua Liu and Yung Chuen Ng and Kwan Hui Lim},
    year={2020},
    journal={arXiv preprint arXiv:2005.02780}
}

About

A Corpus of 192,000 Industrial Occupations

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published