Web page crawler for crawling information of popular singers / songs / albums / singer-song pairs
- KKBOX
- myMusic
- Spotify
- qqmusic
multi_crawler() -> main_crawler() -> language_crawler() -> individual_crawler_module()
multi_crawler(): run a multithread process for crawling info in different languages
main_crawler(): main wrapper for language_crawler() with the data concatention function
language_crawler(): wrapper function for multiple dates input for the basic crawler
individual_crawler_module(): customized fundamental crawler for different webpage
Pick / write a customized fundamental crawler and call:
- multi_crawler(): for faster crawling
- main_crawler(): for data concatention
some useful tools for quickly analyze the Mega data
- checking_duplicate: check if data is duplicated in two sources
- count_duplicate: count Mega data ASR / E2E for different locales
- tokenization: trial of tokenization using jieba
- counter_cosine_similarity: counting similarity of two data sets
tool for creating alias list from wiki using combinations of selenium and wikipedia library