Its a python scripts(class task) which contain two part:
- word tokenization
- line tokenization
- deleting stop words
- word racinisation
- word lemmatisation
- word labeling
- getting The list of documents containing a given word
- getting The number of occurrences of a given word in each returned document
- getting The weight of a given word in each returned document
- getting The tf-idf of a given word in each returned document
- getting The most relevant document for a given word