Skip to content

Tokenizes the documents of a folder given as a command line argument and writes files docids.txt - id and filename for each document, termids.txt - id and word for each term, doc_index.txt - positions in which the term is present in the document.

Notifications You must be signed in to change notification settings

mstojanovicm00/tokenizer

About

Tokenizes the documents of a folder given as a command line argument and writes files docids.txt - id and filename for each document, termids.txt - id and word for each term, doc_index.txt - positions in which the term is present in the document.

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages