stories

Clustering of textual documents with time window

How to install

Install cargo (see cargo documentation).
Install stories

cargo install --git https://github.com/medialab/stories.git

How to run

Extract vocabulary

stories vocab my_file.csv --ngrams 2 > my_vocab.csv

Determine time window

WINDOW=`stories window my_file.csv --raw`

Apply clustering algorithm

stories nn my_vocab.csv my_file.csv -w $WINDOW --ngrams 2  --threshold 0.65 > nn.csv

Evaluate cluster quality

xsv join --left id my_file.csv id nn.csv | xsv select id,created_at,nearest_neighbor,thread_id,distance > nn_dated.csv
stories eval my_labels.csv nn_dated.csv --datecol created_at

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

stories

How to install

How to run

Extract vocabulary

Determine time window

Apply clustering algorithm

Evaluate cluster quality

Files

README.md

Latest commit

History

README.md

File metadata and controls

stories

How to install

How to run

Extract vocabulary

Determine time window

Apply clustering algorithm

Evaluate cluster quality