Install torchtext for data processing
The datasets module currently contains:
- Sentiment analysis: SST and IMDb
- Question classification: TREC
- Entailment: SNLI
- Language modeling: WikiText-2
- Machine translation: Multi30k, IWSLT, WMT14
Others are planned or a work in progress:
- Question answering: SQuAD
The current need to configure the data collection
Download to the project's root directory under the folder vector_cache
- Download IMDB dataset to .data/imdb
- Download SST dataset to .data/sst
- Download TREC Question Classification 2 dataset to .data/imdb
- TextClassificationBenchmark
- .data
- imdb
- aclImdb_v1.tar.gz
- sst
- trainDevTestTrees_PTB.zip
- trec
- train_5500.label
- TREC_10.label
- imdb
- .vector_cache
- glove.42B.300d.zip
- glove.840B.300d.zip
- glove.twitter.27B.zip
- glove.6B.zip
- .data