Skip to content

Crawenlil/language_model

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Language models

Getting Started

Data format

Corpus file should contain sentences, each in separate line.

Preprocess

Before training model datasets should be prepared based on corpus file. Running

python preprocess.py --corpus-path=data/wiki100k/corpus.txt --output-directory=data/wiki100k/

will create word2index, index2word, index2count, trainset and testset.

Model configuration

Model configurations are stored in configs directory.

Training model

python train.py --config=configs/wiki100k.yaml

Testing model

python test.py --config=configs/wiki100k.yaml

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages