NLP-ATC

NLP-ATC is a extremely lightweight library for text classification, it can train and predict nlp text classification model within one line like sklearn.

下载网盘数据，打标训练集data数据放根目录

Features

text classification model:

fasttext
cnntext
albert-classifier
bert_classifier
hierarchy_attention_network(han) (implementing...)

Get Starts

how to use

Download the xatc module, put it into your working path, then import this module.

prepare train data

train data should be prepared as a file like:

sentence\tlabel
sentence\tlabel
....

train model

Also can see in examples.

################
# cnntext train
################
from xatc.models.cnntext import train
train(data_path=data_path,
      model_path=model_saved_path,
      train_size=0.9,
      model_params={
          "epochs": 1,
          "batch_size": 64,
          "lr":1e-4,
      })

################
# albert train
################
from xatc.models.albert_classifier import train
train(data_path=data_path,
      model_path=model_saved_path,
      train_size=0.9,
      model_params={
          "epochs": 2,
          "batch_size": 128,
          "hidden_dim": 50,
      })

################
# bert train
################
from xatc.models.bert_classifier import train
train(data_path=data_path,
      model_path=model_saved_path,
      train_size=0.9,
      model_params={
          "epochs": 2,
          "batch_size": 128,
          "hidden_dim": 50,
      })

################
# fasttext train
################
from xatc.models.fasttext import train
train(data_path=data_path,
      model_path=model_saved_path,
      train_size=0.9,
      model_params=None)

load model to predict

Different models have the same predict functions. Also can see in examples.

# cnn 
from xatc.models.cnntext import Predictor
model = Predictor(model_path=model_saved_path)
pred = model.predict(data=[...])

# albert
from xatc.models.albert_classifier import Predictor
model = Predictor(model_path=model_saved_path)
pred = model.predict(data=[...])

# bert
from xatc.models.bert_classifier import Predictor
model = Predictor(model_path=model_saved_path)
pred = model.predict(data=[...])

# fasttext
from xatc.models.fasttext import Predictor
model = Predictor(model_path=model_saved_path)
pred = model.predict(data=[...])

API

cnntext train parameters

finetune: default True, if embeddings needs finetune
label_weights: default None, label weights
dropout: default 0.3, dropout rate
embedding_dim: default 200, the dim of embeddings
vocab_path: default will load tencent pre_trained vocabulary
emb_path: default is tencent pre_trained embedding, you can use other pre_trained embeddings
emb_sep: default is " ", the sep of pre_trained embedding file
kernel_size: default is [3, 4, 5], the kernel size the model will use
kernel_num: default is 100, the channal of each kernel
batch_size: default is 64
lr: default is 1e-4, learning rate
eps: default is 1e-8, the eps of Adam optimizer
epochs: default is 5, the epoch of training model
n_jobs: default is 1, the num of thread when preprocess and load data, -1 is the max num of thread

albert classifier train parameters

finetune: default True, if embeddings needs finetune
label_weights: default None, label weights
dropout: default 0.3, dropout rate
hidden_dim, default is 50, the hidden layer size after albert
dropout: default 0.3, dropout rate
pre_trained_model_path: default is albert_tiny_bright, you can load other pytorch bert pre_trained model
batch_size: default is 64
lr: default is 1e-4, learning rate
eps: default is 1e-8, the eps of Adam optimizer
epochs: default is 5, the epoch of training model
n_jobs: default is 1, the num of thread when preprocess and load data, -1 is the max num of thread

bert classifier train parameters

finetune: default True, if embeddings needs finetune
label_weights: default None, label weights
dropout: default 0.3, dropout rate
hidden_dim, default is 50, the hidden layer size after albert
dropout: default 0.3, dropout rate
pre_trained_model_path: default is bert_base_wwm, you can load other pytorch bert pre_trained model
batch_size: default is 64
lr: default is 1e-4, learning rate
eps: default is 1e-8, the eps of Adam optimizer
epochs: default is 5, the epoch of training model
n_jobs: default is 1, the num of thread when preprocess and load data, -1 is the max num of thread

fasttext train parameters (same with facebook fasttext)

lr: default 1.0, learning rate
dim: default 100, size of word vectors
ws: 5, size of the context window
epoch: default 25, number of epochs
minCount: default 1, minimal number of word occurences
minCountLabel: default 0, minimal number of label occurences
minn: default 0, min length of char ngram
maxn: default 0, max length of char ngram
neg: default 5, number of negatives sampled
wordNgrams: 4, max length of word ngram
loss: default 'softmax', loss function {ns, hs, softmax, ova}
bucket: default 2000000, number of buckets
thread: default max_cpu-1, number of threads
lrUpdateRate: default 100, change the rate of updates for the learning rate
t: default 0.0001, sampling threshold
label: default 'label', label prefix
verbose: default 2, verbose
pretrainedVectors: default '' pre_trained word vectors (.vec file) for supervised learning

Version

v1.0.0: text classification model: fasttext, cnntext, albert_classifier
v1.0.1: implement Predictor for each model to replace predict function update readme
v1.1.1: add model: bert_classificatier

Futures

v1.2.1: add model: han

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
examples		examples
xatc		xatc
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

NLP-ATC

Features

Get Starts

how to use

prepare train data

train model

load model to predict

API

cnntext train parameters

albert classifier train parameters

bert classifier train parameters

fasttext train parameters (same with facebook fasttext)

Version

Futures

About

Uh oh!

Releases

Packages

Languages

SHolic/NLP-ATC

Folders and files

Latest commit

History

Repository files navigation

NLP-ATC

Features

Get Starts

how to use

prepare train data

train model

load model to predict

API

cnntext train parameters

albert classifier train parameters

bert classifier train parameters

fasttext train parameters (same with facebook fasttext)

Version

Futures

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages