Kashgari

Overview | Performance | Quick start | Documentation | 中文文档 | Contributing

🎉🎉🎉 We are proud to announce that we entirely rewrote Kashgari with tf.keras, now Kashgari comes with easier to understand API and is faster! 🎉🎉🎉

Overview

Kashgari is a simple and powerful NLP Transfer learning framework, build a state-of-art model in 5 minutes for named entity recognition (NER), part-of-speech tagging (PoS), and text classification tasks.

Human-friendly. Kashgari's code is straightforward, well documented and tested, which makes it very easy to understand and modify.
Powerful and simple. Kashgari allows you to apply state-of-the-art natural language processing (NLP) models to your text, such as named entity recognition (NER), part-of-speech tagging (PoS) and classification.
Built-in transfer learning. Kashgari built-in pre-trained BERT and Word2vec embedding models, which makes it very simple to transfer learning to train your model.
Fully scalable. Kashgari provides a simple, fast, and scalable environment for fast experimentation, train your models and experiment with new approaches using different embeddings and model structure.
Production Ready. Kashgari could export model with SavedModel format for tensorflow serving, you could directly deploy it on the cloud.

Our Goal

Academic users Easier experimentation to prove their hypothesis without coding from scratch.
NLP beginners Learn how to build an NLP project with production level code quality.
NLP developers Build a production level classification/labeling model within minutes.

Performance

Task	Language	Dataset	Score	Detail
Named Entity Recognition	Chinese	People's Daily Ner Corpus	94.46 (F1)	Text Labeling Performance Report

Tutorials

Here is a set of quick tutorials to get you started with the library:

There are also articles and posts that illustrate how to use Kashgari:

Quick start

Requirements and Installation

🎉🎉🎉 We renamed the tf.keras version as kashgari-tf 🎉🎉🎉

The project is based on TensorFlow 1.14.0 and Python 3.6+, because it is 2019 and type hinting is cool.

pip install kashgari-tf
# CPU
pip install tensorflow==1.14.0
# GPU
pip install tensorflow-gpu==1.14.0

Example Usage

Let's run an NER labeling model with Bi_LSTM Model.

from kashgari.corpus import ChineseDailyNerCorpus
from kashgari.tasks.labeling import BiLSTM_Model

train_x, train_y = ChineseDailyNerCorpus.load_data('train')
test_x, test_y = ChineseDailyNerCorpus.load_data('test')
valid_x, valid_y = ChineseDailyNerCorpus.load_data('valid')

model = BiLSTM_Model()
model.fit(train_x, train_y, valid_x, valid_y, epochs=50)

"""
_________________________________________________________________
Layer (type)                 Output Shape              Param #
=================================================================
input (InputLayer)           (None, 97)                0
_________________________________________________________________
layer_embedding (Embedding)  (None, 97, 100)           320600
_________________________________________________________________
layer_blstm (Bidirectional)  (None, 97, 256)           235520
_________________________________________________________________
layer_dropout (Dropout)      (None, 97, 256)           0
_________________________________________________________________
layer_time_distributed (Time (None, 97, 8)             2056
_________________________________________________________________
activation_7 (Activation)    (None, 97, 8)             0
=================================================================
Total params: 558,176
Trainable params: 558,176
Non-trainable params: 0
_________________________________________________________________
Train on 20864 samples, validate on 2318 samples
Epoch 1/50
20864/20864 [==============================] - 9s 417us/sample - loss: 0.2508 - acc: 0.9333 - val_loss: 0.1240 - val_acc: 0.9607

"""

Run with GPT-2 Embedding

from kashgari.embeddings import GPT2Embedding
from kashgari.corpus import ChineseDailyNerCorpus
from kashgari.tasks.labeling import BiGRU_Model

train_x, train_y = ChineseDailyNerCorpus.load_data('train')
valid_x, valid_y = ChineseDailyNerCorpus.load_data('valid')

gpt2_embedding = GPT2Embedding('<path-to-gpt-model-folder>', sequence_length=30)
model = BiGRU_Model(gpt2_embedding)
model.fit(train_x, train_y, valid_x, valid_y, epochs=50)

Run with Bert Embedding

from kashgari.embeddings import BERTEmbedding
from kashgari.tasks.labeling import BiGRU_Model
from kashgari.corpus import ChineseDailyNerCorpus

bert_embedding = BERTEmbedding('<bert-model-folder>', sequence_length=30)
model = BiGRU_Model(bert_embedding)

train_x, train_y = ChineseDailyNerCorpus.load_data()
model.fit(train_x, train_y)

Contributing

Thanks for your interest in contributing! There are many ways to get involved; start with the contributor guidelines and then check these open issues for specific tasks.

Reference

This library is inspired by and references following frameworks and papers.

Name		Name	Last commit message	Last commit date
Latest commit History 632 Commits
.github/ISSUE_TEMPLATE		.github/ISSUE_TEMPLATE
kashgari		kashgari
mkdocs		mkdocs
tests		tests
.coveragerc		.coveragerc
.flake8		.flake8
.gitignore		.gitignore
.travis.yml		.travis.yml
LICENSE		LICENSE
README.md		README.md
requirements.dev.txt		requirements.dev.txt
requirements.txt		requirements.txt
setup.py		setup.py
sonar-project.properties		sonar-project.properties

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Kashgari

Overview | Performance | Quick start | Documentation | 中文文档 | Contributing

Overview

Our Goal

Performance

Tutorials

Quick start

Requirements and Installation

Example Usage

Run with GPT-2 Embedding

Run with Bert Embedding

Contributing

Reference

About

Releases

Packages

Languages

License

Xu-Chen/Kashgari

Folders and files

Latest commit

History

Repository files navigation

Kashgari

Overview | Performance | Quick start | Documentation | 中文文档 | Contributing

Overview

Our Goal

Performance

Tutorials

Quick start

Requirements and Installation

Example Usage

Run with GPT-2 Embedding

Run with Bert Embedding

Contributing

Reference

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages