GLiClass is an efficient, zero-shot sequence classification model inspired by the GLiNER framework. It achieves comparable performance to traditional cross-encoder models while being significantly more computationally efficient, offering classification results approximately 10 times faster by performing classification in a single forward pass.
📄 Blog
•
📢 Discord
•
📺 Demo
•
🤗 Available models
•
Install GLiClass easily using pip:
pip install gliclass
Clone and install directly from GitHub:
git clone https://github.com/Knowledgator/GLiClass
cd GLiClass
python -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
pip install -r requirements.txt
pip install .
Verify your installation:
import gliclass
print(gliclass.__version__)
from gliclass import GLiClassModel, ZeroShotClassificationPipeline
from transformers import AutoTokenizer
model = GLiClassModel.from_pretrained("knowledgator/gliclass-small-v1.0")
tokenizer = AutoTokenizer.from_pretrained("knowledgator/gliclass-small-v1.0")
pipeline = ZeroShotClassificationPipeline(
model, tokenizer, classification_type='multi-label', device='cuda:0'
)
text = "One day I will see the world!"
labels = ["travel", "dreams", "sport", "science", "politics"]
results = pipeline(text, labels, threshold=0.5)[0]
for result in results:
print(f"{result['label']} => {result['score']:.3f}")
With new models trained with retrieval-agumented classification, such as this model you can specify examples to improve classification accuracy:
example = {
"text": "A new machine learning platform automates complex data workflows but faces integration issues.",
"all_labels": ["AI", "automation", "data_analysis", "usability", "integration"],
"true_labels": ["AI", "integration", "automation"]
}
text = "The new AI-powered tool streamlines data analysis but has limited integration capabilities."
labels = ["AI", "automation", "data_analysis", "usability", "integration"]
results = pipeline(text, labels, threshold=0.1, rac_examples=[example])[0]
for predict in results:
print(f"{predict['label']} => {predict['score']:.3f}")
- Sentiment Analysis: Rapidly classify texts as positive, negative, or neutral.
- Document Classification: Efficiently organize and categorize large document collections.
- Search Results Re-ranking: Improve relevance and precision by reranking search outputs.
- News Categorization: Automatically tag and organize news articles into predefined categories.
- Fact Checking: Quickly validate and categorize statements based on factual accuracy.
Prepare your training data as follows:
[
{"text": "Sample text.", "all_labels": ["sports", "science", "business"], "true_labels": ["sports"]},
...
]
Optionally, specify confidence scores explicitly:
[
{"text": "Sample text.", "all_labels": ["sports", "science"], "true_labels": {"sports": 0.9}},
...
]
Please, refer to the train.py
script to set up your training from scratch or fine-tune existing models.