marella/ctransformers: Python bindings for the Transformer models implemented in C/C++ using GGML library. #499
Labels
github
gh tools like cli, Actions, Issues, Pages
llm
Large Language Models
llm-applications
Topics related to practical applications of Large Language Models in various fields
llm-inference-engines
Software to run inference on large language models
ml-inference
Running and serving ML models.
shell-script
shell scripting in Bash, ZSH, POSIX etc
technical-writing
Links to deep technical writing and books
CTransformers
![Build and Test](https://github.com/ marella / ctransformers / actions / workflows / build.yml / badge.svg)
Python bindings for the Transformer models implemented in C/C++ using GGML library. Also see ChatDocs
Supported Models
Installation
To install via
pip
, simply run:Usage
It provides a unified interface for all models:
Run in Google Colab
To stream the output:
You can load models from Hugging Face Hub directly:
If a model repo has multiple model files (
.bin
or.gguf
files), specify a model file using:🤗 Transformers
Note: This is an experimental feature and may change in the future.
To use with 🤗 Transformers, create the model and tokenizer using:
Run in Google Colab
You can use 🤗 Transformers text generation pipeline:
You can use 🤗 Transformers generation parameters:
You can use 🤗 Transformers tokenizers:
LangChain
It is integrated into LangChain. See LangChain docs.
GPU
To run some of the model layers on GPU, set the
gpu_layers
parameter:Run in Google Colab
CUDA
Install CUDA libraries using:
ROCm
To enable ROCm support, install the
ctransformers
package using:Metal
To enable Metal support, install the
ctransformers
package using:GPTQ
Note: This is an experimental feature and only LLaMA models are supported using [ExLlama](https
://github.com/TheLastBen/exllama).
Install additional dependencies using:
Load a GPTQ model using:
Run in Google Colab
If the model name or path doesn't contain the word
gptq
, specifymodel_type="gptq"
.It can also be used with LangChain. Low-level APIs are not fully supported.
Documentation
Find the documentation on Read the Docs.
Config
top_k
int
40
top_p
float
0.95
temperature
float
0.8
repetition_penalty
float
1.1
last_n_tokens
int
64
seed
int
-1
max_new_tokens
int
256
stop
List
None
stream
bool
False
reset
bool
True
batch_size
int
8
threads
int
-1
context_length
int
-1
gpu_layers
int
0
Find the URL for the model card for GPTQ here.
Made with ❤️ by marella
Suggested labels
null
The text was updated successfully, but these errors were encountered: