Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

marella/ctransformers: Python bindings for the Transformer models implemented in C/C++ using GGML library. #499

Open
1 task
irthomasthomas opened this issue Feb 2, 2024 · 0 comments
Labels
github gh tools like cli, Actions, Issues, Pages llm Large Language Models llm-applications Topics related to practical applications of Large Language Models in various fields llm-inference-engines Software to run inference on large language models ml-inference Running and serving ML models. shell-script shell scripting in Bash, ZSH, POSIX etc technical-writing Links to deep technical writing and books

Comments

@irthomasthomas
Copy link
Owner

CTransformers

PyPI version
Documentation
![Build and Test](https://github.com/ marella / ctransformers / actions / workflows / build.yml / badge.svg)
Code style: black

Python bindings for the Transformer models implemented in C/C++ using GGML library. Also see ChatDocs

Supported Models

Model Model Type CUDA Metal
GPT-2 gpt2
GPT-J, GPT4All-J gptj
GPT-NeoX, StableLM gpt_neox
Falcon falcon
LLaMA, LLaMA 2 llamai
MPT mpt
StarCoder, StarChat gpt_bigcode
Dolly V2 dolly-v2
Replit replit

Installation

To install via pip, simply run:

pip install ctransformers

Usage

It provides a unified interface for all models:

from ctransformers import AutoModelForCausalLM

llm = AutoModelForCausalLM.from_pretrained("/path/to/ggml-model.bin", model_type="gpt2")

print(llm("AI is going to"))

Run in Google Colab

To stream the output:

for text in llm("AI is going to", stream=True):
    print(text, end="", flush=True)

You can load models from Hugging Face Hub directly:

llm = AutoModelForCausalLM.from_pretrained("marella/gpt-2-ggml")

If a model repo has multiple model files (.bin or .gguf files), specify a model file using:

llm = AutoModelForCausalLM.from_pretrained("marella/gpt-2-ggml", model_file="ggml-model.bin")

🤗 Transformers

Note: This is an experimental feature and may change in the future.

To use with 🤗 Transformers, create the model and tokenizer using:

from ctransformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("marella/gpt-2-ggml", hf=True)
tokenizer = AutoTokenizer.from_pretrained(model)

Run in Google Colab

You can use 🤗 Transformers text generation pipeline:

from transformers import pipeline

pipe = pipeline("text-generation", model=model, tokenizer=tokenizer)
print(pipe("AI is going to", max_new_tokens=256))

You can use 🤗 Transformers generation parameters:

pipe("AI is going to", max_new_tokens=256, do_sample=True, temperature=0.8, repetition_penalty=1.1)

You can use 🤗 Transformers tokenizers:

from ctransformers import AutoModelForCausalLM
from transformers import AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("marella/gpt-2-ggml", hf=True)  # Load model from GGML model repo.
tokenizer = AutoTokenizer.from_pretrained("gpt2")  # Load tokenizer from original model repo.

LangChain

It is integrated into LangChain. See LangChain docs.

GPU

To run some of the model layers on GPU, set the gpu_layers parameter:

llm = AutoModelForCausalLM.from_pretrained("TheBloke/Llama-2-7B-GGML", gpu_layers=50)

Run in Google Colab

CUDA

Install CUDA libraries using:

pip install ctransformers[cuda]

ROCm

To enable ROCm support, install the ctransformers package using:

CT_HIPBLAS=1 pip install ctransformers --no-binary ctransformers

Metal

To enable Metal support, install the ctransformers package using:

CT_METAL=1 pip install ctransformers --no-binary ctransformers

GPTQ

Note: This is an experimental feature and only LLaMA models are supported using [ExLlama](https
://github.com/TheLastBen/exllama).

Install additional dependencies using:

pip install ctransformers[gptq]

Load a GPTQ model using:

llm = AutoModelForCausalLM.from_pretrained("TheBloke/Llama-2-7B-GPTQ")

Run in Google Colab

If the model name or path doesn't contain the word gptq, specify model_type="gptq".

It can also be used with LangChain. Low-level APIs are not fully supported.

Documentation

Find the documentation on Read the Docs.

Config

Parameter Type Description Default
top_k int The top-k value to use for sampling 40
top_p float The top-p value to use for sampling 0.95
temperature float The temperature to use for sampling 0.8
repetition_penalty float The repetition penalty to use for sampling 1.1
last_n_tokens int The number of last tokens to use for repetition penalty 64
seed int The seed value to use for sampling tokens -1
max_new_tokens int The maximum number of new tokens to generate 256
stop List A list of sequences to stop generation when encountered None
stream bool Whether to stream the generated text False
reset bool Whether to reset the model state before generating text True
batch_size int The batch size to use for evaluating tokens in a single prompt 8
threads int The number of threads to use for evaluating tokens -1
context_length int The maximum context length to use -1
gpu_layers int The number of layers to run on GPU 0

Find the URL for the model card for GPTQ here.


Made with ❤️ by marella

Suggested labels

null

@irthomasthomas irthomasthomas added github gh tools like cli, Actions, Issues, Pages llm Large Language Models shell-script shell scripting in Bash, ZSH, POSIX etc technical-writing Links to deep technical writing and books ml-inference Running and serving ML models. llm-inference-engines Software to run inference on large language models llm-applications Topics related to practical applications of Large Language Models in various fields labels Feb 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
github gh tools like cli, Actions, Issues, Pages llm Large Language Models llm-applications Topics related to practical applications of Large Language Models in various fields llm-inference-engines Software to run inference on large language models ml-inference Running and serving ML models. shell-script shell scripting in Bash, ZSH, POSIX etc technical-writing Links to deep technical writing and books
Projects
None yet
Development

No branches or pull requests

1 participant