A FastAPI based application that allows for generating text using an LLM (large language model) using the exllamav2 backend.
This API is still in the alpha phase. There may be bugs and changes down the line. Please be aware that you might need to reinstall dependencies if needed.
To get started, make sure you have the following installed on your system:
-
Python 3.x (preferably 3.11) with pip
-
CUDA 12.1 or 11.8
NOTE: For Flash Attention 2 to work on Windows, CUDA 12.1 must be installed!
-
Clone this repository to your machine:
git clone https://github.com/theroyallab/tabbyAPI
-
Navigate to the project directory:
cd tabbyAPI
-
Create a virtual environment:
-
python -m venv venv
-
On Windows:
.\venv\Scripts\activate
. On Linux:source venv/bin/activate
-
-
Install torch using the instructions found here
-
Install an exllamav2 wheel from here:
- Find the version that corresponds with your cuda and python version. For example, a wheel with
cu121
andcp311
corresponds to CUDA 12.1 and python 3.11
- Find the version that corresponds with your cuda and python version. For example, a wheel with
-
Install the other requirements via:
pip install -r requirements.txt
Copy over config_sample.yml
to config.yml
. All the fields are commented, so make sure to read the descriptions and comment out or remove fields that you don't need.
-
Make sure you are in the project directory and entered into the venv
-
Run the tabbyAPI application:
python main.py
Docs can be accessed once you launch the API at http://<your-IP>:<your-port>/docs
If you use the default YAML config, it's accessible at http://localhost:5000/docs
TabbyAPI uses an API key and admin key to authenticate a user's request. On first launch of the API, a file called api_tokens.yml
will be generated with fields for the admin and API keys.
If you feel that the keys have been compromised, delete api_tokens.yml
and the API will generate new keys for you.
API keys and admin keys can be provided via:
-
x-api-key
andx-admin-key
respectively -
Authorization
with theBearer
prefix
DO NOT share your admin key unless you want someone else to load/unload a model from your system!
All routes require an API key except for the following which require an admin key
-
/v1/model/load
-
/v1/model/unload
If you have issues with the project:
-
Describe the issues in detail
-
If you have a feature request, please indicate it as such.
If you have a Pull Request
- Describe the pull request in detail, what, and why you are changing something
Creators/Developers:
-
kingbri
-
Splice86
-
Turboderp