alpaca_lora_4bit_readme

Русская версия

Just a simple HowTo for https://github.com/johnsmith0031/alpaca_lora_4bit

Created on 22.03.2023

This HowTo file can be updated in the future

Everything was tested on Windows 10 22H2 in WSL. For Linux it all should be similar

Pre-requisites:

Activate WSL 2.0. Consult here - https://learn.microsoft.com/en-US/windows/wsl/install
Install Ubuntu 22.04.2LTS (probably any Ubuntu will do)
NVIDIA GPU Drivers + CUDA Toolkit 11.7 + CUDA Toolkit 11.7 WSL Ubuntu
Miniconda for Linux - https://docs.conda.io/en/latest/miniconda.html

NVidia CUDA Toolkit fix for bitsandbytes

Make a script (or take it from here) to recreate symlinks for the CUDA libraries - https://forums.developer.nvidia.com/t/wsl2-libcuda-so-and-libcuda-so-1-should-be-symlink/236301

#!/bin/bash
cd /usr/lib/wsl/lib
rm libcuda.so libcuda.so.1
ln -s libcuda.so.1.1 libcuda.so.1
ln -s libcuda.so.1 libcuda.so
ldconfig

Save it as fix_cuda.sh in $HOME directory
Change permission to executable

chmod u+x $HOME/fix_cuda.sh

Make sudo command execution passwordless

sudo visudo

In editor change line

%sudo   ALL=(ALL:ALL) ALL

to

%sudo   ALL=(ALL:ALL) NOPASSWD:ALL

Save file (Ctrl+O) and exit (Ctrl+X)

To check if everything works as intended run sudo -ll. Command has to execute without prompting for password

Automate fix for each login

echo 'sudo $HOME/fix_cuda.sh' >> ~/.bashrc

After installation of CUDA Toolkit for WSL Ubuntu one has to edit two files:

/etc/environment to add at the end of the PATH= string :/usr/local/cuda-11.7/bin
/etc/ld.so.conf.d/cuda-11-7.conf to add at the end of the file additional line /usr/local/cuda-11.7/lib64 Thankfully these changes seems to be permanent

Installation:

1. Create new conda environment

conda update -n base conda
conda create -n <YOUR_ENV_NAME_HERE> python=3.10
# The following two lines are optional to speed up installation process of prerequisites
# More here - https://www.anaconda.com/blog/a-faster-conda-for-a-growing-community
conda install -n base conda-libmamba-solver
conda config --set solver libmamba

Activate newly created environment:

conda activate <YOUR_ENV_NAME_HERE>

2. Install prerequisites

conda install pytorch torchvision torchaudio pytorch-cuda=11.7 -c pytorch -c nvidia

try this first...

conda install -c conda-forge cudatoolkit=11.7

...if it doesn't work for you, then try this

conda install -c conda-forge cudatoolkit-dev=11.7

conda install -c conda-forge ninja
conda install -c conda-forge accelerate
conda install -c conda-forge sentencepiece
# For oobabooga/text-generation-webui
conda install -c conda-forge gradio
conda install markdown
# For finetuning
conda install datasets -c conda-forge

3. Clone `alpaca_lora_4bit`

git clone https://github.com/johnsmith0031/alpaca_lora_4bit
cd alpaca_lora_4bit
pip install -r requirements.txt
git clone https://github.com/oobabooga/text-generation-webui.git text-generation-webui-tmp
mv -f text-generation-webui-tmp/{.,}* text-generation-webui/
rmdir text-generation-webui-tmp

4. Get model

GPTQv2 models:

llama-7b:

llama-13b:

llama-30b:

https://huggingface.co/Neko-Institute-of-Science/LLaMA-30B-4bit-128g

llama-65b:

https://huggingface.co/Neko-Institute-of-Science/LLaMA-65B-4bit-128g

GPTQv1 models (legacy):

llama-7b - https://huggingface.co/decapoda-research/llama-7b-hf-int4
llama-13b - https://huggingface.co/decapoda-research/llama-13b-hf-int4
llama-30b - https://huggingface.co/decapoda-research/llama-30b-hf-int4
llama-65b - https://huggingface.co/decapoda-research/llama-65b-hf-int4

# Navigate to text-generation-webui dir:
cd text-generation-webui
# Download quantized model
python download-model.py --text-only decapoda-research/llama-13b-hf
mv models/llama-13b-hf ../llama-13b-4bit
wget https://huggingface.co/decapoda-research/llama-13b-hf-int4/resolve/main/llama-13b-4bit.pt ../llama-13b-4bit.pt

5. Get LoRA

Comprehensive list of LoRAs - https://github.com/tloen/alpaca-lora#resources

# Download LoRA and place it where the custom_monkey patch expects it to be
python download-model.py samwit/alpaca13B-lora
mv loras/alpaca13B-lora ../alpaca13b_lora

6. Use model for inference

Edit server.py. Add at the top of the file this code:

import custom_monkey_patch # apply monkey patch
import gc

Fix paths to autograd_4bit facilities for custom_monkey_patch

ln -s ../autograd_4bit.py ./autograd_4bit.py
ln -s ../matmul_utils_4bit.py matmul_utils_4bit.py
ln -s ../triton_utils.py triton_utils.py
ln -s ../custom_autotune.py custom_autotune.py

Edit custom_monkey_patch.py to be able to load GPTQv2 models

Important:

groupsize has to be the same as was used during model creation. In the example below it's for size 128. If the model was created without --groupsize argument, then value must be -1
LoRA modules produced for GPTQv1 models can produce garbage output

-    config_path = '../llama-13b-4bit/'
-    model_path = '../llama-13b-4bit.pt'
-    lora_path = '../alpaca13b_lora/'
+    config_path = '/path/to/model/config'
+    model_path = '/path/to/model.safetensors'
+    lora_path = '/path/to/lora'
+
+    autograd_4bit.switch_backend_to('triton')

     print("Loading {} ...".format(model_path))
     t0 = time.time()

-    model, tokenizer = load_llama_model_4bit_low_ram(config_path, model_path, groupsize=-1, is_v1_model=True)
+    model, tokenizer = load_llama_model_4bit_low_ram(config_path, model_path, groupsize=128, is_v1_model=False)

Start WebUI

python server.py

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
LICENSE		LICENSE
README-RU.md		README-RU.md
README.md		README.md
fix_cuda.sh		fix_cuda.sh
install.sh		install.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

alpaca_lora_4bit_readme

Pre-requisites:

NVidia CUDA Toolkit fix for bitsandbytes

Installation:

1. Create new conda environment

2. Install prerequisites

3. Clone `alpaca_lora_4bit`

4. Get model

GPTQv2 models:

GPTQv1 models (legacy):

5. Get LoRA

6. Use model for inference

About

Releases

Packages

Languages

License

s4rduk4r/alpaca_lora_4bit_readme

Folders and files

Latest commit

History

Repository files navigation

alpaca_lora_4bit_readme

Pre-requisites:

NVidia CUDA Toolkit fix for bitsandbytes

Installation:

1. Create new conda environment

2. Install prerequisites

3. Clone alpaca_lora_4bit

4. Get model

GPTQv2 models:

GPTQv1 models (legacy):

5. Get LoRA

6. Use model for inference

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

3. Clone `alpaca_lora_4bit`

Packages