Skip to content

Latest commit

 

History

History
140 lines (112 loc) · 6.49 KB

README_LINUX.md

File metadata and controls

140 lines (112 loc) · 6.49 KB

Linux

These instructions are for Ubuntu x86_64 (other linux would be similar with different command instead of apt-get).

Install:

  • First one needs a Python 3.10 environment. We recommend using Miniconda.

    Download MiniConda for Linux. After downloading, run:

    bash ./Miniconda3-py310_23.1.0-1-Linux-x86_64.sh
    # follow license agreement and add to bash if required

    Enter new shell and should also see (base) in prompt. Then, create new env:

    conda create -n h2ogpt -y
    conda activate h2ogpt
    conda install python=3.10 -c conda-forge -y

    You should see (h2ogpt) in shell prompt.

    Alternatively, on newer Ubuntu systems you can get Python 3.10 environment setup by doing:

    sudo apt-get update
    sudo apt-get install -y build-essential gcc python3.10-dev
    virtualenv -p python3 h2ogpt
    source h2ogpt/bin/activate
  • Test your python:

    python --version

    should say 3.10.xx and:

    python -c "import os, sys ; print('hello world')"

    should print hello world. Then clone:

    git clone https://github.com/h2oai/h2ogpt.git
    cd h2ogpt

    On some systems, pip still refers back to the system one, then one can use python -m pip or pip3 instead of pip or try python3 instead of python.

  • For GPU: Install CUDA ToolKit with ability to compile using nvcc for some packages like llama-cpp-python, AutoGPTQ, exllama, flash attention, TTS use of deepspeed, by going to CUDA Toolkit. E.g. CUDA 11.8 Toolkit. In order to avoid removing the original CUDA toolkit/driver you have, on NVIDIA's website, use the runfile (local) installer, and choose to not install driver or overwrite /usr/local/cuda link and just install the toolkit, and rely upon the CUDA_HOME env to point to the desired CUDA version. Then do:

    export CUDA_HOME=/usr/local/cuda-11.8

    Or if you do not plan to use packages like deepspeed in coqui's TTS or build other packages (i.e. only use binaries), you can just use the non-dev version from conda if preferred:

    conda install cudatoolkit=11.8 -c conda-forge -y
    export CUDA_HOME=$CONDA_PREFIX 

    Do not install cudatoolkit-dev as it only goes up to cuda 11.7 that is no longer supported.

  • Place the CUDA_HOME export into your ~/.bashrc or before starting h2oGPT for TTS's use of deepspeed to work.

  • Prepare to install dependencies:

    export PIP_EXTRA_INDEX_URL="https://download.pytorch.org/whl/cu118"

    Choose cu118+ for A100/H100+. Or for CPU set

    export PIP_EXTRA_INDEX_URL="https://download.pytorch.org/whl/cpu"
  • Run (bash docs/linux_install.sh)[linux_install.sh] for full normal document Q/A installation. To allow all (GPL too) packages, run:

    GPLOK=1 bash docs/linux_install.sh

One can pick and choose different optional things to install instead by commenting them out in the shell script, or edit the script if any issues. See script for notes about installation.


Run

See FAQ for many ways to run models. The below are some other examples.

Note models are stored in /home/$USER/.cache/ for chroma, huggingface, selenium, torch, weaviate, etc. directories.

  • Check that can see CUDA from Torch:

    import torch
    print(torch.cuda.is_available())

    should print True.

  • Place all documents in user_path or upload in UI (Help with UI).

    UI using GPU with at least 24GB with streaming:

    python generate.py --base_model=h2oai/h2ogpt-4096-llama2-13b-chat --load_8bit=True  --score_model=None --langchain_mode='UserData' --user_path=user_path

    Same with a smaller model without quantization:

    python generate.py --base_model=h2oai/h2ogpt-4096-llama2-7b-chat --score_model=None --langchain_mode='UserData' --user_path=user_path

    UI using LLaMa.cpp LLaMa2 model:

    python generate.py --base_model='llama' --prompt_type=llama2 --score_model=None --langchain_mode='UserData' --user_path=user_path --model_path_llama=https://huggingface.co/TheBloke/Llama-2-7b-Chat-GGUF/resolve/main/llama-2-7b-chat.Q6_K.gguf --max_seq_len=4096

    which works on CPU or GPU (assuming llama cpp python package compiled against CUDA or Metal).

    If using OpenAI for the LLM is ok, but you want documents to be parsed and embedded locally, then do:

    OPENAI_API_KEY=<key> python generate.py  --inference_server=openai_chat --base_model=gpt-3.5-turbo --score_model=None

    where <key> should be replaced by your OpenAI key that probably starts with sk-. OpenAI is not recommended for private document question-answer, but it can be a good reference for testing purposes or when privacy is not required.
    Perhaps you want better image caption performance and focus local GPU on that, then do:

    OPENAI_API_KEY=<key> python generate.py  --inference_server=openai_chat --base_model=gpt-3.5-turbo --score_model=None --captions_model=Salesforce/blip2-flan-t5-xl

    For Azure OpenAI:

     OPENAI_API_KEY=<key> python generate.py --inference_server="openai_azure_chat:<deployment_name>:<base_url>:<api_version>" --base_model=gpt-3.5-turbo --h2ocolors=False --langchain_mode=UserData

    where the entry <deployment_name> is required for Azure, others are optional and can be filled with string None or have empty input between :. Azure OpenAI is a bit safer for private access to Azure-based docs.

    Add --share=True to make gradio server visible via sharable URL.

    If you see an error about protobuf, try:

    pip install protobuf==3.20.0

See CPU and GPU for some other general aspects about using h2oGPT on CPU or GPU, such as which models to try.

Google Colab