Skip to content

boanuge/Llama-2-Open-Source-LLM-CPU-Inference

 
 

Repository files navigation

================================================================================
Issue Solution : import langchain TypeError: issubclass() arg 1 must be a class
================================================================================
...
File "pydantic\main.py", line 198, in pydantic.main.ModelMetaclass.new
File "pydantic\fields.py", line 506, in pydantic.fields.ModelField.infer
File "pydantic\fields.py", line 436, in pydantic.fields.ModelField.init
File "pydantic\fields.py", line 552, in pydantic.fields.ModelField.prepare
File "pydantic\fields.py", line 663, in pydantic.fields.ModelField._type_analysis
File "pydantic\fields.py", line 808, in pydantic.fields.ModelField._create_sub_type
File "pydantic\fields.py", line 436, in pydantic.fields.ModelField.init
File "pydantic\fields.py", line 552, in pydantic.fields.ModelField.prepare
File "pydantic\fields.py", line 668, in pydantic.fields.ModelField.type_analysis
File "C:\ProgramData\Anaconda3\lib\typing.py", line 852, in subclasscheck
return issubclass(cls, self.origin)
TypeError: issubclass() arg 1 must be a class
PS C:\AI\ai
@_wwhss_alpha_version_orca2_13b>

First, try the following:
(base) $ pip install typing-inspect==0.8.0 typing_extensions==4.5.0
If above command is not resolve the issue, then:
(base) $ pip install pydantic -U
(base) $ pip install pydantic==1.10.11

PS C:\AI\Llama-2-Open-Source-LLM-CPU-Inference_@github.com> conda activate base
PS C:\AI\Llama-2-Open-Source-LLM-CPU-Inference
@github.com> python .\db_build.py
PS C:\AI\Llama-2-Open-Source-LLM-CPU-Inference
@_github.com> python .\main.py

Answer: Jesus is the Christ, the Son of God.

==================================================
Source Document 1
Source Text: Matthew 16:13 Now when Jesus had come into the parts of Caesarea Philippi, he said, questioning his disciples, Who do men say that the Son of man is? Matthew 16:14 And they said, Some say, John the Baptist; some, Elijah; and others, Jeremiah, or one of the prophets. Matthew 16:15 He says to them, But who do you say that I am? Matthew 16:16 And Simon Peter made answer and said, You are the Christ, the Son of the living God.
Document Name: data\data_5_bible_english_BBE.txt
============================================================
Source Document 2
Source Text: John 20:31 But these are recorded, so that you may have faith that Jesus is the Christ, the Son of God, and so that, having this faith you may have life in his name. John 21:1 After these things Jesus let himself be seen again by the disciples at the sea of Tiberias; and it came about in this way. John 21:2 Simon Peter, Thomas named Didymus, Nathanael of Cana in Galilee, the sons of Zebedee, and two others of his disciples were all together.
Document Name: data\data_5_bible_english_BBE.txt
============================================================
Time to retrieve response: 52.476544000000004
PS C:\AI\Llama-2-Open-Source-LLM-CPU-Inference_@_github.com>


Running Llama 2 and other Open-Source LLMs on CPU Inference Locally for Document Q&A

Clearly explained guide for running quantized open-source LLM applications on CPUs using LLama 2, C Transformers, GGML, and LangChain

Step-by-step guide on TowardsDataScience: https://towardsdatascience.com/running-llama-2-on-cpu-inference-for-document-q-a-3d636037a3d8


Context

  • Third-party commercial large language model (LLM) providers like OpenAI's GPT4 have democratized LLM use via simple API calls.
  • However, there are instances where teams would require self-managed or private model deployment for reasons like data privacy and residency rules.
  • The proliferation of open-source LLMs has opened up a vast range of options for us, thus reducing our reliance on these third-party providers. 
  • When we host open-source LLMs locally on-premise or in the cloud, the dedicated compute capacity becomes a key issue. While GPU instances may seem the obvious choice, the costs can easily skyrocket beyond budget.
  • In this project, we will discover how to run quantized versions of open-source LLMs on local CPU inference for document question-and-answer (Q&A).

    Alt text

Quickstart

  • Ensure you have downloaded the GGML binary file from https://huggingface.co/TheBloke/Llama-2-7B-Chat-GGML and placed it into the models/ folder
  • To start parsing user queries into the application, launch the terminal from the project directory and run the following command: poetry run python main.py "<user query>"
  • For example, poetry run python main.py "What is the minimum guarantee payable by Adidas?"
  • Note: Omit the prepended poetry run if you are NOT using Poetry

    Alt text

Tools

  • LangChain: Framework for developing applications powered by language models
  • C Transformers: Python bindings for the Transformer models implemented in C/C++ using GGML library
  • FAISS: Open-source library for efficient similarity search and clustering of dense vectors.
  • Sentence-Transformers (all-MiniLM-L6-v2): Open-source pre-trained transformer model for embedding text to a 384-dimensional dense vector space for tasks like clustering or semantic search.
  • Llama-2-7B-Chat: Open-source fine-tuned Llama 2 model designed for chat dialogue. Leverages publicly available instruction datasets and over 1 million human annotations.
  • Poetry: Tool for dependency management and Python packaging

Files and Content

  • /assets: Images relevant to the project
  • /config: Configuration files for LLM application
  • /data: Dataset used for this project (i.e., Manchester United FC 2022 Annual Report - 177-page PDF document)
  • /models: Binary file of GGML quantized LLM model (i.e., Llama-2-7B-Chat)
  • /src: Python codes of key components of LLM application, namely llm.py, utils.py, and prompts.py
  • /vectorstore: FAISS vector store for documents
  • db_build.py: Python script to ingest dataset and generate FAISS vector store
  • main.py: Main Python script to launch the application and to pass user query via command line
  • pyproject.toml: TOML file to specify which versions of the dependencies used (Poetry)
  • requirements.txt: List of Python dependencies (and version)

References

About

Running Llama 2 and other Open-Source LLMs on CPU Inference Locally for Document Q&A

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%