Running Llama 2 and other Open-Source LLMs on CPU Inference Locally for Document Q&A

================================================================================
Issue Solution : import langchain TypeError: issubclass() arg 1 must be a class
================================================================================
...
File "pydantic\main.py", line 198, in pydantic.main.ModelMetaclass.new
File "pydantic\fields.py", line 506, in pydantic.fields.ModelField.infer
File "pydantic\fields.py", line 436, in pydantic.fields.ModelField.init
File "pydantic\fields.py", line 552, in pydantic.fields.ModelField.prepare
File "pydantic\fields.py", line 663, in pydantic.fields.ModelField._type_analysis
File "pydantic\fields.py", line 808, in pydantic.fields.ModelField._create_sub_type
File "pydantic\fields.py", line 436, in pydantic.fields.ModelField.init
File "pydantic\fields.py", line 552, in pydantic.fields.ModelField.prepare
File "pydantic\fields.py", line 668, in pydantic.fields.ModelField.type_analysis
File "C:\ProgramData\Anaconda3\lib\typing.py", line 852, in subclasscheck
return issubclass(cls, self.origin)
TypeError: issubclass() arg 1 must be a class
PS C:\AI\ai@_wwhss_alpha_version_orca2_13b>

First, try the following:
(base) $ pip install typing-inspect==0.8.0 typing_extensions==4.5.0
If above command is not resolve the issue, then:
(base) $ pip install pydantic -U
(base) $ pip install pydantic==1.10.11

@ https://www.youtube.com/watch?v=gdzeE6ys2nM

PS C:\AI\Llama-2-Open-Source-LLM-CPU-Inference_@github.com> conda activate base
PS C:\AI\Llama-2-Open-Source-LLM-CPU-Inference@github.com> python .\db_build.py
PS C:\AI\Llama-2-Open-Source-LLM-CPU-Inference@_github.com> python .\main.py

Answer: Jesus is the Christ, the Son of God.

==================================================
Source Document 1
Source Text: Matthew 16:13 Now when Jesus had come into the parts of Caesarea Philippi, he said, questioning his disciples, Who do men say that the Son of man is? Matthew 16:14 And they said, Some say, John the Baptist; some, Elijah; and others, Jeremiah, or one of the prophets. Matthew 16:15 He says to them, But who do you say that I am? Matthew 16:16 And Simon Peter made answer and said, You are the Christ, the Son of the living God.
Document Name: data\data_5_bible_english_BBE.txt
============================================================
Source Document 2
Source Text: John 20:31 But these are recorded, so that you may have faith that Jesus is the Christ, the Son of God, and so that, having this faith you may have life in his name. John 21:1 After these things Jesus let himself be seen again by the disciples at the sea of Tiberias; and it came about in this way. John 21:2 Simon Peter, Thomas named Didymus, Nathanael of Cana in Galilee, the sons of Zebedee, and two others of his disciples were all together.
Document Name: data\data_5_bible_english_BBE.txt
============================================================
Time to retrieve response: 52.476544000000004
PS C:\AI\Llama-2-Open-Source-LLM-CPU-Inference_@_github.com>

Running Llama 2 and other Open-Source LLMs on CPU Inference Locally for Document Q&A

Clearly explained guide for running quantized open-source LLM applications on CPUs using LLama 2, C Transformers, GGML, and LangChain

Step-by-step guide on TowardsDataScience: https://towardsdatascience.com/running-llama-2-on-cpu-inference-for-document-q-a-3d636037a3d8

Context

Third-party commercial large language model (LLM) providers like OpenAI's GPT4 have democratized LLM use via simple API calls.
However, there are instances where teams would require self-managed or private model deployment for reasons like data privacy and residency rules.
The proliferation of open-source LLMs has opened up a vast range of options for us, thus reducing our reliance on these third-party providers.
When we host open-source LLMs locally on-premise or in the cloud, the dedicated compute capacity becomes a key issue. While GPU instances may seem the obvious choice, the costs can easily skyrocket beyond budget.
In this project, we will discover how to run quantized versions of open-source LLMs on local CPU inference for document question-and-answer (Q&A).

Quickstart

Ensure you have downloaded the GGML binary file from https://huggingface.co/TheBloke/Llama-2-7B-Chat-GGML and placed it into the models/ folder
To start parsing user queries into the application, launch the terminal from the project directory and run the following command: poetry run python main.py "<user query>"
For example, poetry run python main.py "What is the minimum guarantee payable by Adidas?"
Note: Omit the prepended poetry run if you are NOT using Poetry

Tools

LangChain: Framework for developing applications powered by language models
C Transformers: Python bindings for the Transformer models implemented in C/C++ using GGML library
FAISS: Open-source library for efficient similarity search and clustering of dense vectors.
Sentence-Transformers (all-MiniLM-L6-v2): Open-source pre-trained transformer model for embedding text to a 384-dimensional dense vector space for tasks like clustering or semantic search.
Llama-2-7B-Chat: Open-source fine-tuned Llama 2 model designed for chat dialogue. Leverages publicly available instruction datasets and over 1 million human annotations.
Poetry: Tool for dependency management and Python packaging

Files and Content

/assets: Images relevant to the project
/config: Configuration files for LLM application
/data: Dataset used for this project (i.e., Manchester United FC 2022 Annual Report - 177-page PDF document)
/models: Binary file of GGML quantized LLM model (i.e., Llama-2-7B-Chat)
/src: Python codes of key components of LLM application, namely llm.py, utils.py, and prompts.py
/vectorstore: FAISS vector store for documents
db_build.py: Python script to ingest dataset and generate FAISS vector store
main.py: Main Python script to launch the application and to pass user query via command line
pyproject.toml: TOML file to specify which versions of the dependencies used (Poetry)
requirements.txt: List of Python dependencies (and version)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

@ https://www.youtube.com/watch?v=gdzeE6ys2nM

Answer: Jesus is the Christ, the Son of God.

Running Llama 2 and other Open-Source LLMs on CPU Inference Locally for Document Q&A

Clearly explained guide for running quantized open-source LLM applications on CPUs using LLama 2, C Transformers, GGML, and LangChain

Context

Quickstart

Tools

Files and Content

References

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 52 Commits
assets		assets
config		config
data		data
models		models
src		src
vectorstore/db_faiss		vectorstore/db_faiss
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
db_build.py		db_build.py
main.py		main.py
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

License

boanuge/Llama-2-Open-Source-LLM-CPU-Inference

Folders and files

Latest commit

History

Repository files navigation

@ https://www.youtube.com/watch?v=gdzeE6ys2nM

Answer: Jesus is the Christ, the Son of God.

Running Llama 2 and other Open-Source LLMs on CPU Inference Locally for Document Q&A

Clearly explained guide for running quantized open-source LLM applications on CPUs using LLama 2, C Transformers, GGML, and LangChain

Context

Quickstart

Tools

Files and Content

References

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages