Skip to content

Commit

Permalink
Upload assets
Browse files Browse the repository at this point in the history
  • Loading branch information
kennethleungty committed Jul 11, 2023
1 parent cb62ece commit a5b0fc4
Show file tree
Hide file tree
Showing 2 changed files with 14 additions and 2 deletions.
16 changes: 14 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,13 +1,25 @@
# Running Open-Source LLMs on CPU Inference for Document Q&A

### Clearly explained step-by-step guide on using C Transformers, GGML, and LangChain for running LLM Python applications on CPU instances

**Link to TowardsDataScience article**: *Coming Soon*
___
## Context

- Third-party commercial large language model (LLM) providers like OpenAI's GPT4 have democratized LLM use via simple API calls.
- However, there are instances where teams would require self-managed or private model deployment for reasons like data privacy and residency rules.
- The proliferation of open-source LLMs has opened up a vast range of options for us, thus reducing our reliance on these third-party providers. 
- When we host open-source LLMs locally on-premise or in the cloud, the dedicated compute capacity becomes a key issue. While GPU instances may seem the obvious choice, the costs can easily skyrocket beyond budget.
- In this project, we will discover how to run quantized versions of open-source LLMs on local CPU inference for document question-and-answer (Q&A).
<br><br>
![Alt text](assets/document_qa_flowchart.png)

___
## Tools

- LangChain:
- C Transformers: Python
- FAISS:
- Sentence-Transformers (all-MiniLM-L6-v2):
- MPT-7B-Instruct (LLM):

___
## Files
Expand Down
Binary file modified assets/document_qa_flowchart.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit a5b0fc4

Please sign in to comment.