Welcome to the LLM & Semantic Search Playground! This repository is a hands-on guide and collection of scripts demonstrating how to interact with various Large Language Models (LLMs) and Embedding Models. It showcases the foundational concepts behind modern AI applications like RAG (Retrieval-Augmented Generation).
This project explores:
- Invoking different LLMs (Open-Source vs. Closed-Source).
- Using models via APIs (like OpenAI, Gemini) vs. running them locally.
- Generating text embeddings to capture semantic meaning.
- Performing semantic search using cosine similarity to find the most relevant documents.
Here are some screenshots showing the key scripts in action.
1. OpenAI Chat Model (openaichatmodel.py
)
Shows a simple conversation with the GPT model via API.
2. Hugging Face Chat Model (huggingface_chatmodel_local.py
)
Demonstrates running an open-source model locally on your machine.
3. Generating Embeddings (embedding_openai_docs.py
)
Displays the numerical vector representations (embeddings) of text documents.
4. Semantic Search (document_similarity.py
)
Shows the script taking a query and finding the most relevant document using cosine similarity.
This repository is structured around three key concepts of modern AI development:
-
🤖 LLM & Chat Model Invocation:
- Closed-Source (API-based): Scripts to interact with powerful models like OpenAI's GPT and Google's Gemini through their APIs.
- Open-Source (Local & API): Examples of running open-source models from platforms like Hugging Face and DeepSeek, both by downloading them locally and using their APIs.
-
🔍 Text Embedding Generation:
- Demonstrates how to convert text (documents and user queries) into numerical vectors (embeddings) using models like OpenAI's
text-embedding-ada-002
and local Hugging Face models. These embeddings capture the meaning of the text, not just the keywords.
- Demonstrates how to convert text (documents and user queries) into numerical vectors (embeddings) using models like OpenAI's
-
💡 Semantic Search:
- A practical mini-project (
document_similarity.py
) that shows how embeddings are used. It converts a user's query into a vector and uses cosine similarity to find the most contextually relevant document from a knowledge base. This is the core mechanism behind vector databases and RAG applications.
- A practical mini-project (
- Core Framework: LangChain
- LLM/Chat Model Providers: OpenAI, Google (Gemini), Hugging Face, DeepSeek
- Embedding Models: OpenAI, Sentence-Transformers (from Hugging Face)
- Core Libraries:
langchain
,python-dotenv
,numpy
,scikit-learn
- Local Model Execution:
transformers
,torch
-
Clone the repository:
git clone [https://github.com/jsonusuman351/langchainmodel.git](https://github.com/jsonusuman351/langchainmodel.git) cd langchainmodel
-
Create and activate a virtual environment:
# It is recommended to use Python 3.10 or higher python -m venv venv .\venv\Scripts\activate
-
Install the required packages:
pip install -r requirements.txt
-
Set Up Environment Variables: To use API-based models (OpenAI, Gemini, etc.), you need to provide your API keys.
- Create a file named
.env
in the root directory of the project. - Add your keys to this file like so:
OPENAI_API_KEY="your-openai-api-key" GOOGLE_API_KEY="your-google-api-key" HF_TOKEN="your-huggingface-api-key" DEEPSEEK_API_KEY="your-deepseek-api-key"
- Create a file named
This repository is a collection of standalone scripts. You can run each one to explore a specific concept.
These scripts show how to get responses from different models.
- OpenAI Chat Model (API):
python chatModels/openaichatmodel.py
- Google Gemini Chat Model (API):
python chatModels/gemini_chatmodel.py
- Hugging Face Chat Model (Local Download):
Note: This will download the model (~1.5 GB) the first time you run it.
python chatModels/huggingface_chatmodel_local.py
- DeepSeek Chat Model (API):
python chatModels/deepseekchatmodel.py
These scripts demonstrate how to convert text into vector embeddings.
- Using OpenAI's API to embed documents:
python EmbeddedModels/embedding_openai_docs.py
- Using a local Hugging Face model to embed text:
Note: This will download the embedding model the first time you run it.
python EmbeddedModels/embedding_hf_local.py
This script is a complete example of using embeddings for semantic search. It takes a user query, finds the most similar document from a list, and returns it.
- Run the semantic search demo:
Example Interaction:
python document_similarity.py
(D:\Projects\langchainmodel\venv) D:\Projects\langchainmodel>python document_similarity.py tell me about narendra modi Narendra Modi is the Prime Minister of India known for his charismatic leadership and economic reforms. similarity score is: 0.6063302711097277
Click to view the folder structure
langchainmodel/
│
├── LLMs/ # Scripts for basic LLMs
│ └── llm_demo.py
│
├── chatModels/ # Scripts for various chat models
│ ├── openaichatmodel.py # (OpenAI API)
│ ├── gemini_chatmodel.py # (Google Gemini API)
│ ├── huggingface_chatmodel_local.py # (Local open-source model)
│ └── ...
│
├── EmbeddedModels/ # Scripts for text embedding models
│ ├── embedding_openai_docs.py # (Using OpenAI API)
│ └── embedding_hf_local.py # (Using local open-source model)
│
├── document_similarity.py # Mini-project for semantic search
├── requirements.txt
├── .env # ( create this for API keys)
└── README.md