This project implements a Question Answering (QA) chatbot using FastAPI as the backend framework. It integrates Pinecone for vector-based similarity search and OpenAI for language model embeddings and responses. The bot is capable of:
- Extracting and embedding text from PDFs.
- Storing embeddings in a Pinecone vector database.
- Answering user queries by retrieving relevant context from the vector database.
- Supporting CORS to allow requests from different origins.
- FastAPI: For building the REST API.
- Pinecone: Vector database for storing and retrieving embeddings.
- OpenAI API: For text embeddings and chat model (GPT-3.5-turbo).
- PyPDF2: For extracting text from PDF files.
- LangChain: For managing embeddings, text splitting, and retrieval workflows.
- Uvicorn: For running the FastAPI application.
- PDF Text Extraction: Extracts and splits text into chunks for efficient embedding.
- Vector Storage: Uses Pinecone to store and manage text embeddings.
- Contextual Question Answering: Matches user queries to relevant chunks in the database and provides answers using GPT-3.5-turbo.
- CORS Support: Configured to allow cross-origin requests, enabling integration with frontend applications.
- Python 3.8+
- API keys for OpenAI and Pinecone.
- Pip for dependency management.
-
Clone the repository:
git clone <repository-url> cd <repository-folder>
-
Create a virtual environment:
python -m venv venv source venv/bin/activate # On Windows, use `venv\Scripts\activate`
-
Install dependencies:
pip install -r requirements.txt
-
Set up environment variables: Create a
.envfile in the project root:OPENAI_API_KEY=<paste your key here> PINECONE_API_KEY=<paste your key here>
Start the FastAPI server:
uvicorn main:app --reload
The API will be accessible at http://127.0.0.1:8000.
Endpoint: GET /
- Description: Verifies that the server is running.
- Response:
{ "HomePage": "HomePage" }
Endpoint: POST /chat
- Description: Fromate for handles user queries and provides answers.
- Request Body:
{ "query": "<Your question here>" } - Response:
{ "query": "<Your question>", "answer": "<Generated answer>" }
The backend includes CORS middleware configured for seamless integration with frontend applications. Ensure the origins list in the CORSMiddleware settings includes your frontend's URL:
origins = [
"http://localhost:3000", # Local frontend
]- CORS Errors: Ensure proper configuration of CORS middleware to match frontend requests.
- PDF Parsing Limitations: PyPDF2 may not extract text accurately from certain PDF formats.
This project is licensed under the MIT License. See the LICENSE file for details.
For queries contact:
- Email: [sharadnaik001@gmail.com]