A GenAI-powered customer service chatbot designed for an e-commerce clothing company. Built with LangChain, Pinecone, Groq, llama3.3 70b model. The chatbot provides product recommendations, processes orders, tracks shipments, and remembers past conversations for a seamless user experience. The full pipeline β from data collection to chatbot deployment β is automated and orchestrated using Apache Airflow, enabling scalable and production-grade MLOps workflows.
- The first step in our project was collecting real world product data from Amazon.
- Implemented automated web scraping using Selenium to extract product information from Amazon.
- Targeted different product categories including:
- Formal Shirts for men
- Sarees for women
- Watches for men
- For each category the below details has been collected:
- Brand name
- Product name
- Rating
- Rating counts
- Selling Price
- MRP (original price)
- Offer percentage
- Performed thorough data cleaning and preprocessing on the collected dataset
- Handled missing values in ratings, rating counts, and other relevant columns
- Applied mode imputation to replace missing values as most of the columns are categorical
- Leveraged NVIDIA's embedding model "nv-embedqa-mistral-7b-v2" for vector embeddings
- Selected this model based on its top performance on the MTEB leaderboard
- Implemented embedding generation through the LangChain framework
- Transferred generated embeddings to Pinecone, Pinecone is a purpose-built vector database for AI applications
- First created a Pinecone index using Python and
- Uploaded the embeddings to the index to enable semantic search capabilities
- Integrated "llama-3.3-70b-versatile" model via Groq through the LangChain framework
- Selected Groq for its significant enhancements in computational efficiency and response speed
- Developed optimized prompts with specific instructions and guidelines to maximize model performance and response
- Setup the Pinecone vector store as an retriever
- Created document chain after LLM and prompt configuration
- Created retrieval chain utilizing both the retriever and document chain
- Finally our fully functional RAG-based chatbot system is to be used
- Developed a web interface using Flask framework
- Created an e-commerce website with integrated chatbot functionality using HTML and CSS
- Handled the chatbot receive and response part through javascript.
- The final result delivers a user experience that is similar in a interaction with a customer service representative of a clothing company
- Integrated Apache Airflow to orchestrate the complete data pipeline.
- Each pipeline stage is defined as a task:
- Data Collection DAG: Scrapes product data from Amazon using Selenium.
- Data Cleaning DAG: Cleans and preprocesses raw product data.
- Vector Store Builder DAG: Embeds product data and stores it in Pinecone.
- Chatbot Builder DAG: Builds and updates the chatbot using LLaMA and LangChain.
- The pipeline runs daily at a scheduled time automatically as a result the chatbot gets trained with new product data.
- Enables better automation, monitoring, retry handling.
- MLOps Orchestration: Automates and monitors the entire pipeline with Apache Airflow.
- Product Recommendations: Suggests products based on user queries and budget.
- Order Processing: Handles multiple items, calculates totals, and generates order confirmations.
- Order Tracking: Provides real-time order status updates.
- Conversational Memory: Retains chat history using LangGraph for better interactions.
- Efficient Retrieval: Uses Pinecone for fast, relevant document retrieval.
- Python
- Flask (Flask for Web Interface)
- Apache Airflow (MLOps pipeline orchestration)
- Selenium (For Webscraping amazon website)
- LangChain (LLM integration & retrieval-augmented generation)
- Pinecone (Vector database for retrieval)
- GROQ API (GROQ for accessing Llama 3.3 70b model)
- HTML & CSS (Frontend for chatbot UI)
/πEcommerce-Chatbot-Project
βββ /πdags # dag pipeline
β βββ pipeline.py
βββ /πartifacts # artifact files
β βββ data_cleaned.csv
βββ /πdata # Data collected from amazon
| βββ data_shirts.csv
β βββ data_sarees.csv
β βββ data_watches.csv
βββ /πreadme_images # Screenshots of the webapp
β βββ screenshot_1.png
β βββ screenshot_2.png
βββ /πsrc # Source files (core files of the project)
| βββ main.py # Running the chabot locally
| βββ /πcomponents # Main components files
| | βββ scraper.py
| | βββ data_colletion.py
| | βββ data_cleaning.py
| | βββ vectorstore_builder.py
| | βββ chatbot_builder.py
| βββ /πutils # Utilities files
| | βββ exception.py
| | βββ logger.py
| | βββ chatbot_utils.py
βββ /πstatic # Static folder
| βββ /πcss # Css files
| | βββ hp_style.css # Home page styles
| βββ /πimages # Website Images
| βββ /πjs # javascripts
βββ /πtemplates # Templates (html files)
| βββ /home_page.html
βββ .gitignore
βββ LICENCE
βββ README.md
βββ app.py # Flask backend
βββ chromedriver.exe # Chrome driver application
βββ docker-compose.yml # airflow docker container configuration
βββ dockerfile # airflow image
βββ requirements.txt # Python dependencies
βββ setup.py # Setup
git clone https://github.com/Dhanush-Raj1/Ecommerce-Chatbot-Project.git
cd Ecommerce-Chatbot-Project
conda create -p envi python==3.9 -y
source venv/bin/activate # On macOS/Linux
conda activate envi # On Windows
pip install -r requirements.txt
Create a .env
file in the root directory and add:
NVIDIA_API_KEY=your_nvidia_api_key
PINECONE_API_KEY=your_pinecone_api_key
GROQ_API_KEY=your_groq_api_key
python app.py
The app will be available at: http://127.0.0.1:5000/
docker-compose up --build
Access the Airflow UI at http://localhost:8080/ and trigger the DAGs manually or set a schedule for automation.
- Chat in natural language.
- Ask any kind of questions related to any products to the chatbot. Some products are listed in the website mention the product name or other details and ask further questions about the product.
- Make orders.
- Ask for invoice of your order.
- Ask for recommendation for example: Recommend me a shirt under the budget of rupees 1000 but above rupees 500.
- Support for more product categories
- Integration with payment gateways
- Connectivity between customers and customer service employees
- Advanced memory support with backend database connection
- Improved accuracy on product recommendations
- Multi-language support
π‘ Have an idea? Feel free to contribute or open an issue and pull requests!
This project is licensed under the MIT License β LICENSE