This chatbot is powered by the LLaMA 3 model and runs entirely on CPU using ctransformers.
It is designed to be lightweight, efficient, and responsive while providing an interactive Gradio UI for seamless conversations.
✅ Runs on CPU – No GPU required, making it accessible on standard hardware
✅ Optimized with ctransformers – Faster inference on CPUs
✅ Concise & direct responses – Avoids unnecessary small talk
✅ Interactive Gradio UI – Easy-to-use web interface
✅ Maintains chat history – Context-aware responses
To run this chatbot locally, follow these steps:
Ensure you have Python 3.8+ installed, then run:
pip install gradio ctransformersYou need the LLaMA 3 GGUF model. Download it from TheBloke's Hugging Face repository. Move the .gguf model file to your project directory.
python app.pyModel Used: LLaMA 3 (8B) - Quantized (Q4_K_M)
Why CPU?: This chatbot is optimized to run without a GPU, making it accessible to more users.
Optimization: Adjusted temperature, response length, and stop tokens for more accurate answers.
