This project integrates LangFlow as a backend API with a Streamlit frontend for a chatbot interface. It also includes RAGAS evaluation for measuring the performance of RAG (Retrieval-Augmented Generation) pipelines.
project/
├── api/ # FastAPI server that connects to LangFlow
├── chatbot/ # Streamlit application
├── evaluation/ # RAGAS evaluation tools
├── data/ # Data storage for evaluation
├── .env # Environment variables
├── requirements.txt # Project dependencies
├── docker-compose.yml # Docker configuration
- Docker and Docker Compose
- Python 3.10+
-
Clone this repository
-
Create a virtual environment and install dependencies:
python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate pip install -r requirements.txt
-
Create a
.env
file with the following content:LANGFLOW_API_URL=http://localhost:7860 API_PORT=8000 DEBUG=True
The easiest way to run the entire stack is with Docker Compose:
docker-compose up -d
This will start:
- LangFlow on port 7860
- The API server on port 8000
- The Streamlit UI on port 8501
-
Start LangFlow:
docker run -p 7860:7860 logspace/langflow:latest
-
Start the API server:
uvicorn api.app:app --reload
-
Start the Streamlit application:
streamlit run chatbot/app.py
-
Start the evaluation dashboard:
streamlit run evaluation/metrics.py
- Access LangFlow at http://localhost:7860
- Create a new flow using the drag-and-drop interface
- Set up your RAG pipeline with appropriate components:
- Document loaders
- Vector stores
- LLM models
- Chain components
- Deploy your flow
- Access the Streamlit UI at http://localhost:8501
- Select your flow from the dropdown in the sidebar
- Start chatting with your LangFlow-powered application
- Access the evaluation dashboard at http://localhost:8501/evaluation
- Select the flow you want to evaluate
- Click "Run Evaluation" to test your flow with RAGAS metrics
- Review the results and optimize your flow accordingly
This project uses RAGAS to evaluate the performance of your RAG pipelines with the following metrics:
- Faithfulness: Measures how factually consistent the generated answer is with the retrieved context
- Answer Relevancy: Evaluates whether the answer addresses the question
- Context Relevancy: Assesses the quality of retrieval - how relevant the retrieved context is to the question
- Context Recall: Measures how well the retrieved context covers the information needed to answer the question
- Harmfulness: Evaluates the safety of the generated response
Edit or replace the data/questions.json
file with your domain-specific questions and ground truth answers.
The API is built with FastAPI, making it easy to add new endpoints:
- Open
api/app.py
- Add new route functions using the FastAPI decorator syntax
- Implement your endpoint logic
The Streamlit UI can be customized:
- Edit
chatbot/app.py
to adjust the main application flow - Modify components in the
chatbot/components/
directory - Add new utility functions as needed
The evaluation dashboard provides valuable insights into your RAG pipeline performance:
- Metric comparison across different flows
- Historical performance tracking
- Detailed view of evaluation results
- Areas for improvement identification
Use these insights to iteratively improve your LangFlow pipelines:
- Identify metrics with lower scores
- Adjust relevant components in your flow
- Re-run evaluations to measure improvement
- Repeat until satisfactory performance is achieved