DeepEval is the leading open-source LLM evaluation framework with over 10,500 stars on GitHub.
Built by the team at Confident AI, DeepEval provides a comprehensive suite of 25+ research-backed metrics for evaluating Large Language Model applications.
From RAG systems to conversational AI, DeepEval offers metrics for faithfulness, answer relevancy, contextual precision, bias detection, toxicity screening, and much more - making it the go-to solution for developers who need reliable, human-like accuracy in their LLM evaluations.
This REST API wrapper brings DeepEval's powerful evaluation capabilities to any application through simple HTTP endpoints.
Whether you're building n8n AI agents, automated testing pipelines, or integrating LLM evaluation into existing systems, this wrapper provides an easy-to-deploy solution.
- GitHub repository with this code
- Render.com account (free tier works)
- OpenAI API key
Deploy on Render:
- Go to Render Dashboard
- Click "New" → "Web Service"
- Paste in this public GitHub repository - https://github.com/theaiautomators/deepeval-wrapper
- Render auto-detects Docker configuration
- Click "Connect"
Set Environment Variables: Add these environment variables:
LLM Provider Keys:
OPENAI_API_KEY- Your OpenAI API key (required for most metrics)ANTHROPIC_API_KEY- Optional for Claude modelsGOOGLE_API_KEY- Optional for Gemini models
Authentication:
API_KEYS- API keys for accessing the API. Set something secure here
Once deployed, your API will be available at : https://your-app.onrender.com
Visit your deployed API for interactive docs:
- Swagger UI:
https://your-app.onrender.com/docs - ReDoc:
https://your-app.onrender.com/redoc
curl -X POST "https://your-app.onrender.com/evaluate" \
-H "Content-Type: application/json" \
-H "X-API-Key: YOUR_API_KEY" \
-d '{
"test_case": {
"input": "What are the benefits of renewable energy?",
"actual_output": "I really enjoy pizza on weekends. My favorite toppings are pepperoni and mushrooms."
},
"metrics": [
{
"metric_type": "answer_relevancy",
"threshold": 0.7
}
]
}'There are two example n8n workflows in the n8n folder of the repo.
You can import these into n8n to test out some of the DeepEval metrics and how the flow would look from triggering to evaluating when testing your agents or systems.
- RAG Metrics: Faithfulness, Answer Relevancy, Contextual Precision/Recall/Relevancy
- Safety Metrics: Bias, Toxicity, Hallucination, PII Leakage
- Task Metrics: Summarization, Tool Correctness, Task Completion
- Custom: G-Eval for custom criteria
- Conversational: Turn Relevancy, Conversation Completeness
For a full list of metrics, check out https://deepeval.com/docs/metrics-introduction
This is an early version of the DeepEval API wrapper. While the core functionality works well, not all features of the DeepEval system have been fully tested in this wrapper format.
We're looking for experienced Python developers to help maintain and improve this project! If you'd like to contribute or help maintain this wrapper, please get in touch - your expertise would be greatly appreciated.
If you're interested in learning how to use and this DeepEval Wrapper for your n8n AI Agents, join our community, The AI Automators.
https://www.theaiautomators.com/
Contributions make the open-source community an amazing place to learn, inspire, and create. Any contributions you make are greatly appreciated.
- Fork the Project
- Create your Feature Branch (
git checkout -b feature/AmazingFeature) - Commit your Changes (
git commit -m 'Add some AmazingFeature') - Push to the Branch (
git push origin feature/AmazingFeature) - Open a Pull Request
This codebase is distributed under the MIT License.
