DevQualityEval: An evaluation benchmark 📈 and framework to compare and evolve the quality of code generation of LLMs.
-
Updated
May 15, 2025 - Go
DevQualityEval: An evaluation benchmark 📈 and framework to compare and evolve the quality of code generation of LLMs.
The DataSnack AI Agent Evaluator is a CLI tool that automates the testing of AI agents by generating test prompts, creating test documents, and evaluating basic functionality, consistency, and vulnerability to data leakage and prompt injection attacks.
Add a description, image, and links to the evaluation-framework topic page so that developers can more easily learn about it.
To associate your repository with the evaluation-framework topic, visit your repo's landing page and select "manage topics."