A high-performance, blazingly-fast evaluation platform for Large Language Models, built with enterprise-grade architecture and real-time capabilities. This platform enables systematic assessment of LLM performance through comprehensive test suites, sophisticated prompt management, and detailed analytics.
LLM Tournament addresses the critical challenge of evaluating and comparing language model performance at scale. Built with a focus on reliability and real-time processing, it provides a robust framework for managing complex evaluation workflows while maintaining high performance and data integrity.
Key technical highlights:
- Lightweight and blazingly-fast due to pure Go Template without any bloat, single binary
- Real-time evaluation engine powered by WebSocket
- Horizontally scalable architecture with stateless components
- Efficient data persistence layer with JSON-based storage
- Responsive frontend built on modern web standards
- 🔑 Key Features
- 🛠️ Stack
- 🖼️ UI
- 🏃 Run
- 🛠️ Develop
- 🤝 Contribute
- 📝 TODO/Roadmap
- 🏆 Badges
- 👥 Contributors
- 📜 License
- 📞 Contact
- Real-time Evaluation Engine: WebSocket-powered instant updates for results and metrics
- Modular Test Suites: Independent prompt and model configurations for different scenarios
- Comprehensive Data Management: JSON-based storage with CSV import/export capabilities
- Full Lifecycle Control: Create, edit, delete, and reorder prompts
- Rich Content Support: Markdown formatting and multiline input
- Advanced Filtering: Search by text, filter by profile and order
- Bulk Operations: Delete multiple prompts at once
- Solution Tracking: Attach reference solutions to each prompt
- Profile Association: Tag prompts with evaluation profiles
- Performance Tracking: Pass/fail results with detailed metrics
- Real-time Analytics: Scores and pass percentages updated instantly
- Flexible Filtering: View results by model or profile
- Data Portability: Import/export results in CSV format
- Evaluation Management: Reset or refresh results as needed
- Prompt Suites: Create and switch between different prompt sets
- Model Suites: Manage different model configurations
- Profile System: Define and manage evaluation profiles
- Data Integrity: Automatic backups and version control
- Responsive UI: Modern interface optimized for all devices
- Bulk Operations: Manage multiple items simultaneously
- Template System: Reuse configurations across evaluations
- Data Migration: Easy import/export of prompts and results
- Real-time Sync: Instant updates across all connected clients
- Tech: Go, WebSockets, Built-in Template, HTML, CSS, JS, and database in JSON.
- Assistant: Aider with
- free/unlimited APIs: Gemini 2.0 Advanced, Gemini 2.0 Flash, Codestral 2501, Mistral Large Latest.
- paid deepseek-3-chat API since v1.1
make run
or
./release/llm-tournament-v1.0
Then go to http://localhost:8080
Require Linux environment with Python and Go installed (preferably via Brew).
make aiderupdate
Then tweak ./.aider.conf.yml.example
into ./.aider.conf.yml
with your own API Key.
Anyone can just submit a PR and we'll discuss there.
- Make another prompt suite for vision LLMs.
- Search model.
- Order model.
- Add RAG and Web search agentic system under
./tools/ragweb_agent/
. - Update the features section about the tools.
This project is licensed under the MIT License - see the LICENSE file for details.
For any questions or suggestions or collaboration/job inquiries, feel free to reach out to us at cariyaputta@gmail.com.