The official evaluation suite and dynamic data release for MixEval.
-
Updated
Nov 10, 2024 - Python
The official evaluation suite and dynamic data release for MixEval.
Python Multi-Process Execution Pool: concurrent asynchronous execution pool with custom resource constraints (memory, timeouts, affinity, CPU cores and caching), load balancing and profiling capabilities of the external apps on NUMA architecture
MLOS is a project to enable autotuning for systems.
NPBench - A Benchmarking Suite for High-Performance NumPy
A toolkit for auto-generation of OpenAI Gym environments from RDDL description files.
Arline Benchmarks platform allows to benchmark various algorithms for quantum circuit mapping/compression against each other on a list of predefined hardware types and target circuit classes
Benchmarking machine learning inferencing on embedded hardware.
Telco pIPeline benchmarking SYstem
Benchmarking framework for Feature Selection and Feature Ranking algorithms 🚀
Framework for benchmarking deep learning operators for Apache MXNet
Deterministic runtime for agent evaluation
A framework for benchmarking in python
Crossbar Parasitics Simulator – A tool for benchmarking parasitic resistance models in RRAM crossbars and evaluating neural networks under realistic hardware constraints.
STELLAR: A Search-Based Testing Framework for Large Language Model Applications" (SANER 2026)
PARROT (Performance Assessment of Reasoning and Responses On Trivia) is a novel benchmarking framework designed to evaluate Large Language Models (LLMs) on real-world, complex, and ambiguous QA tasks.
How To Measure And Improve Code Efficiency with Pytest Benchmark (The Ultimate Guide)
A modular research framework engineered to benchmark CNN models across multiple sign language datasets. Featuring a scalable architecture (Factory Pattern), optimized HSV-based hand segmentation, and real-time inference capabilities for edge deployment.
A lightweight benchmarking and visualization framework to analyze long-context failures in large language models (LLMs) using synthetic datasets, retrieval-augmented methods, and evaluation metrics.
🌐 Evaluate LVLMs' ability to reconstruct dynamic, interactive webpages from user interaction videos with the IWR-Bench benchmark.
Add a description, image, and links to the benchmarking-framework topic page so that developers can more easily learn about it.
To associate your repository with the benchmarking-framework topic, visit your repo's landing page and select "manage topics."