Releases: vllm-project/guidellm
GuideLLM v0.3.0
GuideLLM v0.3.0
Overview
A major (non-semantic versioning sense) release introducing the GuideLLM web UI, containerized benchmarking, dataset preprocessing, and significant workflow improvements. This release transitions the project from the Neural Magic organization into the vLLM project ecosystem while expanding benchmarking capabilities and improving developer experience.
To get started, install with:
pip install guidellm==0.3.0
Or from source with:
pip install git+https://github.com/vllm-project/guidellm.git@v0.3.0
What's New
- GuideLLM Web UI: Complete frontend interface with interactive charts and data visualization for benchmark results
- Dataset Preprocessing: New preprocess command to filter datasets by token distribution and save to local files or Hugging Face Hub
- Containerized Benchmarking: Docker support with configurable environment variables for streamlined deployment
- Benchmark Scenarios: Support for file-based benchmark configuration with Pydantic validation
- HTML Report Generation: Static HTML reports with embedded visualization data
What's Changed
- Project Migration: Transitioned from neuralmagic to vllm-project GitHub organization with updated links and branding
- Improved Scheduling: Unified RPS and concurrent scheduler paths for better multi-turn conversation support
- Enhanced OpenAI Backend: Added support for custom headers, SSL verification control, query parameters, and request body modifications
- Development Workflow: Streamlined CI/CD with unified test execution, pre-commit improvements, and artifact management
- Synthetic Data Generator: Added prefix caching controls and unique prompt generation
What's Fixed
- Metric Calculation: Fixed double-counting issues in token calculations and concurrency change events
- Event Loop Errors: Resolved "Event loop Closed" errors in HTTP client connection pooling
- Token Counting: Fixed max token limits in synthetic data generator and first decode token counting
- Display Issues: Corrected metric units display and Firefox compatibility for web UI
Compatibility Notes
- Python: 3.9–3.13
- OS: Linux and macOS
- Dependencies: Updated to latest Pydantic, locked Click to support Python 3.9
- Breaking: Removed several UI workflow components and husky pre-commit hooks
- Breaking: Updated project URLs from vllm-project to neuralmagic organization
New Contributors
- @chewong made their first contribution in #168
- @dagrayvid made their first contribution in #173
- @TomerG711 made their first contribution in #162
- @wangchen615 made their first contribution in #123
- @kyolebu made their first contribution in #207
- @rymc made their first contribution in #223
- @jaredoconnell made their first contribution in #185
- @natoscott made their first contribution in #231
- @kdelee made their first contribution in #230
- @Harshith-umesh made their first contribution in #240
- @tjandy98 made their first contribution in #256
- @tukwila made their first contribution in #302
Changelog
Major Features
- #169: Implement complete GuideLLM UI with interactive charts and Redux state management
- #162: Add dataset preprocessing command with HuggingFace integration
- #123: Add containerized benchmarking support with Docker configuration
- #99: Add support for benchmark scenarios with Pydantic validation
- #218: Implement HTML output generation with embedded data
Infrastructure & Workflows
- #233: Unify RPS and concurrent scheduler paths for improved performance
- #215: Complete UI build pipeline and GitHub Pages workflows
- #231: Migrate project from vllm-project to neuralmagic organization
- #190: Add container build jobs to all workflows
Backend Improvements
- #230: Add CLI options for custom headers and SSL verification
- #146: Allow extra query parameters for OpenAI server requests
- #184: Add remove_from_body parameter to OpenAIHTTPBackend
- #183: Add prefix caching controls to synthetic dataset generator
Bug Fixes & Quality
- #266: Fix metric accumulation errors at extreme concurrency changes
- #188: Fix "Event loop Closed" error in HTTP client pooling
- #173: Fix double counting of tokens and warmup percentage calculation
- #170: Fix max token limits in synthetic data generator
Developer Experience
GuideLLM v0.2.1
Summary
- Bug fixes enabling HF datasets and local data files for benchmarking that were resulting in crashes due to improper calls into datasets load_dataset function
- Refactored CI/CD system based on the latest standards for releases
What's Changed
- Update version on main to 0.3.0 to begin work on the next release by @markurtz in #127
- Fix python versions for display in README.md by @markurtz in #128
- Fix logging by @hhy3 in #129
- Data and request fixes for real data / chat_completions pathways by @markurtz in #131
- Fix argument error in nightly unit tests by @sjmonson in #132
- Add docs for data/datasets and how to configure them in GuideLLM by @markurtz in #137
- Refactor CI/CD system based on latest standardization for upstreams by @markurtz in #135
New Contributors
Full Changelog: v0.2.0...v0.2.1
GuideLLM v0.2.0
Summary
- Minimal Execution Overheads
- Refactor enabling async multi-process/threaded design with just 0.16% overhead in synchronous and 99.9% accuracy for constant requests
- Robust Accuracy + Monitoring
- Built-in timings and diagnostics added to validate performance and catch regressions
- Flexible Benchmarking Profiles
- Prebuilt support for synchronous, concurrent (added), throughput, constant rate, poisson rate, and sweep modes
- Unified Input/Output Formats
- JSON, YAML, CSV, and console output now standardized
- Multi-Use Data Loaders
- Native support for HuggingFace datasets, file-based data, and synthetic samples with fixes for previous flows and expanded support
- Pluggable Backends via OpenAI-Compatible APIs
- Redeisgned to work out of the box with OpenAI style HTTP servers, easily expandable to other interfaces and servers. Fixed issues related to improper token lengths and more
What's Changed
- Add summary metrics to saved json file by @anmarques in #46
- ADD TGI docs by @philschmid in #43
- Add missing vllm docs link by @eldarkurtic in #50
- Change default "role" from "system" to "user" by @philschmid in #53
- FIX TGI example by @philschmid in #51
- Revert Summary Metrics and Expand Test Coverage to Stabilize Nightly/Main CI by @markurtz in #58
- [Dataset]: Iterate through benchmark dataset once by @parfeniukink in #48
- Replace busy wait in async loop with a Semaphore by @sjmonson in #80
- Add backend_kwargs to generate_benchmark_report by @jackcook in #78
- Drop request count check from throughput sweep profile by @sjmonson in #89
- Rework Backend to Native HTTP Requests and Enhance API Compatibility & Performance by @markurtz in #91
- Multi Process Scheduler Implementation, Benchmarker, and Report Generation Refactor by @markurtz in #96
- Update the README by @sjmonson in #112
- Fix units for Req Latency in output to seconds by @smalleni in #113
- Fix/non integer rates by @thameem-abbas in #116
- Output support expansion, code hygiene, and tests by @markurtz in #117
- Bump min python to 3.9 by @sjmonson in #121
- v0.2.0 Version Update and Docs Expansions by @markurtz in #118
- Fix issue if async task count does not evenly divide accross process pool by @sjmonson in #120
- Readme grammar updates and cleanup by @markurtz in #124
- Update CICD flows to enable automated releases and match the feature set laid out in #56 by @markurtz in #125
- CI/CD Build Fixes for Release by @markurtz in #126
New Contributors
- @anmarques made their first contribution in #46
- @philschmid made their first contribution in #43
- @eldarkurtic made their first contribution in #50
- @sjmonson made their first contribution in #80
- @jackcook made their first contribution in #78
- @smalleni made their first contribution in #113
- @thameem-abbas made their first contribution in #116
Full Changelog: v0.1.0...v0.2.0
GuideLLM v0.1.0
What's Changed
Initial release of GuideLLM with version 0.1.0! This core release adds the basic structure, infrastructure, and code for benchmarking LLM deployments across several different use cases utilizing a CLI interface and terminal output. Further improvements are coming soon!
- Support added for general OpenAI backends and any text-input-based model served through those
- Support added for emulated, transformers, and file-based datasets
- Support added for general file storage of the full benchmark/evaluation that was run
- Full support for different benchmark types including sweeps, synchronous, throughput, constant, and poison enabled through new scheduler and executor interfaces built on top of Python's asyncio
New Contributors
- @DaltheCow made their first contribution in #4
- @markurtz made their first contribution in #3
- @rgreenberg1 made their first contribution in #21
- @jennyyangyi-magic made their first contribution in #35
Full Changelog: https://github.com/neuralmagic/guidellm/commits/v0.1.0