Releases · vllm-project/aibrix

19 Aug 05:31

github-actions

v0.4.1

f0c65c2

v0.4.1 Latest

Latest

Automatically generated release for tag v0.4.1.

What's Changed

[Misc] KVCache bugfixes cherry-picks for v0.4.1 by @DwyaneShi in #1458
[Cherry-Pick] fix: align envoy pod template labels with controller selector by @omerap12 in #1462
Cherry picks #1409 #1412 #1425 #1436 #1429 #1427 #1442 #1441 to release-0.4 branch by @Jeffwan in #1468
KVCache integration cherry picks by @DwyaneShi in #1474
Cut release v0.4.1 against release-0.4 by @Jeffwan in #1478

Full Changelog: v0.4.0...v0.4.1

Contributors

Jeffwan, DwyaneShi, and omerap12

Assets 4

05 Aug 19:51

github-actions

v0.4.0

24eaefc

v0.4.0

🚀 New Features Highlights

Prefill/Decode (P/D) Disaggregation Support: Introduces StormService and RoleSet CRDs to enable fine-grained orchestration of P/D roles, along with routing to unlock disaggregated inference at scale. (#1209, #1226, #1229, #1256, #1258, #1259, #1268, #1280, #1309, #1311, #1354, #1355, #1377, #1399, #1402)
KVCache V1 Connector Optimizations: Delivers a major refactor with v1 Connector integration, CUDA kernel separation from vllm downstream, compact memory layout, connector integration for PrisDB and InfiniStore(/w TCP), tunable block sizes, RDMA auto-detection support and few performance optimizations to boost throughput and deployment density. ( #1174, #1194, #1247, #1274, #1276, #1278, #1286, #1287, #1288, #1295, #1303, #1312, #1318)
KV Event Synchronization: Introduces remote tokenizer support to ensure tokenization consistency between client and server and implements a comprehensive KV cache event synchronization system that shares KV cache state between vLLM instances and aibrix gateway for improved prefix caching efficiency (#1307, #1328, #1349, #1362)
Multi-Engine Deployment Support: Adds unified regression test suites and Helm values to support heterogeneous backends including vLLM, SGLang, and Dynamo, enabling flexible model deployment across engines. (#1293, #1319, #1322, #1341, #1346)

📊 Feature Enhancements

🌐 Gateway Enhancements

SLO-aware router with profile support (#1192, #1305, #1368)
Adds custom inference port and metrics port support (#1140, #1313).
Make httproute timeout configurable and checks missing httproute before request start(#1212, #1344).
Adds metrics server support and adds ready-to-use sample dashboard (#1211).

☁️ Control Plane Improvements

Enhance the CRD existence check and improve webhook support (#1170, #1187).
Ensure cache sync before starting controller reconcile and resync object on component restarts (#1146, #1219).
Use worker pool management for periodic metrics update (#1096)

📦 Installation & Tooling & CI

Adds Helm Chart support with helm standard labels and probes (#1323, #1331, #1343).
Supports multi-arch (AMD, ARM) Docker builds and refactors release pipelines (#1315, #1317, #1324, #1325).
Improves kind development workflow and supports port-forward via Makefile, support override IMAGE_TAG and disable docker push workflow in forked repo(#1210, #1274, #1301).

🐞 Bug Fixes

Fixes incorrect request count, out-of-index errors, and race conditions in AIBrix router(#1246, #1262, #1305).
Fix Prefix cache chained hashing issue and optimize to O(N) via block-hash. (#1218, #1262)
Fixes completion body parsing and complex content bugs (#1145, #1160).
Fixes legacy autoscaling annotation misconfigurations (#1173).
Fixes image replacement issues in Kustomize (#1165).
Fixes e2e test flakiness with wait.PollUntilContextTimeout (#1214).
Add read lock for h.histogram (#1147)

📚 Documentation Updates

Adds v0.4.0 new features documentation including P/D disaggregation, multi-engine, KVCache Offloading and SLO routing documentation (#1279, #1285, #1341, #1356, #1368).
Fixes broken links, typos, and dashboard URLs (#1190, #1193, #1237, #1270, #1271).
Refactors component design docs into structured architecture folders (#1224, #1236, #1250).
Refactors local development and quickstart guides (#1193, #1339, #1172).
Improve installation commands and add more deployment examples (#1128, #1136, #1230, #1379, #1395)

New Contributors

@dittops made their first contribution in #1128
@yyzxw made their first contribution in #1139
@firebook made their first contribution in #1145
@windsonsea made their first contribution in #1150
@emmanuel-ferdman made their first contribution in #1161
@MondayCha made their first contribution in #1165
@learner0810 made their first contribution in #1170
@jiahuipaung made their first contribution in #1172
@didier-durand made their first contribution in #1190
@gcalmettes made their first contribution in #1193
@justadogistaken made their first contribution in #1218
@ModiCodeCraftsman made their first contribution in #1217
@haitwang-cloud made their first contribution in #1230
@ae86zhizhi made their first contribution in #1262
@nicole-lihui made their first contribution in #1270
@omerap12 made their first contribution in #1282
@li-rongzhi made their first contribution in #1285
@rudeigerc made their first contribution in #1301
@Yaegaki1Erika made their first contribution in #1313
@elizabetht made their first contribution in #1339
@autopear made their first contribution in #1362
@Epsilon314 made their first contribution in #1402

What's Changed

Full Changelog: v0.3.0...v0.4.0

[Docs] Update typo for installation command by @dittops in #1128
[Docs] fix: update example yaml ai runtime tag to v0.3.0 by @yyzxw in #1139
[Bug]: fix: README.md docs install error by @googs1025 in #1136
[Bug] fix: error when parse stop param in completion body by @firebook in #1145
Resync model adapters on gateway restart by @dittops in #1146
[Docs]add management user link by @yyzxw in #1141
[Bug] Add read lock for h.histogram by @runzhen in #1147
[Doc] Improve samples/volcano-engine/README.md by @windsonsea in #1150
feature: use worker pool management for periodic metrics update by @googs1025 in #1096
[Misc] [gpu_optimizer] add namespace info for log by @googs1025 in #1149
Add support for custom inference engine port by @varungup90 in #1140
Modernize logger interface by @emmanuel-ferdman in #1161
Add unit test code coverage by @varungup90 in #1156
feat: simplfy router interface by @Xunzhuo in #1163
[Bug] Fix image replacements in Kustomize files to support installation by @MondayCha in #1165
[Bug]: fix(aibrix kvcache): ObjectPool by @googs1025 in #1162
[Bug]: Fix legacy misconfigurations of autoscaling annotations. by @zhangjyr in #1173
Enhance the CRD existence check by @learner0810 in #1170
[Misc]: add unit test for aibrix metrics collector by @googs1025 in #1153
[Docs] Add vllm-cpu local deployment guide to Quickstart by @jiahuipaung in #1172
fix: gateway benchmark info by @Xunzhuo in #1180
[Bug] fix: error when parse complex content in completion body by @firebook in #1160
Add race condition check in unit-test CI and add test-coverage cmd in Makefile by @varungup90 in #1169
Supporting Mooncake Traces in Workload Generator by @happyandslow in #1182
Recover Client Implementation by @happyandslow in #1191
Docs: fixing various text issues by @didier-durand in #1190
[Docs] update development instructions to new make commands by @gcalmettes in #1193
feat: make preble configurable and rename by @Xunzhuo in #1189
[Misc] Add deepseek-r1 tp8 pp2 example by @Jeffwan in #1195
[Misc] Update the latest news in README.md by @Jeffwan in #1196
[Feature] Add RDMA auto-detection for kvcache by @DwyaneShi in #1194
[Tooling]: port-forward support, Makefile changes for easier dev workflow in kind by @Venkat2811 in #1210
Multiple fixes and adding workload merging tool by @happyandslow in #1213
Add configurable httproute timeout by @varungup90 in #1212
[Bug] fix(e2e flaky): replace for-loop with wait.PollUntilContextTimeout in validateAllPodsAreReady by @googs1025 in #1214
[Misc] chore: use constant var for gpu_busy_ti...

Contributors

zhangjyr, autopear, and 31 other contributors

Assets 4

21 May 21:42

github-actions

v0.3.0

ecc3529

v0.3.0

Automatically generated release for tag v0.3.0.

🚀 New Features Highlights

AIBrix KVCache Offloading Framework: Introduces a pluggable multi-tier KVCache architecture with support for DRAM and remote backends, enabling efficient offloading of KV states to reduce GPU memory pressure and increase deployment density. (#1057, #1061, #1062, #1063, #1064, #1068, #1069, #1080, #1107)
New KVCache orchestration API: Refactors the orchestration layer to support distributed hashing based caching solutions. (#971, #984, #985, #1037, #1055, #1071, #1114)
Prefix Cache and Load aware Routing: Uses hash token-based prefix matching and load awareness to reduce latency by increasing prefix cache hit rate and routing efficiency (#838, #774, #933, #1067)
Preble Routing (ICLR’25): An implementation of Preble, it balances KV cache reuse and GPU load by comparing prefix lengths and computing prompt-aware cost scores for optimal routing. (#678, #719, #730, #1024)
Fairness-oriented Routing (OSDI’24 VTC): Introduces the vtc-basic router with Windowed Adaptive Fairness Routing, which dynamically tracks token usage and ensures fair load distribution across pods. (#964, #1011, #1065)

📊 Feature Enhancements

Gateway Enhancements

Support for OpenAI-compatible APIs, including streaming responses, usage reporting, asynchronous handling, and standardized error responses for seamless end-to-end integration. (#703, #788, #799)
Introduced the /v1/models endpoint for compatibility with OpenAI-style API clients. (#802)
Refactored gateway-plugins with an extensible ext-proc server architecture, laying the foundation for pluggable policies. (#810)
Improved concurrency safety and routing stability through major cache and router redesigns (#878, #884)

Control Plane:

Added Kubernetes webhook validation for CRDs, providing early error feedback during resource creation (#748, #786).
Improve RayClusterFleet to fully support Deepseek-r1/v3 models (#789, #826, #835, #914, #954).
Add scale subresource in RayClusterFleet CRD and enable HPA support (#1082, #1109)

Installation Experiences:

Introduced Terraform modules for GCP and Kubernetes deployment (#823).
Added setup guides for Minikube on Lambda Cloud and AWS in the documentation (#1020).
Enabled standalone controller installation for simplified system bootstrapping.(#930, #931)
Streamlined upgrade workflows by introducing kubectl apply support. CRDs are now split and applied with --server-side, avoiding annotation size limits and enabling smooth incremental updates. (#793)
Enabled container image publishing to Github Container Registry (GHCR) (#1041).
Support ARM container Images (#1090)

Observability & Stability:

Shipped prebuilt Grafana dashboards covering control plane, gateway, and KV cache components for out-of-the-box observability. (#1048)
Tuned Envoy proxy memory and buffer configurations for better performance under high concurrency. (#825)
Tuned Envoy proxy configurations for memory and buffer management under high concurrency (#967).
Added graceful shutdown, liveness, and readiness probes to improve service resilience (#962).
Delivered production-ready monitoring setups for all major system components (#1048).

New Contributors

@gaocegege made their first contribution in #731
@eltociear made their first contribution in #736
@terrytangyuan made their first contribution in #746
@jolfr made their first contribution in #744
@Abirdcfly made their first contribution in #763
@pierDipi made their first contribution in #764
@Xunzhuo made their first contribution in #810
@zjd0112 made their first contribution in #849
@SongGuyang made their first contribution in #850
@vaaandark made their first contribution in #856
@vie-serendipity made their first contribution in #860
@nurali-techie made their first contribution in #867
@legendtkl made their first contribution in #870
@ronaldosaheki made their first contribution in #886
@nadongjun made their first contribution in #890
@cr7258 made their first contribution in #893
@thomasjpfan made their first contribution in #883
@runzhen made their first contribution in #896
@my-git9 made their first contribution in #895
@googs1025 made their first contribution in #908
@Iceber made their first contribution in #926
@ModiIntel made their first contribution in #954
@Venkat2811 made their first contribution in #964
@SuperMohit made their first contribution in #992
@weapons97 made their first contribution in #990
@zhixian82 made their first contribution in #1082

What's Changed

Full Changelog: v0.2.0...v0.3.0

[Docs] fix format of the dist kv cache doc by @DwyaneShi in #714
complete the 'make generate' command by @kerthcet in #711
Update organization reference in code base by @Jeffwan in #717
[Misc] Update the documentation link by @Jeffwan in #720
Initial implementation of radix tree-based cache by @gangmuk in #678
Add model adapter e2e tests by @varungup90 in #701
Add vllm cpu alternative for local development by @varungup90 in #721
Add white paper file by @Jeffwan in #724
Adding streaming client for AIbrix experiments by @happyandslow in #676
[Docs] Update Readme with new links and blog post, and update white paper by @xieus in #725
Recording failed requests in benchmark client by @gangmuk in #727
Process response headers in gateway by @varungup90 in #703
[misc] Fix white paper link by @Jeffwan in #728
Prefix and load aware routing with radix tree kv cache by @gangmuk in #719
Fix slack link in README.md by @Jeffwan in #729
[readme] Fix wrong link by @gaocegege in #731
[Misc] update scheduler.py by @eltociear in #736
Improve thread safety for TreeNode data structure and refactor related codes by @gangmuk in #730
Fix CacheSpec api scheme by @kerthcet in #740
docs: Fix link to license by @terrytangyuan in #746
Use native codegen cmd generating client-go by @kerthcet in #741
[Docs]: Fixed kubectl commands for install of components by @jolfr in #744
[fix] fixing bug in using AsyncOpenAI client (header setting, token counting, etc) by @gangmuk in #738
Add webhook framework by @kerthcet in #748
Use random seed for xxhash by @varungup90 in #752
Create SECURITY.md to enable security policy by @xieus in #756
[CI] Add integration test by @kerthcet in #759
[Bug] fix: correct non-inherited context by @Abirdcfly in #763
[Misc] Parametrize Makefile for mocked vLLM apps by @pierDipi in #764
Support benchmarking script by using real application trace by @nwangfw in #737
Maintaining common benchmarks utils in a separate dir by @gangmuk in #770
Ignore worker pods for gateway routing by @varungup90 in #776
Disable ENABLE_PROBES_INJECTION in correct way by @Jeffwan in #779
Make stream include usage as optional by @varungup90 in #788
Append ray head label selector in PodAutoscaler by @Jeffwan in #789
Remove redundant install crds in makefile by @varungup90 in #792
Update request message processing for /v1/completion input by @varungup90 in #794
Added target...

Contributors

zhangjyr, ronaldosaheki, and 33 other contributors

Assets 4

21 May 04:13

github-actions

v0.3.0-rc.2

c3bb240

v0.3.0-rc.2 Pre-release

Pre-release

Automatically generated release for tag v0.3.0-rc.2.

What's Changed

[Bug] fix: condition nil panic in FindStatusCondition func by @googs1025 in #1078
Refactor request body processing and add multi-turn conversation support by @varungup90 in #1067
Upload arm build images with git.ref_name by @varungup90 in #1090
Update documentation and add openai sdk samples by @varungup90 in #1092
Rename preble based prefix routing strategy by @varungup90 in #1104
Add v0.3.0 ps performance regression test scenario by @Jeffwan in #1099
Migrating benchmark entrypoints to python client by @happyandslow in #1066
[Misc] Add demo manifests for volcano engine by @Jeffwan in #1105
[Integration] KVCache: update vLLM integration by @DwyaneShi in #1107
[Bug]fix: add scale subresource to rayclusterfleet by @zhixian82 in #1082
[Feature] KVCache: Suppport InfiniStore GID and enhance cluster mode by @DwyaneShi in #1106
[Chore] fix: regenerate crd by @zhixian82 in #1109
[Chore] KVCache: enhance format and dependencies by @DwyaneShi in #1108
Polish benchmark manifests and VE samples by @Jeffwan in #1113
[API] Support customized template for cache by @Jeffwan in #1114
Bump version to v0.3.0-rc.2 by @Jeffwan in #1115
[Fix] Move pdb from patch to resources by @Jeffwan in #1117

New Contributors

@zhixian82 made their first contribution in #1082

Full Changelog: v0.3.0-rc.1...v0.3.0-rc.2

Contributors

Jeffwan, DwyaneShi, and 4 other contributors

Assets 4

13 May 07:12

github-actions

v0.3.0-rc.1

575aa5d

v0.3.0-rc.1 Pre-release

Pre-release

What's Changed

[Docs] fix format of the dist kv cache doc by @DwyaneShi in #714
complete the 'make generate' command by @kerthcet in #711
Update organization reference in code base by @Jeffwan in #717
[Misc] Update the documentation link by @Jeffwan in #720
Initial implementation of radix tree-based cache by @gangmuk in #678
Add model adapter e2e tests by @varungup90 in #701
Add vllm cpu alternative for local development by @varungup90 in #721
Add white paper file by @Jeffwan in #724
Adding streaming client for AIbrix experiments by @happyandslow in #676
[Docs] Update Readme with new links and blog post, and update white paper by @xieus in #725
Recording failed requests in benchmark client by @gangmuk in #727
Process response headers in gateway by @varungup90 in #703
[misc] Fix white paper link by @Jeffwan in #728
Prefix and load aware routing with radix tree kv cache by @gangmuk in #719
Fix slack link in README.md by @Jeffwan in #729
[readme] Fix wrong link by @gaocegege in #731
[Misc] update scheduler.py by @eltociear in #736
Improve thread safety for TreeNode data structure and refactor related codes by @gangmuk in #730
Fix CacheSpec api scheme by @kerthcet in #740
docs: Fix link to license by @terrytangyuan in #746
Use native codegen cmd generating client-go by @kerthcet in #741
[Docs]: Fixed kubectl commands for install of components by @jolfr in #744
[fix] fixing bug in using AsyncOpenAI client (header setting, token counting, etc) by @gangmuk in #738
Add webhook framework by @kerthcet in #748
Use random seed for xxhash by @varungup90 in #752
Create SECURITY.md to enable security policy by @xieus in #756
[CI] Add integration test by @kerthcet in #759
[Bug] fix: correct non-inherited context by @Abirdcfly in #763
[Misc] Parametrize Makefile for mocked vLLM apps by @pierDipi in #764
Support benchmarking script by using real application trace by @nwangfw in #737
Maintaining common benchmarks utils in a separate dir by @gangmuk in #770
Ignore worker pods for gateway routing by @varungup90 in #776
Disable ENABLE_PROBES_INJECTION in correct way by @Jeffwan in #779
Make stream include usage as optional by @varungup90 in #788
Append ray head label selector in PodAutoscaler by @Jeffwan in #789
Remove redundant install crds in makefile by @varungup90 in #792
Update request message processing for /v1/completion input by @varungup90 in #794
Added target pod to client result and made clients consistent by @gangmuk in #799
Enable CI tests for release branch by @Jeffwan in #805
Move modelAdapter runtime validation to webhook by @kerthcet in #786
[Misc] Adding model field to each request by @happyandslow in #812
[Refactor]: gateway-plugins ext-proc server codebase by @Xunzhuo in #810
[CI]: update release tags pattern by @Xunzhuo in #815
[Docs]: fix vllm mock app Unauthorized response by @Xunzhuo in #817
Reconfigure workload generator for predefined synthetic patterns by @happyandslow in #771
Workload generation scripts for prefix aware routing by @gangmuk in #820
Fix the paths in lambda cloud doc by @gangmuk in #824
[Bug] Added Startup Probe in Quickstart Model by @jolfr in #773
Add /v1/models endpoint to gateway by @varungup90 in #802
Increase envoy proxy memory config and client connection buffersize by @varungup90 in #825
Support to create default HttpRoute for RayClusterFleet by @Jeffwan in #826
[Misc] Fix CI issue on release branch and clean up logs by @Jeffwan in #837
Fix repeated initialization of gateway routers and add unit test for prefix cache by @varungup90 in #838
Add deepseek-r1 671B deployment sample and docs by @Jeffwan in #835
Bump AIBrix version to v0.2.1 in manifests by @Jeffwan in #839
[Docs] Update Slack link by @gaocegege in #841
[Docs] Remove repeated lines by @zjd0112 in #849
Bump AIBrix version to v0.2.1 for standalone distributed inference by @SongGuyang in #850
Support OpenAI api style /v1/models response by @Jeffwan in #829
[Misc] Resolve symlink ambiguity when generating codes by @vaaandark in #856
Introduce RoutingContext in Route interface and clean up stale codes by @Jeffwan in #855
[Misc]: sync hpa status to podAutoScaler by @vie-serendipity in #860
Generate workload based on prefix sharing synthetic data by @happyandslow in #840
Fixing missing image link in #840 by @happyandslow in #871
Cite Melange paper in heterogeneous feature by @Jeffwan in #872
[Misc] support linux for vllm cpu local development by @nurali-techie in #867
Refactor make deploy to use apply instead of create by @varungup90 in #793
Use string based tokenizer in prefix cache by @varungup90 in #774
Add profiling support for gateway plugins and bug fix to close stream decoder by @varungup90 in #857
Add flag to enable/disable GPU Optimizer tracing by @varungup90 in #875
[Docs] fix typo in runtime feature page by @legendtkl in #870
chore: clean-up mock yaml by @Xunzhuo in #877
Fixing image link error in workload generator README.md by @happyandslow in #888
Update Synthetic Load Prodefined Config for Geneerator by @happyandslow in #889
[Misc] Fix plot_workload to pass dirname to makedirs by @ronaldosaheki in #886
[Misc] Fix client.py in case workload has model null and client has default_model by @ronaldosaheki in #887
[WIP] Adding input/output distribution argument to constant load generator by @happyandslow in #882
[Docs] Fix broken contributing guidelines link in README by @nadongjun in #890
[Bug] fix install script PATH environment variable by @cr7258 in #893
[Docs] Link to dynamic lora from docs by @thomasjpfan in #883
[API] Refactor: core cache design and impl by @Xunzhuo in #878
Added antiaffinity in kvcache crd by @gangmuk in #865
[Docs] Fix tpm and rpm typo in gateway-plugins.rst by @runzhen in #896
[Misc] Remove unused function in pkg/utils by @my-git9 in #895
Remove model name from client and generator by @happyandslow in #894
[Misc] Add PS benchmark manifests and scripts by @Jeffwan in #899
Add release overlays to update control plane config for production deployment by @varungup90 in https://github.com/vllm-project/aibri...

Contributors

zhangjyr, ronaldosaheki, and 32 other contributors

Assets 4

09 Mar 13:25

github-actions

v0.2.1

858ec82

v0.2.1

Automatically generated release for tag v0.2.1.

What's Changed

Cherry-pick Enable CI tests for release branch (#805) by @Jeffwan in #808
Cherry pick #776 #779 #788 #789 #794 to release branch by @Jeffwan @varungup90 in #809
Cherry-pick #825 #826 part of #717 in release branch by @varungup90 @Jeffwan in #828
Update version and tags to v0.2.1 by @Jeffwan in #833

Full Changelog: v0.2.0...v0.2.1

Contributors

Jeffwan and varungup90

Assets 5

19 Feb 18:31

github-actions

v0.2.0

0a21d77

v0.2.0

Automatically generated release for tag v0.2.0.

🚀 New Features Highlights

Distributed KV Cache: Implemented support for managing KV cache across multiple nodes, enhancing performance.
Cost-Driven Heterogenous Serving: Improved scheduling and inference strategies for mixed GPU environments, optimizing cost and resource utilization. (#371 #430, #509, #598, #554, #598)
Optimizer Based Autoscaling: Leverage offline profiles of inference server to calculate the number of replicas. (#430, #500, #692, #508)
Prefix Cache Aware Routing: Added support for routing decisions based on prefix cache hits, improving inference efficiency. (#641, #657)

📊 Feature Enhancements

LoRA Scheduling Enhancements: Introduced multiple scheduling strategies, including bin packing, least latency, least throughput, and random. (#544)
Prefix Cache Aware Routing: Added support for routing decisions based on prefix cache hits, improving inference efficiency. (#641)
Gateway Enhancements: Improved request handling efficiency by enabling streaming in the Envoy gateway. (#377) Enhanced the handling of model registration and invalid cache scenarios. (#542), Introduced fallback strategies to ensure robust request allocation. (#445) Optimized cache store retrieval, reducing unnecessary overhead. (#639) Addressed missing Prometheus config preventing gateway startup. (#441)
PodAutoscaler Scaling improvements: Improved scaling logic to handle edge cases more efficiently. (#508, #515)

🛠Infrastructure & CI/CD Upgrades

Parallelized Build Tasks: CI efficiency improvements by running builds in parallel. (#398)
CrashLoopBackOff Detection in CI: Added monitoring for pod failures in testing workflows. (#444)
Improved GitHub Actions Cost Efficiency: Optimized triggers and removed unnecessary nightly builds. (#411, #422)
Integration Tests for Core Components: Added integration tests for autoscalers, routing policies, and deployment configurations. (#616, #620)

What's Changed

Add envoy gateway streaming support by @varungup90 in #377
Add client traffic policy to increase per connection buffer size from 32kb to 256kb by @varungup90 in #395
Misc: add support to metricsSources property of podautoscaler by @zhangjyr in #371
[Misc] Update runtime server startup command in v0.1.0 by @brosoul in #396
[CI] improve the ci efficiency by parallelizing the build tasks by @nwangfw in #398
Fix the ticker interval by removing unnecessary ms by @Jeffwan in #415
[Misc] Disable specific endpoints logs by @Jeffwan in #418
[CI] Github Action trigger condition optimized for cost saving by @nwangfw in #411
[Misc] Fix the mocked app role permission issue by @Jeffwan in #416
[CI] Nightly tag removed for release branch by @nwangfw in #422
Enable setting PodAutoscaler configuration via YAML labels by @kr11 in #409
Update manifest to adopt v0.1.1 images by @Jeffwan in #429
[Bug]: duplicated http in rest metrics fetcher (#408) by @zhangjyr in #421
[MISC]: Improve Request Trace Granularity with Version Control by @zhangjyr in #431
Support histogram metrics from engine in cache by @Jeffwan in #424
Support fetching metrics from remote Prometheus server by @Jeffwan in #433
[CI] Add python wheel to release artifact by @Jeffwan in #434
Fix update cache pod issue and refactor updatePod handler by @Jeffwan in #439
Extract common metrics structure to types and utils by @Jeffwan in #438
Fix gateway startup issue due to missing prometheus config by @Jeffwan in #441
[feat]: GPU Optimizer and Simulator development app by @zhangjyr in #430
Add selectrandom fallback in routing and only scraping healthy pods by @Jeffwan in #445
AIBrix Workload Generator / Scenario Simulator by @happyandslow in #428
CrashLoopBackOff status detection in CI by @nwangfw in #444
Support installing individual controllers from giant controller-manager by @nwangfw in #442
Refactor Scaler: Resolve Issues with Metric Parameter Updates in Multiple KPAs by @kr11 in #437
Support metrics multi labels for different models by @brosoul in #450
Add health check api interface for runtime by @Jeffwan in #451
Fix the service name override issue in rolebindings by @Jeffwan in #453
Reorganize docs/development and docs/tutorial structure by @Jeffwan in #455
Move tools to separate folders and update mocked app README.md by @Jeffwan in #457
Fix multi models metric result in PromQL by @brosoul in #458
Support Azure LLM trace in workload generator by @happyandslow in #462
Fix autoscaler scalingstrategy switching logic by @nwangfw in #475
Fix missing handle of PromQL scope is PodMetricScope by @brosoul in #479
[Misc] Consolidate app and simulator by @zhangjyr in #477
[Bug] Avoid including sensitive info in Dockerfile ENV by @zhangjyr in #487
Refactor generator to generate time-based traces by @happyandslow in #478
[CI] Update deploy workload script in installation test by @nwangfw in #499
[Bug] handle metricKey creation with MetricsSources by @nwangfw in #498
Adding Client for Workload Generator Workload File by @happyandslow in #501
[Feat] Integrate deployment configurations and fix autoscaler/gpu optimizer connectivity by @zhangjyr in #500
Fix some simulator format issue and add some TODOs by @Jeffwan in #505
[Bug] Fix the way how podautoscaler handle 0 pods. by @zhangjyr in #508
[Misc] Improve gpu optimizer debugging on podautoscaler. by @zhangjyr in #509
Optimize kustomize overlay for volcano engine deployment by @Jeffwan in #512
[perf] Refact tos downloader in Runtime by @brosoul in #510
Refactor metric source for customized protocol, port and path by @kr11 in #511
[Bug] Fixed the yaml of deployments in heterogenous GPU settings to make KPA scaling work as expected. by @zhangjyr in #513
[Misc] Heterogeneous GPU Optimizer Logging Clean Up by @nwangfw in #514
Fix KPA bug, and an elaborate KPA test case by @kr11 in #515
Cut v0.2.0-rc.1 release by @Jeffwan in #516
[Bug] Accumulated bug fix on controller manager, mock app configuration, and gpu optimizer. by @zhangjyr in #522
[Misc] Reduced runtime's container image size by @nwangfw in #518
clean memory scaler object when pa crd is deleted by @kr11 in #520
Configure autoscaler http client to skip certificate check by @Jeffwan in #530
[Doc] Update aibrix documentation by @Jeffwan in #533
Refactor the gateway-plugin and metadata service manifests by @Jeffwan in #531
Fix the GITHUB_WORKSPACE artifact sharing issue in release workflow by @Jeffwan in #532
[Misc] Polish the benchmark scripts by @Jeffwan in #525
Fix APA bugs in creation, add test and demo yaml by @kr11 in #536
Add VKE IPv4 Testing Cluster Config by @nwangfw in #537
Support for request length internal trace by @happyandslow in #538
[Feat] Add download status into runtime downloader by @brosoul in #539
[Feat] Add runtime model management api by @brosoul in #540
[gateway] handle the wrong model name and cache inconsistency case by @Jeffwan in #542
[Docs] fix: update the parameters instruction in readme by @scarlet25151 in #548
add lora schedulers - bin pack, least latency, least throughput, random by @Aspirin96 in #544
add request routers - least kv cache, least expected latency by @Aspirin96 in #543
[Docs] heterogenous gpu docs added by ...

Contributors

zhangjyr, Jeffwan, and 10 other contributors

Assets 5

23 Jan 22:23

github-actions

v0.2.0-rc.2

6ee2f11

v0.2.0-rc.2 Pre-release

Pre-release

Automatically generated release for tag v0.2.0-rc.2.

What's Changed

[Bug] Accumulated bug fix on controller manager, mock app configuration, and gpu optimizer. by @zhangjyr in #522
[Misc] Reduced runtime's container image size by @nwangfw in #518
clean memory scaler object when pa crd is deleted by @kr11 in #520
Configure autoscaler http client to skip certificate check by @Jeffwan in #530
[Doc] Update aibrix documentation by @Jeffwan in #533
Refactor the gateway-plugin and metadata service manifests by @Jeffwan in #531
Fix the GITHUB_WORKSPACE artifact sharing issue in release workflow by @Jeffwan in #532
[Misc] Polish the benchmark scripts by @Jeffwan in #525
Fix APA bugs in creation, add test and demo yaml by @kr11 in #536
Add VKE IPv4 Testing Cluster Config by @nwangfw in #537
Support for request length internal trace by @happyandslow in #538
[Feat] Add download status into runtime downloader by @brosoul in #539
[Feat] Add runtime model management api by @brosoul in #540
[gateway] handle the wrong model name and cache inconsistency case by @Jeffwan in #542
[Docs] fix: update the parameters instruction in readme by @scarlet25151 in #548
add lora schedulers - bin pack, least latency, least throughput, random by @Aspirin96 in #544
add request routers - least kv cache, least expected latency by @Aspirin96 in #543
[Docs] heterogenous gpu docs added by @nwangfw in #545
Fix race condition in cache by @varungup90 in #550
Fix pod internal cache delete handling by @varungup90 in #552
Handle terminating pod for request routing by @varungup90 in #549
Support absolute path as lora adapter artifact path by @Jeffwan in #556
Deadlock fix for cache by @varungup90 in #557
Mock app log fix for missing metrics warning by @varungup90 in #564
Add vllm graceful termination configuration by @nwangfw in #568
Enhance dynamic lora adapter support for auth enabled scenario by @Jeffwan in #571
Update pyproject.toml to support python 3.12 by @Jeffwan in #579
[Docs ]Update ai runtime management api and downloader docs by @Jeffwan in #577
Check the HPA ownerReference in request enqueue by @Jeffwan in #582
Add request length for traces by @happyandslow in #569
Support model registration flow using aibrix runtime api by @Jeffwan in #580
Gateway plugin report total incoming requests and pending requests by @zhangjyr in #554
Support distributed kv cache orchestration by @Jeffwan in #583
Grant workflow action permission to write packages by @Jeffwan in #586
Update routers to use GetPodModelMetric api and misc cleanup in metri… by @varungup90 in #590
Update upload/download artifact github actions version to v4 by @varungup90 in #591
Update version in aibrix/python to 0.2.0-rc.2 by @varungup90 in #594

New Contributors

@scarlet25151 made their first contribution in #548
@Aspirin96 made their first contribution in #544

Full Changelog: v0.2.0-rc.1...v0.2.0-rc.2

Contributors

zhangjyr, Jeffwan, and 7 other contributors

Assets 5

09 Jan 06:44

Jeffwan

v0.1.2

b0766a9

v0.1.2

What's Changed

Support absolute path as lora adapter artifact path (#556) by @Jeffwan in #558
Cherry pick streaming and client traffic policy by @varungup90 in #560
Cut v0.1.2 release by @Jeffwan in #561

Full Changelog: v0.1.1...v0.1.2

Contributors

Jeffwan and varungup90

Assets 4

10 Dec 20:16

Jeffwan

v0.2.0-rc.1

0d40fbd

v0.2.0-rc.1 Pre-release

Pre-release

What's Changed

Add envoy gateway streaming support by @varungup90 in #377
Add client traffic policy to increase per connection buffer size from 32kb to 256kb by @varungup90 in #395
Misc: add support to metricsSources property of podautoscaler by @zhangjyr in #371
[Misc] Update runtime server startup command in v0.1.0 by @brosoul in #396
[CI] improve the ci efficiency by parallelizing the build tasks by @nwangfw in #398
Fix the ticker interval by removing unnecessary ms by @Jeffwan in #415
[Misc] Disable specific endpoints logs by @Jeffwan in #418
[CI] Github Action trigger condition optimized for cost saving by @nwangfw in #411
[Misc] Fix the mocked app role permission issue by @Jeffwan in #416
[CI] Nightly tag removed for release branch by @nwangfw in #422
Enable setting PodAutoscaler configuration via YAML labels by @kr11 in #409
Update manifest to adopt v0.1.1 images by @Jeffwan in #429
[Bug]: duplicated http in rest metrics fetcher (#408) by @zhangjyr in #421
[MISC]: Improve Request Trace Granularity with Version Control by @zhangjyr in #431
Support histogram metrics from engine in cache by @Jeffwan in #424
Support fetching metrics from remote Prometheus server by @Jeffwan in #433
[CI] Add python wheel to release artifact by @Jeffwan in #434
Fix update cache pod issue and refactor updatePod handler by @Jeffwan in #439
Extract common metrics structure to types and utils by @Jeffwan in #438
Fix gateway startup issue due to missing prometheus config by @Jeffwan in #441
[feat]: GPU Optimizer and Simulator development app by @zhangjyr in #430
Add selectrandom fallback in routing and only scraping healthy pods by @Jeffwan in #445
AIBrix Workload Generator / Scenario Simulator by @happyandslow in #428
CrashLoopBackOff status detection in CI by @nwangfw in #444
Support installing individual controllers from giant controller-manager by @nwangfw in #442
Refactor Scaler: Resolve Issues with Metric Parameter Updates in Multiple KPAs by @kr11 in #437
Support metrics multi labels for different models by @brosoul in #450
Add health check api interface for runtime by @Jeffwan in #451
Fix the service name override issue in rolebindings by @Jeffwan in #453
Reorganize docs/development and docs/tutorial structure by @Jeffwan in #455
Move tools to separate folders and update mocked app README.md by @Jeffwan in #457
Fix multi models metric result in PromQL by @brosoul in #458
Support Azure LLM trace in workload generator by @happyandslow in #462
Fix autoscaler scalingstrategy switching logic by @nwangfw in #475
Fix missing handle of PromQL scope is PodMetricScope by @brosoul in #479
[Misc] Consolidate app and simulator by @zhangjyr in #477
[Bug] Avoid including sensitive info in Dockerfile ENV by @zhangjyr in #487
Refactor generator to generate time-based traces by @happyandslow in #478
[CI] Update deploy workload script in installation test by @nwangfw in #499
[Bug] handle metricKey creation with MetricsSources by @nwangfw in #498
Adding Client for Workload Generator Workload File by @happyandslow in #501
[Feat] Integrate deployment configurations and fix autoscaler/gpu optimizer connectivity by @zhangjyr in #500
Fix some simulator format issue and add some TODOs by @Jeffwan in #505
[Bug] Fix the way how podautoscaler handle 0 pods. by @zhangjyr in #508
[Misc] Improve gpu optimizer debugging on podautoscaler. by @zhangjyr in #509
Optimize kustomize overlay for volcano engine deployment by @Jeffwan in #512
[perf] Refact tos downloader in Runtime by @brosoul in #510
Refactor metric source for customized protocol, port and path by @kr11 in #511
[Bug] Fixed the yaml of deployments in heterogenous GPU settings to make KPA scaling work as expected. by @zhangjyr in #513
[Misc] Heterogeneous GPU Optimizer Logging Clean Up by @nwangfw in #514
Fix KPA bug, and an elaborate KPA test case by @kr11 in #515
Cut v0.2.0-rc.1 release by @Jeffwan in #516

Full Changelog: v0.1.1...v0.2.0-rc.1

Contributors

zhangjyr, Jeffwan, and 5 other contributors

Assets 6

Releases: vllm-project/aibrix

v0.4.1

What's Changed

Contributors

Uh oh!

v0.4.0

🚀 New Features Highlights

📊 Feature Enhancements

🌐 Gateway Enhancements

☁️ Control Plane Improvements

📦 Installation & Tooling & CI

🐞 Bug Fixes

📚 Documentation Updates

New Contributors

What's Changed

Contributors

Uh oh!

v0.3.0

🚀 New Features Highlights

📊 Feature Enhancements

Gateway Enhancements

Control Plane:

Installation Experiences:

Observability & Stability:

New Contributors

What's Changed

Contributors

Uh oh!

v0.3.0-rc.2

What's Changed

New Contributors

Contributors

Uh oh!

v0.3.0-rc.1

What's Changed

Contributors

Uh oh!

v0.2.1

What's Changed

Contributors

Uh oh!

v0.2.0

🚀 New Features Highlights

📊 Feature Enhancements

🛠Infrastructure & CI/CD Upgrades

What's Changed

Contributors

Uh oh!

v0.2.0-rc.2

What's Changed

New Contributors

Contributors

Uh oh!

v0.1.2

What's Changed

Contributors

Uh oh!

v0.2.0-rc.1

What's Changed

Contributors

Uh oh!