Releases: BerriAI/litellm
v1.58.2
What's Changed
- Fix RPM/TPM limit typo in admin UI by @yujonglee in #7769
- Add AIM Guardrails support by @krrishdholakia in #7771
- Support temporary budget increases on keys by @krrishdholakia in #7754
- Litellm dev 01 13 2025 p2 by @krrishdholakia in #7758
- docs - iam role based access for bedrock by @ishaan-jaff in #7774
- (Feat) prometheus - emit remaining team budget metric on proxy startup by @ishaan-jaff in #7777
- (fix)
BaseAWSLLM
- cache IAM role credentials when used by @ishaan-jaff in #7775
Full Changelog: v1.58.1...v1.58.2
Docker Run LiteLLM Proxy
docker run \
-e STORE_MODEL_IN_DB=True \
-p 4000:4000 \
ghcr.io/berriai/litellm:main-v1.58.2
Don't want to maintain your internal proxy? get in touch 🎉
Hosted Proxy Alpha: https://calendly.com/d/4mp-gd3-k5k/litellm-1-1-onboarding-chat
Load Test LiteLLM Proxy Results
Name | Status | Median Response Time (ms) | Average Response Time (ms) | Requests/s | Failures/s | Request Count | Failure Count | Min Response Time (ms) | Max Response Time (ms) |
---|---|---|---|---|---|---|---|---|---|
/chat/completions | Passed ✅ | 250.0 | 289.8090936126223 | 6.143711740946042 | 0.0 | 1838 | 0 | 228.12097899998207 | 2196.5017750000015 |
Aggregated | Passed ✅ | 250.0 | 289.8090936126223 | 6.143711740946042 | 0.0 | 1838 | 0 | 228.12097899998207 | 2196.5017750000015 |
v1.58.1
🚨Alpha - 1.58.0 has various perf improvements, we recommend waiting for a stable release before bumping in production
What's Changed
- (core sdk fix) - fix fallbacks stuck in infinite loop by @ishaan-jaff in #7751
- [Bug fix]: v1.58.0 - issue with read request body by @ishaan-jaff in #7753
- (litellm SDK perf improvements) - handle cases when unable to lookup model in model cost map by @ishaan-jaff in #7750
- (prometheus - minor bug fix) -
litellm_llm_api_time_to_first_token_metric
not populating for bedrock models by @ishaan-jaff in #7740 - (fix) health check - allow setting
health_check_model
by @ishaan-jaff in #7752
Full Changelog: v1.58.0...v1.58.1
Docker Run LiteLLM Proxy
docker run \
-e STORE_MODEL_IN_DB=True \
-p 4000:4000 \
ghcr.io/berriai/litellm:main-v1.58.1
Don't want to maintain your internal proxy? get in touch 🎉
Hosted Proxy Alpha: https://calendly.com/d/4mp-gd3-k5k/litellm-1-1-onboarding-chat
Load Test LiteLLM Proxy Results
Name | Status | Median Response Time (ms) | Average Response Time (ms) | Requests/s | Failures/s | Request Count | Failure Count | Min Response Time (ms) | Max Response Time (ms) |
---|---|---|---|---|---|---|---|---|---|
/chat/completions | Passed ✅ | 250.0 | 294.2978673554448 | 6.045420383532543 | 0.0 | 1809 | 0 | 223.72276400000146 | 3539.4181890000027 |
Aggregated | Passed ✅ | 250.0 | 294.2978673554448 | 6.045420383532543 | 0.0 | 1809 | 0 | 223.72276400000146 | 3539.4181890000027 |
v1.58.0
v1.58.0 - Alpha Release
🚨 This is an alpha release - we've made several performance / RPS improvements to litellm core. If you see any issues please file it https://github.com/BerriAI/litellm/issues
What's Changed
- (proxy perf) - service logger don't always import OTEL in helper function by @ishaan-jaff in #7727
- (proxy perf) - only read request body 1 time per request by @ishaan-jaff in #7728
Full Changelog: v1.57.11...v1.58.0
Docker Run LiteLLM Proxy
docker run \
-e STORE_MODEL_IN_DB=True \
-p 4000:4000 \
ghcr.io/berriai/litellm:main-v1.58.0
Don't want to maintain your internal proxy? get in touch 🎉
Hosted Proxy Alpha: https://calendly.com/d/4mp-gd3-k5k/litellm-1-1-onboarding-chat
Load Test LiteLLM Proxy Results
Name | Status | Median Response Time (ms) | Average Response Time (ms) | Requests/s | Failures/s | Request Count | Failure Count | Min Response Time (ms) | Max Response Time (ms) |
---|---|---|---|---|---|---|---|---|---|
/chat/completions | Passed ✅ | 240.0 | 273.2166563012582 | 6.118315985413586 | 0.0033451700302972037 | 1829 | 1 | 75.1692759999969 | 3821.228761000043 |
Aggregated | Passed ✅ | 240.0 | 273.2166563012582 | 6.118315985413586 | 0.0033451700302972037 | 1829 | 1 | 75.1692759999969 | 3821.228761000043 |
v1.57.11
v1.57.11 - Alpha Release
🚨 This is an alpha release - we've made several performance / RPS improvements to litellm core. If you see any issues please file it https://github.com/BerriAI/litellm/issues
What's Changed
- (litellm SDK perf improvement) - use
verbose_logger.debug
and_cached_get_model_info_helper
in_response_cost_calculator
by @ishaan-jaff in #7720 - (litellm sdk speedup) - use
_model_contains_known_llm_provider
inresponse_cost_calculator
to check if the model contains a known litellm provider by @ishaan-jaff in #7721 - (proxy perf) - only parse request body 1 time per request by @ishaan-jaff in #7722
- Revert "(proxy perf) - only parse request body 1 time per request" by @ishaan-jaff in #7724
- add azure o1 pricing by @krrishdholakia in #7715
Full Changelog: v1.57.10...v1.57.11
Docker Run LiteLLM Proxy
docker run \
-e STORE_MODEL_IN_DB=True \
-p 4000:4000 \
ghcr.io/berriai/litellm:main-v1.57.11
Don't want to maintain your internal proxy? get in touch 🎉
Hosted Proxy Alpha: https://calendly.com/d/4mp-gd3-k5k/litellm-1-1-onboarding-chat
Load Test LiteLLM Proxy Results
Name | Status | Median Response Time (ms) | Average Response Time (ms) | Requests/s | Failures/s | Request Count | Failure Count | Min Response Time (ms) | Max Response Time (ms) |
---|---|---|---|---|---|---|---|---|---|
/chat/completions | Passed ✅ | 240.0 | 270.55759577820237 | 6.130862160194138 | 0.0 | 1835 | 0 | 224.79750500002638 | 1207.8732939999952 |
Aggregated | Passed ✅ | 240.0 | 270.55759577820237 | 6.130862160194138 | 0.0 | 1835 | 0 | 224.79750500002638 | 1207.8732939999952 |
v1.57.8-stable
Full Changelog: v1.57.8...v1.57.8-stable
Docker Run LiteLLM Proxy
docker run \
-e STORE_MODEL_IN_DB=True \
-p 4000:4000 \
ghcr.io/berriai/litellm:litellm_stable_release_branch-v1.57.8-stable
Don't want to maintain your internal proxy? get in touch 🎉
Hosted Proxy Alpha: https://calendly.com/d/4mp-gd3-k5k/litellm-1-1-onboarding-chat
Load Test LiteLLM Proxy Results
Name | Status | Median Response Time (ms) | Average Response Time (ms) | Requests/s | Failures/s | Request Count | Failure Count | Min Response Time (ms) | Max Response Time (ms) |
---|---|---|---|---|---|---|---|---|---|
/chat/completions | Passed ✅ | 240.0 | 271.08706884006597 | 6.1244865014274685 | 0.0 | 1832 | 0 | 221.9753340000068 | 2009.652516000017 |
Aggregated | Passed ✅ | 240.0 | 271.08706884006597 | 6.1244865014274685 | 0.0 | 1832 | 0 | 221.9753340000068 | 2009.652516000017 |
v1.57.10
v1.57.10 - Alpha Release
🚨 This is an alpha release - we've made several performance / RPS improvements to litellm core. If you see any issues please file it https://github.com/BerriAI/litellm/issues
- Litellm dev 01 10 2025 p2 by @krrishdholakia in #7679
- Litellm dev 01 10 2025 p3 by @krrishdholakia in #7682
- build: new ui build by @krrishdholakia in #7685
- fix(model_hub.tsx): clarify cost in model hub is per 1m tokens by @krrishdholakia in #7687
- Litellm dev 01 11 2025 p3 by @krrishdholakia in #7702
- (perf litellm) - use
_get_model_info_helper
for cost tracking by @ishaan-jaff in #7703 - (perf sdk) - minor changes to cost calculator to run helpers only when necessary by @ishaan-jaff in #7704
- (perf) - proxy, use
orjson
for reading request body by @ishaan-jaff in #7706 - (minor fix -
aiohttp_openai/
) - fix get_custom_llm_provider by @ishaan-jaff in #7705 - (sdk perf fix) - only print args passed to litellm when debugging mode is on by @ishaan-jaff in #7708
- (perf) - only use response_cost_calculator 1 time per request. (Don't re-use the same helper twice per call ) by @ishaan-jaff in #7709
- [BETA] Add OpenAI
/images/variations
+ Topaz API support by @krrishdholakia in #7700 - (litellm sdk speedup router) - adds a helper
_cached_get_model_group_info
to use when trying to get deployment tpm/rpm limits by @ishaan-jaff in #7719
Full Changelog: v1.57.8...v1.57.10
Docker Run LiteLLM Proxy
docker run \
-e STORE_MODEL_IN_DB=True \
-p 4000:4000 \
ghcr.io/berriai/litellm:main-v1.57.10
Don't want to maintain your internal proxy? get in touch 🎉
Hosted Proxy Alpha: https://calendly.com/d/4mp-gd3-k5k/litellm-1-1-onboarding-chat
Load Test LiteLLM Proxy Results
Name | Status | Median Response Time (ms) | Average Response Time (ms) | Requests/s | Failures/s | Request Count | Failure Count | Min Response Time (ms) | Max Response Time (ms) |
---|---|---|---|---|---|---|---|---|---|
/chat/completions | Passed ✅ | 240.0 | 264.0629029362514 | 6.184926091214754 | 0.0 | 1851 | 0 | 213.62108399998192 | 1622.618584999998 |
Aggregated | Passed ✅ | 240.0 | 264.0629029362514 | 6.184926091214754 | 0.0 | 1851 | 0 | 213.62108399998192 | 1622.618584999998 |
v1.57.8
What's Changed
- (proxy latency/perf fix - user_api_key_auth) - use asyncio.create task for caching virtual key once it's validated by @ishaan-jaff in #7676
- (litellm sdk - perf improvement) - optimize
response_cost_calculator
by @ishaan-jaff in #7674 - (litellm sdk - perf improvement) - use O(1) set lookups for checking llm providers / models by @ishaan-jaff in #7672
- (litellm sdk - perf improvement) - optimize
pre_call_check
by @ishaan-jaff in #7673 - [integrations/lunary] allow to pass custom parent run id to LLM calls by @hughcrt in #7651
- LiteLLM Minor Fixes & Improvements (01/10/2025) - p1 by @krrishdholakia in #7670
- (performance improvement - litellm sdk + proxy) - ensure litellm does not create unnecessary threads when running async functions by @ishaan-jaff in #7680
- (litellm proxy perf) - pass num_workers cli arg to uvicorn when
num_workers
is specified by @ishaan-jaff in #7681 - fix proxy pre call hook - only use
asyncio.create_task
if user opts into alerting by @ishaan-jaff in #7683 - [Bug fix]: Proxy Auth Layer - Allow Azure Realtime routes as llm_api_routes by @ishaan-jaff in #7684
Full Changelog: v1.57.7...v1.57.8
Docker Run LiteLLM Proxy
docker run \
-e STORE_MODEL_IN_DB=True \
-p 4000:4000 \
ghcr.io/berriai/litellm:main-v1.57.8
Don't want to maintain your internal proxy? get in touch 🎉
Hosted Proxy Alpha: https://calendly.com/d/4mp-gd3-k5k/litellm-1-1-onboarding-chat
Load Test LiteLLM Proxy Results
Name | Status | Median Response Time (ms) | Average Response Time (ms) | Requests/s | Failures/s | Request Count | Failure Count | Min Response Time (ms) | Max Response Time (ms) |
---|---|---|---|---|---|---|---|---|---|
/chat/completions | Passed ✅ | 210.0 | 225.29799695056985 | 6.153370698253471 | 0.0 | 1841 | 0 | 177.73327700001573 | 2088.13791099999 |
Aggregated | Passed ✅ | 210.0 | 225.29799695056985 | 6.153370698253471 | 0.0 | 1841 | 0 | 177.73327700001573 | 2088.13791099999 |
v1.57.7
What's Changed
- (minor latency fixes / proxy) - use verbose_proxy_logger.debug() instead of litellm.print_verbose by @ishaan-jaff in #7664
- feat(ui_sso.py): Allows users to use test key pane, and have team budget limits be enforced for their use-case by @krrishdholakia in #7666
- fix(main.py): fix lm_studio/ embedding routing by @krrishdholakia in #7658
- fix(vertex_ai/gemini/transformation.py): handle 'http://' in gemini p… by @krrishdholakia in #7660
- Use environment variable for Athina logging URL by @vivek-athina in #7628
Full Changelog: v1.57.5...v1.57.7
Docker Run LiteLLM Proxy
docker run \
-e STORE_MODEL_IN_DB=True \
-p 4000:4000 \
ghcr.io/berriai/litellm:main-v1.57.7
Don't want to maintain your internal proxy? get in touch 🎉
Hosted Proxy Alpha: https://calendly.com/d/4mp-gd3-k5k/litellm-1-1-onboarding-chat
Load Test LiteLLM Proxy Results
Name | Status | Median Response Time (ms) | Average Response Time (ms) | Requests/s | Failures/s | Request Count | Failure Count | Min Response Time (ms) | Max Response Time (ms) |
---|---|---|---|---|---|---|---|---|---|
/chat/completions | Passed ✅ | 200.0 | 218.4749677188173 | 6.216185012755876 | 0.0 | 1860 | 0 | 177.92223199990076 | 3911.6109139999935 |
Aggregated | Passed ✅ | 200.0 | 218.4749677188173 | 6.216185012755876 | 0.0 | 1860 | 0 | 177.92223199990076 | 3911.6109139999935 |
v1.57.5
🚨🚨 Known issue - do not upgrade - Window's compatibility issue on this release
Relevant issue: #7677
What's Changed
- LiteLLM Minor Fixes & Improvements (01/08/2025) - p2 by @krrishdholakia in #7643
- Litellm dev 01 08 2025 p1 by @krrishdholakia in #7640
- (proxy - RPS) - Get 2K RPS at 4 instances, minor fix for caching_handler by @ishaan-jaff in #7655
- (proxy - RPS) - Get 2K RPS at 4 instances, minor fix
aiohttp_openai/
by @ishaan-jaff in #7659 - (proxy perf improvement) - use
uvloop
for higher RPS (10%-20% higher RPS) by @ishaan-jaff in #7662 - (Feat - Batches API) add support for retrieving vertex api batch jobs by @ishaan-jaff in #7661
- (proxy-latency fixes) use asyncio tasks for logging db metrics by @ishaan-jaff in #7663
Full Changelog: v1.57.4...v1.57.5
Docker Run LiteLLM Proxy
docker run \
-e STORE_MODEL_IN_DB=True \
-p 4000:4000 \
ghcr.io/berriai/litellm:main-v1.57.5
Don't want to maintain your internal proxy? get in touch 🎉
Hosted Proxy Alpha: https://calendly.com/d/4mp-gd3-k5k/litellm-1-1-onboarding-chat
Load Test LiteLLM Proxy Results
Name | Status | Median Response Time (ms) | Average Response Time (ms) | Requests/s | Failures/s | Request Count | Failure Count | Min Response Time (ms) | Max Response Time (ms) |
---|---|---|---|---|---|---|---|---|---|
/chat/completions | Passed ✅ | 230.0 | 282.70225500655766 | 6.115771768544881 | 0.0 | 1830 | 0 | 206.44150200001832 | 3375.4479410000044 |
Aggregated | Passed ✅ | 230.0 | 282.70225500655766 | 6.115771768544881 | 0.0 | 1830 | 0 | 206.44150200001832 | 3375.4479410000044 |
v1.57.4
What's Changed
- fix(utils.py): fix select tokenizer for custom tokenizer by @krrishdholakia in #7599
- LiteLLM Minor Fixes & Improvements (01/07/2025) - p3 by @krrishdholakia in #7635
- (feat) - allow building litellm proxy from pip package by @ishaan-jaff in #7633
- Litellm dev 01 07 2025 p2 by @krrishdholakia in #7622
- Allow assigning teams to org on UI + OpenAI
omni-moderation
cost model tracking by @krrishdholakia in #7566 - (fix) proxy auth - allow using Azure JS SDK routes as llm_api_routes by @ishaan-jaff in #7631
- (helm) - bug fix - allow using
migrationJob.enabled
variable within job by @ishaan-jaff in #7639
Full Changelog: v1.57.3...v1.57.4
Docker Run LiteLLM Proxy
docker run \
-e STORE_MODEL_IN_DB=True \
-p 4000:4000 \
ghcr.io/berriai/litellm:main-v1.57.4
Don't want to maintain your internal proxy? get in touch 🎉
Hosted Proxy Alpha: https://calendly.com/d/4mp-gd3-k5k/litellm-1-1-onboarding-chat
Load Test LiteLLM Proxy Results
Name | Status | Median Response Time (ms) | Average Response Time (ms) | Requests/s | Failures/s | Request Count | Failure Count | Min Response Time (ms) | Max Response Time (ms) |
---|---|---|---|---|---|---|---|---|---|
/chat/completions | Passed ✅ | 200.0 | 218.7550845980808 | 6.268875045928877 | 0.0 | 1876 | 0 | 170.9488330000113 | 1424.4913769999812 |
Aggregated | Passed ✅ | 200.0 | 218.7550845980808 | 6.268875045928877 | 0.0 | 1876 | 0 | 170.9488330000113 | 1424.4913769999812 |