[Feature]: Proxy - return OpenAI compatible remaining_requests and remaining_tokens headers #5957
Closed
Description
The Feature
he was using the headers to track usage and remaining quota. He was very surprised to find out his metric was broken on the migration, as LiteLLM was supposed to be a drop-in replacement.
Action items for this ticket
- Return
x-ratelimit-*
headers in responses https://platform.openai.com/docs/guides/rate-limits/usage-tiers - If virtual key / team / user has no rate limit set then return rate limit from litellm model group
- If just one deployment in a model group -> return the headers in OpenAI compatible format
- If multiple deployments in a model group then return the remaining tokens / requests for the model group
Motivation, pitch
Twitter / LinkedIn details
No response