Skip to content

Commit

Permalink
(feat) add Predicted Outputs for OpenAI (BerriAI#6594)
Browse files Browse the repository at this point in the history
* bump openai to openai==1.54.0

* add 'prediction' param

* testing fix bedrock deprecated cohere.command-text-v14

* test test_openai_prediction_param.py

* test_openai_prediction_param_with_caching

* doc Predicted Outputs

* doc Predicted Output
  • Loading branch information
ishaan-jaff authored Nov 5, 2024
1 parent 57b1bb5 commit c047d51
Show file tree
Hide file tree
Showing 12 changed files with 362 additions and 13 deletions.
10 changes: 5 additions & 5 deletions .circleci/config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,7 @@ jobs:
pip install opentelemetry-api==1.25.0
pip install opentelemetry-sdk==1.25.0
pip install opentelemetry-exporter-otlp==1.25.0
pip install openai==1.52.0
pip install openai==1.54.0
pip install prisma==0.11.0
pip install "detect_secrets==1.5.0"
pip install "httpx==0.24.1"
Expand Down Expand Up @@ -520,7 +520,7 @@ jobs:
pip install "aiodynamo==23.10.1"
pip install "asyncio==3.4.3"
pip install "PyGithub==1.59.1"
pip install "openai==1.52.0"
pip install "openai==1.54.0 "
# Run pytest and generate JUnit XML report
- run:
name: Build Docker image
Expand Down Expand Up @@ -637,7 +637,7 @@ jobs:
pip install "aiodynamo==23.10.1"
pip install "asyncio==3.4.3"
pip install "PyGithub==1.59.1"
pip install "openai==1.52.0"
pip install "openai==1.54.0 "
- run:
name: Build Docker image
command: docker build -t my-app:latest -f ./docker/Dockerfile.database .
Expand Down Expand Up @@ -729,7 +729,7 @@ jobs:
pip install "pytest-asyncio==0.21.1"
pip install "google-cloud-aiplatform==1.43.0"
pip install aiohttp
pip install "openai==1.52.0"
pip install "openai==1.54.0 "
python -m pip install --upgrade pip
pip install "pydantic==2.7.1"
pip install "pytest==7.3.1"
Expand Down Expand Up @@ -924,7 +924,7 @@ jobs:
pip install "pytest-retry==1.6.3"
pip install "pytest-asyncio==0.21.1"
pip install aiohttp
pip install "openai==1.52.0"
pip install "openai==1.54.0 "
python -m pip install --upgrade pip
pip install "pydantic==2.7.1"
pip install "pytest==7.3.1"
Expand Down
2 changes: 1 addition & 1 deletion .circleci/requirements.txt
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
# used by CI/CD testing
openai==1.52.0
openai==1.54.0
python-dotenv
tiktoken
importlib_metadata
Expand Down
109 changes: 109 additions & 0 deletions docs/my-website/docs/completion/predict_outputs.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,109 @@
import Tabs from '@theme/Tabs';
import TabItem from '@theme/TabItem';

# Predicted Outputs

| Property | Details |
|-------|-------|
| Description | Use this when most of the output of the LLM is known ahead of time. For instance, if you are asking the model to rewrite some text or code with only minor changes, you can reduce your latency significantly by using Predicted Outputs, passing in the existing content as your prediction. |
| Supported providers | `openai` |
| Link to OpenAI doc on Predicted Outputs | [Predicted Outputs ↗](https://platform.openai.com/docs/guides/latency-optimization#use-predicted-outputs) |
| Supported from LiteLLM Version | `v1.51.4` |



## Using Predicted Outputs

<Tabs>
<TabItem label="LiteLLM Python SDK" value="Python">

In this example we want to refactor a piece of C# code, and convert the Username property to Email instead:
```python
import litellm
os.environ["OPENAI_API_KEY"] = "your-api-key"
code = """
/// <summary>
/// Represents a user with a first name, last name, and username.
/// </summary>
public class User
{
/// <summary>
/// Gets or sets the user's first name.
/// </summary>
public string FirstName { get; set; }
/// <summary>
/// Gets or sets the user's last name.
/// </summary>
public string LastName { get; set; }
/// <summary>
/// Gets or sets the user's username.
/// </summary>
public string Username { get; set; }
}
"""

completion = litellm.completion(
model="gpt-4o-mini",
messages=[
{
"role": "user",
"content": "Replace the Username property with an Email property. Respond only with code, and with no markdown formatting.",
},
{"role": "user", "content": code},
],
prediction={"type": "content", "content": code},
)

print(completion)
```

</TabItem>
<TabItem label="LiteLLM Proxy Server" value="proxy">

1. Define models on config.yaml

```yaml
model_list:
- model_name: gpt-4o-mini # OpenAI gpt-4o-mini
litellm_params:
model: openai/gpt-4o-mini
api_key: os.environ/OPENAI_API_KEY

```

2. Run proxy server

```bash
litellm --config config.yaml
```

3. Test it using the OpenAI Python SDK


```python
from openai import OpenAI

client = OpenAI(
api_key="LITELLM_PROXY_KEY", # sk-1234
base_url="LITELLM_PROXY_BASE" # http://0.0.0.0:4000
)

completion = client.chat.completions.create(
model="gpt-4o-mini",
messages=[
{
"role": "user",
"content": "Replace the Username property with an Email property. Respond only with code, and with no markdown formatting.",
},
{"role": "user", "content": code},
],
prediction={"type": "content", "content": code},
)

print(completion)
```

</TabItem>
</Tabs>
1 change: 1 addition & 0 deletions docs/my-website/sidebars.js
Original file line number Diff line number Diff line change
Expand Up @@ -205,6 +205,7 @@ const sidebars = {
"completion/prompt_caching",
"completion/audio",
"completion/vision",
"completion/predict_outputs",
"completion/prefix",
"completion/drop_params",
"completion/prompt_formatting",
Expand Down
1 change: 1 addition & 0 deletions litellm/llms/OpenAI/chat/gpt_transformation.py
Original file line number Diff line number Diff line change
Expand Up @@ -94,6 +94,7 @@ def get_supported_openai_params(self, model: str) -> list:
"max_tokens",
"max_completion_tokens",
"modalities",
"prediction",
"n",
"presence_penalty",
"seed",
Expand Down
8 changes: 8 additions & 0 deletions litellm/main.py
Original file line number Diff line number Diff line change
Expand Up @@ -162,6 +162,7 @@
ChatCompletionAssistantMessage,
ChatCompletionAudioParam,
ChatCompletionModality,
ChatCompletionPredictionContentParam,
ChatCompletionUserMessage,
HttpxBinaryResponseContent,
)
Expand Down Expand Up @@ -304,6 +305,7 @@ async def acompletion(
max_tokens: Optional[int] = None,
max_completion_tokens: Optional[int] = None,
modalities: Optional[List[ChatCompletionModality]] = None,
prediction: Optional[ChatCompletionPredictionContentParam] = None,
audio: Optional[ChatCompletionAudioParam] = None,
presence_penalty: Optional[float] = None,
frequency_penalty: Optional[float] = None,
Expand Down Expand Up @@ -346,6 +348,7 @@ async def acompletion(
max_tokens (integer, optional): The maximum number of tokens in the generated completion (default is infinity).
max_completion_tokens (integer, optional): An upper bound for the number of tokens that can be generated for a completion, including visible output tokens and reasoning tokens.
modalities (List[ChatCompletionModality], optional): Output types that you would like the model to generate for this request. You can use `["text", "audio"]`
prediction (ChatCompletionPredictionContentParam, optional): Configuration for a Predicted Output, which can greatly improve response times when large parts of the model response are known ahead of time. This is most common when you are regenerating a file with only minor changes to most of the content.
audio (ChatCompletionAudioParam, optional): Parameters for audio output. Required when audio output is requested with modalities: ["audio"]
presence_penalty (float, optional): It is used to penalize new tokens based on their existence in the text so far.
frequency_penalty: It is used to penalize new tokens based on their frequency in the text so far.
Expand Down Expand Up @@ -387,6 +390,7 @@ async def acompletion(
"max_tokens": max_tokens,
"max_completion_tokens": max_completion_tokens,
"modalities": modalities,
"prediction": prediction,
"audio": audio,
"presence_penalty": presence_penalty,
"frequency_penalty": frequency_penalty,
Expand Down Expand Up @@ -693,6 +697,7 @@ def completion( # type: ignore # noqa: PLR0915
max_completion_tokens: Optional[int] = None,
max_tokens: Optional[int] = None,
modalities: Optional[List[ChatCompletionModality]] = None,
prediction: Optional[ChatCompletionPredictionContentParam] = None,
audio: Optional[ChatCompletionAudioParam] = None,
presence_penalty: Optional[float] = None,
frequency_penalty: Optional[float] = None,
Expand Down Expand Up @@ -737,6 +742,7 @@ def completion( # type: ignore # noqa: PLR0915
max_tokens (integer, optional): The maximum number of tokens in the generated completion (default is infinity).
max_completion_tokens (integer, optional): An upper bound for the number of tokens that can be generated for a completion, including visible output tokens and reasoning tokens.
modalities (List[ChatCompletionModality], optional): Output types that you would like the model to generate for this request.. You can use `["text", "audio"]`
prediction (ChatCompletionPredictionContentParam, optional): Configuration for a Predicted Output, which can greatly improve response times when large parts of the model response are known ahead of time. This is most common when you are regenerating a file with only minor changes to most of the content.
audio (ChatCompletionAudioParam, optional): Parameters for audio output. Required when audio output is requested with modalities: ["audio"]
presence_penalty (float, optional): It is used to penalize new tokens based on their existence in the text so far.
frequency_penalty: It is used to penalize new tokens based on their frequency in the text so far.
Expand Down Expand Up @@ -843,6 +849,7 @@ def completion( # type: ignore # noqa: PLR0915
"stop",
"max_completion_tokens",
"modalities",
"prediction",
"audio",
"max_tokens",
"presence_penalty",
Expand Down Expand Up @@ -994,6 +1001,7 @@ def completion( # type: ignore # noqa: PLR0915
max_tokens=max_tokens,
max_completion_tokens=max_completion_tokens,
modalities=modalities,
prediction=prediction,
audio=audio,
presence_penalty=presence_penalty,
frequency_penalty=frequency_penalty,
Expand Down
3 changes: 3 additions & 0 deletions litellm/types/llms/openai.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,9 @@
from openai.types.chat import ChatCompletionChunk
from openai.types.chat.chat_completion_audio_param import ChatCompletionAudioParam
from openai.types.chat.chat_completion_modality import ChatCompletionModality
from openai.types.chat.chat_completion_prediction_content_param import (
ChatCompletionPredictionContentParam,
)
from openai.types.embedding import Embedding as OpenAIEmbedding
from pydantic import BaseModel, Field
from typing_extensions import Dict, Required, TypedDict, override
Expand Down
2 changes: 2 additions & 0 deletions litellm/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -2550,6 +2550,7 @@ def get_optional_params( # noqa: PLR0915
max_tokens=None,
max_completion_tokens=None,
modalities=None,
prediction=None,
audio=None,
presence_penalty=None,
frequency_penalty=None,
Expand Down Expand Up @@ -2631,6 +2632,7 @@ def get_optional_params( # noqa: PLR0915
"max_tokens": None,
"max_completion_tokens": None,
"modalities": None,
"prediction": None,
"audio": None,
"presence_penalty": None,
"frequency_penalty": None,
Expand Down
10 changes: 5 additions & 5 deletions poetry.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ documentation = "https://docs.litellm.ai"

[tool.poetry.dependencies]
python = ">=3.8.1,<4.0, !=3.9.7"
openai = ">=1.52.0"
openai = ">=1.54.0"
python-dotenv = ">=0.2.0"
tiktoken = ">=0.7.0"
importlib-metadata = ">=6.8.0"
Expand Down
2 changes: 1 addition & 1 deletion requirements.txt
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# LITELLM PROXY DEPENDENCIES #
anyio==4.4.0 # openai + http req.
openai==1.52.0 # openai req.
openai==1.54.0 # openai req.
fastapi==0.111.0 # server dep
backoff==2.2.1 # server dep
pyyaml==6.0.0 # server dep
Expand Down
Loading

0 comments on commit c047d51

Please sign in to comment.