feat: add support for /responses background parameter#4824
feat: add support for /responses background parameter#4824cdoern wants to merge 3 commits intollamastack:mainfrom
Conversation
✱ Stainless preview buildsThis PR will update the Edit this comment to update it. It will appear in the SDK's changelogs. ✅ llama-stack-client-python studio · code · diff
✅ llama-stack-client-kotlin studio · code · diff
✅ llama-stack-client-node studio · code · diff
⏳ These are partial results; builds are still running. This comment is auto-generated by GitHub Actions and is automatically kept up to date as you push. |
Add OpenAI-compatible background mode for the Responses API, allowing responses to be queued for asynchronous processing. - Added `background` parameter (bool, default: false) - When true, returns immediately with status "queued" - Added `background` field to `OpenAIResponseObject` - New status values: "queued", "in_progress" - `agents/models.py`: Added `background` to `CreateResponseRequest` - `openai_responses.py`: Added `background` field to response object - `openai_responses.py`: Background processing with `_create_background_response` and `_process_background_response` - `responses_store.py`: Added `update_response_object` for status updates - Issue: llamastack#4701 - OpenAI docs: https://platform.openai.com/docs/guides/background Signed-off-by: Charlie Doern <cdoern@redhat.com>
b03ebdf to
a3b69f6
Compare
| # the root directory of this source tree. | ||
|
|
||
|
|
||
| def remove_null_from_anyof(schema: dict) -> None: |
There was a problem hiding this comment.
FYI, moved this to a helper so it can be used by multiple APIs.
| final_response = None | ||
| failed_response = None |
| except Exception as update_error: | ||
| logger.exception(f"Failed to update response {response_id} with error status: {update_error}") |
There was a problem hiding this comment.
what will a user see / not see if this happens?
There was a problem hiding this comment.
hmm good point. the server will log an err, but the client might not get a useful one. let me see if I can propagate this in a better way.
| if not self.sql_store: | ||
| raise ValueError("Responses store is not initialized") |
There was a problem hiding this comment.
this looks like a fail on startup kind of situation
| # Preserve existing messages if not provided | ||
| if messages is not None: | ||
| data["messages"] = [msg.model_dump() for msg in messages] | ||
| else: | ||
| data["messages"] = existing_data.get("messages", []) |
There was a problem hiding this comment.
when would the response have a messages field?
| ) | ||
|
|
||
| # Schedule background processing task | ||
| asyncio.create_task( |
There was a problem hiding this comment.
how does this behave with concurrent users requesting background processing?
| # Schedule background processing task | ||
| asyncio.create_task( | ||
| self._process_background_response( | ||
| response_id=response_id, |
There was a problem hiding this comment.
what happens when a user gets the response_id and uses it as previous_response_id before this original request has terminated?
| if not existing_row: | ||
| raise ValueError(f"Response with id {response_object.id} not found") |
There was a problem hiding this comment.
if there's no row then there's some serious internal logic error. lots of logging here, maybe even crash the server.
| # Preserve existing input if not provided | ||
| if input is not None: | ||
| data["input"] = [input_item.model_dump() for input_item in input] | ||
| else: | ||
| data["input"] = existing_data.get("input", []) |
There was a problem hiding this comment.
this will be another place where previous_response_id chains will have to be followed for #3646
| existing_row = await self.sql_store.fetch_one( | ||
| self.reference.table_name, | ||
| where={"id": response_object.id}, | ||
| ) |
There was a problem hiding this comment.
i expect this will be heavy - every new event will require a query for the old event, a few ser/des rounds and an update. until we can optimize the storage schema, only do this dance when necessary.
What does this PR do?
Add OpenAI-compatible background mode for the Responses API, allowing responses to be queued for asynchronous processing. Added
backgroundparameter (bool, default: false), When true, returns immediately with status "queued".openai_responses.pynow has Background processing with_create_background_responseand_process_background_responseadd new integration tests using the field, and associated recordings.
closes: #4701
Test Plan
new integration tests + recordings using openAI client should pass.
saving /cancel route for a separate PR.