Skip to content

feat: add support for /responses background parameter#4824

Draft
cdoern wants to merge 3 commits intollamastack:mainfrom
cdoern:responses-background-no-cancel
Draft

feat: add support for /responses background parameter#4824
cdoern wants to merge 3 commits intollamastack:mainfrom
cdoern:responses-background-no-cancel

Conversation

@cdoern
Copy link
Collaborator

@cdoern cdoern commented Feb 3, 2026

What does this PR do?

Add OpenAI-compatible background mode for the Responses API, allowing responses to be queued for asynchronous processing. Added background parameter (bool, default: false), When true, returns immediately with status "queued". openai_responses.py now has Background processing with _create_background_response and _process_background_response

add new integration tests using the field, and associated recordings.

closes: #4701

Test Plan

new integration tests + recordings using openAI client should pass.

saving /cancel route for a separate PR.

@cdoern cdoern marked this pull request as draft February 3, 2026 21:26
@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Feb 3, 2026
@github-actions
Copy link
Contributor

github-actions bot commented Feb 3, 2026

✱ Stainless preview builds

This PR will update the llama-stack-client SDKs with the following commit message.

feat: add support for /responses background parameter

Edit this comment to update it. It will appear in the SDK's changelogs.

llama-stack-client-python studio · code · diff

Your SDK built successfully.
generate ⚠️build ✅lint ✅test ✅

pip install https://pkg.stainless.com/s/llama-stack-client-python/0477a25c223eeba36da2d560ea04e4b39610bfa8/llama_stack_client-0.4.0a15-py3-none-any.whl
llama-stack-client-kotlin studio · code · diff

Your SDK built successfully.
generate ⚠️build ❗lint ✅test ❗

llama-stack-client-openapi studio · code · diff

Your SDK built successfully.
generate ⚠️

llama-stack-client-node studio · code · diff

Your SDK built successfully.
generate ⚠️build ✅lint ✅test ❗

npm install https://pkg.stainless.com/s/llama-stack-client-node/47bda28552f1624fcaac24cca35c3ecc52361f76/dist.tar.gz
llama-stack-client-go studio · conflict

⏳ These are partial results; builds are still running.


This comment is auto-generated by GitHub Actions and is automatically kept up to date as you push.
Last updated: 2026-02-04 20:48:48 UTC

Add OpenAI-compatible background mode for the Responses API, allowing
responses to be queued for asynchronous processing.

- Added `background` parameter (bool, default: false)
- When true, returns immediately with status "queued"

- Added `background` field to `OpenAIResponseObject`
- New status values: "queued", "in_progress"

- `agents/models.py`: Added `background` to `CreateResponseRequest`
- `openai_responses.py`: Added `background` field to response object

- `openai_responses.py`: Background processing with `_create_background_response` and `_process_background_response`
- `responses_store.py`: Added `update_response_object` for status updates

- Issue: llamastack#4701
- OpenAI docs: https://platform.openai.com/docs/guides/background

Signed-off-by: Charlie Doern <cdoern@redhat.com>
@cdoern cdoern force-pushed the responses-background-no-cancel branch from b03ebdf to a3b69f6 Compare February 3, 2026 21:28
# the root directory of this source tree.


def remove_null_from_anyof(schema: dict) -> None:
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI, moved this to a helper so it can be used by multiple APIs.

Copy link
Collaborator

@mattf mattf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some quick comments

Comment on lines +749 to +750
final_response = None
failed_response = None
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you only need one of these

Comment on lines +797 to +798
except Exception as update_error:
logger.exception(f"Failed to update response {response_id} with error status: {update_error}")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what will a user see / not see if this happens?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm good point. the server will log an err, but the client might not get a useful one. let me see if I can propagate this in a better way.

Comment on lines +231 to +232
if not self.sql_store:
raise ValueError("Responses store is not initialized")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks like a fail on startup kind of situation

Comment on lines +251 to +255
# Preserve existing messages if not provided
if messages is not None:
data["messages"] = [msg.model_dump() for msg in messages]
else:
data["messages"] = existing_data.get("messages", [])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when would the response have a messages field?

)

# Schedule background processing task
asyncio.create_task(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how does this behave with concurrent users requesting background processing?

# Schedule background processing task
asyncio.create_task(
self._process_background_response(
response_id=response_id,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what happens when a user gets the response_id and uses it as previous_response_id before this original request has terminated?

Comment on lines +240 to +241
if not existing_row:
raise ValueError(f"Response with id {response_object.id} not found")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if there's no row then there's some serious internal logic error. lots of logging here, maybe even crash the server.

Comment on lines +246 to +250
# Preserve existing input if not provided
if input is not None:
data["input"] = [input_item.model_dump() for input_item in input]
else:
data["input"] = existing_data.get("input", [])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this will be another place where previous_response_id chains will have to be followed for #3646

Comment on lines +235 to +238
existing_row = await self.sql_store.fetch_one(
self.reference.table_name,
where={"id": response_object.id},
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i expect this will be heavy - every new event will require a query for the old event, a few ser/des rounds and an update. until we can optimize the storage schema, only do this dance when necessary.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Responses API: Add background parameter support

2 participants