Python: Add container_id and filename fields to AnnotationContent class by ymuichiro · Pull Request #12985 · microsoft/semantic-kernel

ymuichiro · 2025-08-21T08:51:41Z

Motivation and Context

I confirmed that Code Interpreter can run via Azure Responses API in Semantic Kernel, but I found two issues: an error occurs when a file is created inside the Code Interpreter, and the response lacks parameters required to access the created file.

Description

Bug 1: ValidationError for AnnotationContent

Semantic Kernel uses the following enum for AnnotationContent.content_type:

@experimental
class CitationType(str, Enum):
    """Citation type."""

    URL_CITATION = "url_citation"
    FILE_PATH = "file_path"
    FILE_CITATION = "file_citation"

However, the Responses API returns file citations with a type like this:

{"container_id": "cntr_68a6c50135d88190b6fdc062051155b50847ddec73d3be1b", "end_index": 94, "file_id": "cfile_68a6c67a7fcc8190889d7e1789d485e2", "filename": "sample.txt", "start_index": 66, "type": "container_file_citation"}

There is no matching enum member on the Semantic Kernel side. The CitationType should include CONTAINER_FILE_CITATION, for example:

@experimental
class CitationType(str, Enum):
    """Citation type."""

    URL_CITATION = "url_citation"
    FILE_PATH = "file_path"
    FILE_CITATION = "file_citation"
    CONTAINER_FILE_CITATION = "container_file_citation"

Bug 2: Missing parameters to access generated files

The response returned by the agent shows a file_id, but lacks the container_id and filename required to download the file (the filename is needed to replace sandbox:/mnt/data/sample.txt):

Example call:

r = await agent.get_response(cast(list, history.messages), thread=thread)
print(r.content.items)

Example output:

[
  TextContent(inner_content=None, ai_model_id=None, metadata={}, content_type='text', text='<user_prompt>', encoding=None),
  TextContent(inner_content=None, ai_model_id=None, metadata={}, content_type='text', text='<assistant_message>\n\n[sample.txt](sandbox:/mnt/data/sample.txt)', encoding=None), 
  AnnotationContent(inner_content=None, ai_model_id=None, metadata={}, content_type='annotation', file_id='cfile_68a6c67a7fcc8190889d7e1789d485e2', quote=None, start_index=88, end_index=116, url=None, title=None, citation_type=<CitationType.CONTAINER_FILE_CITATION: 'container_file_citation'>)
]

To include these values in the parsed result, AnnotationContent in semantic_kernel/contents/annotation_content.py should expose container_id and filename, and map the incoming type to citation_type. Example suggested structure:

@experimental
class AnnotationContent(KernelContent):
    """Annotation content."""

    content_type: Literal[ContentTypes.ANNOTATION_CONTENT] = Field(ANNOTATION_CONTENT_TAG, init=False)  # type: ignore
    tag: ClassVar[str] = ANNOTATION_CONTENT_TAG
    file_id: str | None = None
    quote: str | None = None
    start_index: int | None = None
    end_index: int | None = None
    url: str | None = None
    title: str | None = None
    # added
    container_id: str | None = None
    filename: str | None = None
    citation_type: CitationType | None = Field(None, alias="type")

    model_config = ConfigDict(
        extra="ignore",
        populate_by_name=True,
    )

    def __str__(self) -> str:
        ...

    def to_element(self) -> Element:
        ...

    @classmethod
    def from_element(cls: type[_T], element: Element) -> _T:
        ...

    def to_dict(self) -> dict[str, Any]:
        ...

Summary: add the missing enum value container_file_citation and extend AnnotationContent to include container_id and filename (and map type to citation_type) so file references returned by the Azure Responses API are fully usable.

Code

client: AsyncAzureOpenAI = AsyncAzureOpenAI(
    base_url=urljoin(server_settings.AZURE_OPENAI_COMPLETION_ENDPOINT.rstrip("/") + "/", "openai/v1/"),
    api_key=server_settings.AZURE_OPENAI_COMPLETION_API_KEY,
    api_version="preview",
)

async def run(history: ChatHistory) -> dict[str, Any]:
    with open("sample.txt", "rb") as fp:
        f = await client.files.create(file=fp, purpose="assistants")

    agent = AzureResponsesAgent(
        ai_model_id=server_settings.AZURE_OPENAI_COMPLETION_DEPLOYMENT_NAME,
        client=client,
        name="name",
        function_choice_behavior=FunctionChoiceBehavior.Required(),
        tools=[
            CodeInterpreter(
                type="code_interpreter",
                container=CodeInterpreterContainerCodeInterpreterToolAuto(type="auto", file_ids=[f.id]),
            )
        ],
    )

    thread = ResponsesAgentThread(client, history, previous_response_id=None, enable_store=True)
    r = await agent.get_response(cast(list, history.messages), thread=thread)
    print(thread.id, r.content.items)

    annotation_content = [item for item in r.content.items if isinstance(item, AnnotationContent)]

    for ac in annotation_content:
        print(ac.file_id, ac.filename, ac.container_id)

Contribution Checklist

The code builds clean without any errors or warnings
The PR follows the SK Contribution Guidelines and the pre-submission formatting script raises no violations
All unit tests pass, and I have added new tests where possible
I didn't break anyone 😄

Copilot

Pull Request Overview

This PR adds support for container file citations in the Azure Responses API by extending the AnnotationContent class to handle additional fields required for file access. The changes address validation errors and missing parameters when Code Interpreter creates files.

Key changes:

Add CONTAINER_FILE_CITATION enum value to CitationType to support new citation type from Azure API
Add container_id and filename fields to AnnotationContent class for accessing generated files

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

python/semantic_kernel/contents/annotation_content.py

moonbox3

Thanks for the contribution, @ymuichiro.

…interpreter

moonbox3 · 2025-08-27T23:34:26Z

Python Test Coverage Report •

File	Stmts	Miss	Cover	Missing
contents
annotation_content.py	56	1	98%	71
TOTAL	26933	4651	82%

Python Unit Test Overview

Tests	Skipped	Failures	Errors	Time
3684	22 💤	0 ❌	0 🔥	1m 38s ⏱️

@experimental

…ss (microsoft#12985) ### Motivation and Context I confirmed that Code Interpreter can run via Azure Responses API in Semantic Kernel, but I found two issues: an error occurs when a file is created inside the Code Interpreter, and the response lacks parameters required to access the created file. ### Description #### Bug 1: ValidationError for AnnotationContent Semantic Kernel uses the following enum for `AnnotationContent.content_type`: ```python @experimental class CitationType(str, Enum): """Citation type.""" URL_CITATION = "url_citation" FILE_PATH = "file_path" FILE_CITATION = "file_citation" ``` However, the Responses API returns file citations with a `type` like this: ```json {"container_id": "cntr_68a6c50135d88190b6fdc062051155b50847ddec73d3be1b", "end_index": 94, "file_id": "cfile_68a6c67a7fcc8190889d7e1789d485e2", "filename": "sample.txt", "start_index": 66, "type": "container_file_citation"} ``` There is no matching enum member on the Semantic Kernel side. The `CitationType` should include `CONTAINER_FILE_CITATION`, for example: ```python @experimental class CitationType(str, Enum): """Citation type.""" URL_CITATION = "url_citation" FILE_PATH = "file_path" FILE_CITATION = "file_citation" CONTAINER_FILE_CITATION = "container_file_citation" ``` #### Bug 2: Missing parameters to access generated files The response returned by the agent shows a `file_id`, but lacks the `container_id` and `filename` required to download the file (the filename is needed to replace `sandbox:/mnt/data/sample.txt`): Example call: ```python r = await agent.get_response(cast(list, history.messages), thread=thread) print(r.content.items) ``` Example output: ```python [ TextContent(inner_content=None, ai_model_id=None, metadata={}, content_type='text', text='<user_prompt>', encoding=None), TextContent(inner_content=None, ai_model_id=None, metadata={}, content_type='text', text='<assistant_message>\n\n[sample.txt](sandbox:/mnt/data/sample.txt)', encoding=None), AnnotationContent(inner_content=None, ai_model_id=None, metadata={}, content_type='annotation', file_id='cfile_68a6c67a7fcc8190889d7e1789d485e2', quote=None, start_index=88, end_index=116, url=None, title=None, citation_type=<CitationType.CONTAINER_FILE_CITATION: 'container_file_citation'>) ] ``` To include these values in the parsed result, `AnnotationContent` in `semantic_kernel/contents/annotation_content.py` should expose `container_id` and `filename`, and map the incoming `type` to `citation_type`. Example suggested structure: ```python @experimental class AnnotationContent(KernelContent): """Annotation content.""" content_type: Literal[ContentTypes.ANNOTATION_CONTENT] = Field(ANNOTATION_CONTENT_TAG, init=False) # type: ignore tag: ClassVar[str] = ANNOTATION_CONTENT_TAG file_id: str | None = None quote: str | None = None start_index: int | None = None end_index: int | None = None url: str | None = None title: str | None = None # added container_id: str | None = None filename: str | None = None citation_type: CitationType | None = Field(None, alias="type") model_config = ConfigDict( extra="ignore", populate_by_name=True, ) def __str__(self) -> str: ... def to_element(self) -> Element: ... @classmethod def from_element(cls: type[_T], element: Element) -> _T: ... def to_dict(self) -> dict[str, Any]: ... ``` Summary: add the missing enum value `container_file_citation` and extend `AnnotationContent` to include `container_id` and `filename` (and map `type` to `citation_type`) so file references returned by the Azure Responses API are fully usable. #### Code ```python client: AsyncAzureOpenAI = AsyncAzureOpenAI( base_url=urljoin(server_settings.AZURE_OPENAI_COMPLETION_ENDPOINT.rstrip("/") + "/", "openai/v1/"), api_key=server_settings.AZURE_OPENAI_COMPLETION_API_KEY, api_version="preview", ) async def run(history: ChatHistory) -> dict[str, Any]: with open("sample.txt", "rb") as fp: f = await client.files.create(file=fp, purpose="assistants") agent = AzureResponsesAgent( ai_model_id=server_settings.AZURE_OPENAI_COMPLETION_DEPLOYMENT_NAME, client=client, name="name", function_choice_behavior=FunctionChoiceBehavior.Required(), tools=[ CodeInterpreter( type="code_interpreter", container=CodeInterpreterContainerCodeInterpreterToolAuto(type="auto", file_ids=[f.id]), ) ], ) thread = ResponsesAgentThread(client, history, previous_response_id=None, enable_store=True) r = await agent.get_response(cast(list, history.messages), thread=thread) print(thread.id, r.content.items) annotation_content = [item for item in r.content.items if isinstance(item, AnnotationContent)] for ac in annotation_content: print(ac.file_id, ac.filename, ac.container_id) ``` ### Contribution Checklist  - [x] The code builds clean without any errors or warnings - [x] The PR follows the [SK Contribution Guidelines](https://github.com/microsoft/semantic-kernel/blob/main/CONTRIBUTING.md) and the [pre-submission formatting script](https://github.com/microsoft/semantic-kernel/blob/main/CONTRIBUTING.md#development-scripts) raises no violations - [x] All unit tests pass, and I have added new tests where possible - [x] I didn't break anyone 😄 Co-authored-by: Evan Mattson <35585003+moonbox3@users.noreply.github.com>

Python: Add container_id and filename fields to AnnotationContent class

e15c43c

Copilot AI review requested due to automatic review settings August 21, 2025 08:51

ymuichiro requested a review from a team as a code owner August 21, 2025 08:51

moonbox3 added the python Pull requests for the Python Semantic Kernel label Aug 21, 2025

Copilot AI reviewed Aug 21, 2025

View reviewed changes

python/semantic_kernel/contents/annotation_content.py Show resolved Hide resolved

python/semantic_kernel/contents/annotation_content.py Show resolved Hide resolved

ymuichiro mentioned this pull request Aug 21, 2025

Python: Bug: Missing parameters when files are created inside Code Interpreter using Azure Responses API #12984

Closed

moonbox3 approved these changes Aug 22, 2025

View reviewed changes

moonbox3 requested a review from dmytrostruk August 22, 2025 02:06

moonbox3 added the agents label Aug 22, 2025

moonbox3 enabled auto-merge August 27, 2025 23:22

Merge branch 'main' into fix/azure-responses-api-missing-params-code-…

f474556

…interpreter

TaoChenOSU approved these changes Aug 27, 2025

View reviewed changes

moonbox3 added this pull request to the merge queue Aug 28, 2025

Merged via the queue into microsoft:main with commit d3318ad Aug 28, 2025
28 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Python: Add container_id and filename fields to AnnotationContent class#12985

Python: Add container_id and filename fields to AnnotationContent class#12985
moonbox3 merged 2 commits intomicrosoft:mainfrom
ymuichiro:fix/azure-responses-api-missing-params-code-interpreter

ymuichiro commented Aug 21, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

moonbox3 left a comment

Uh oh!

moonbox3 commented Aug 27, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

ymuichiro commented Aug 21, 2025

Motivation and Context

Description

Bug 1: ValidationError for AnnotationContent

Bug 2: Missing parameters to access generated files

Code

Contribution Checklist

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

Uh oh!

Uh oh!

moonbox3 left a comment

Choose a reason for hiding this comment

Uh oh!

moonbox3 commented Aug 27, 2025

Python Unit Test Overview

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants