Skip to content

Python: Add container_id and filename fields to AnnotationContent class#12985

Merged
moonbox3 merged 2 commits intomicrosoft:mainfrom
ymuichiro:fix/azure-responses-api-missing-params-code-interpreter
Aug 28, 2025
Merged

Python: Add container_id and filename fields to AnnotationContent class#12985
moonbox3 merged 2 commits intomicrosoft:mainfrom
ymuichiro:fix/azure-responses-api-missing-params-code-interpreter

Conversation

@ymuichiro
Copy link
Copy Markdown
Contributor

Motivation and Context

I confirmed that Code Interpreter can run via Azure Responses API in Semantic Kernel, but I found two issues: an error occurs when a file is created inside the Code Interpreter, and the response lacks parameters required to access the created file.

Description

Bug 1: ValidationError for AnnotationContent

Semantic Kernel uses the following enum for AnnotationContent.content_type:

@experimental
class CitationType(str, Enum):
    """Citation type."""

    URL_CITATION = "url_citation"
    FILE_PATH = "file_path"
    FILE_CITATION = "file_citation"

However, the Responses API returns file citations with a type like this:

{"container_id": "cntr_68a6c50135d88190b6fdc062051155b50847ddec73d3be1b", "end_index": 94, "file_id": "cfile_68a6c67a7fcc8190889d7e1789d485e2", "filename": "sample.txt", "start_index": 66, "type": "container_file_citation"}

There is no matching enum member on the Semantic Kernel side. The CitationType should include CONTAINER_FILE_CITATION, for example:

@experimental
class CitationType(str, Enum):
    """Citation type."""

    URL_CITATION = "url_citation"
    FILE_PATH = "file_path"
    FILE_CITATION = "file_citation"
    CONTAINER_FILE_CITATION = "container_file_citation"

Bug 2: Missing parameters to access generated files

The response returned by the agent shows a file_id, but lacks the container_id and filename required to download the file (the filename is needed to replace sandbox:/mnt/data/sample.txt):

Example call:

r = await agent.get_response(cast(list, history.messages), thread=thread)
print(r.content.items)

Example output:

[
  TextContent(inner_content=None, ai_model_id=None, metadata={}, content_type='text', text='<user_prompt>', encoding=None),
  TextContent(inner_content=None, ai_model_id=None, metadata={}, content_type='text', text='<assistant_message>\n\n[sample.txt](sandbox:/mnt/data/sample.txt)', encoding=None), 
  AnnotationContent(inner_content=None, ai_model_id=None, metadata={}, content_type='annotation', file_id='cfile_68a6c67a7fcc8190889d7e1789d485e2', quote=None, start_index=88, end_index=116, url=None, title=None, citation_type=<CitationType.CONTAINER_FILE_CITATION: 'container_file_citation'>)
]

To include these values in the parsed result, AnnotationContent in semantic_kernel/contents/annotation_content.py should expose container_id and filename, and map the incoming type to citation_type. Example suggested structure:

@experimental
class AnnotationContent(KernelContent):
    """Annotation content."""

    content_type: Literal[ContentTypes.ANNOTATION_CONTENT] = Field(ANNOTATION_CONTENT_TAG, init=False)  # type: ignore
    tag: ClassVar[str] = ANNOTATION_CONTENT_TAG
    file_id: str | None = None
    quote: str | None = None
    start_index: int | None = None
    end_index: int | None = None
    url: str | None = None
    title: str | None = None
    # added
    container_id: str | None = None
    filename: str | None = None
    citation_type: CitationType | None = Field(None, alias="type")

    model_config = ConfigDict(
        extra="ignore",
        populate_by_name=True,
    )

    def __str__(self) -> str:
        ...

    def to_element(self) -> Element:
        ...

    @classmethod
    def from_element(cls: type[_T], element: Element) -> _T:
        ...

    def to_dict(self) -> dict[str, Any]:
        ...

Summary: add the missing enum value container_file_citation and extend AnnotationContent to include container_id and filename (and map type to citation_type) so file references returned by the Azure Responses API are fully usable.

Code

client: AsyncAzureOpenAI = AsyncAzureOpenAI(
    base_url=urljoin(server_settings.AZURE_OPENAI_COMPLETION_ENDPOINT.rstrip("/") + "/", "openai/v1/"),
    api_key=server_settings.AZURE_OPENAI_COMPLETION_API_KEY,
    api_version="preview",
)

async def run(history: ChatHistory) -> dict[str, Any]:
    with open("sample.txt", "rb") as fp:
        f = await client.files.create(file=fp, purpose="assistants")

    agent = AzureResponsesAgent(
        ai_model_id=server_settings.AZURE_OPENAI_COMPLETION_DEPLOYMENT_NAME,
        client=client,
        name="name",
        function_choice_behavior=FunctionChoiceBehavior.Required(),
        tools=[
            CodeInterpreter(
                type="code_interpreter",
                container=CodeInterpreterContainerCodeInterpreterToolAuto(type="auto", file_ids=[f.id]),
            )
        ],
    )

    thread = ResponsesAgentThread(client, history, previous_response_id=None, enable_store=True)
    r = await agent.get_response(cast(list, history.messages), thread=thread)
    print(thread.id, r.content.items)

    annotation_content = [item for item in r.content.items if isinstance(item, AnnotationContent)]

    for ac in annotation_content:
        print(ac.file_id, ac.filename, ac.container_id)

Contribution Checklist

Copilot AI review requested due to automatic review settings August 21, 2025 08:51
@ymuichiro ymuichiro requested a review from a team as a code owner August 21, 2025 08:51
@moonbox3 moonbox3 added the python Pull requests for the Python Semantic Kernel label Aug 21, 2025
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds support for container file citations in the Azure Responses API by extending the AnnotationContent class to handle additional fields required for file access. The changes address validation errors and missing parameters when Code Interpreter creates files.

Key changes:

  • Add CONTAINER_FILE_CITATION enum value to CitationType to support new citation type from Azure API
  • Add container_id and filename fields to AnnotationContent class for accessing generated files

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Copy link
Copy Markdown
Collaborator

@moonbox3 moonbox3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution, @ymuichiro.

@moonbox3 moonbox3 requested a review from dmytrostruk August 22, 2025 02:06
@moonbox3 moonbox3 enabled auto-merge August 27, 2025 23:22
@moonbox3
Copy link
Copy Markdown
Collaborator

Python Test Coverage

Python Test Coverage Report •
FileStmtsMissCoverMissing
contents
   annotation_content.py56198%71
TOTAL26933465182% 

Python Unit Test Overview

Tests Skipped Failures Errors Time
3684 22 💤 0 ❌ 0 🔥 1m 38s ⏱️

@moonbox3 moonbox3 added this pull request to the merge queue Aug 28, 2025
Merged via the queue into microsoft:main with commit d3318ad Aug 28, 2025
28 checks passed
jcruzmot-te pushed a commit to thousandeyes/aia-semantic-kernel that referenced this pull request Sep 15, 2025
…ss (microsoft#12985)

### Motivation and Context

I confirmed that Code Interpreter can run via Azure Responses API in
Semantic Kernel, but I found two issues: an error occurs when a file is
created inside the Code Interpreter, and the response lacks parameters
required to access the created file.

### Description

#### Bug 1: ValidationError for AnnotationContent

Semantic Kernel uses the following enum for
`AnnotationContent.content_type`:

```python
@experimental
class CitationType(str, Enum):
    """Citation type."""

    URL_CITATION = "url_citation"
    FILE_PATH = "file_path"
    FILE_CITATION = "file_citation"
```

However, the Responses API returns file citations with a `type` like
this:

```json
{"container_id": "cntr_68a6c50135d88190b6fdc062051155b50847ddec73d3be1b", "end_index": 94, "file_id": "cfile_68a6c67a7fcc8190889d7e1789d485e2", "filename": "sample.txt", "start_index": 66, "type": "container_file_citation"}
```

There is no matching enum member on the Semantic Kernel side. The
`CitationType` should include `CONTAINER_FILE_CITATION`, for example:

```python
@experimental
class CitationType(str, Enum):
    """Citation type."""

    URL_CITATION = "url_citation"
    FILE_PATH = "file_path"
    FILE_CITATION = "file_citation"
    CONTAINER_FILE_CITATION = "container_file_citation"
```

#### Bug 2: Missing parameters to access generated files

The response returned by the agent shows a `file_id`, but lacks the
`container_id` and `filename` required to download the file (the
filename is needed to replace `sandbox:/mnt/data/sample.txt`):

Example call:

```python
r = await agent.get_response(cast(list, history.messages), thread=thread)
print(r.content.items)
```

Example output:

```python
[
  TextContent(inner_content=None, ai_model_id=None, metadata={}, content_type='text', text='<user_prompt>', encoding=None),
  TextContent(inner_content=None, ai_model_id=None, metadata={}, content_type='text', text='<assistant_message>\n\n[sample.txt](sandbox:/mnt/data/sample.txt)', encoding=None), 
  AnnotationContent(inner_content=None, ai_model_id=None, metadata={}, content_type='annotation', file_id='cfile_68a6c67a7fcc8190889d7e1789d485e2', quote=None, start_index=88, end_index=116, url=None, title=None, citation_type=<CitationType.CONTAINER_FILE_CITATION: 'container_file_citation'>)
]
```

To include these values in the parsed result, `AnnotationContent` in
`semantic_kernel/contents/annotation_content.py` should expose
`container_id` and `filename`, and map the incoming `type` to
`citation_type`. Example suggested structure:

```python
@experimental
class AnnotationContent(KernelContent):
    """Annotation content."""

    content_type: Literal[ContentTypes.ANNOTATION_CONTENT] = Field(ANNOTATION_CONTENT_TAG, init=False)  # type: ignore
    tag: ClassVar[str] = ANNOTATION_CONTENT_TAG
    file_id: str | None = None
    quote: str | None = None
    start_index: int | None = None
    end_index: int | None = None
    url: str | None = None
    title: str | None = None
    # added
    container_id: str | None = None
    filename: str | None = None
    citation_type: CitationType | None = Field(None, alias="type")

    model_config = ConfigDict(
        extra="ignore",
        populate_by_name=True,
    )

    def __str__(self) -> str:
        ...

    def to_element(self) -> Element:
        ...

    @classmethod
    def from_element(cls: type[_T], element: Element) -> _T:
        ...

    def to_dict(self) -> dict[str, Any]:
        ...
```

Summary: add the missing enum value `container_file_citation` and extend
`AnnotationContent` to include `container_id` and `filename` (and map
`type` to `citation_type`) so file references returned by the Azure
Responses API are fully usable.

#### Code

```python
client: AsyncAzureOpenAI = AsyncAzureOpenAI(
    base_url=urljoin(server_settings.AZURE_OPENAI_COMPLETION_ENDPOINT.rstrip("/") + "/", "openai/v1/"),
    api_key=server_settings.AZURE_OPENAI_COMPLETION_API_KEY,
    api_version="preview",
)

async def run(history: ChatHistory) -> dict[str, Any]:
    with open("sample.txt", "rb") as fp:
        f = await client.files.create(file=fp, purpose="assistants")

    agent = AzureResponsesAgent(
        ai_model_id=server_settings.AZURE_OPENAI_COMPLETION_DEPLOYMENT_NAME,
        client=client,
        name="name",
        function_choice_behavior=FunctionChoiceBehavior.Required(),
        tools=[
            CodeInterpreter(
                type="code_interpreter",
                container=CodeInterpreterContainerCodeInterpreterToolAuto(type="auto", file_ids=[f.id]),
            )
        ],
    )

    thread = ResponsesAgentThread(client, history, previous_response_id=None, enable_store=True)
    r = await agent.get_response(cast(list, history.messages), thread=thread)
    print(thread.id, r.content.items)

    annotation_content = [item for item in r.content.items if isinstance(item, AnnotationContent)]

    for ac in annotation_content:
        print(ac.file_id, ac.filename, ac.container_id)
```

### Contribution Checklist

<!-- Before submitting this PR, please make sure: -->

- [x] The code builds clean without any errors or warnings
- [x] The PR follows the [SK Contribution
Guidelines](https://github.com/microsoft/semantic-kernel/blob/main/CONTRIBUTING.md)
and the [pre-submission formatting
script](https://github.com/microsoft/semantic-kernel/blob/main/CONTRIBUTING.md#development-scripts)
raises no violations
- [x] All unit tests pass, and I have added new tests where possible
- [x] I didn't break anyone 😄

Co-authored-by: Evan Mattson <35585003+moonbox3@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agents python Pull requests for the Python Semantic Kernel

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants