Python: Add container_id and filename fields to AnnotationContent class#12985
Merged
moonbox3 merged 2 commits intomicrosoft:mainfrom Aug 28, 2025
Conversation
Contributor
There was a problem hiding this comment.
Pull Request Overview
This PR adds support for container file citations in the Azure Responses API by extending the AnnotationContent class to handle additional fields required for file access. The changes address validation errors and missing parameters when Code Interpreter creates files.
Key changes:
- Add
CONTAINER_FILE_CITATIONenum value toCitationTypeto support new citation type from Azure API - Add
container_idandfilenamefields toAnnotationContentclass for accessing generated files
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
moonbox3
approved these changes
Aug 22, 2025
Collaborator
moonbox3
left a comment
There was a problem hiding this comment.
Thanks for the contribution, @ymuichiro.
TaoChenOSU
approved these changes
Aug 27, 2025
Collaborator
Python Test Coverage Report •
Python Unit Test Overview
|
||||||||||||||||||||||||||||||
jcruzmot-te
pushed a commit
to thousandeyes/aia-semantic-kernel
that referenced
this pull request
Sep 15, 2025
…ss (microsoft#12985) ### Motivation and Context I confirmed that Code Interpreter can run via Azure Responses API in Semantic Kernel, but I found two issues: an error occurs when a file is created inside the Code Interpreter, and the response lacks parameters required to access the created file. ### Description #### Bug 1: ValidationError for AnnotationContent Semantic Kernel uses the following enum for `AnnotationContent.content_type`: ```python @experimental class CitationType(str, Enum): """Citation type.""" URL_CITATION = "url_citation" FILE_PATH = "file_path" FILE_CITATION = "file_citation" ``` However, the Responses API returns file citations with a `type` like this: ```json {"container_id": "cntr_68a6c50135d88190b6fdc062051155b50847ddec73d3be1b", "end_index": 94, "file_id": "cfile_68a6c67a7fcc8190889d7e1789d485e2", "filename": "sample.txt", "start_index": 66, "type": "container_file_citation"} ``` There is no matching enum member on the Semantic Kernel side. The `CitationType` should include `CONTAINER_FILE_CITATION`, for example: ```python @experimental class CitationType(str, Enum): """Citation type.""" URL_CITATION = "url_citation" FILE_PATH = "file_path" FILE_CITATION = "file_citation" CONTAINER_FILE_CITATION = "container_file_citation" ``` #### Bug 2: Missing parameters to access generated files The response returned by the agent shows a `file_id`, but lacks the `container_id` and `filename` required to download the file (the filename is needed to replace `sandbox:/mnt/data/sample.txt`): Example call: ```python r = await agent.get_response(cast(list, history.messages), thread=thread) print(r.content.items) ``` Example output: ```python [ TextContent(inner_content=None, ai_model_id=None, metadata={}, content_type='text', text='<user_prompt>', encoding=None), TextContent(inner_content=None, ai_model_id=None, metadata={}, content_type='text', text='<assistant_message>\n\n[sample.txt](sandbox:/mnt/data/sample.txt)', encoding=None), AnnotationContent(inner_content=None, ai_model_id=None, metadata={}, content_type='annotation', file_id='cfile_68a6c67a7fcc8190889d7e1789d485e2', quote=None, start_index=88, end_index=116, url=None, title=None, citation_type=<CitationType.CONTAINER_FILE_CITATION: 'container_file_citation'>) ] ``` To include these values in the parsed result, `AnnotationContent` in `semantic_kernel/contents/annotation_content.py` should expose `container_id` and `filename`, and map the incoming `type` to `citation_type`. Example suggested structure: ```python @experimental class AnnotationContent(KernelContent): """Annotation content.""" content_type: Literal[ContentTypes.ANNOTATION_CONTENT] = Field(ANNOTATION_CONTENT_TAG, init=False) # type: ignore tag: ClassVar[str] = ANNOTATION_CONTENT_TAG file_id: str | None = None quote: str | None = None start_index: int | None = None end_index: int | None = None url: str | None = None title: str | None = None # added container_id: str | None = None filename: str | None = None citation_type: CitationType | None = Field(None, alias="type") model_config = ConfigDict( extra="ignore", populate_by_name=True, ) def __str__(self) -> str: ... def to_element(self) -> Element: ... @classmethod def from_element(cls: type[_T], element: Element) -> _T: ... def to_dict(self) -> dict[str, Any]: ... ``` Summary: add the missing enum value `container_file_citation` and extend `AnnotationContent` to include `container_id` and `filename` (and map `type` to `citation_type`) so file references returned by the Azure Responses API are fully usable. #### Code ```python client: AsyncAzureOpenAI = AsyncAzureOpenAI( base_url=urljoin(server_settings.AZURE_OPENAI_COMPLETION_ENDPOINT.rstrip("/") + "/", "openai/v1/"), api_key=server_settings.AZURE_OPENAI_COMPLETION_API_KEY, api_version="preview", ) async def run(history: ChatHistory) -> dict[str, Any]: with open("sample.txt", "rb") as fp: f = await client.files.create(file=fp, purpose="assistants") agent = AzureResponsesAgent( ai_model_id=server_settings.AZURE_OPENAI_COMPLETION_DEPLOYMENT_NAME, client=client, name="name", function_choice_behavior=FunctionChoiceBehavior.Required(), tools=[ CodeInterpreter( type="code_interpreter", container=CodeInterpreterContainerCodeInterpreterToolAuto(type="auto", file_ids=[f.id]), ) ], ) thread = ResponsesAgentThread(client, history, previous_response_id=None, enable_store=True) r = await agent.get_response(cast(list, history.messages), thread=thread) print(thread.id, r.content.items) annotation_content = [item for item in r.content.items if isinstance(item, AnnotationContent)] for ac in annotation_content: print(ac.file_id, ac.filename, ac.container_id) ``` ### Contribution Checklist <!-- Before submitting this PR, please make sure: --> - [x] The code builds clean without any errors or warnings - [x] The PR follows the [SK Contribution Guidelines](https://github.com/microsoft/semantic-kernel/blob/main/CONTRIBUTING.md) and the [pre-submission formatting script](https://github.com/microsoft/semantic-kernel/blob/main/CONTRIBUTING.md#development-scripts) raises no violations - [x] All unit tests pass, and I have added new tests where possible - [x] I didn't break anyone 😄 Co-authored-by: Evan Mattson <35585003+moonbox3@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation and Context
I confirmed that Code Interpreter can run via Azure Responses API in Semantic Kernel, but I found two issues: an error occurs when a file is created inside the Code Interpreter, and the response lacks parameters required to access the created file.
Description
Bug 1: ValidationError for AnnotationContent
Semantic Kernel uses the following enum for
AnnotationContent.content_type:However, the Responses API returns file citations with a
typelike this:{"container_id": "cntr_68a6c50135d88190b6fdc062051155b50847ddec73d3be1b", "end_index": 94, "file_id": "cfile_68a6c67a7fcc8190889d7e1789d485e2", "filename": "sample.txt", "start_index": 66, "type": "container_file_citation"}There is no matching enum member on the Semantic Kernel side. The
CitationTypeshould includeCONTAINER_FILE_CITATION, for example:Bug 2: Missing parameters to access generated files
The response returned by the agent shows a
file_id, but lacks thecontainer_idandfilenamerequired to download the file (the filename is needed to replacesandbox:/mnt/data/sample.txt):Example call:
Example output:
To include these values in the parsed result,
AnnotationContentinsemantic_kernel/contents/annotation_content.pyshould exposecontainer_idandfilename, and map the incomingtypetocitation_type. Example suggested structure:Summary: add the missing enum value
container_file_citationand extendAnnotationContentto includecontainer_idandfilename(and maptypetocitation_type) so file references returned by the Azure Responses API are fully usable.Code
Contribution Checklist