-
Notifications
You must be signed in to change notification settings - Fork 7.6k
Description
What version of Codex is running?
codex-cli 0.93.0
What subscription do you have?
Pro
Which model were you using?
gpt-5.2
What platform is your computer?
Microsoft Windows NT 10.0.19045.0 x64
What terminal emulator and version are you using (if applicable)?
No response
What issue are you seeing?
Codex drops (or does not surface to the user/model) MCP tool CallToolResult.content[] content blocks when structuredContent is also present.
This is most visible with { type: "image" } content blocks returned by MCP tools: the tool succeeds and returns an image block, but Codex only shows structuredContent (or a text summary) and the image block is inaccessible, so clients cannot render screenshots/images using the standard MCP mechanism.
What steps can reproduce the bug?
I cannot provide reliable steps using our proprietary in-house MCP server, but this reproduces with a minimal standalone MCP server that returns both content[] and structuredContent.
- Create a minimal MCP stdio server with a single tool
emit_imagethat returns a 1x1 PNG as an MCP image content block and includesstructuredContent:
// mcp-image-repro.mjs
import { Server } from "@modelcontextprotocol/sdk/server/index.js";
import { StdioServerTransport } from "@modelcontextprotocol/sdk/server/stdio.js";
const server = new Server(
{ name: "mcp-image-repro", version: "0.0.1" },
{ capabilities: { tools: {} } },
);
server.setRequestHandler("tools/list", async () => ({
tools: [
{
name: "emit_image",
description: "Returns a 1x1 PNG as an MCP image content block.",
inputSchema: { type: "object", properties: {}, additionalProperties: false },
},
],
}));
server.setRequestHandler("tools/call", async (req) => {
if (req.params.name !== "emit_image") throw new Error("Unknown tool");
// 1x1 PNG
const png1x1Base64 =
"iVBORw0KGgoAAAANSUhEUgAAAAEAAAABCAQAAAC1HAwCAAAAC0lEQVR42mP8/x8AAwMCAO5GZJcAAAAASUVORK5CYII=";
return {
// MCP-standard rich content payload.
content: [
{ type: "text", text: "Here is a 1x1 PNG." },
{ type: "image", mimeType: "image/png", data: png1x1Base64 },
],
// Include structured output too (this triggers the bug).
structuredContent: {
result: { ok: true, mimeType: "image/png", byteLength: 68 },
summary: "Returned image content block.",
},
isError: false,
};
});
await server.connect(new StdioServerTransport());-
Configure Codex to connect to this server via stdio and call the
emit_imagetool. -
Observe that Codex drops/does not display the
content[]blocks (especially the{ type: "image" }block) whenstructuredContentis present.
Second reproduction/use case (proxy enrichment):
-
An upstream MCP server tool returns a valid
CallToolResultwithcontent[]blocks (for example{ type: "image" }for a screenshot). -
An in-house MCP proxy server forwards that tool response but also enriches it by adding
structuredContent(for example metadata likebyteLength,savedPath,durationMs, or other derived fields). The MCP spec does not disallow returning bothcontent[]andstructuredContent. -
In Codex, once
structuredContentis present, the originalcontent[]blocks from upstream are dropped/hidden, making the enriched proxy response less useful than the original upstream response. Today we have to duplicate parts ofcontent[]intostructuredContentas a workaround to make the content accessible in Codex.
What is the expected behavior?
- Codex should preserve and surface
CallToolResult.content[]to both the user and the model, even whenstructuredContentis present. - For
{ type: "image" }content blocks, Codex should provide a stable way to view/export the image bytes.
Additional information
- This blocks MCP servers that follow the protocol and use
content[]for images/audio. Servers are forced into tool-specific workarounds like returning a file path or embedding base64 intostructuredContent(which reduces interoperability and increases payload sizes/telemetry risk). - Related upstream context:
openai/codex#9815(Codex improvements around images returned by MCP tools). - Key detail: the issue is not "Codex can't render images"; it is that
content[]appears to be dropped/hidden specifically whenstructuredContentis present in the tool response.