Server-Side Context Retention for OpenAI Codex Sessions #8339

nimble-cr · 2025-12-19T21:12:14Z

nimble-cr
Dec 19, 2025

Hi,
I want to understand how server-side context retention works as I am building an app with codex as a worker in BE. Since reusing conversation is token friendly, I am trying to understand the following:

Does the OpenAI server keep any persistent session context for Codex once the CLI exits or the session is idle?
If so, what are the retention rules or expiration? E.g. how long it remains available after last interaction.
Is the “auto-compact” context behavior a client-only mechanism, or does the server also compact or retain context across requests?
When making new Codex requests, does the server ever use previous request context without it being sent by the client?

Thanks in advance for your help.

etraut-openai · 2025-12-19T21:35:42Z

etraut-openai
Dec 19, 2025
Maintainer

You might find it useful to clone the codex repo locally and then ask codex questions about its own implementation. I realize that sounds a little recursive, but it's a pretty effective way to get detailed answers to questions about a code base.

0 replies

nimble-cr · 2025-12-19T22:27:57Z

nimble-cr
Dec 19, 2025
Author

Thanks Eric. After all the digging and chatting with chatgpt, here is the summary. I am pasting here, if someone is looking for same information later.

In interactive Codex sessions, conversation history is not stored permanently on OpenAI’s servers in the way ChatGPT retains chat history. The only persistent session state that exists is on the client side. Codex writes local rollout files (~/.codex/sessions/...) that contain the full transcript of the session. These files can be resumed, forked, and compacted by the user’s local client to continue conversations — but this persistence lives entirely on the user’s machine, not on the server.

OpenAI does implement a prompt caching mechanism on the server that can reuse intermediate computation for identical prefixes between requests. This cache can reduce cost and latency by avoiding recomputation of the same prefix tokens, but it is not a semantic or durable memory of the conversation. The server’s cache is ephemeral, may be evicted under load or inactivity, and cannot reconstruct prior context if the client does not include it in the prompt. Even with optional extended retention (e.g., a 24-hour cache hint in the API), this caching is strictly for performance and not a guarantee of session memory.

Every Codex request still must send the full prompt (including any prefix and relevant history) each turn. The server uses the prompt_cache_key and prefix tokens to potentially reuse cached computation, but it does not maintain conversational state that can be resumed with an ID alone. This means that context is always client-driven: the client reconstructs history and composes prompts every time. Server caching only helps with computational efficiency when the same prefix is frequently reused and has not yet been evicted.

In summary, Codex’s server retains cached computation for a short period (minutes to at most hours, depending on configuration), but it does not retain conversation context in a resumable or durable way. True context persistence across sessions comes only from local files managed by the client, and clients must always send context explicitly for the model to “remember” anything.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Server-Side Context Retention for OpenAI Codex Sessions #8339

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Server-Side Context Retention for OpenAI Codex Sessions #8339

Uh oh!

nimble-cr Dec 19, 2025

Replies: 2 comments

Uh oh!

etraut-openai Dec 19, 2025 Maintainer

Uh oh!

nimble-cr Dec 19, 2025 Author

nimble-cr
Dec 19, 2025

etraut-openai
Dec 19, 2025
Maintainer

nimble-cr
Dec 19, 2025
Author