Skip to content

Conversation

@zemaj
Copy link

@zemaj zemaj commented Oct 30, 2025

Summary

  • add explicit phase context to initialize/list/call MCP failures so logs show where the request died
  • retry the Streamable HTTP initialize handshake once on obvious transient network errors before surfacing failure
  • cover the new helpers with unit tests for phase labeling and retry gating

Testing

  • cargo test -p code-rmcp-client
  • ./build-fast.sh

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines 338 to 340
fn handshake_failed_error(err: impl Into<anyhow::Error>) -> anyhow::Error {
let err = err.into();
anyhow!(

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Propagate original handshake error so retries can detect transients

The new retry path in initialize relies on should_retry_initialize walking the error chain for ReqwestError or io::Error, but handshake_failed_error recreates the error with anyhow!(...) and drops the original source. After this wrapper runs the chain only contains the formatted string, so should_retry_initialize will never see the underlying transport error and the streamable HTTP initialize handshake will never be retried. Consider wrapping the incoming error instead of formatting it (e.g. err.context(...)) so its type survives downcasting.

Useful? React with 👍 / 👎.

Copy link
Author

@zemaj zemaj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary:

  • Added bounded exponential retries (max 3, 200 ms base, cap 1.6 s) when the streamable HTTP handshake flakes.
  • Treated handshake timeouts and common IO reset errors as transient so we rebuild the transport before retrying.
  • Extended unit coverage for phase annotation, retry gating, and backoff math.

Tests:

  • cargo test -p code-rmcp-client --tests
  • ./build-fast.sh

Risks:

  • Backoff remains deterministic; consider jitter if servers see herd retries.

@zemaj zemaj closed this Nov 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants