Fix opening connection stability with Chromium as a client#168
Fix opening connection stability with Chromium as a client#168kixelated merged 6 commits intomoq-dev:mainfrom
Conversation
The read() methods used read_buf() into a Vec then tried to decode, silently discarding any bytes beyond the frame. When a client (e.g. Chromium) sends data immediately after the CONNECT frame on the same stream, those bytes are consumed and lost. This causes the background capsule reader in Session::run_closed() to start at the wrong byte offset, producing decode errors that kill the session (~30-40% of browser connections are "born closed"). Fix: use VarInt::read() + read_exact() to consume only the exact bytes of each frame. No extra bytes are consumed, so subsequent readers on the same stream start at the correct position. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
Warning Rate limit exceeded
⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. WalkthroughAdded a 🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
web-transport-proto/src/connect.rs (1)
154-155: Document thatreadis not cancellation-safe.
read_exactis not cancellation safe — if it is used as an event in atokio::select!statement and some other branch completes first, some data may be lost. The existing doc comment describes what the method does but does not warn callers. A future maintainer wrapping this in aselect!for a per-request timeout could silently corrupt the stream position.📝 Proposed doc update
- /// Read a CONNECT request from a stream, consuming only the exact bytes of the frame. + /// Read a CONNECT request from a stream, consuming only the exact bytes of the frame. + /// + /// # Cancellation safety + /// + /// This method is **not** cancellation safe. If dropped while awaiting, partial + /// frame bytes may have already been consumed from the stream, leaving it in an + /// indeterminate state. pub async fn read<S: AsyncRead + Unpin>(stream: &mut S) -> Result<Self, ConnectError> {Apply the same note to
ConnectResponse::read.Also applies to: 282-283
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@web-transport-proto/src/connect.rs` around lines 154 - 155, The doc comment for Connect::read must warn that the method is not cancellation-safe because it uses read_exact (which can consume partial bytes on cancellation); update the comment for pub async fn read<S: AsyncRead + Unpin>(...) and likewise for ConnectResponse::read to explicitly state callers must not call these inside a tokio::select! or other cancellable context and suggest safe alternatives (e.g., use tokio::time::timeout around a spawned task or perform non-cancelling read logic) so consumers know how to avoid corrupting the stream position.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@web-transport-proto/src/connect.rs`:
- Around line 160-161: The code in ConnectRequest::read (and likewise
ConnectResponse::read) unsafely allocates payload based on an untrusted size
field (vec![0u8; size.into_inner() as usize]), allowing DoS via huge
allocations; add a sanity cap (e.g. MAX_HEADERS_FRAME_SIZE a few KB) and check
the parsed size against that cap before allocating or reading, returning a new
ConnectError::FrameTooLarge (or appropriate error) when exceeded; prefer using a
read limiter (Buf::take-style) or read_exact into a bounded buffer after the
size check to avoid unbounded heap allocation and apply the same guard in both
ConnectRequest::read and ConnectResponse::read.
- Around line 157-161: ConnectRequest::read and ConnectResponse::read currently
allocate payloads using the untrusted size from VarInt::read (via
size.into_inner()) which allows a malicious peer to request huge allocations;
add a bounded maximum (e.g. const MAX_FRAME_SIZE: usize = ...) and check
size.into_inner() as usize <= MAX_FRAME_SIZE before creating the Vec, returning
a suitable ConnectError (e.g. ConnectError::FrameTooLarge or a new variant) if
exceeded; do the same validation in both ConnectRequest::read and
ConnectResponse::read and avoid allocating until after the size is validated.
---
Nitpick comments:
In `@web-transport-proto/src/connect.rs`:
- Around line 154-155: The doc comment for Connect::read must warn that the
method is not cancellation-safe because it uses read_exact (which can consume
partial bytes on cancellation); update the comment for pub async fn read<S:
AsyncRead + Unpin>(...) and likewise for ConnectResponse::read to explicitly
state callers must not call these inside a tokio::select! or other cancellable
context and suggest safe alternatives (e.g., use tokio::time::timeout around a
spawned task or perform non-cancelling read logic) so consumers know how to
avoid corrupting the stream position.
|
Oh thank you so much, I was having trouble reproducing this and hated seeing the WebSocket fallback being used. |
|
I can fix the CodeRabbit suggestions, one sec. |
…uge allocations A malicious peer could send a crafted VarInt frame-length (up to 2^62) to trigger excessive memory allocation. Validate the size against a 64 KiB maximum before allocating, matching the existing pattern in capsule.rs. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
Ah, I just pushed the fix before reading this :) Thanks for reviewing it! |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
web-transport-proto/src/connect.rs (1)
161-189: Core fix is correct — exact-bytes framing resolves the Chromium interop issue.The read loop correctly reads the full payload of each frame (including grease frames) before advancing, so stream position is always valid for subsequent readers. The
MAX_FRAME_SIZEguard andread_exacttogether ensure sound parsing.Optional: compare against
u64before casting tousize.On line 167,
size.into_inner()(au64) is cast tousizebefore the bounds check on line 168. On 32-bit platforms a crafted VarInt whose value exceedsu32::MAXbut whose lower 32 bits fall within the limit would silently truncate and pass the guard, leaving excess bytes in the stream. A safe pattern is to validate the rawu64first:♻️ Proposed fix
- let size = size.into_inner() as usize; - if size > MAX_FRAME_SIZE { - return Err(std::io::Error::new( - std::io::ErrorKind::InvalidData, - "frame too large", - ) - .into()); - } + let size_u64 = size.into_inner(); + if size_u64 > MAX_FRAME_SIZE as u64 { + return Err(std::io::Error::new( + std::io::ErrorKind::InvalidData, + "frame too large", - ) + ) + .into()); + } + let size = size_u64 as usize;
Frame::is_grease()is defined inweb-transport-proto/src/frame.rsand is correctly available on theFrametype imported fromsuper.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@web-transport-proto/src/connect.rs` around lines 161 - 189, The code casts size.into_inner() (u64) to usize before checking against MAX_FRAME_SIZE, which can truncate on 32-bit platforms; in read (pub async fn read) first get the raw u64 (e.g. let raw = size.into_inner()), compare raw to MAX_FRAME_SIZE using a u64 conversion (or compare raw > MAX_FRAME_SIZE as u64) and return the same InvalidData error if it exceeds the limit, and only after that safely cast raw to usize for allocating payload; keep the existing behavior for error construction and the rest of the read loop (Frame::HEADERS check, Frame::is_grease handling, read_exact, and Self::decode_headers).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@web-transport-proto/src/connect.rs`:
- Around line 11-16: No changes required: the MAX_FRAME_SIZE constant correctly
limits allocation sizes used by ConnectRequest::read and ConnectResponse::read;
keep the constant as-is and ensure both read implementations continue to use
MAX_FRAME_SIZE for frame-length checks and buffer allocations.
- Around line 298-326: The read<S: AsyncRead + Unpin> implementation performs a
direct size.into_inner() as usize cast which can truncate large VarInt values;
change it to first extract the u64 (e.g. let size_u64 = size.into_inner()),
compare size_u64 > MAX_FRAME_SIZE as u64 and return the same InvalidData error
if too large, then safely cast to usize (let size = size_u64 as usize) before
allocating payload; reference Frame, VarInt::read, MAX_FRAME_SIZE, and
ConnectError::UnexpectedEnd/UnexpectedFrame to locate the logic to update.
---
Nitpick comments:
In `@web-transport-proto/src/connect.rs`:
- Around line 161-189: The code casts size.into_inner() (u64) to usize before
checking against MAX_FRAME_SIZE, which can truncate on 32-bit platforms; in read
(pub async fn read) first get the raw u64 (e.g. let raw = size.into_inner()),
compare raw to MAX_FRAME_SIZE using a u64 conversion (or compare raw >
MAX_FRAME_SIZE as u64) and return the same InvalidData error if it exceeds the
limit, and only after that safely cast raw to usize for allocating payload; keep
the existing behavior for error construction and the rest of the read loop
(Frame::HEADERS check, Frame::is_grease handling, read_exact, and
Self::decode_headers).
1. Add some tests. 2. Avoid zeroing the Vec 3. Avoid copying grease payloads. 4. Fix the two other instances of this bug in the code. 5. Avoid a bug on 32-bit systems I guess.
I pushed some nits, stuff like not zeroing the buffer. And applied the fix to the two other places in the code that suffer from this bug. And some AI generated unit tests I didn't even look at lul. |
There was a problem hiding this comment.
🧹 Nitpick comments (4)
web-transport-proto/src/settings.rs (1)
142-143:MAX_FRAME_SIZEis defined as an associated constant here vs. a module-level constant inconnect.rs.Both are 65536 and serve the same purpose. The inconsistency is cosmetic, but if the frame-reading logic is ever unified (per the earlier optional refactor suggestion), a single shared constant would be cleaner.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@web-transport-proto/src/settings.rs` around lines 142 - 143, Move the duplicated 65536 constant into a single shared constant and update both locations to use it: remove the associated constant MAX_FRAME_SIZE from the Settings impl in settings.rs and replace uses with a module-level/shared constant (e.g., pub(crate) const MAX_FRAME_SIZE: usize = 65536) defined in a common module or the existing connect.rs module, then update references in both settings.rs (Settings::MAX_FRAME_SIZE) and connect.rs (the module-level constant) to reference the new shared symbol so there is only one canonical MAX_FRAME_SIZE.web-transport-proto/src/connect.rs (2)
174-180: The "frame too large" error surfaces as a genericConnectError::Io— consider a dedicated variant.Currently, exceeding
MAX_FRAME_SIZEcreates astd::io::Errorthat gets wrapped intoConnectError::Io. This works, but callers cannot distinguish "frame too large" from a real I/O failure without inspecting the error message string. A dedicatedConnectError::FrameTooLargevariant would provide a more precise error for diagnostics and tests (the tests currently assertmatches!(err, ConnectError::Io(_)), which is fragile).Not blocking for this PR.
Also applies to: 320-326
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@web-transport-proto/src/connect.rs` around lines 174 - 180, The size check returning a std::io::Error should be replaced with a dedicated ConnectError variant: add ConnectError::FrameTooLarge to the ConnectError enum, and in the size checks inside connect.rs (the block around the if size > MAX_FRAME_SIZE as u64 and the similar block at 320-326) return Err(ConnectError::FrameTooLarge.into()) (or Err(ConnectError::FrameTooLarge) where the function returns ConnectError), update any From/Into implementations or conversion points so the new variant converts correctly to the error type used by callers, and update tests that assert Io to match the new ConnectError::FrameTooLarge variant.
162-197:ConnectRequest::readandConnectResponse::readare nearly identical — consider extracting shared frame-reading logic.Both methods repeat the same pattern: read type/size varints, validate size against
MAX_FRAME_SIZE, create aTakeadapter, drain grease frames, read payload, check expected frame type, then delegate todecode_headers. A shared async helper (e.g.,read_headers_frame(stream) -> Result<Vec<u8>, ConnectError>) could eliminate this duplication. The same pattern also appears incapsule.rsandsettings.rs(with different error types).Not blocking — the current code is clear and correct.
Also applies to: 308-343
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@web-transport-proto/src/connect.rs` around lines 162 - 197, Extract the duplicated frame-reading logic in ConnectRequest::read and ConnectResponse::read into a single async helper like read_headers_frame<S: AsyncRead + Unpin>(stream: &mut S) -> Result<Vec<u8>, ConnectError> that performs VarInt::read for type and size, checks size against MAX_FRAME_SIZE, creates stream.take(size), drains GREASE frames via tokio::io::copy to tokio::io::sink(), reads the payload into a Vec<u8>, verifies typ == Frame::HEADERS (returning ConnectError::UnexpectedFrame otherwise) and returns the payload buffer; then replace the bodies of ConnectRequest::read and ConnectResponse::read to call this helper and pass the returned Vec<u8> into Self::decode_headers, keeping existing error mapping (ConnectError::UnexpectedEnd) behavior and using the same symbols (VarInt::read, Frame, MAX_FRAME_SIZE, ConnectError, decode_headers, tokio::io::copy, tokio::io::sink, stream.take).web-transport-proto/src/capsule.rs (1)
81-131: Exact-consumption approach forCapsule::readlooks correct.The
VarInt::read→ size check →stream.take(length)pattern properly bounds both the allocation and the stream consumption. Grease capsules are drained and returned, and theCloseWebTransportSession/Unknowndispatch is clean.On line 127,
VarInt::from_u64(typ_val).unwrap()reconstructs aVarIntfrom a value that was already decoded as one. SinceVarIntderivesCopy, the originaltypvariable (from line 85) is still available and can be reused directly instead of round-tripping throughu64.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@web-transport-proto/src/capsule.rs` around lines 81 - 131, The code reconstructs a VarInt from typ_val for the Unknown variant instead of reusing the original decoded VarInt; replace VarInt::from_u64(typ_val).unwrap() with the original typ (the VarInt returned by VarInt::read) since VarInt derives Copy and typ is still in scope—update the Unknown arm to use typ directly to avoid the unnecessary round-trip conversion.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@web-transport-proto/src/capsule.rs`:
- Around line 81-131: The code reconstructs a VarInt from typ_val for the
Unknown variant instead of reusing the original decoded VarInt; replace
VarInt::from_u64(typ_val).unwrap() with the original typ (the VarInt returned by
VarInt::read) since VarInt derives Copy and typ is still in scope—update the
Unknown arm to use typ directly to avoid the unnecessary round-trip conversion.
In `@web-transport-proto/src/connect.rs`:
- Around line 174-180: The size check returning a std::io::Error should be
replaced with a dedicated ConnectError variant: add ConnectError::FrameTooLarge
to the ConnectError enum, and in the size checks inside connect.rs (the block
around the if size > MAX_FRAME_SIZE as u64 and the similar block at 320-326)
return Err(ConnectError::FrameTooLarge.into()) (or
Err(ConnectError::FrameTooLarge) where the function returns ConnectError),
update any From/Into implementations or conversion points so the new variant
converts correctly to the error type used by callers, and update tests that
assert Io to match the new ConnectError::FrameTooLarge variant.
- Around line 162-197: Extract the duplicated frame-reading logic in
ConnectRequest::read and ConnectResponse::read into a single async helper like
read_headers_frame<S: AsyncRead + Unpin>(stream: &mut S) -> Result<Vec<u8>,
ConnectError> that performs VarInt::read for type and size, checks size against
MAX_FRAME_SIZE, creates stream.take(size), drains GREASE frames via
tokio::io::copy to tokio::io::sink(), reads the payload into a Vec<u8>, verifies
typ == Frame::HEADERS (returning ConnectError::UnexpectedFrame otherwise) and
returns the payload buffer; then replace the bodies of ConnectRequest::read and
ConnectResponse::read to call this helper and pass the returned Vec<u8> into
Self::decode_headers, keeping existing error mapping
(ConnectError::UnexpectedEnd) behavior and using the same symbols (VarInt::read,
Frame, MAX_FRAME_SIZE, ConnectError, decode_headers, tokio::io::copy,
tokio::io::sink, stream.take).
In `@web-transport-proto/src/settings.rs`:
- Around line 142-143: Move the duplicated 65536 constant into a single shared
constant and update both locations to use it: remove the associated constant
MAX_FRAME_SIZE from the Settings impl in settings.rs and replace uses with a
module-level/shared constant (e.g., pub(crate) const MAX_FRAME_SIZE: usize =
65536) defined in a common module or the existing connect.rs module, then update
references in both settings.rs (Settings::MAX_FRAME_SIZE) and connect.rs (the
module-level constant) to reference the new shared symbol so there is only one
canonical MAX_FRAME_SIZE.
There was a problem hiding this comment.
🧹 Nitpick comments (2)
web-transport-proto/src/capsule.rs (1)
84-88: The "Clean EOF" comment is slightly misleading — this also swallows a truncated type varint.
VarInt::readreturns the same error for both a true zero-byte EOF and a partial multi-byte varint (e.g., first byte read but subsequent bytes missing). Mapping allErrtoOk(None)means a genuinely truncated stream mid-type-header is silently treated as "no capsule."This is a pragmatic choice (the caller can't do much either way), but consider adjusting the comment to say something like
// EOF or truncated type — treat as end of streamso future readers aren't surprised.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@web-transport-proto/src/capsule.rs` around lines 84 - 88, Update the misleading comment in async fn read<S: AsyncRead + Unpin>(stream: &mut S) -> Result<Option<Self>, CapsuleError> where we match on VarInt::read(stream).await and currently return Ok(None) on Err; change the comment to clarify that the Err covers both a true zero-byte EOF and a truncated/malformed varint (e.g., "// EOF or truncated type — treat as end of stream") so future readers understand that a partial type header is intentionally treated as end-of-stream.web-transport-proto/src/connect.rs (1)
162-205:ConnectRequest::readandConnectResponse::readare near-identical — consider extracting a shared frame reader.The two methods differ only in the final
decode_headerscall and the expectedFrametype (both expectHEADERS).Settings::readinsettings.rsfollows the same pattern too (withSETTINGSframe). A shared helper like:async fn read_frame<S: AsyncRead + Unpin>( stream: &mut S, max_size: usize, ) -> Result<(Frame, Vec<u8>), io::Error>…that handles VarInt parsing, size enforcement, grease skipping, and truncation detection could eliminate ~40 duplicated lines per call site. Not critical for this fix PR, but worth considering as a follow-up.
Also applies to: 315-358
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@web-transport-proto/src/connect.rs` around lines 162 - 205, ConnectRequest::read and ConnectResponse::read (and Settings::read) duplicate the same VarInt parsing, size checking, grease-skipping and payload-read logic; extract that into a shared helper (e.g. async fn read_frame<S: AsyncRead + Unpin>(stream: &mut S, max_size: usize) -> Result<(Frame, Vec<u8>), io::Error>) that returns the parsed Frame and payload bytes, enforce MAX_FRAME_SIZE, handle grease frames by draining to tokio::io::sink(), and return UnexpectedEnd on truncation; then replace the body of ConnectRequest::read, ConnectResponse::read and Settings::read to call read_frame(...) and then only perform the final Frame type check (Frame::HEADERS or Frame::SETTINGS) and call the respective decode_headers / decode_settings.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@web-transport-proto/src/connect.rs`:
- Around line 11-16: Add and use the MAX_FRAME_SIZE constant to prevent
unbounded allocations: ensure any frame-reading logic (where a frame length is
parsed and a Vec/Buffer is allocated) checks that parsed_length <=
MAX_FRAME_SIZE before allocating or reserving, and return an error if it exceeds
the limit; reference the MAX_FRAME_SIZE constant in those checks so all places
that create a Vec for CONNECT HEADERS enforce the 65536 byte cap.
---
Nitpick comments:
In `@web-transport-proto/src/capsule.rs`:
- Around line 84-88: Update the misleading comment in async fn read<S: AsyncRead
+ Unpin>(stream: &mut S) -> Result<Option<Self>, CapsuleError> where we match on
VarInt::read(stream).await and currently return Ok(None) on Err; change the
comment to clarify that the Err covers both a true zero-byte EOF and a
truncated/malformed varint (e.g., "// EOF or truncated type — treat as end of
stream") so future readers understand that a partial type header is
intentionally treated as end-of-stream.
In `@web-transport-proto/src/connect.rs`:
- Around line 162-205: ConnectRequest::read and ConnectResponse::read (and
Settings::read) duplicate the same VarInt parsing, size checking,
grease-skipping and payload-read logic; extract that into a shared helper (e.g.
async fn read_frame<S: AsyncRead + Unpin>(stream: &mut S, max_size: usize) ->
Result<(Frame, Vec<u8>), io::Error>) that returns the parsed Frame and payload
bytes, enforce MAX_FRAME_SIZE, handle grease frames by draining to
tokio::io::sink(), and return UnexpectedEnd on truncation; then replace the body
of ConnectRequest::read, ConnectResponse::read and Settings::read to call
read_frame(...) and then only perform the final Frame type check (Frame::HEADERS
or Frame::SETTINGS) and call the respective decode_headers / decode_settings.
|
Thanks a bunch @ai-and-i |
The FrameTooLarge variants added in #168 are new enum variants on exhaustive enums, which is a semver-breaking change. Map the condition to Io errors instead so this can ship as a patch release. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The FrameTooLarge variants added in #168 are new enum variants on exhaustive enums, which is a semver-breaking change. Map the condition to Io errors instead so this can ship as a patch release. Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Chromium non-deterministically fails to open connection with web-transport-quinn-based server. This PR fixes it.
The ConnectRequest::read() method used stream.read(), which might read more data than expected by decode(). The extra data was silently dropped from the stream, causing subsequent reads to start mid-packet and break parsing, causing the connection to drop. Depending on how data was split across read() and whether Chromium were sending a grease packet, it resulted in failures to establish the connection about half of the times.
Fix: use VarInt::read() + read_exact() to consume only the exact bytes of each frame. No extra bytes are consumed, so subsequent readers on the same stream start at the correct position.