Add request/response body size tracking to trace events#5925
Add request/response body size tracking to trace events#5925steebchen wants to merge 10 commits intocloudflare:mainfrom
Conversation
|
All contributors have signed the CLA ✍️ ✅ |
There was a problem hiding this comment.
Pull request overview
This pull request adds request and response body size tracking to workerd trace events. The sizes are captured from the Content-Length header for requests and from the expectedBodySize field for responses, making them accessible to tail workers.
Changes:
- Added
requestSizefield to FetchEventInfo andbodySizefield to FetchResponseInfo in Cap'n Proto schema - Implemented Content-Length header parsing to extract request body size
- Modified ResponseSentTracker to capture and report response body size
- Exposed body sizes as optional JavaScript properties on trace request/response objects
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| src/workerd/io/worker-interface.capnp | Added requestSize (UInt64) to FetchEventInfo and bodySize (UInt64) to FetchResponseInfo schema definitions |
| src/workerd/io/trace.h | Updated FetchEventInfo and FetchResponseInfo struct declarations with new size fields and constructor parameters |
| src/workerd/io/trace.c++ | Implemented constructors, copy, clone, and reader methods for new size fields |
| src/workerd/io/worker-entrypoint.c++ | Added Content-Length parsing logic and ResponseSentTracker modifications to capture body sizes |
| src/workerd/api/trace.h | Added bodySize getter methods and internal storage to Request and Response trace API classes |
| src/workerd/api/trace.c++ | Implemented getBodySize methods that return null for zero/unknown sizes, otherwise return size as double |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
src/workerd/io/worker-entrypoint.c++
Outdated
| char* end; | ||
| auto parsed = strtoull(contentLength.cStr(), &end, 10); | ||
| if (end != contentLength.cStr() && *end == '\0') { | ||
| requestSize = parsed; | ||
| } |
There was a problem hiding this comment.
The parsing validation check has a subtle issue. The condition end != contentLength.cStr() will be true even if parsing partially fails (e.g., for "123abc", it would parse "123" and set end to point to 'a').
The validation should check that ALL characters were consumed, not just that SOME characters were consumed. The condition should be:
end != contentLength.cStr()ensures at least one character was parsed*end == '\0'ensures we reached the end
However, this still allows leading whitespace and '+' signs which strtoull accepts but may not be valid for Content-Length per HTTP specs. Additionally, strtoull returns ULLONG_MAX on overflow and sets errno to ERANGE, which is not checked.
Consider using a stricter validation that only allows digits, or at minimum check errno for overflow.
src/workerd/api/trace.c++
Outdated
| if (detail->requestSize == 0) { | ||
| return kj::none; | ||
| } | ||
| return static_cast<double>(detail->requestSize); |
There was a problem hiding this comment.
Converting uint64_t to double can lose precision for values larger than 2^53 (approximately 9 quadrillion bytes or 9 petabytes). While extremely unlikely in practice for HTTP body sizes, this could theoretically cause inaccurate body size reporting for very large bodies.
JavaScript's Number type uses IEEE 754 double precision, which can only accurately represent integers up to Number.MAX_SAFE_INTEGER (2^53 - 1). Beyond this, precision is lost. Consider documenting this limitation or using BigInt if precise representation of very large sizes is required.
src/workerd/api/trace.c++
Outdated
| if (bodySize == 0) { | ||
| return kj::none; | ||
| } | ||
| return static_cast<double>(bodySize); |
There was a problem hiding this comment.
Converting uint64_t to double can lose precision for values larger than 2^53 (approximately 9 quadrillion bytes or 9 petabytes). While extremely unlikely in practice for HTTP body sizes, this could theoretically cause inaccurate body size reporting for very large bodies.
JavaScript's Number type uses IEEE 754 double precision, which can only accurately represent integers up to Number.MAX_SAFE_INTEGER (2^53 - 1). Beyond this, precision is lost. Consider documenting this limitation or using BigInt if precise representation of very large sizes is required.
src/workerd/api/trace.h
Outdated
| JSG_LAZY_READONLY_INSTANCE_PROPERTY(headers, getHeaders); | ||
| JSG_LAZY_READONLY_INSTANCE_PROPERTY(method, getMethod); | ||
| JSG_LAZY_READONLY_INSTANCE_PROPERTY(url, getUrl); | ||
| JSG_LAZY_READONLY_INSTANCE_PROPERTY(bodySize, getBodySize); |
There was a problem hiding this comment.
The new bodySize fields for request and response in trace events lack test coverage. Existing tail worker tests in this directory validate trace event structure but don't include assertions for the newly added bodySize properties on request/response objects.
Consider adding test cases that:
- Verify bodySize is present and correct when Content-Length header exists
- Verify bodySize is null/undefined when Content-Length is missing or invalid
- Verify response bodySize is present when expectedBodySize is known
- Test edge cases like zero-length bodies and very large sizes
|
I have read the CLA Document and I hereby sign the CLA |
|
Thanks for your contribution, I haven't fully reviewed but there is a problem with the approach. This also shows an unfortunate lack of test coverage in this pull request, ideally I would love to see these cases covered correctly in tests to show the feature is working as expected. Thank you again for your contribution and I hope my tips can help steer you in the right direction 🙏 |
Add requestSize and bodySize fields to FetchEventInfo and FetchResponseInfo so tail workers can access body size information. Request size is captured from the Content-Length header, and response size from expectedBodySize in ResponseSentTracker. Both are exposed as optional properties in JavaScript trace events. Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
- Fix Content-Length parsing to handle leading/trailing whitespace - Rename requestSize to bodySize in FetchEventInfo for naming consistency - Add precision limitation comments for uint64_t to double conversion - Add tests for FetchEventInfo bodySize serialization/deserialization - Add tests for FetchResponseInfo bodySize serialization/deserialization Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
1535869 to
5206364
Compare
|
@danlapid thank you very much for the quick feedback. i'll have a closer look i just figured i'll make a quick prototype with claude |
The previous implementation captured body size from Content-Length headers before streaming started. This is incorrect because: - Content-Length may not be present (chunked encoding) - Actual bytes streamed may differ from declared length - True body size is only known after streaming completes This change: - Adds ByteCountingOutputStream wrapper to track actual bytes written - Wraps response output stream in ResponseSentTracker::send() - Defers setReturn() call until proxyTask completes - Reports actual byte count instead of Content-Length header value The byte counter uses std::atomic for thread safety since proxyTask may run on a different thread than the original request handler. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add BODYSIZE_STR constant to trace-stream.c++ - Update ToJs(FetchEventInfo) to include bodySize for request body - Update ToJs(FetchResponseInfo) to include bodySize for response body - Update tail-worker-test.js expected output with bodySize values Note: Integration tests (tail-worker-test@, http-test@) are failing with pre-existing segfaults unrelated to these changes. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
| // A WritableSink wrapper that counts the total bytes written to the underlying sink. | ||
| // This is used to track actual response body sizes for trace events. | ||
| class ByteCountingWritableSink final: public WritableSinkWrapper { | ||
| public: |
There was a problem hiding this comment.
It doesn't appear like this is used at all in the PR?
src/workerd/io/worker-entrypoint.c++
Outdated
| // Wrap with byte counting for trace events | ||
| auto counter = kj::refcounted<ResponseBodyByteCounter>(); | ||
| byteCounter = kj::addRef(*counter); | ||
| return kj::heap<ByteCountingOutputStream>(kj::mv(stream), kj::mv(counter)); |
There was a problem hiding this comment.
This is going to incur a performance penalty for all workers unnecessarily. We should only add the counter for workers that actually have tail workers enabled and will actually use the data.
src/workerd/io/worker-entrypoint.c++
Outdated
| // A refcounted byte counter that can be shared between the stream wrapper and | ||
| // the code that needs to read the final byte count after streaming completes. | ||
| struct ResponseBodyByteCounter: public kj::Refcounted { | ||
| std::atomic<uint64_t> bytesWritten{0}; |
There was a problem hiding this comment.
Is the atomic really necessary? kj i/o objects are not used across multiple threads.
src/workerd/io/worker-entrypoint.c++
Outdated
| t.setEventInfo(*incomingRequest, | ||
| tracing::FetchEventInfo(method, kj::str(url), kj::mv(cfJson), kj::mv(traceHeadersArray))); | ||
| tracing::FetchEventInfo( | ||
| method, kj::str(url), kj::mv(cfJson), kj::mv(traceHeadersArray), bodySize)); |
There was a problem hiding this comment.
This is going to be a bit confusing... When Content-Length is not specified, bodySize will end up being 0... which means we cannot differentiate between no content-length (and transfer-encoding of chunked) and content-length: 0 ... each of which have very different semantics.
src/workerd/io/worker-entrypoint.c++
Outdated
| while (*end == ' ' || *end == '\t') end++; // Skip trailing whitespace | ||
| if (end != ptr && *end == '\0') { | ||
| bodySize = parsed; | ||
| } |
There was a problem hiding this comment.
kj::StringPtr has a method parseAs<uint64_t>() that should be used instead of rolling your own parsing.
src/workerd/io/worker-entrypoint.c++
Outdated
| // relying on Content-Length header. This is less critical than response body size | ||
| // since request bodies typically have accurate Content-Length. | ||
| uint64_t bodySize = 0; | ||
| KJ_IF_SOME(contentLength, headers.get(kj::HttpHeaderId::CONTENT_LENGTH)) { |
There was a problem hiding this comment.
This is going to incur an additional perf cost for every worker even those not using tail workers.
| } | ||
| deferredWorkerTracer = t; | ||
| deferredHttpResponseStatus = wrappedResponse.getHttpResponseStatus(); | ||
| traceByteCounter = wrappedResponse.takeByteCounter(); |
| value @1 :Text; | ||
| } | ||
| bodySize @4 :UInt64; | ||
| # Request body size in bytes. 0 if unknown or no body. |
There was a problem hiding this comment.
no content-length is not semantically equivalent to content-length: 0, we should not conflate the two.
src/workerd/io/worker-entrypoint.c++
Outdated
| return tracing::FetchEventInfo::Header(kj::mv(entry.key), kj::strArray(entry.value, ", ")); | ||
| }; | ||
|
|
||
| // Extract request body size from Content-Length header if present. |
There was a problem hiding this comment.
Given that we're capturing the headers above in traceHeaders, do we really need to separately capture and parse the content-length here?
| } | ||
|
|
||
| // Returns a reference to the byte counter for deferred access after streaming. | ||
| // The returned ownership should be held until after proxyTask completes. |
There was a problem hiding this comment.
It doesn't return a reference... it transfers ownership. This likely needs protections against being called multiple times.
|
|
||
| kj::Promise<void> whenWriteDisconnected() override { | ||
| return inner->whenWriteDisconnected(); | ||
| } |
There was a problem hiding this comment.
this likely should also include overrides for tryPumpFrom and abortWrite. The tryPumpFrom is going to be tricky as without it this is going to end up always falling back to a non-optimized pump, but then counting the bytes for an optimized pump is tricky. If tail stream is not enabled for a worker, we wouldn't want to force everything through this path.
Changes based on @jasnell's review: - Remove unused ByteCountingWritableSink from writable-sink.h - Only enable byte counting when tracing is enabled to avoid performance overhead for workers without tail workers - Remove std::atomic since kj I/O objects are single-threaded - Use kj::StringPtr::tryParseAs<uint64_t>() instead of custom strtoull parsing for Content-Length - Fix takeByteCounter() comment to indicate it transfers ownership - Add comment explaining why tryPumpFrom() is not overridden (need to count bytes, so use non-optimized write path) Note on bodySize semantics: 0 is used for both "no Content-Length" and "Content-Length: 0". While these are semantically different in HTTP, for trace events both indicate no/empty body data which is acceptable for monitoring purposes. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This change allows differentiating between: - kj::none: Body size is unknown (no Content-Length header) - Some(0): Body size is explicitly zero (Content-Length: 0) - Some(N): Body size is N bytes Changes: - Add hasBodySize boolean field to capnp schema for FetchEventInfo and FetchResponseInfo to indicate whether bodySize is valid - Update trace.h structs to use kj::Maybe<uint64_t> instead of uint64_t - Update trace.c++ serialization to read/write hasBodySize flag - Update trace-stream.c++ ToJs to only include bodySize when known - Update api/trace.h and api/trace.c++ for the new type - Update worker-entrypoint.c++ to pass kj::Maybe for bodySize - Add new test cases for zero bodySize vs unknown bodySize The JavaScript API returns null when bodySize is unknown, and the actual numeric value (including 0) when it's known. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This comment still stands. |
Addresses PR feedback: request body size should be tracked after the body is consumed, not at onset time using Content-Length header. - Remove bodySize from FetchEventInfo (onset event) - Add requestBodySize to FetchResponseInfo (return event) - Add ByteCountingInputStream to track actual request bytes consumed - Both request and response body sizes are now reported after streaming - Byte counting only enabled when tail workers are active (no perf impact) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
|
@danlapid let me know if it's going into a better direction now. not sure if this feature is worth the complexity/extra steps it does, i was mostly experimenting with workerd. let me know if you think it makes sense to pursue this further |
I think the current approach is still lacking. It seems like the soonest we can actually have this information in a somewhat sensible way is the Outcome event. |
Per HTTP spec, body size is only definitively known after the body has been fully transmitted. The Outcome event is emitted after all work completes (including body streaming), making it the correct place to report body sizes. Changes: - Remove body sizes from FetchResponseInfo (return event) - Add responseBodySize and requestBodySize to Outcome struct - Add setBodySizes() method to tracer for deferred body size reporting - Update streaming tail worker JS serialization for new schema - Update buffered tail worker API to read body sizes from Trace object Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
|
@steebchen lmk when you finished cleaning it up, I'm pretty sure most of the changes on this PR are remnants of previous iterations and are not actually still strictly neccessary |
Clean up redundant comments that explained implementation details that are already self-evident from the code structure. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Summary
Add
requestSizeandbodySizefields to FetchEventInfo and FetchResponseInfo in workerd trace events. Request size is captured from the Content-Length header, and response size from the expectedBodySize field. These sizes are now accessible to tail workers via trace event properties.Changes
🤖 Generated with Claude Code