-
Notifications
You must be signed in to change notification settings - Fork 49
Encoder revamp #202
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Encoder revamp #202
Conversation
Do encoding in 2 passes, so we know exactly how much buffer space we'll need. I will probably revert this. I thought this was clever, but seeing it implemented it's just so much more complicated than writing everything to a dynamic buffer, then copying it into the aws_io_mesage.
ditch 2 pass approach, it was just too complicated
…match aws-c-io api changes. (#194)
As seen in real life
* Enabled compilation on VS 2015 * Fix VS narrowing warning * Updated to v0.5.3 of builder
Ignore connection: close on 200/OK responses to a CONNECT Request, since the proxy is obviously drunk and needs to hail an uber to get home from the bar safely. Fix the broken tests from the tcp back pressure refactor in aws-c-io.
- Use common struct - Pre-encode the entire frame - Incrementally copy that to aws_io_message whenever encode() is called. This is simpler/better because: 1) more shared code 2) unique payload-writing code all goes in the one new() function, instead of being spread across the new() and encode() functions 3) less chance of incorrect size calculations, since we're encoding to a buffer of the exact correct length
…If this kind of error happens now, it's programmer error
| /* If debug_data is too long, don't sent it. | ||
| * It's more important that the GOAWAY frame gets sent. */ | ||
| const size_t debug_data_max = s_prebuilt_payload_max() - s_frame_goaway_length_min; | ||
| if (debug_data.len > debug_data_max) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would truncation be reasonable here? What would this payload look like?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think truncation is probably bad. Imagine owning the xml or json parser on the other side?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Users are free to send whatever they want as the debug data.
Henso and I had a big chat about how to handle debug-data being too long.
We decided it shouldn't be a show-stopping error, it's more important that the GOAWAY frame gets sent.
We were less certain about truncating vs sending nothing when it's too long. Someone might never see their data get too long until the magic day that it does. So do we send partial data, or none at all? Partial XML or JSON would turn into a parse error, but a truncated log ... it might not be clear that it's been truncated and just be misleading. Options I can think of:
- send no debug-data
- truncate debug-data
- truncate and write "TRUNCATED" as the last 9 bytes?
THoughts? We can always change this later. It's not exposed all the way out to a public API yet
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Jonathan chimed in below and voted again for no data
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, thanks for clarifying.
|
|
||
| /* If for whatever reason this new header is bigger than the total table size, burn everything to the ground. */ | ||
| if (AWS_UNLIKELY(header_size > context->dynamic_table.max_size)) { | ||
| /* #TODO handle this. It's not an error. It should simply result in an empty table RFC-7541 4.4 */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it hard to clear the table? What makes this a TODO?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wasn't touching the table code in this PR
I just traced my way in here and got spooked, so leaving a todo
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👻
|
Pulling out all those checks is NIIIIICE |
JonathanHenson
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Damn good work!
| struct aws_h2_frame *aws_h2_frame_new_ping( | ||
| struct aws_allocator *allocator, | ||
| bool ack, | ||
| const uint8_t opaque_data[AWS_H2_PING_DATA_SIZE]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you sure this wouldn't be better as a byte_cursor?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It needs to be exactly 8 bytes.
I thought this was a cool way to enforce that.
This isn't a public API anyway
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
| * This only controls how string values are encoded when they're not already in a table. | ||
| */ | ||
| enum aws_hpack_huffman_mode { | ||
| AWS_HPACK_HUFFMAN_SMALLEST, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lol, I don't know why we'd want ALWAYS at this point. I like this feature.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, the ALWAYS option is just in there so we can test against some of the samples in RFC-7541
| [AWS_H2_SETTINGS_HEADER_TABLE_SIZE] = 4096, | ||
| [AWS_H2_SETTINGS_ENABLE_PUSH] = 1, | ||
| [AWS_H2_SETTINGS_MAX_CONCURRENT_STREAMS] = UINT32_MAX, /* "Initially there is no limit to this value" */ | ||
| [AWS_H2_SETTINGS_INITIAL_WINDOW_SIZE] = 65535, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NIT: maybe make these constants that signify their meaning?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's what I was doing? you'd access them like so:
aws_h2_settings_initial[AWS_H2_SETTINGS_HEADER_TABLE_SIZE]
instead of:
AWS_H2_HEADER_TABLE_SIZE_INITIAL
I thought it would be more "programmable" to have them in an array.
I'll change it in the future if it's just a pain to read
| uint8_t pad_length; /* Set to 0 to disable AWS_H2_FRAME_F_PADDED */ | ||
|
|
||
| /* HEADERS-only data */ | ||
| bool end_stream; /* AWS_H2_FRAME_F_END_STREAM */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
trivial, maybe take struct packing into account here and group the 1 byte sized members into groups of 2 or 4, or move them to the end
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I shuffled them a bit,
I didn't want to lose the grouping by category
| flags |= AWS_H2_FRAME_F_END_HEADERS; | ||
| } else { | ||
| /* If we're not finishing the header-block, is it even worth trying to send this frame now? */ | ||
| const size_t even_worth_sending_threshold = s_frame_prefix_length + payload_overhead; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we could probably make this a bit smarter in the future. Currently this would just be, do we at least have enough space to send a valid but empty headers frame?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I can easily do that.
I was trying to avoid sending a 9-byte HEADERS frame that had no actual header-data in it, knowing that the next time we try to encode there will be a fresh new aws_io_message with tons of space.
Do you think it's valuable to send the valid-but-empty frame earlier?
source/h2_frames.c
Outdated
|
|
||
| return AWS_OP_SUCCESS; | ||
| /* Write as much of the pre-encoded frame as will fit */ | ||
| size_t chunk_len = aws_min_size(frame->send_progress.len, output->capacity - output->len); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why isn't this ALWAYS zero? I think Justin and I had the same question... maybe some comments on this somewhere for future posterity?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed some variable names and added comments.
Hope this is less confusing now
| /* If debug_data is too long, don't sent it. | ||
| * It's more important that the GOAWAY frame gets sent. */ | ||
| const size_t debug_data_max = s_prebuilt_payload_max() - s_frame_goaway_length_min; | ||
| if (debug_data.len > debug_data_max) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think truncation is probably bad. Imagine owning the xml or json parser on the other side?
justinboswell
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Happy with the updates, just one last issue
justinboswell
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PERFECT 👨🍳 💋
Summary:
The existing encoder was not streaming. It would error if it didn't have enough space, but there was no way to know how much space it would need. We couldn't just try to re-encode the frame again when more space was available, because header-encoding mutates the HPACK state.
Now we have a "partially" streaming encoder. Most frames are pre-encoded, and we stream them into the available buffers. Headers and Data frames, which may need to split their payload across multiple frames, know how to do so.
Details
aws_h2_frameis a proper base class:aws_h2_frame_prebuilttype.aws_h2_frametype.aws_h2_frametypes will split their payload across CONTINUATION frames if necessary during encoding.aws_h2_frametype.aws_h2_frametype.By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.