Encoder revamp #202

graebm · 2020-03-17T00:17:34Z

Summary:

The existing encoder was not streaming. It would error if it didn't have enough space, but there was no way to know how much space it would need. We couldn't just try to re-encode the frame again when more space was available, because header-encoding mutates the HPACK state.

Now we have a "partially" streaming encoder. Most frames are pre-encoded, and we stream them into the available buffers. Headers and Data frames, which may need to split their payload across multiple frames, know how to do so.

Details

aws_h2_frame is a proper base class:
- Previously, each specific frame type had its own init()/encode()/clean_up(). Now it's heap allocated and has vtable for encode()/destroy()
Simple frame types (most) make use of the aws_h2_frame_prebuilt type.
- The entire frame is pre-encoded.
- We stream the pre-encoded frame into the available space of an aws_io_message.
Remove "continuation" aws_h2_frame type.
- Instead, the "headers" and "push promise" aws_h2_frame types will split their payload across CONTINUATION frames if necessary during encoding.
Remove "data" aws_h2_frame type.
- Instead, the encoder has a special function that takes a body-stream and writes a DATA frame immediately.
- There are lots of differences in how the connection deals with DATA vs every other frame type, so it didn't make sense for it to be a proper aws_h2_frame type.
HPACK encoding will grow the output buffer if necessary.
- This was the simpler than building a fully "streaming" encoder.
- HEADERS frame encoding is therefore 2 stage.
  - First, do HPACK encoding to separate buffer.
  - Then, split that payload across multiple HEADERS/CONTINUATION frames as necessary.
Moved logic for encoding the header-block into hpack.c. Previously a bunch of that logic was in h2_frames.c. All this code was changing anyway.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

Do encoding in 2 passes, so we know exactly how much buffer space we'll need. I will probably revert this. I thought this was clever, but seeing it implemented it's just so much more complicated than writing everything to a dynamic buffer, then copying it into the aws_io_mesage.

ditch 2 pass approach, it was just too complicated

…match aws-c-io api changes. (#194)

As seen in real life

* Enabled compilation on VS 2015 * Fix VS narrowing warning * Updated to v0.5.3 of builder

Ignore connection: close on 200/OK responses to a CONNECT Request, since the proxy is obviously drunk and needs to hail an uber to get home from the bar safely. Fix the broken tests from the tcp back pressure refactor in aws-c-io.

- Use common struct - Pre-encode the entire frame - Incrementally copy that to aws_io_message whenever encode() is called. This is simpler/better because: 1) more shared code 2) unique payload-writing code all goes in the one new() function, instead of being spread across the new() and encode() functions 3) less chance of incorrect size calculations, since we're encoding to a buffer of the exact correct length

…If this kind of error happens now, it's programmer error

include/aws/http/private/h2_frames.h

source/h2_frames.c

justinboswell · 2020-03-20T00:20:39Z

source/h2_frames.c

+    /* If debug_data is too long, don't sent it.
+     * It's more important that the GOAWAY frame gets sent. */
+    const size_t debug_data_max = s_prebuilt_payload_max() - s_frame_goaway_length_min;
+    if (debug_data.len > debug_data_max) {


Would truncation be reasonable here? What would this payload look like?

I think truncation is probably bad. Imagine owning the xml or json parser on the other side?

Users are free to send whatever they want as the debug data.

Henso and I had a big chat about how to handle debug-data being too long.
We decided it shouldn't be a show-stopping error, it's more important that the GOAWAY frame gets sent.

We were less certain about truncating vs sending nothing when it's too long. Someone might never see their data get too long until the magic day that it does. So do we send partial data, or none at all? Partial XML or JSON would turn into a parse error, but a truncated log ... it might not be clear that it's been truncated and just be misleading. Options I can think of:

send no debug-data

truncate debug-data

truncate and write "TRUNCATED" as the last 9 bytes?

THoughts? We can always change this later. It's not exposed all the way out to a public API yet

Jonathan chimed in below and voted again for no data

I agree, thanks for clarifying.

source/hpack.c

justinboswell · 2020-03-20T00:27:09Z

source/hpack.c


    /* If for whatever reason this new header is bigger than the total table size, burn everything to the ground. */
    if (AWS_UNLIKELY(header_size > context->dynamic_table.max_size)) {
+        /* #TODO handle this. It's not an error. It should simply result in an empty table RFC-7541 4.4 */


Is it hard to clear the table? What makes this a TODO?

I wasn't touching the table code in this PR
I just traced my way in here and got spooked, so leaving a todo

source/hpack.c

justinboswell · 2020-03-20T01:05:50Z

Pulling out all those checks is NIIIIICE

JonathanHenson

Damn good work!

JonathanHenson · 2020-03-20T00:45:02Z

include/aws/http/private/h2_frames.h

+struct aws_h2_frame *aws_h2_frame_new_ping(
+    struct aws_allocator *allocator,
+    bool ack,
+    const uint8_t opaque_data[AWS_H2_PING_DATA_SIZE]);


you sure this wouldn't be better as a byte_cursor?

It needs to be exactly 8 bytes.
I thought this was a cool way to enforce that.
This isn't a public API anyway

JonathanHenson · 2020-03-20T00:45:57Z

include/aws/http/private/hpack.h

+ * This only controls how string values are encoded when they're not already in a table.
+ */
+enum aws_hpack_huffman_mode {
+    AWS_HPACK_HUFFMAN_SMALLEST,


lol, I don't know why we'd want ALWAYS at this point. I like this feature.

Yeah, the ALWAYS option is just in there so we can test against some of the samples in RFC-7541

include/aws/http/private/hpack.h

include/aws/http/request_response.h

JonathanHenson · 2020-03-20T00:51:17Z

source/h2_frames.c

+    [AWS_H2_SETTINGS_HEADER_TABLE_SIZE] = 4096,
+    [AWS_H2_SETTINGS_ENABLE_PUSH] = 1,
+    [AWS_H2_SETTINGS_MAX_CONCURRENT_STREAMS] = UINT32_MAX, /* "Initially there is no limit to this value" */
+    [AWS_H2_SETTINGS_INITIAL_WINDOW_SIZE] = 65535,


NIT: maybe make these constants that signify their meaning?

That's what I was doing? you'd access them like so:
aws_h2_settings_initial[AWS_H2_SETTINGS_HEADER_TABLE_SIZE]

instead of:
AWS_H2_HEADER_TABLE_SIZE_INITIAL

I thought it would be more "programmable" to have them in an array.
I'll change it in the future if it's just a pain to read

JonathanHenson · 2020-03-20T01:10:14Z

source/h2_frames.c

+    uint8_t pad_length; /* Set to 0 to disable AWS_H2_FRAME_F_PADDED */
+
+    /* HEADERS-only data */
+    bool end_stream;   /* AWS_H2_FRAME_F_END_STREAM */


trivial, maybe take struct packing into account here and group the 1 byte sized members into groups of 2 or 4, or move them to the end

I shuffled them a bit,
I didn't want to lose the grouping by category

JonathanHenson · 2020-03-20T01:16:32Z

source/h2_frames.c

+        flags |= AWS_H2_FRAME_F_END_HEADERS;
+    } else {
+        /* If we're not finishing the header-block, is it even worth trying to send this frame now? */
+        const size_t even_worth_sending_threshold = s_frame_prefix_length + payload_overhead;


we could probably make this a bit smarter in the future. Currently this would just be, do we at least have enough space to send a valid but empty headers frame?

Oh, I can easily do that.
I was trying to avoid sending a 9-byte HEADERS frame that had no actual header-data in it, knowing that the next time we try to encode there will be a fresh new aws_io_message with tons of space.

Do you think it's valuable to send the valid-but-empty frame earlier?

source/h2_frames.c

JonathanHenson · 2020-03-20T01:25:20Z

source/h2_frames.c


-    return AWS_OP_SUCCESS;
+    /* Write as much of the pre-encoded frame as will fit */
+    size_t chunk_len = aws_min_size(frame->send_progress.len, output->capacity - output->len);


why isn't this ALWAYS zero? I think Justin and I had the same question... maybe some comments on this somewhere for future posterity?

I changed some variable names and added comments.
Hope this is less confusing now

JonathanHenson · 2020-03-20T01:29:02Z

source/h2_frames.c

+    /* If debug_data is too long, don't sent it.
+     * It's more important that the GOAWAY frame gets sent. */
+    const size_t debug_data_max = s_prebuilt_payload_max() - s_frame_goaway_length_min;
+    if (debug_data.len > debug_data_max) {


I think truncation is probably bad. Imagine owning the xml or json parser on the other side?

justinboswell

Happy with the updates, just one last issue

source/h2_frames.c

justinboswell

PERFECT 👨‍🍳 💋

graebm and others added 16 commits March 9, 2020 12:01

revamp encoder API (wip)

ce9f7ef

simplify decoder state machine loop

db65d0c

hpack encoding expands output buffer

d65ed7b

ditch 2 pass approach, it was just too complicated

Revamp HEADERS/PUSH_PROMISE/DATA frames

4ac13c3

all tests passing again

2171e90

Exposed options for toggling read back pressure behavior, updated to …

fa207f3

…match aws-c-io api changes. (#194)

Fix bug when new request has same memory address as old request. (#195)

041ab77

As seen in real life

Enabled compilation on VS 2015 (#196)

db6778c

* Enabled compilation on VS 2015 * Fix VS narrowing warning * Updated to v0.5.3 of builder

Fix proxies connect and tests (#198)

dd7a615

Ignore connection: close on 200/OK responses to a CONNECT Request, since the proxy is obviously drunk and needs to hail an uber to get home from the bar safely. Fix the broken tests from the tcp back pressure refactor in aws-c-io.

Merge branch 'master' into encoder-revamp

7e420f6

Fix fuzz tests

e1de8aa

Merge branch 'master' into encoder-revamp

f862003

clang-format

281ac17

tweaks

7073245

tweaks

57fb1c8

graebm marked this pull request as ready for review March 19, 2020 03:10

graebm added 4 commits March 19, 2020 13:56

THANK YOU MSVC COMPILER WARNING YOU SAVED MY ASS

1413e1a

replaced lots of AWS_ERROR_SHORT_BUFFER error-handling with asserts. …

3a048e7

…If this kind of error happens now, it's programmer error

compiler warning

2a987a9

justinboswell reviewed Mar 20, 2020

View reviewed changes

JonathanHenson reviewed Mar 20, 2020

View reviewed changes

respond to PR feedback

a5d7631

justinboswell reviewed Mar 20, 2020

View reviewed changes

source/h2_frames.c Show resolved Hide resolved

JonathanHenson approved these changes Mar 20, 2020

View reviewed changes

Comments explaining how prebuilt frame works

7791357

justinboswell approved these changes Mar 20, 2020

View reviewed changes

graebm merged commit 64aa5fb into master Mar 20, 2020

graebm deleted the encoder-revamp branch March 20, 2020 17:58

Encoder revamp #202

Encoder revamp #202

Uh oh!

Conversation

graebm commented Mar 17, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary:

Details

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

justinboswell commented Mar 20, 2020

Uh oh!

JonathanHenson left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

justinboswell left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

justinboswell left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

graebm commented Mar 17, 2020 •

edited

Loading