HTTP/2 to origin #7622

shinrich · 2021-03-25T17:01:30Z

This is work that has been going on in fits and starts for a couple years with contributions from @a-canary and Kees. And discussions with @masaori335 and @maskit. I have been using the project page https://github.com/apache/trafficserver/projects/9 to track issues and progress.

This latest effort builds upon @a-canary's earlier PoolableSession restructuring work and has been going in earnest since last December. I have run a version of this code against our 9.0.x (plus some other HTTP/2 commits to ease the merge) on one of our production boxes for several weeks. That worked out many memory management issues. In that environment it negotiated HTTP/2 with another ATS layer and an istio/envoy layer. About 20% of our outbound transactions were over H2.

I've just brought it back to master and rebased for this PR. Locally on my Centos7 box is passes autest, unit tests, and regression tests.

This code contains trailer logic to support gRPC. I have run versions of this code against one of the python examples in the gRPC repository, but that was last done in January. The gRPC/trailer support needs more testing, but it is a start.

Some key code changes include

ConnectionPool/ConnectingEntry - This data structure tracks existing outstanding connection requests. For HTTP/2, you may receive many simultaneous incoming streams all heading for the same location. You do not want to start separate HTTP/2 sessions for each. The code uses to history in the HostInfo to determine if it is likely that the origin will negotiate HTTP/2 and use the ConnnectionPool or connect directly ignoring other outstanding connection requests.
Server_session becomes server_txn in HttpSM.
An overridable ALPN setting has been added to control whether ATS will offer to negotiate H2 with the origin.
Two h2 origin autests have been added leveraging proxy-verifier.

The main code change is adding Http2 versions of transactions and sessions, and creating a Http1 server transaction. The class hierarchies differ slightly between Http1 and Http2. For Http1, both Http1ClientTransaction and Http1ServerTransaction inherit directly from ProxyTransaction.

For Http2, there is still only the Http2Stream to represent both client and server transactions. This PR adds an _outbound_flag to track the few cases where we do need to process inbound and outbound transactions differently. For the sessions, the common elements of session processing are moved into Http2CommonSession. The common class is used as a mix in for Http2ClientSession and Http2ServerSession. Http2ClientSession continues to inherit directly from ProxySession. Http2ServerSession inherits from PoolableSession as the Http1ServerSession does.

One other thing to point out, is that the Http2ServerSessions can only be put in a Thread session pool. I tested in hybrid mode, but pure thread pool mode should also work. I need to see what happens with global pool mode. That will likely require an addition check, warning, or error.

This branch also includes connection failure cleanups identified in PR #7580. We are running with that change in our production environment. Pulling it back out to make the master PR seemed unnecessary since it will hopefully land first.

shinrich · 2021-03-25T17:15:18Z

[approve ci clang-format]

shinrich · 2021-03-25T22:00:47Z

[approve ci clang-analyzer]

maskit

I haven't reached to the main change for H2 to Origin.

maskit · 2021-03-26T02:38:46Z

doc/admin-guide/files/records.config.en.rst

+   Set the alpn string that ATS will send to origin during new connections.  By default no ALPN string will be set.
+   To enable HTTP/2 communication to the origin, set this to "h2,http1.1".


I'd like to know why it has no ALPN string by default. Having "http1.1" (or "http1.1,http1.0") by default and adding "h2" to enable HTTP/2 sound more natural to me. Also, it may be better to mention about the order.

What will we do if we support H3 to origin? Will we internally filter the string to not offer protocols that are not supported on a transport, or have another setting for QUIC transport?

I'm not sure if users want to specify raw ALPN protocol names. For server_ports, we don't use it.

I don't think this is really going to be a problem, but ALPN protocol name may have commas and any characters. The spec says protocol names are non-empty byte strings.

Good questions. I kept it empty by default so the initial integration would have less impact. Of course, there are enough other code changes that landing this PR potentially have an impact even if you don't set the ALPN strings.

We need to have a more general discussion about the preferred order, etc. Should probably look how other similar proxies handle this.

I don't see the benefit of defaulting to "http1.1". To add HTTP/2, it would still be necessary to specific "http2,http1.1" in the configuration. I agree with Susan's view that, at least initially, this should be backwards compatible in that no HTTP/2 is done outbound unless explicitly configured. That's how HTTP/2 inbound was done - it originally required an explicit "enable" configuration variable to be set.

I understand having "http1.1" by default doesn't make much sense in terms of the behavior, and I know no ALPN negotiation is the current behavior. But if I was not familiar with ALPN, I'd wonder why I have to have "http1.1"as well to enable HTTP/2. I might write only "h2" and say "Aha, H2 works. Seems like http/1.1 is not necessary.". Also listing numbers in descending order seem unnatural if I thought it's just a list. That's why I'm not keen on having this as ALPN setting.

iocore/net/I_NetVConnection.h

iocore/net/SSLConfig.cc

proxy/PoolableSession.h

proxy/ProxySession.cc

maskit · 2021-03-26T07:56:20Z

proxy/hdrs/HdrToken.cc

+  // Need to figure out why this cannot be handled as hop by hop.  If it is hop-by-hop
+  // the information is not propagated for gRPC
+  //{"TE", MIME_SLOTID_TE, MIME_PRESENCE_TE, (HTIF_COMMAS | HTIF_MULTVALS | HTIF_HOPBYHOP)},
+  {"TE", MIME_SLOTID_TE, MIME_PRESENCE_TE, (HTIF_COMMAS | HTIF_MULTVALS)},


Assuming this is not harmful for non-gRPC traffic. Having gRPC support may be worth temporally removing hop-by-hop flag, but in case we do that, we should file an issue for it.

Making this change separately would make reverting the change easy in case it affects non-gRPC traffic.

proxy/http/Http1ClientTransaction.cc

proxy/http/Http1ClientTransaction.h

proxy/http/HttpConfig.cc

proxy/http/HttpSessionManager.cc

shinrich · 2021-04-01T17:16:30Z

Pushed an updated version of the branch based on fixes from my most recent 9.x based h2 to origin testing.

Will be working on transitioning this to a feature branch as requested in Monday's meeting. And will be scheduling a PR walk through for next week.

maskit

Added comments. Basically just pointing out something that are not related H2 to Origin.

iocore/net/UnixNetVConnection.cc

proxy/http/HttpTransact.h

maskit · 2021-03-29T07:25:08Z

proxy/http2/HTTP2.cc

      hdr->field_find(MIME_FIELD_KEEP_ALIVE, MIME_LEN_KEEP_ALIVE) != nullptr ||
      hdr->field_find(MIME_FIELD_PROXY_CONNECTION, MIME_LEN_PROXY_CONNECTION) != nullptr ||
-      hdr->field_find(MIME_FIELD_TRANSFER_ENCODING, MIME_LEN_TRANSFER_ENCODING) != nullptr ||
+      // hdr->field_find(MIME_FIELD_TRANSFER_ENCODING, MIME_LEN_TRANSFER_ENCODING) != nullptr ||


Assuming this is related to gRPC. If normal H2 requests work fine with this line, I'd suggest making this change separately after this big change. That would allow us to back out only gRPC support but not entire H2 to Origin support.

Backed this out. Wasn't sure why I did this. My use of transfer_encoding headers evolved during this work. Ending up with out using transfer encoding at all for HTTP/2.

proxy/http/HttpSessionManager.cc

proxy/http/HttpTunnel.cc

proxy/logging/LogAccess.cc

src/tscore/BufferWriterFormat.cc

tests/gold_tests/continuations/openclose.test.py

tests/gold_tests/redirect/redirect_actions.test.py

shinrich · 2021-04-06T14:58:21Z

include/tscore/Scalar.h

+{
+  struct generic;
+}
+


Needs to be in the ts namespace to avoid conflicts.

@SolidWallOfCode pulled this into a separate PR which has landed. PR #7690

shinrich · 2021-04-07T13:07:00Z

[approve ci autest]

maskit · 2021-06-04T06:28:32Z

proxy/http/HttpSM.cc

+  SSLNetVConnection *sslnetvc = dynamic_cast<SSLNetVConnection *>(netvc);
+  if (sslnetvc) {
+    SSL_get0_alpn_selected(sslnetvc->ssl, &proto, &proto_length);
+  }


This could be done by ALPNSupport.

maskit · 2021-06-04T06:29:29Z

proxy/http/HttpSM.cc

+  if (proto_length == 2 && memcmp(proto, "h2", 2) == 0) {
+    Http2ServerSession *session = http2ServerSessionAllocator.alloc();
+    add_session                 = true;
+    retval                      = session;
+  } else {
+    Http1ServerSession *session = httpServerSessionAllocator.alloc();
+    retval                      = session;
+  }


I understand we need to do this here, but I don't want to have the detail here in HttpSM. We should probably have a mechanism like ProtocolProbeSessionAccept.

pulled out into a static method.

maskit · 2021-06-04T06:56:07Z

proxy/http/HttpSessionManager.cc

+    } else {
+      if (first != m_fqdn_pool.end()) {


maskit · 2021-06-04T07:07:59Z

proxy/logging/LogAccess.cc


  if (m_http_sm) {
-    auto txn = m_http_sm->get_ua_txn();
+    auto txn = m_http_sm->ua_txn;


Let's use the getter.

maskit · 2021-06-04T07:27:10Z

proxy/http/HttpSessionManager.cc

    EThread *ethread = this_ethread();
    Ptr<ProxyMutex> pool_mutex =
-      (TS_SERVER_SESSION_SHARING_POOL_THREAD == pool_type) ? ethread->server_session_pool->mutex : m_g_pool->mutex;
+      (TS_SERVER_SESSION_SHARING_POOL_THREAD == sm->t_state.http_config_param->server_session_sharing_pool) ?


pool_type that is passed by the caller is used after acquiring the lock. Doesn't it cause mismatch?

Good point. That does seem to be an error.

maskit · 2021-06-04T08:36:52Z

proxy/http/HttpSM.cc

+      // Did we end up with a multiplexing session?
+      int count = 0;
+      if (new_session->is_multiplexing()) {
+        // Hand off to all queued up ConnectSM's.


How do you know a multiplexed connection can process all the requests? An origin server may stop accepting new requests on a connection for some reasons (e.g. max requests per conn to mitigate memory pressure due to memory leak), and the queued up requests would just fail where those could be processed successfully on H1 if the server still accepts new connections.

Also, there's no guarantee a multiplexed connection can process all the requests at once. The number of H2 concurrent stream might be set to 1 for some reasons.

In that sense, whether a connection is multiplexing is not important. Non-multiplexed connection could be handled as a multiplexed connection that only process 1 request at once.

You don't know. There is logic before requests are queued that tried to determine if the origin connection may be close to its limit of simultaneous active transactions (In the H2 case using a function of the current setting for max_concurrent_streams)

You mean is_peer_concurrent_stream_max and add_session, right? That is fine (except direct access to the pool), but we have a session object here and could check the limit again to not assign too many transactions to the session.

And that should work for the both multiplexed and non-multiplexed sessions. I'm trying to suggest something like below:

while (!_connect_sms.empty()) { sm = _sm_connect_sms->front(); succeeded = new_session->assign_transaction(sm); // Returns false if a session has too many txn assigned if (succeeded) { _connect_sms.pop_front(); } else { break; } } if (!_connect_sms.empty()) { // Prepare for remaining transactions, and wait for next opportunity // For H1, create another session // For H2, maybe do nothing (I'm not sure if we should have multiple sessions for the same server) }

Also, I know this is going to be complicated, but we can retry a transaction on another session if an H2 transaction was closed by RST_STREAM with REFUSED_STREAM error code.

I agree, we should be able to test to see if we successfully created the server_txn and kick off a request for a new server session if not. I think we will want to be able to create multiple sessions to the same server. In the case of virtual IP's it will allow for better load distribution. Also relying on a single TCP connection to origin may worry some.

I do think it makes sense to differentiate the multiplexing (H2/H3) and non-multiplexing (HTTP/1.x) protocols in the ConnectingEntry::state_http_server_open handler. We know that the non-multiplexing protocols can only support 1 server_txn at a time. So we try not to queue up on an existing connection request if we think the protocol is non-multiplexing and just make multiple connection requests in parallel. If we guess wrong and queue up, we should process once of the queued requests and kick off the others to make session requests directly (not queued). Which is what the existing logic does.

So I'll add logic here to check that the server_txn was successfully created in the multiplexed case. If it was not, we will kick off another session request and queue up the requests there.

Added logic in HttpSM::state_http_server_open to detect if the transaction creation failed and retry the session if so.

I think we will want to be able to create multiple sessions to the same server. In the case of virtual IP's it will allow for better load distribution. Also relying on a single TCP connection to origin may worry some.

I feel the same way, but RFC 7540 says:

A client MAY open multiple connections to the same IP address and TCP
port using different Server Name Indication [TLS-EXT] values or to
provide different TLS client certificates but SHOULD avoid creating
multiple connections with the same configuration.

If we can open as many connections as we want, we will eventually make multiple connections to the same server through VIP, and then max_concurrent_stream doesn't make much sense.

I'm not sure if I understand the entire process. Let's say we got 5 requests and all the requests will be send to the same origin server (or origin servers that have the same VIP). The origin server supports H2, and max_concurrent_stream is 2. What will happen?

Assuming 5 simultaneous (overlapping requests). 2 transactions will be made against the first HTTP/2 connection. Then a new session will be created and 2 more transactions will be made there. Then a third session session will be opened.

Actually, I think the code will pull new transactions under the reported max_concurrent_streams setting.

Actually, I think the code will pull new transactions under the reported max_concurrent_streams setting.

I don't really understand this part. On the example above, the origin server tells us "max_concurrent_streams is 2" on the first connection. The origin server doesn't want to take the third request until either of the first two transactions ends. But we exploit a loophole and start third request by opening a new connection. I think this is the scenario that the paragraph on RFC7540 tries to avoid.

It looks like hops or proxies were not fully considered in writing the spec.
From a proxy's POV I really can imagine wanting to have multiple H2 connections to same origin, as you might be forward proxying a large organisation hiding behind a single outgoing IP or a busy reverse proxying in front of an origin.
In those cases HOL blocking or too many interleaved response streams really might become a problem in user experience.

I think we should have sane values (max h2 connections per server, configurable?) and then act upon the hints the H2 protocol will give us.

shinrich · 2021-06-08T01:15:10Z

[approve ci autest]

maskit · 2021-06-08T06:05:18Z

Let's separate out the outbound ALPN part. We still need to discuss how to enable protocols, but the feature itself should work without H2 (e.g. use http/1.1 but not 1.0), and it should be testable.

shinrich · 2021-06-08T12:42:36Z

I'll look at breaking out the alpn logic. It didn't occur to me you could use alpn to negotiate between HTTP/1.1 and HTTP/1.0 but I guess you can.

shinrich · 2021-07-20T21:49:45Z

In testing, I found that I needed to defer the assignment of the stream_id for an outbound stream until it was ready to actually send a header. Otherwise, I was getting GOAWAY frames from the next level ATS box with error code set to STREAM_CLOSED. Presumably, another stream was getting assigned a later stream ID and sending its header out first. So when the slower stream sends its header, the origin ATS box fails it because the stream ID is smaller than the "latest" seen stream id.

shinrich · 2021-07-26T15:12:46Z

In my production testing, I am having problems with increased ERR_CONN_FAIL from another ATS box acting as origin as described in issue #8163. Adjusting the error rate up helped some, but the ERR_CONN_FAIL was still significantly higher for that origin. The mechanism was the same. The origin was returning GOAWAY with no error which caused all other active streams to be terminated.

Finally, Friday I notice that the H2 client logic was using the "Connection: closed" header as described below to start draining the session.

https://docs.trafficserver.apache.org/en/latest/admin-guide/plugins/header_rewrite.en.html?highlight=drain#close-connections-for-draining

Unfortunately, I cannot get the logic to keep around existing streams until the are finished working. I just move my ERR_CONN_FAIL errors to ERR_CLIENT_ABORT or ERR_READ_TIMEOUT errors. With recent changes to improve header error checking, I've been adding "Connection: close" headers not worrying about the HTTP/2 case since H2 should be ignoring the Connection header. Forgetting about this draining feature.

Not sure how to deal with this. Could add H1/H2 checks into the HttpSM to only really add the "Connection: close" for Http/1 connections. Or more likely add a virtual method onto the ProxyTransaction to do that logic.

This reverts commit 4208ecb. Reverting the removal of the incompatible changes that we took out of master so that those incompatible changes will be in our 10-Dev branch.

keesspoelstra · 2021-10-07T13:25:41Z

proxy/http/HttpSessionManager.cc

    if (zret == HSM_DONE) {
      to_return = first;
-      this->removeSession(to_return);
+      if (!to_return->is_multiplexing()) {


We probably want an early check in the while statement above to see if the session is maxed out, this to prevent ending up much later in the flow and finding out we can't add any more streams

keesspoelstra · 2021-10-12T15:58:30Z

doc/admin-guide/files/records.config.en.rst


   Enables (``1``) or disables (``0``) TLSv1_3 in the ATS client context. If not specified, enabled by default

+.. ts:cv:: CONFIG proxy.config.ssl.client.alpn_protocol STRING ""


Typo, should be
proxy.config.ssl.client.alpn_protocols

bryancall · 2021-10-14T22:11:22Z

[approve ci]

shinrich · 2021-10-24T21:13:02Z

Rebased against 10-Dev branch. I need to close this PR and create another against 10-Dev. Will coordinate with @bneradt to do that. Need to sync up my last Yahoo testing changes too.

shinrich · 2021-10-25T11:10:50Z

Closing in deference to PR #8447 against the 10-Dev branch.

shinrich added HTTP/2 Server Session labels Mar 25, 2021

shinrich added this to the 10.0.0 milestone Mar 25, 2021

shinrich requested review from a-canary, masaori335 and maskit March 25, 2021 17:01

shinrich self-assigned this Mar 25, 2021

shinrich force-pushed the server-h2-origin branch from 2c479a0 to 67a3749 Compare March 25, 2021 17:09

maskit reviewed Mar 26, 2021

View reviewed changes

This was referenced Mar 26, 2021

Add pooled_server_connections metric #7627

Merged

Expose URL element methods through HTTPHdr #7628

Merged

Add default implementation for allow_half_open #7630

Merged

Fix HttpSessionManager::acquireSession from previous rebase error #7631

Merged

shinrich commented Mar 26, 2021

View reviewed changes

proxy/http/HttpSessionManager.cc Show resolved Hide resolved

shinrich force-pushed the server-h2-origin branch from 038038c to ea2f26b Compare March 29, 2021 13:52

shinrich mentioned this pull request Mar 29, 2021

Adapt to continuation changes in write event handling #7578

Closed

shinrich force-pushed the server-h2-origin branch 2 times, most recently from 7c509e2 to 1ab826d Compare April 1, 2021 17:14

shinrich force-pushed the server-h2-origin branch from 1ab826d to 20f07a4 Compare April 1, 2021 23:46

maskit reviewed Apr 2, 2021

View reviewed changes

shinrich force-pushed the server-h2-origin branch from 20f07a4 to 828e6cd Compare April 5, 2021 16:33

This was referenced Apr 5, 2021

Add close header normalize openclose test #7679

Merged

Add sqpv log field for server protocol #7680

Merged

shinrich force-pushed the server-h2-origin branch from d0c7eaf to 6461bd7 Compare April 6, 2021 13:59

shinrich commented Apr 6, 2021

View reviewed changes

shinrich force-pushed the server-h2-origin branch from 9d29fe2 to 7749d78 Compare April 6, 2021 17:42

masaori335 mentioned this pull request May 18, 2021

⛱️ HTTP/2 Session Refactoring #7852

Open

shinrich force-pushed the server-h2-origin branch from 8551f4f to b577b1a Compare June 3, 2021 19:51

maskit reviewed Jun 4, 2021

View reviewed changes

shinrich force-pushed the server-h2-origin branch from 6d461d8 to ca22d2a Compare June 8, 2021 18:44

bryancall added the New Feature label Jun 25, 2021

shinrich mentioned this pull request Jul 26, 2021

Abstract adding Connection: close header to avoid triggering H2 draining logic #8178

Merged

shinrich force-pushed the server-h2-origin branch from ca22d2a to b638064 Compare August 19, 2021 21:14

Revert "Remove incompatible changes for 9.2.0 (apache#8316)"

4deeb25

This reverts commit 4208ecb. Reverting the removal of the incompatible changes that we took out of master so that those incompatible changes will be in our 10-Dev branch.

keesspoelstra reviewed Oct 7, 2021

View reviewed changes

keesspoelstra reviewed Oct 12, 2021

View reviewed changes

Http2 to origin.

b13ddc3

shinrich force-pushed the server-h2-origin branch from b638064 to b13ddc3 Compare October 24, 2021 21:11

Fix documentation typo

d40fb86

shinrich mentioned this pull request Oct 25, 2021

H2 to origin for 10-Dev branch #8447

Closed

shinrich closed this Oct 25, 2021

bneradt mentioned this pull request Oct 25, 2021

Http2 to origin 10 dev #8448

Closed

zwoop removed this from the 10.0.0 milestone Nov 8, 2021

		Set the alpn string that ATS will send to origin during new connections. By default no ALPN string will be set.
		To enable HTTP/2 communication to the origin, set this to "h2,http1.1".


		Enables (``1``) or disables (``0``) TLSv1_3 in the ATS client context. If not specified, enabled by default

		.. ts:cv:: CONFIG proxy.config.ssl.client.alpn_protocol STRING ""

HTTP/2 to origin #7622

HTTP/2 to origin #7622

Uh oh!

Conversation

shinrich commented Mar 25, 2021

Uh oh!

shinrich commented Mar 25, 2021

Uh oh!

shinrich commented Mar 25, 2021

Uh oh!

maskit left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

shinrich commented Apr 1, 2021

Uh oh!

maskit left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

shinrich commented Apr 7, 2021

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

maskit commented Jun 8, 2021 •

edited

Loading

shinrich commented Oct 24, 2021 •

edited

Loading