-
Notifications
You must be signed in to change notification settings - Fork 851
HTTP/2 to origin #7622
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HTTP/2 to origin #7622
Conversation
2c479a0 to
67a3749
Compare
|
[approve ci clang-format] |
|
[approve ci clang-analyzer] |
maskit
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't reached to the main change for H2 to Origin.
| Set the alpn string that ATS will send to origin during new connections. By default no ALPN string will be set. | ||
| To enable HTTP/2 communication to the origin, set this to "h2,http1.1". |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to know why it has no ALPN string by default. Having "http1.1" (or "http1.1,http1.0") by default and adding "h2" to enable HTTP/2 sound more natural to me. Also, it may be better to mention about the order.
What will we do if we support H3 to origin? Will we internally filter the string to not offer protocols that are not supported on a transport, or have another setting for QUIC transport?
I'm not sure if users want to specify raw ALPN protocol names. For server_ports, we don't use it.
I don't think this is really going to be a problem, but ALPN protocol name may have commas and any characters. The spec says protocol names are non-empty byte strings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good questions. I kept it empty by default so the initial integration would have less impact. Of course, there are enough other code changes that landing this PR potentially have an impact even if you don't set the ALPN strings.
We need to have a more general discussion about the preferred order, etc. Should probably look how other similar proxies handle this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see the benefit of defaulting to "http1.1". To add HTTP/2, it would still be necessary to specific "http2,http1.1" in the configuration. I agree with Susan's view that, at least initially, this should be backwards compatible in that no HTTP/2 is done outbound unless explicitly configured. That's how HTTP/2 inbound was done - it originally required an explicit "enable" configuration variable to be set.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand having "http1.1" by default doesn't make much sense in terms of the behavior, and I know no ALPN negotiation is the current behavior. But if I was not familiar with ALPN, I'd wonder why I have to have "http1.1"as well to enable HTTP/2. I might write only "h2" and say "Aha, H2 works. Seems like http/1.1 is not necessary.". Also listing numbers in descending order seem unnatural if I thought it's just a list. That's why I'm not keen on having this as ALPN setting.
| // Need to figure out why this cannot be handled as hop by hop. If it is hop-by-hop | ||
| // the information is not propagated for gRPC | ||
| //{"TE", MIME_SLOTID_TE, MIME_PRESENCE_TE, (HTIF_COMMAS | HTIF_MULTVALS | HTIF_HOPBYHOP)}, | ||
| {"TE", MIME_SLOTID_TE, MIME_PRESENCE_TE, (HTIF_COMMAS | HTIF_MULTVALS)}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assuming this is not harmful for non-gRPC traffic. Having gRPC support may be worth temporally removing hop-by-hop flag, but in case we do that, we should file an issue for it.
Making this change separately would make reverting the change easy in case it affects non-gRPC traffic.
038038c to
ea2f26b
Compare
7c509e2 to
1ab826d
Compare
|
Pushed an updated version of the branch based on fixes from my most recent 9.x based h2 to origin testing. Will be working on transitioning this to a feature branch as requested in Monday's meeting. And will be scheduling a PR walk through for next week. |
1ab826d to
20f07a4
Compare
maskit
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added comments. Basically just pointing out something that are not related H2 to Origin.
| hdr->field_find(MIME_FIELD_KEEP_ALIVE, MIME_LEN_KEEP_ALIVE) != nullptr || | ||
| hdr->field_find(MIME_FIELD_PROXY_CONNECTION, MIME_LEN_PROXY_CONNECTION) != nullptr || | ||
| hdr->field_find(MIME_FIELD_TRANSFER_ENCODING, MIME_LEN_TRANSFER_ENCODING) != nullptr || | ||
| // hdr->field_find(MIME_FIELD_TRANSFER_ENCODING, MIME_LEN_TRANSFER_ENCODING) != nullptr || |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assuming this is related to gRPC. If normal H2 requests work fine with this line, I'd suggest making this change separately after this big change. That would allow us to back out only gRPC support but not entire H2 to Origin support.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Backed this out. Wasn't sure why I did this. My use of transfer_encoding headers evolved during this work. Ending up with out using transfer encoding at all for HTTP/2.
20f07a4 to
828e6cd
Compare
d0c7eaf to
6461bd7
Compare
| { | ||
| struct generic; | ||
| } | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needs to be in the ts namespace to avoid conflicts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@SolidWallOfCode pulled this into a separate PR which has landed. PR #7690
9d29fe2 to
7749d78
Compare
|
[approve ci autest] |
8551f4f to
b577b1a
Compare
proxy/http/HttpSM.cc
Outdated
| SSLNetVConnection *sslnetvc = dynamic_cast<SSLNetVConnection *>(netvc); | ||
| if (sslnetvc) { | ||
| SSL_get0_alpn_selected(sslnetvc->ssl, &proto, &proto_length); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could be done by ALPNSupport.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
proxy/http/HttpSM.cc
Outdated
| if (proto_length == 2 && memcmp(proto, "h2", 2) == 0) { | ||
| Http2ServerSession *session = http2ServerSessionAllocator.alloc(); | ||
| add_session = true; | ||
| retval = session; | ||
| } else { | ||
| Http1ServerSession *session = httpServerSessionAllocator.alloc(); | ||
| retval = session; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand we need to do this here, but I don't want to have the detail here in HttpSM. We should probably have a mechanism like ProtocolProbeSessionAccept.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pulled out into a static method.
proxy/http/HttpSessionManager.cc
Outdated
| } else { | ||
| if (first != m_fqdn_pool.end()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
else if
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
proxy/logging/LogAccess.cc
Outdated
|
|
||
| if (m_http_sm) { | ||
| auto txn = m_http_sm->get_ua_txn(); | ||
| auto txn = m_http_sm->ua_txn; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's use the getter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
proxy/http/HttpSessionManager.cc
Outdated
| EThread *ethread = this_ethread(); | ||
| Ptr<ProxyMutex> pool_mutex = | ||
| (TS_SERVER_SESSION_SHARING_POOL_THREAD == pool_type) ? ethread->server_session_pool->mutex : m_g_pool->mutex; | ||
| (TS_SERVER_SESSION_SHARING_POOL_THREAD == sm->t_state.http_config_param->server_session_sharing_pool) ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pool_type that is passed by the caller is used after acquiring the lock. Doesn't it cause mismatch?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. That does seem to be an error.
| // Did we end up with a multiplexing session? | ||
| int count = 0; | ||
| if (new_session->is_multiplexing()) { | ||
| // Hand off to all queued up ConnectSM's. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do you know a multiplexed connection can process all the requests? An origin server may stop accepting new requests on a connection for some reasons (e.g. max requests per conn to mitigate memory pressure due to memory leak), and the queued up requests would just fail where those could be processed successfully on H1 if the server still accepts new connections.
Also, there's no guarantee a multiplexed connection can process all the requests at once. The number of H2 concurrent stream might be set to 1 for some reasons.
In that sense, whether a connection is multiplexing is not important. Non-multiplexed connection could be handled as a multiplexed connection that only process 1 request at once.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You don't know. There is logic before requests are queued that tried to determine if the origin connection may be close to its limit of simultaneous active transactions (In the H2 case using a function of the current setting for max_concurrent_streams)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean is_peer_concurrent_stream_max and add_session, right? That is fine (except direct access to the pool), but we have a session object here and could check the limit again to not assign too many transactions to the session.
And that should work for the both multiplexed and non-multiplexed sessions. I'm trying to suggest something like below:
while (!_connect_sms.empty()) {
sm = _sm_connect_sms->front();
succeeded = new_session->assign_transaction(sm); // Returns false if a session has too many txn assigned
if (succeeded) {
_connect_sms.pop_front();
} else {
break;
}
}
if (!_connect_sms.empty()) {
// Prepare for remaining transactions, and wait for next opportunity
// For H1, create another session
// For H2, maybe do nothing (I'm not sure if we should have multiple sessions for the same server)
}There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, I know this is going to be complicated, but we can retry a transaction on another session if an H2 transaction was closed by RST_STREAM with REFUSED_STREAM error code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, we should be able to test to see if we successfully created the server_txn and kick off a request for a new server session if not. I think we will want to be able to create multiple sessions to the same server. In the case of virtual IP's it will allow for better load distribution. Also relying on a single TCP connection to origin may worry some.
I do think it makes sense to differentiate the multiplexing (H2/H3) and non-multiplexing (HTTP/1.x) protocols in the ConnectingEntry::state_http_server_open handler. We know that the non-multiplexing protocols can only support 1 server_txn at a time. So we try not to queue up on an existing connection request if we think the protocol is non-multiplexing and just make multiple connection requests in parallel. If we guess wrong and queue up, we should process once of the queued requests and kick off the others to make session requests directly (not queued). Which is what the existing logic does.
So I'll add logic here to check that the server_txn was successfully created in the multiplexed case. If it was not, we will kick off another session request and queue up the requests there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added logic in HttpSM::state_http_server_open to detect if the transaction creation failed and retry the session if so.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we will want to be able to create multiple sessions to the same server. In the case of virtual IP's it will allow for better load distribution. Also relying on a single TCP connection to origin may worry some.
I feel the same way, but RFC 7540 says:
A client MAY open multiple connections to the same IP address and TCP
port using different Server Name Indication [TLS-EXT] values or to
provide different TLS client certificates but SHOULD avoid creating
multiple connections with the same configuration.
If we can open as many connections as we want, we will eventually make multiple connections to the same server through VIP, and then max_concurrent_stream doesn't make much sense.
I'm not sure if I understand the entire process. Let's say we got 5 requests and all the requests will be send to the same origin server (or origin servers that have the same VIP). The origin server supports H2, and max_concurrent_stream is 2. What will happen?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assuming 5 simultaneous (overlapping requests). 2 transactions will be made against the first HTTP/2 connection. Then a new session will be created and 2 more transactions will be made there. Then a third session session will be opened.
Actually, I think the code will pull new transactions under the reported max_concurrent_streams setting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I think the code will pull new transactions under the reported max_concurrent_streams setting.
I don't really understand this part. On the example above, the origin server tells us "max_concurrent_streams is 2" on the first connection. The origin server doesn't want to take the third request until either of the first two transactions ends. But we exploit a loophole and start third request by opening a new connection. I think this is the scenario that the paragraph on RFC7540 tries to avoid.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like hops or proxies were not fully considered in writing the spec.
From a proxy's POV I really can imagine wanting to have multiple H2 connections to same origin, as you might be forward proxying a large organisation hiding behind a single outgoing IP or a busy reverse proxying in front of an origin.
In those cases HOL blocking or too many interleaved response streams really might become a problem in user experience.
I think we should have sane values (max h2 connections per server, configurable?) and then act upon the hints the H2 protocol will give us.
|
[approve ci autest] |
|
Let's separate out the outbound ALPN part. We still need to discuss how to enable protocols, but the feature itself should work without H2 (e.g. use http/1.1 but not 1.0), and it should be testable. |
|
I'll look at breaking out the alpn logic. It didn't occur to me you could use alpn to negotiate between HTTP/1.1 and HTTP/1.0 but I guess you can. |
6d461d8 to
ca22d2a
Compare
|
In testing, I found that I needed to defer the assignment of the stream_id for an outbound stream until it was ready to actually send a header. Otherwise, I was getting GOAWAY frames from the next level ATS box with error code set to STREAM_CLOSED. Presumably, another stream was getting assigned a later stream ID and sending its header out first. So when the slower stream sends its header, the origin ATS box fails it because the stream ID is smaller than the "latest" seen stream id. |
|
In my production testing, I am having problems with increased ERR_CONN_FAIL from another ATS box acting as origin as described in issue #8163. Adjusting the error rate up helped some, but the ERR_CONN_FAIL was still significantly higher for that origin. The mechanism was the same. The origin was returning GOAWAY with no error which caused all other active streams to be terminated. Finally, Friday I notice that the H2 client logic was using the "Connection: closed" header as described below to start draining the session. Unfortunately, I cannot get the logic to keep around existing streams until the are finished working. I just move my ERR_CONN_FAIL errors to ERR_CLIENT_ABORT or ERR_READ_TIMEOUT errors. With recent changes to improve header error checking, I've been adding "Connection: close" headers not worrying about the HTTP/2 case since H2 should be ignoring the Connection header. Forgetting about this draining feature. Not sure how to deal with this. Could add H1/H2 checks into the HttpSM to only really add the "Connection: close" for Http/1 connections. Or more likely add a virtual method onto the ProxyTransaction to do that logic. |
ca22d2a to
b638064
Compare
This reverts commit 4208ecb. Reverting the removal of the incompatible changes that we took out of master so that those incompatible changes will be in our 10-Dev branch.
| if (zret == HSM_DONE) { | ||
| to_return = first; | ||
| this->removeSession(to_return); | ||
| if (!to_return->is_multiplexing()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We probably want an early check in the while statement above to see if the session is maxed out, this to prevent ending up much later in the flow and finding out we can't add any more streams
|
|
||
| Enables (``1``) or disables (``0``) TLSv1_3 in the ATS client context. If not specified, enabled by default | ||
|
|
||
| .. ts:cv:: CONFIG proxy.config.ssl.client.alpn_protocol STRING "" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo, should be
proxy.config.ssl.client.alpn_protocols
|
[approve ci] |
b638064 to
b13ddc3
Compare
|
Rebased against 10-Dev branch. I need to close this PR and create another against 10-Dev. Will coordinate with @bneradt to do that. Need to sync up my last Yahoo testing changes too. |
|
Closing in deference to PR #8447 against the 10-Dev branch. |
This is work that has been going on in fits and starts for a couple years with contributions from @a-canary and Kees. And discussions with @masaori335 and @maskit. I have been using the project page https://github.com/apache/trafficserver/projects/9 to track issues and progress.
This latest effort builds upon @a-canary's earlier PoolableSession restructuring work and has been going in earnest since last December. I have run a version of this code against our 9.0.x (plus some other HTTP/2 commits to ease the merge) on one of our production boxes for several weeks. That worked out many memory management issues. In that environment it negotiated HTTP/2 with another ATS layer and an istio/envoy layer. About 20% of our outbound transactions were over H2.
I've just brought it back to master and rebased for this PR. Locally on my Centos7 box is passes autest, unit tests, and regression tests.
This code contains trailer logic to support gRPC. I have run versions of this code against one of the python examples in the gRPC repository, but that was last done in January. The gRPC/trailer support needs more testing, but it is a start.
Some key code changes include
The main code change is adding Http2 versions of transactions and sessions, and creating a Http1 server transaction. The class hierarchies differ slightly between Http1 and Http2. For Http1, both Http1ClientTransaction and Http1ServerTransaction inherit directly from ProxyTransaction.
For Http2, there is still only the Http2Stream to represent both client and server transactions. This PR adds an _outbound_flag to track the few cases where we do need to process inbound and outbound transactions differently. For the sessions, the common elements of session processing are moved into Http2CommonSession. The common class is used as a mix in for Http2ClientSession and Http2ServerSession. Http2ClientSession continues to inherit directly from ProxySession. Http2ServerSession inherits from PoolableSession as the Http1ServerSession does.
One other thing to point out, is that the Http2ServerSessions can only be put in a Thread session pool. I tested in hybrid mode, but pure thread pool mode should also work. I need to see what happens with global pool mode. That will likely require an addition check, warning, or error.
This branch also includes connection failure cleanups identified in PR #7580. We are running with that change in our production environment. Pulling it back out to make the master PR seemed unnecessary since it will hopefully land first.