-
Notifications
You must be signed in to change notification settings - Fork 5k
[release/8.0] Removed unused sessions from SSL_CTX internal cache #102095
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[release/8.0] Removed unused sessions from SSL_CTX internal cache #102095
Conversation
Tagging subscribers to this area: @dotnet/ncl, @bartonjs, @vcsjones |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Memory usage impact on production systems. E2E scenario regression from .NET 6. We should service it. |
Approved via email on 06/03. |
Will be part of July release 8.0.7. |
Backport of #101684 to release/8.0-staging
/cc @wfurt @rzikm
Customer Impact
Reported by customer via official support. Small repro available.
Customers on Linux sees increased memory usage when establishing parallel connections to the same host (note that parallel requests on HTTP/1.1 will always use parallel connections). Measured overhead can be easily 100M+, which is problem for containers in k8s clusters limited to 300M memory.
It also helped one customer in general "memory problems with .NET 6 -> 8 upgrade" issue - see comment.
Workaround is lowering TLS cache size via:
System.Net.Security.TlsCacheSize
AppCtx switch, orDOTNET_SYSTEM_NET_SECURITY_TLSCACHESIZE
environment variableTechnical details:
The mechanism of the (bounded) memory leak is as follows:
The fix is to keep the two caches in sync and remove the dropped TLS session tickets from the internal cache as well.
Regression
Yes, the bug is part of TLS Session resumption feature on Linux, introduced in .NET 7. For customers migrating from .NET 6 it manifests as E2E scenario regression.
Testing
Tested on customer provided minimal repro.
Customer was not willing to verify privates in production.
Note: Customer confirmed that the workaround helps them in production, which means we have high confidence, this fix is the real root cause of their production problems and will help them.
Risk
Low, the issue is well understood and the change is localized to the feature. Functional tests verified TLS resumption works.