-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OpenSSL::SSL::SSLError: SSL_read: unexpected eof while reading #1106
Comments
Fix: redis#1106 When Ruby is compiled against OpenSSL V3, if an SSL connection is closed without sending a `close_notify`, a `OpenSSL::SSL::SSLError` will be raised.
Ok, #1107 is ready, I'd appreciate if you could test it in prod. A quick though though. According to the documentation the heroku person pointed: https://www.openssl.org/docs/manmaster/man3/SSL_CTX_set_options.html#SSL_OP_IGNORE_UNEXPECTED_EOF
This let me think that the Heroku SSL proxy is behaving incorrectly, and that they should fix it. We should still handle it on the It's also entirely possible that their proxy is pure TCP, hence incapable to do this, but still it seems faulty to me. |
I ran into this last week, and got some great advice from Heroku support, which I'm leaving here in case anybody else is in the same boat. Until #1107 ships, you can set your redis-side connection timeouts to 25 hours (longer than the dyno refresh period) to avoid the issue. The downside to that config change is that it's possible you might run out of available redis connections; your mileage may vary. But for most cases this advice should be the workaround you need. Once #1107 ships, you'll be able to set the configs back to a 5 minute timeout. |
I just wanted to chime in here and share some results from my investigation around this. The issue is reproducible with a vanilla Redis 7.0.2 compiled from source with TLS support, which also makes sense given how Heroku Redis doesn't use any proxying for TLS. I'm currently cleaning up the reproduce case so its |
Thank you so much @casperisfine and @byroot for the fix and all the work you have been doing! The community is grateful. |
Redis Server issue filed here: redis/redis#10915 |
Redis Server fix: redis/redis#10931 |
Hello 👋
On Heroku, after upgrading to the
heroku-22
stack fromheroku-20
we started having intermittentSSL_read: unexpected eof while reading
exceptions when enqueuing jobs viasidekiq
.The root cause seems to be related to the strict unexpected EOFs implementation of OpenSSL 3.
heroku-22
uses solely OpenSSL 3 (Ubuntu 22.04 ships with OpenSSL 3.0. OpenSSL 1.1 is not available as a runtime library).Other projects faced the same issue. Ruby and other langauges.
For example puma/puma#2800
Ruby version: 3.1.2
Rails version: 7.0.3
Sidekiq: 6.5.1
redis gem: 4.6.0
Heroku Redis: 6.2.3, premium-0
From my Heroku support ticket: (they agreed for me to copy/paste this)
We've seen a few reports of this, and while we continue investigating to see how we can improve this from our end, I think we have a pretty good idea of the current situation - let me share:
These "unexpected EOF while reading" SSL errors can be reproduced consistently as soon as the connection is killed from the Redis end. The two more direct ways here would be to open a
redis-cli
session against the Redis instance and runningCLIENT KILL
on any connections coming from your app, or waiting for the default idle timeout to act, which I think it's what's happening here.OpenSSL 3 has a much more strict behavior about unexpected EOF. Most Redis clients for other languages use their own TLS implementation, and some clients like PHP and Python's have enabled the SSL_OP_IGNORE_UNEXPECTED_EOF setting to make OpenSSL 3 behave as it did in previous versions.
However, for Ruby specifically, we can see that when redis-rb is running with OpenSSL 3 and it gets one of these EOF, it is not gracefully reconnecting as it does for the behavior it saw with older OpenSSL versions. From our investigation so far we believe that the final fix will probably need to come from Ruby's OpenSSL and/or redis-rb.
To get back into what I believe might be the source of the EOFs, which is Redis terminating client connections, it is important to remember that all Heroku Redis instances will close idle connections after 5 minutes: https://devcenter.heroku.com/articles/managing-heroku-redis-using-cli#redis-timeout. I consider this feature useful to avoid that any connection leaks to Redis could end up in your app hitting the maximum number of clients, but this is something you can fully configure on your own. I have been working with some of our customers that have tried setting the idle timeout to a larger value, around 25 hours, so that the dyno cycling periods are covered. This is, by making the timeout over 25 hours, no connections made by your dynos during their lifespan would be affected by this timeout.
Heroku Redis is configured with a
tcp-keepalive
of 240 seconds, so any broken connections from clients would still be cleaned up by the Redis server. However, we recommend being careful with increasing this timeout initially as any connection leaks from your app might have been hidden by the timeout, and if it gets extended or disabled completely, your connections might end up exhausting the connection limit for your Redis plan. We're also evaluating making this keepalive shorter, as we believe that'd be beneficial especially if the idle timeout is deactivated.Alternatively, and especially seeing that this is behavior coming from OpenSSL that could potentially affect other connections, you might want to build EOF resiliency/retries from within your code directly.
To confirm, this behavior does not happen in previous stacks like
heroku-20
as they don't come with OpenSSL3 asheroku-22
does. Staying inheroku-20
for a while would also be an option, while we see if there's some changes around this behavior that can be applied toredis-rb
.Would you like to test the idle timeout changes and let us know how it goes? As I mentioned, this is something we're actively investigating and working on, and it's possible this situation results in changes to the default timeout and keepalive settings in Heroku as I mentioned.
We're still investigating further to see if these EOFs due to the incorrectly closed connections from the Redis server side is due to the default Redis behavior or if there might be something specific with how Heroku Redis runs that could be causing it. However, and keeping this in mind, if you want to use that info around OpenSSL, feel free to do so. For context, this is what other languages/clients did:
python/cpython#25309
php/php-src@74f75db - this commit is interesting because the message has much more context.
As I mentioned, we'll continue looking into this to see if we can make Heroku Redis compliant with the expected behavior from our end. I've linked this ticket to the internal work item where we're tracking this so I can keep you posted if we have news in the short term.
Initially opened under the
sidekiq
project sidekiq/sidekiq#5402The text was updated successfully, but these errors were encountered: