Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OpenSSL::SSL::SSLError: SSL_read: unexpected eof while reading #1106

Closed
jeremiemv opened this issue Jun 30, 2022 · 6 comments · Fixed by #1107
Closed

OpenSSL::SSL::SSLError: SSL_read: unexpected eof while reading #1106

jeremiemv opened this issue Jun 30, 2022 · 6 comments · Fixed by #1107

Comments

@jeremiemv
Copy link

Hello 👋

On Heroku, after upgrading to the heroku-22 stack from heroku-20 we started having intermittent SSL_read: unexpected eof while reading exceptions when enqueuing jobs via sidekiq.

The root cause seems to be related to the strict unexpected EOFs implementation of OpenSSL 3. heroku-22 uses solely OpenSSL 3 (Ubuntu 22.04 ships with OpenSSL 3.0. OpenSSL 1.1 is not available as a runtime library).

Other projects faced the same issue. Ruby and other langauges.
For example puma/puma#2800

Ruby version: 3.1.2
Rails version: 7.0.3
Sidekiq: 6.5.1
redis gem: 4.6.0
Heroku Redis: 6.2.3, premium-0

**OpenSSL::SSL::SSLError: SSL_read: unexpected eof while reading**
/app/vendor/ruby-3.1.2/lib/ruby/3.1.0/openssl/buffering.rb line 214 in sysread_nonblock
/app/vendor/ruby-3.1.2/lib/ruby/3.1.0/openssl/buffering.rb line 214 in read_nonblock
/app/vendor/bundle/ruby/3.1.0/gems/redis-4.6.0/lib/redis/connection/ruby.rb line 55 in block in _read_from_socket
/app/vendor/bundle/ruby/3.1.0/gems/redis-4.6.0/lib/redis/connection/ruby.rb line 54 in loop
/app/vendor/bundle/ruby/3.1.0/gems/redis-4.6.0/lib/redis/connection/ruby.rb line 54 in _read_from_socket
/app/vendor/bundle/ruby/3.1.0/gems/redis-4.6.0/lib/redis/connection/ruby.rb line 47 in gets
/app/vendor/bundle/ruby/3.1.0/gems/redis-4.6.0/lib/redis/connection/ruby.rb line 382 in read
/app/vendor/bundle/ruby/3.1.0/gems/redis-4.6.0/lib/redis/client.rb line 311 in block in read
/app/vendor/bundle/ruby/3.1.0/gems/redis-4.6.0/lib/redis/client.rb line 299 in io
/app/vendor/bundle/ruby/3.1.0/gems/redis-4.6.0/lib/redis/client.rb line 310 in read
/app/vendor/bundle/ruby/3.1.0/gems/redis-4.6.0/lib/redis/client.rb line 161 in block in call
/app/vendor/bundle/ruby/3.1.0/gems/redis-4.6.0/lib/redis/client.rb line 279 in block (2 levels) in process
/app/vendor/bundle/ruby/3.1.0/gems/redis-4.6.0/lib/redis/client.rb line 420 in ensure_connected
/app/vendor/bundle/ruby/3.1.0/gems/redis-4.6.0/lib/redis/client.rb line 269 in block in process
/app/vendor/bundle/ruby/3.1.0/gems/redis-4.6.0/lib/redis/client.rb line 356 in logging
/app/vendor/bundle/ruby/3.1.0/gems/redis-4.6.0/lib/redis/client.rb line 268 in process
/app/vendor/bundle/ruby/3.1.0/gems/redis-4.6.0/lib/redis/client.rb line 161 in call
/app/vendor/bundle/ruby/3.1.0/gems/scout_apm-5.2.0/lib/scout_apm/instruments/redis.rb line 32 in block in call_with_scout_instruments
/app/vendor/bundle/ruby/3.1.0/gems/scout_apm-5.2.0/lib/scout_apm/tracer.rb line 34 in instrument
/app/vendor/bundle/ruby/3.1.0/gems/scout_apm-5.2.0/lib/scout_apm/tracer.rb line 44 in instrument
/app/vendor/bundle/ruby/3.1.0/gems/scout_apm-5.2.0/lib/scout_apm/instruments/redis.rb line 31 in call_with_scout_instruments
/app/vendor/bundle/ruby/3.1.0/gems/redis-4.6.0/lib/redis.rb line 263 in block in send_command
/app/vendor/bundle/ruby/3.1.0/gems/redis-4.6.0/lib/redis.rb line 262 in synchronize
/app/vendor/bundle/ruby/3.1.0/gems/redis-4.6.0/lib/redis.rb line 262 in send_command
/app/vendor/bundle/ruby/3.1.0/gems/redis-4.6.0/lib/redis/commands/scripting.rb line 110 in _eval
/app/vendor/bundle/ruby/3.1.0/gems/redis-4.6.0/lib/redis/commands/scripting.rb line 97 in evalsha
/app/vendor/bundle/ruby/3.1.0/gems/sidekiq-6.5.1/lib/sidekiq/scheduled.rb line 55 in zpopbyscore
/app/vendor/bundle/ruby/3.1.0/gems/sidekiq-6.5.1/lib/sidekiq/scheduled.rb line 35 in block (2 levels) in enqueue_jobs
/app/vendor/bundle/ruby/3.1.0/gems/sidekiq-6.5.1/lib/sidekiq/scheduled.rb line 30 in each
/app/vendor/bundle/ruby/3.1.0/gems/sidekiq-6.5.1/lib/sidekiq/scheduled.rb line 30 in block in enqueue_jobs
/app/vendor/bundle/ruby/3.1.0/gems/sidekiq-6.5.1/lib/sidekiq.rb line 156 in block in redis
/app/vendor/bundle/ruby/3.1.0/gems/connection_pool-2.2.5/lib/connection_pool.rb line 63 in block (2 levels) in with
/app/vendor/bundle/ruby/3.1.0/gems/connection_pool-2.2.5/lib/connection_pool.rb line 62 in handle_interrupt
/app/vendor/bundle/ruby/3.1.0/gems/connection_pool-2.2.5/lib/connection_pool.rb line 62 in block in with
/app/vendor/bundle/ruby/3.1.0/gems/connection_pool-2.2.5/lib/connection_pool.rb line 59 in handle_interrupt
/app/vendor/bundle/ruby/3.1.0/gems/connection_pool-2.2.5/lib/connection_pool.rb line 59 in with
/app/vendor/bundle/ruby/3.1.0/gems/sidekiq-6.5.1/lib/sidekiq.rb line 153 in redis
/app/vendor/bundle/ruby/3.1.0/gems/sidekiq-6.5.1/lib/sidekiq/scheduled.rb line 29 in enqueue_jobs
/app/vendor/bundle/ruby/3.1.0/gems/sidekiq-6.5.1/lib/sidekiq/scheduled.rb line 109 in enqueue
/app/vendor/bundle/ruby/3.1.0/gems/sidekiq-6.5.1/lib/sidekiq/scheduled.rb line 101 in block in start
/app/vendor/bundle/ruby/3.1.0/gems/sidekiq-6.5.1/lib/sidekiq/component.rb line 8 in watchdog
/app/vendor/bundle/ruby/3.1.0/gems/sidekiq-6.5.1/lib/sidekiq/component.rb line 17 in block in safe_threa

From my Heroku support ticket: (they agreed for me to copy/paste this)


We've seen a few reports of this, and while we continue investigating to see how we can improve this from our end, I think we have a pretty good idea of the current situation - let me share:

These "unexpected EOF while reading" SSL errors can be reproduced consistently as soon as the connection is killed from the Redis end. The two more direct ways here would be to open a redis-cli session against the Redis instance and running CLIENT KILL on any connections coming from your app, or waiting for the default idle timeout to act, which I think it's what's happening here.

OpenSSL 3 has a much more strict behavior about unexpected EOF. Most Redis clients for other languages use their own TLS implementation, and some clients like PHP and Python's have enabled the SSL_OP_IGNORE_UNEXPECTED_EOF setting to make OpenSSL 3 behave as it did in previous versions.

However, for Ruby specifically, we can see that when redis-rb is running with OpenSSL 3 and it gets one of these EOF, it is not gracefully reconnecting as it does for the behavior it saw with older OpenSSL versions. From our investigation so far we believe that the final fix will probably need to come from Ruby's OpenSSL and/or redis-rb.

To get back into what I believe might be the source of the EOFs, which is Redis terminating client connections, it is important to remember that all Heroku Redis instances will close idle connections after 5 minutes: https://devcenter.heroku.com/articles/managing-heroku-redis-using-cli#redis-timeout. I consider this feature useful to avoid that any connection leaks to Redis could end up in your app hitting the maximum number of clients, but this is something you can fully configure on your own. I have been working with some of our customers that have tried setting the idle timeout to a larger value, around 25 hours, so that the dyno cycling periods are covered. This is, by making the timeout over 25 hours, no connections made by your dynos during their lifespan would be affected by this timeout.

Heroku Redis is configured with a tcp-keepalive of 240 seconds, so any broken connections from clients would still be cleaned up by the Redis server. However, we recommend being careful with increasing this timeout initially as any connection leaks from your app might have been hidden by the timeout, and if it gets extended or disabled completely, your connections might end up exhausting the connection limit for your Redis plan. We're also evaluating making this keepalive shorter, as we believe that'd be beneficial especially if the idle timeout is deactivated.

Alternatively, and especially seeing that this is behavior coming from OpenSSL that could potentially affect other connections, you might want to build EOF resiliency/retries from within your code directly.

To confirm, this behavior does not happen in previous stacks like heroku-20 as they don't come with OpenSSL3 as heroku-22 does. Staying in heroku-20 for a while would also be an option, while we see if there's some changes around this behavior that can be applied to redis-rb.

Would you like to test the idle timeout changes and let us know how it goes? As I mentioned, this is something we're actively investigating and working on, and it's possible this situation results in changes to the default timeout and keepalive settings in Heroku as I mentioned.


We're still investigating further to see if these EOFs due to the incorrectly closed connections from the Redis server side is due to the default Redis behavior or if there might be something specific with how Heroku Redis runs that could be causing it. However, and keeping this in mind, if you want to use that info around OpenSSL, feel free to do so. For context, this is what other languages/clients did:

python/cpython#25309
php/php-src@74f75db - this commit is interesting because the message has much more context.

As I mentioned, we'll continue looking into this to see if we can make Heroku Redis compliant with the expected behavior from our end. I've linked this ticket to the internal work item where we're tracking this so I can keep you posted if we have news in the short term.


Initially opened under the sidekiq project sidekiq/sidekiq#5402

casperisfine pushed a commit to casperisfine/redis-rb that referenced this issue Jun 30, 2022
Fix: redis#1106

When Ruby is compiled against OpenSSL V3, if an SSL connection
is closed without sending a `close_notify`, a `OpenSSL::SSL::SSLError`
will be raised.
@casperisfine
Copy link

Ok, #1107 is ready, I'd appreciate if you could test it in prod.

A quick though though. According to the documentation the heroku person pointed: https://www.openssl.org/docs/manmaster/man3/SSL_CTX_set_options.html#SSL_OP_IGNORE_UNEXPECTED_EOF

Some TLS implementations do not send the mandatory close_notify alert on shutdown. If the application tries to wait for the close_notify alert but the peer closes the connection without sending it, an error is generated. When this option is enabled the peer does not need to send the close_notify alert and a closed connection will be treated as if the close_notify alert was received.

You should only enable this option if the protocol running over TLS can detect a truncation attack itself, and that the application is checking for that truncation attack.

For more information on shutting down a connection, see SSL_shutdown(3).

This let me think that the Heroku SSL proxy is behaving incorrectly, and that they should fix it. We should still handle it on the redis-rb side of course, but it shouldn't be a normal behavior. Just my two cents though, I'm not a Heroku customer, so up to you wether you want to bring this up with them.

It's also entirely possible that their proxy is pure TCP, hence incapable to do this, but still it seems faulty to me.

@brent-yearone
Copy link

I ran into this last week, and got some great advice from Heroku support, which I'm leaving here in case anybody else is in the same boat. Until #1107 ships, you can set your redis-side connection timeouts to 25 hours (longer than the dyno refresh period) to avoid the issue. The downside to that config change is that it's possible you might run out of available redis connections; your mileage may vary. But for most cases this advice should be the workaround you need. Once #1107 ships, you'll be able to set the configs back to a 5 minute timeout.

@dzuelke
Copy link

dzuelke commented Jun 30, 2022

I just wanted to chime in here and share some results from my investigation around this. The issue is reproducible with a vanilla Redis 7.0.2 compiled from source with TLS support, which also makes sense given how Heroku Redis doesn't use any proxying for TLS.

I'm currently cleaning up the reproduce case so its Dockerfile is as minimal as possible; I'll then file an issue against the Redis GH repo.

@jeremiemv
Copy link
Author

Thank you so much @casperisfine and @byroot for the fix and all the work you have been doing! The community is grateful.

@dzuelke
Copy link

dzuelke commented Jul 1, 2022

Redis Server issue filed here: redis/redis#10915

@dzuelke
Copy link

dzuelke commented Jul 4, 2022

Redis Server fix: redis/redis#10931

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants