Description
Hello 👋
On Heroku, after upgrading to the heroku-22
stack from heroku-20
we started having intermittent SSL_read: unexpected eof while reading
exceptions when enqueuing jobs via sidekiq
.
The root cause seems to be related to the strict unexpected EOFs implementation of OpenSSL 3. heroku-22
uses solely OpenSSL 3 (Ubuntu 22.04 ships with OpenSSL 3.0. OpenSSL 1.1 is not available as a runtime library).
Other projects faced the same issue. Ruby and other langauges.
For example puma/puma#2800
Ruby version: 3.1.2
Rails version: 7.0.3
Sidekiq: 6.5.1
redis gem: 4.6.0
Heroku Redis: 6.2.3, premium-0
**OpenSSL::SSL::SSLError: SSL_read: unexpected eof while reading**
/app/vendor/ruby-3.1.2/lib/ruby/3.1.0/openssl/buffering.rb line 214 in sysread_nonblock
/app/vendor/ruby-3.1.2/lib/ruby/3.1.0/openssl/buffering.rb line 214 in read_nonblock
/app/vendor/bundle/ruby/3.1.0/gems/redis-4.6.0/lib/redis/connection/ruby.rb line 55 in block in _read_from_socket
/app/vendor/bundle/ruby/3.1.0/gems/redis-4.6.0/lib/redis/connection/ruby.rb line 54 in loop
/app/vendor/bundle/ruby/3.1.0/gems/redis-4.6.0/lib/redis/connection/ruby.rb line 54 in _read_from_socket
/app/vendor/bundle/ruby/3.1.0/gems/redis-4.6.0/lib/redis/connection/ruby.rb line 47 in gets
/app/vendor/bundle/ruby/3.1.0/gems/redis-4.6.0/lib/redis/connection/ruby.rb line 382 in read
/app/vendor/bundle/ruby/3.1.0/gems/redis-4.6.0/lib/redis/client.rb line 311 in block in read
/app/vendor/bundle/ruby/3.1.0/gems/redis-4.6.0/lib/redis/client.rb line 299 in io
/app/vendor/bundle/ruby/3.1.0/gems/redis-4.6.0/lib/redis/client.rb line 310 in read
/app/vendor/bundle/ruby/3.1.0/gems/redis-4.6.0/lib/redis/client.rb line 161 in block in call
/app/vendor/bundle/ruby/3.1.0/gems/redis-4.6.0/lib/redis/client.rb line 279 in block (2 levels) in process
/app/vendor/bundle/ruby/3.1.0/gems/redis-4.6.0/lib/redis/client.rb line 420 in ensure_connected
/app/vendor/bundle/ruby/3.1.0/gems/redis-4.6.0/lib/redis/client.rb line 269 in block in process
/app/vendor/bundle/ruby/3.1.0/gems/redis-4.6.0/lib/redis/client.rb line 356 in logging
/app/vendor/bundle/ruby/3.1.0/gems/redis-4.6.0/lib/redis/client.rb line 268 in process
/app/vendor/bundle/ruby/3.1.0/gems/redis-4.6.0/lib/redis/client.rb line 161 in call
/app/vendor/bundle/ruby/3.1.0/gems/scout_apm-5.2.0/lib/scout_apm/instruments/redis.rb line 32 in block in call_with_scout_instruments
/app/vendor/bundle/ruby/3.1.0/gems/scout_apm-5.2.0/lib/scout_apm/tracer.rb line 34 in instrument
/app/vendor/bundle/ruby/3.1.0/gems/scout_apm-5.2.0/lib/scout_apm/tracer.rb line 44 in instrument
/app/vendor/bundle/ruby/3.1.0/gems/scout_apm-5.2.0/lib/scout_apm/instruments/redis.rb line 31 in call_with_scout_instruments
/app/vendor/bundle/ruby/3.1.0/gems/redis-4.6.0/lib/redis.rb line 263 in block in send_command
/app/vendor/bundle/ruby/3.1.0/gems/redis-4.6.0/lib/redis.rb line 262 in synchronize
/app/vendor/bundle/ruby/3.1.0/gems/redis-4.6.0/lib/redis.rb line 262 in send_command
/app/vendor/bundle/ruby/3.1.0/gems/redis-4.6.0/lib/redis/commands/scripting.rb line 110 in _eval
/app/vendor/bundle/ruby/3.1.0/gems/redis-4.6.0/lib/redis/commands/scripting.rb line 97 in evalsha
/app/vendor/bundle/ruby/3.1.0/gems/sidekiq-6.5.1/lib/sidekiq/scheduled.rb line 55 in zpopbyscore
/app/vendor/bundle/ruby/3.1.0/gems/sidekiq-6.5.1/lib/sidekiq/scheduled.rb line 35 in block (2 levels) in enqueue_jobs
/app/vendor/bundle/ruby/3.1.0/gems/sidekiq-6.5.1/lib/sidekiq/scheduled.rb line 30 in each
/app/vendor/bundle/ruby/3.1.0/gems/sidekiq-6.5.1/lib/sidekiq/scheduled.rb line 30 in block in enqueue_jobs
/app/vendor/bundle/ruby/3.1.0/gems/sidekiq-6.5.1/lib/sidekiq.rb line 156 in block in redis
/app/vendor/bundle/ruby/3.1.0/gems/connection_pool-2.2.5/lib/connection_pool.rb line 63 in block (2 levels) in with
/app/vendor/bundle/ruby/3.1.0/gems/connection_pool-2.2.5/lib/connection_pool.rb line 62 in handle_interrupt
/app/vendor/bundle/ruby/3.1.0/gems/connection_pool-2.2.5/lib/connection_pool.rb line 62 in block in with
/app/vendor/bundle/ruby/3.1.0/gems/connection_pool-2.2.5/lib/connection_pool.rb line 59 in handle_interrupt
/app/vendor/bundle/ruby/3.1.0/gems/connection_pool-2.2.5/lib/connection_pool.rb line 59 in with
/app/vendor/bundle/ruby/3.1.0/gems/sidekiq-6.5.1/lib/sidekiq.rb line 153 in redis
/app/vendor/bundle/ruby/3.1.0/gems/sidekiq-6.5.1/lib/sidekiq/scheduled.rb line 29 in enqueue_jobs
/app/vendor/bundle/ruby/3.1.0/gems/sidekiq-6.5.1/lib/sidekiq/scheduled.rb line 109 in enqueue
/app/vendor/bundle/ruby/3.1.0/gems/sidekiq-6.5.1/lib/sidekiq/scheduled.rb line 101 in block in start
/app/vendor/bundle/ruby/3.1.0/gems/sidekiq-6.5.1/lib/sidekiq/component.rb line 8 in watchdog
/app/vendor/bundle/ruby/3.1.0/gems/sidekiq-6.5.1/lib/sidekiq/component.rb line 17 in block in safe_threa
From my Heroku support ticket: (they agreed for me to copy/paste this)
We've seen a few reports of this, and while we continue investigating to see how we can improve this from our end, I think we have a pretty good idea of the current situation - let me share:
These "unexpected EOF while reading" SSL errors can be reproduced consistently as soon as the connection is killed from the Redis end. The two more direct ways here would be to open a redis-cli
session against the Redis instance and running CLIENT KILL
on any connections coming from your app, or waiting for the default idle timeout to act, which I think it's what's happening here.
OpenSSL 3 has a much more strict behavior about unexpected EOF. Most Redis clients for other languages use their own TLS implementation, and some clients like PHP and Python's have enabled the SSL_OP_IGNORE_UNEXPECTED_EOF setting to make OpenSSL 3 behave as it did in previous versions.
However, for Ruby specifically, we can see that when redis-rb is running with OpenSSL 3 and it gets one of these EOF, it is not gracefully reconnecting as it does for the behavior it saw with older OpenSSL versions. From our investigation so far we believe that the final fix will probably need to come from Ruby's OpenSSL and/or redis-rb.
To get back into what I believe might be the source of the EOFs, which is Redis terminating client connections, it is important to remember that all Heroku Redis instances will close idle connections after 5 minutes: https://devcenter.heroku.com/articles/managing-heroku-redis-using-cli#redis-timeout. I consider this feature useful to avoid that any connection leaks to Redis could end up in your app hitting the maximum number of clients, but this is something you can fully configure on your own. I have been working with some of our customers that have tried setting the idle timeout to a larger value, around 25 hours, so that the dyno cycling periods are covered. This is, by making the timeout over 25 hours, no connections made by your dynos during their lifespan would be affected by this timeout.
Heroku Redis is configured with a tcp-keepalive
of 240 seconds, so any broken connections from clients would still be cleaned up by the Redis server. However, we recommend being careful with increasing this timeout initially as any connection leaks from your app might have been hidden by the timeout, and if it gets extended or disabled completely, your connections might end up exhausting the connection limit for your Redis plan. We're also evaluating making this keepalive shorter, as we believe that'd be beneficial especially if the idle timeout is deactivated.
Alternatively, and especially seeing that this is behavior coming from OpenSSL that could potentially affect other connections, you might want to build EOF resiliency/retries from within your code directly.
To confirm, this behavior does not happen in previous stacks like heroku-20
as they don't come with OpenSSL3 as heroku-22
does. Staying in heroku-20
for a while would also be an option, while we see if there's some changes around this behavior that can be applied to redis-rb
.
Would you like to test the idle timeout changes and let us know how it goes? As I mentioned, this is something we're actively investigating and working on, and it's possible this situation results in changes to the default timeout and keepalive settings in Heroku as I mentioned.
We're still investigating further to see if these EOFs due to the incorrectly closed connections from the Redis server side is due to the default Redis behavior or if there might be something specific with how Heroku Redis runs that could be causing it. However, and keeping this in mind, if you want to use that info around OpenSSL, feel free to do so. For context, this is what other languages/clients did:
python/cpython#25309
php/php-src@74f75db - this commit is interesting because the message has much more context.
As I mentioned, we'll continue looking into this to see if we can make Heroku Redis compliant with the expected behavior from our end. I've linked this ticket to the internal work item where we're tracking this so I can keep you posted if we have news in the short term.
Initially opened under the sidekiq
project sidekiq/sidekiq#5402