Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Broken Pipe and EOFErrors #119

Open
leequarella opened this issue May 22, 2023 · 28 comments
Open

Broken Pipe and EOFErrors #119

leequarella opened this issue May 22, 2023 · 28 comments

Comments

@leequarella
Copy link

We recently switched from Sidekiq 6.5.8 to 7.1 (where there was a change to use redis-client) and have noticed a significant increase in errors, all coming from RedisClient. All errors follow one of three forms:

RedisClient::ConnectionError: EOFError
/app/vendor/bundle/ruby/3.2.0/gems/redis-client-0.14.1/lib/redis_client/ruby_connection.rb:101:in `rescue in read': EOFError (RedisClient::ConnectionError)
RedisClient::ConnectionError: Broken pipe
/app/vendor/bundle/ruby/3.2.0/gems/redis-client-0.14.1/lib/redis_client/ruby_connection.rb:88:in `rescue in write_multi': Broken pipe (RedisClient::ConnectionError)
RedisClient::ConnectionError: Broken pipe
/app/vendor/bundle/ruby/3.2.0/gems/redis-client-0.14.1/lib/redis_client/ruby_connection.rb:76:in `rescue in write': Broken pipe (RedisClient::ConnectionError)
	from /app/vendor/bundle/ruby/3.2.0/gems/redis-client-0.14.1/lib/redis_client/ruby_connection.rb:73:in `write'
	from /app/vendor/bundle/ruby/3.2.0/gems/redis-client-0.14.1/lib/redis_client/connection_mixin.rb:30:in `call'
	from /app/vendor/bundle/ruby/3.2.0/gems/redis-client-0.14.1/lib/redis_client.rb:211:in `block (2 levels) in call'
	from /app/vendor/bundle/ruby/3.2.0/gems/redis-client-0.14.1/lib/redis_client/middlewares.rb:16:in `call'
	from /app/vendor/bundle/ruby/3.2.0/gems/ddtrace-1.11.1/lib/datadog/tracing/contrib/redis/trace_middleware.rb:23:in `block in call'
	from /app/vendor/bundle/ruby/3.2.0/gems/ddtrace-1.11.1/lib/datadog/tracing/trace_operation.rb:192:in `block in measure'
	from /app/vendor/bundle/ruby/3.2.0/gems/ddtrace-1.11.1/lib/datadog/tracing/span_operation.rb:150:in `measure'
	from /app/vendor/bundle/ruby/3.2.0/gems/ddtrace-1.11.1/lib/datadog/tracing/trace_operation.rb:192:in `measure'
	from /app/vendor/bundle/ruby/3.2.0/gems/ddtrace-1.11.1/lib/datadog/tracing/tracer.rb:380:in `start_span'
	from /app/vendor/bundle/ruby/3.2.0/gems/ddtrace-1.11.1/lib/datadog/tracing/tracer.rb:160:in `block in trace'
	from /app/vendor/bundle/ruby/3.2.0/gems/ddtrace-1.11.1/lib/datadog/tracing/context.rb:45:in `activate!'
	from /app/vendor/bundle/ruby/3.2.0/gems/ddtrace-1.11.1/lib/datadog/tracing/tracer.rb:159:in `trace'
	from /app/vendor/bundle/ruby/3.2.0/gems/ddtrace-1.11.1/lib/datadog/tracing.rb:18:in `trace'
	from /app/vendor/bundle/ruby/3.2.0/gems/ddtrace-1.11.1/lib/datadog/tracing/contrib/redis/trace_middleware.rb:13:in `call'
	from /app/vendor/bundle/ruby/3.2.0/gems/redis-client-0.14.1/lib/redis_client.rb:210:in `block in call'
	from /app/vendor/bundle/ruby/3.2.0/gems/redis-client-0.14.1/lib/redis_client.rb:626:in `ensure_connected'
	from /app/vendor/bundle/ruby/3.2.0/gems/redis-client-0.14.1/lib/redis_client.rb:209:in `call'
	from /app/vendor/bundle/ruby/3.2.0/gems/redis-client-0.14.1/lib/redis_client/decorator.rb:26:in `call'
	from /app/vendor/bundle/ruby/3.2.0/gems/sidekiq-ent-7.1.0/lib/sidekiq-ent/scripting.rb:21:in `block in call'
	from /app/vendor/bundle/ruby/3.2.0/gems/connection_pool-2.4.0/lib/connection_pool.rb:107:in `block (2 levels) in with'
	from /app/vendor/bundle/ruby/3.2.0/gems/connection_pool-2.4.0/lib/connection_pool.rb:106:in `handle_interrupt'
	from /app/vendor/bundle/ruby/3.2.0/gems/connection_pool-2.4.0/lib/connection_pool.rb:106:in `block in with'
	from /app/vendor/bundle/ruby/3.2.0/gems/connection_pool-2.4.0/lib/connection_pool.rb:103:in `handle_interrupt'
	from /app/vendor/bundle/ruby/3.2.0/gems/connection_pool-2.4.0/lib/connection_pool.rb:103:in `with'
	from /app/vendor/bundle/ruby/3.2.0/gems/sidekiq-ent-7.1.0/lib/sidekiq-ent/scripting.rb:20:in `call'
	from /app/vendor/bundle/ruby/3.2.0/gems/sidekiq-ent-7.1.0/lib/sidekiq-ent/senate.rb:148:in `update_leader'
	from /app/vendor/bundle/ruby/3.2.0/gems/sidekiq-ent-7.1.0/lib/sidekiq-ent/senate.rb:124:in `election'
	from /app/vendor/bundle/ruby/3.2.0/gems/sidekiq-ent-7.1.0/lib/sidekiq-ent/senate.rb:103:in `cycle'
	from /app/vendor/bundle/ruby/3.2.0/gems/sidekiq-7.1.0/lib/sidekiq/component.rb:10:in `watchdog'
	from /app/vendor/bundle/ruby/3.2.0/gems/sidekiq-7.1.0/lib/sidekiq/component.rb:19:in `block in safe_thread'
/app/vendor/ruby-3.2.1/lib/ruby/3.2.0/openssl/buffering.rb:415:in `syswrite_nonblock': Broken pipe (Errno::EPIPE)
	from /app/vendor/ruby-3.2.1/lib/ruby/3.2.0/openssl/buffering.rb:415:in `write_nonblock'
	from /app/vendor/bundle/ruby/3.2.0/gems/redis-client-0.14.1/lib/redis_client/ruby_connection/buffered_io.rb:64:in `block in write'
	from /app/vendor/bundle/ruby/3.2.0/gems/redis-client-0.14.1/lib/redis_client/ruby_connection/buffered_io.rb:63:in `loop'
	from /app/vendor/bundle/ruby/3.2.0/gems/redis-client-0.14.1/lib/redis_client/ruby_connection/buffered_io.rb:63:in `write'
	from /app/vendor/bundle/ruby/3.2.0/gems/redis-client-0.14.1/lib/redis_client/ruby_connection.rb:74:in `write'
	from /app/vendor/bundle/ruby/3.2.0/gems/redis-client-0.14.1/lib/redis_client/connection_mixin.rb:30:in `call'
	from /app/vendor/bundle/ruby/3.2.0/gems/redis-client-0.14.1/lib/redis_client.rb:211:in `block (2 levels) in call'
	from /app/vendor/bundle/ruby/3.2.0/gems/redis-client-0.14.1/lib/redis_client/middlewares.rb:16:in `call'
	from /app/vendor/bundle/ruby/3.2.0/gems/ddtrace-1.11.1/lib/datadog/tracing/contrib/redis/trace_middleware.rb:23:in `block in call'
	from /app/vendor/bundle/ruby/3.2.0/gems/ddtrace-1.11.1/lib/datadog/tracing/trace_operation.rb:192:in `block in measure'
	from /app/vendor/bundle/ruby/3.2.0/gems/ddtrace-1.11.1/lib/datadog/tracing/span_operation.rb:150:in `measure'
	from /app/vendor/bundle/ruby/3.2.0/gems/ddtrace-1.11.1/lib/datadog/tracing/trace_operation.rb:192:in `measure'
	from /app/vendor/bundle/ruby/3.2.0/gems/ddtrace-1.11.1/lib/datadog/tracing/tracer.rb:380:in `start_span'
	from /app/vendor/bundle/ruby/3.2.0/gems/ddtrace-1.11.1/lib/datadog/tracing/tracer.rb:160:in `block in trace'
	from /app/vendor/bundle/ruby/3.2.0/gems/ddtrace-1.11.1/lib/datadog/tracing/context.rb:45:in `activate!'
	from /app/vendor/bundle/ruby/3.2.0/gems/ddtrace-1.11.1/lib/datadog/tracing/tracer.rb:159:in `trace'
	from /app/vendor/bundle/ruby/3.2.0/gems/ddtrace-1.11.1/lib/datadog/tracing.rb:18:in `trace'
	from /app/vendor/bundle/ruby/3.2.0/gems/ddtrace-1.11.1/lib/datadog/tracing/contrib/redis/trace_middleware.rb:13:in `call'
	from /app/vendor/bundle/ruby/3.2.0/gems/redis-client-0.14.1/lib/redis_client.rb:210:in `block in call'
	from /app/vendor/bundle/ruby/3.2.0/gems/redis-client-0.14.1/lib/redis_client.rb:626:in `ensure_connected'
	from /app/vendor/bundle/ruby/3.2.0/gems/redis-client-0.14.1/lib/redis_client.rb:209:in `call'
	from /app/vendor/bundle/ruby/3.2.0/gems/redis-client-0.14.1/lib/redis_client/decorator.rb:26:in `call'
	from /app/vendor/bundle/ruby/3.2.0/gems/sidekiq-ent-7.1.0/lib/sidekiq-ent/scripting.rb:21:in `block in call'
	from /app/vendor/bundle/ruby/3.2.0/gems/connection_pool-2.4.0/lib/connection_pool.rb:107:in `block (2 levels) in with'
	from /app/vendor/bundle/ruby/3.2.0/gems/connection_pool-2.4.0/lib/connection_pool.rb:106:in `handle_interrupt'
	from /app/vendor/bundle/ruby/3.2.0/gems/connection_pool-2.4.0/lib/connection_pool.rb:106:in `block in with'
	from /app/vendor/bundle/ruby/3.2.0/gems/connection_pool-2.4.0/lib/connection_pool.rb:103:in `handle_interrupt'
	from /app/vendor/bundle/ruby/3.2.0/gems/connection_pool-2.4.0/lib/connection_pool.rb:103:in `with'
	from /app/vendor/bundle/ruby/3.2.0/gems/sidekiq-ent-7.1.0/lib/sidekiq-ent/scripting.rb:20:in `call'
	from /app/vendor/bundle/ruby/3.2.0/gems/sidekiq-ent-7.1.0/lib/sidekiq-ent/senate.rb:148:in `update_leader'
	from /app/vendor/bundle/ruby/3.2.0/gems/sidekiq-ent-7.1.0/lib/sidekiq-ent/senate.rb:124:in `election'
	from /app/vendor/bundle/ruby/3.2.0/gems/sidekiq-ent-7.1.0/lib/sidekiq-ent/senate.rb:103:in `cycle'
	from /app/vendor/bundle/ruby/3.2.0/gems/sidekiq-7.1.0/lib/sidekiq/component.rb:10:in `watchdog'
	from /app/vendor/bundle/ruby/3.2.0/gems/sidekiq-7.1.0/lib/sidekiq/component.rb:19:in `block in safe_thread'

#116 seemed possibly related, but it looks like Sidekiq already defaults to having reconnect_attempts set to 1 so maybe a dead end there.

I've struggled to figure out what the actual cause is. Any ideas what direction to head on this?

@byroot
Copy link
Member

byroot commented May 22, 2023

Yes, AFAIK, sidekiq does set reconnect_attempts: 1. But perhaps it's depends how you configure it?

Also I don't have the code to sidekiq-ent handy, but perhaps it's a problem similar to sidekiq/sidekiq#5834. I'll have a look tomorrow to see if I can see something fishy.

FYI @mperham

@byroot
Copy link
Member

byroot commented May 22, 2023

Question, is the vast majority coming from watchdog / senate.rb, or is it all over the place?

@leequarella leequarella changed the title Broken Pipe and Broken Pipe and EOFErrors May 22, 2023
@leequarella
Copy link
Author

Question, is the vast majority coming from watchdog / senate.rb, or is it all over the place?

Seems to be all over the place.

@casperisfine
Copy link
Collaborator

Ok, so I had a look but couldn't spot anything wrong. Sidekiq 7.x does default to reconnect_attempts: 1 with no delay, which match redis-rb 4.x behavior.

I really fail to see what could have made the difference on your setup. Is the Sidekiq upgrade the only thing you shipped?

Any ideas what direction to head on this?

I'd say the very first step would be to double check reconnects do happen on your setup. I'd suggest forking the gem to add some logs here:

if !@disable_reconnection && config.retry_connecting?(tries, error)
tries += 1
retry

I suspect it's fine, but you never know.

Then I assume you use some third party Redis provider? It's possible that they're failing over of things like that. Any pattern on the errors? Is it a constant stream or is there some spikes?

If so, you may need to try exponential backoff retries: https://github.com/redis-rb/redis-client#exponential-backoff

@imarquezc
Copy link

Any updates on this issue? I'm experiencing the same after upgrading to sidekiq to 7.1.2 (which relies in redis-client 0.14.1).

I'm deploying on Heroku-22
Rails version: 7.0.6 (also happened on 6.1.7.3)

❯ heroku redis:info 
=== redis-cubic-41951 (REDIS_URL)
Plan:                   Premium 0
Status:                 available
Created:                2022-03-07 16:17 
Version:                6.2.12
Timeout:                300
Maxmemory:              noeviction
Maintenance:            not required
Maintenance window:     Mondays 19:00 to 23:00 UTC
Persistence:            AOF
HA Status:              Available
Requires TLS:           Yes
Keyspace Notifications: Disabled

@byroot
Copy link
Member

byroot commented Aug 1, 2023

I wonder if this could have been solved by #126.

But that does require a forking setup in which redis-client is used before fork, which isn't super common. I think sidekiq enterprise does support that? Have you have feature enabled?

@brianclin
Copy link

brianclin commented Sep 13, 2023

I've also been getting a lot of RedisClient::ConnectionError: Broken pipe errors.
It seems to have happened after I enabled TLS on Redis.
I'm using sidekiq 7.1 and redis 6.X on GCP.

RedisClient::ConnectionError: Broken pipe
/usr/local/bundle/gems/redis-client-0.17.0/lib/redis_client/ruby_connection.rb:76:in `rescue in write': Broken pipe (RedisClient::ConnectionError)
	from /usr/local/bundle/gems/redis-client-0.17.0/lib/redis_client/ruby_connection.rb:73:in `write'
	from /usr/local/bundle/gems/redis-client-0.17.0/lib/redis_client/connection_mixin.rb:30:in `call'
	from /usr/local/bundle/gems/redis-client-0.17.0/lib/redis_client.rb:224:in `block (2 levels) in call'
	from /usr/local/bundle/gems/redis-client-0.17.0/lib/redis_client/middlewares.rb:16:in `call'
	from /usr/local/bundle/gems/ddtrace-1.12.1/lib/datadog/tracing/contrib/redis/trace_middleware.rb:23:in `block in call'
	from /usr/local/bundle/gems/ddtrace-1.12.1/lib/datadog/tracing/trace_operation.rb:191:in `block in measure'
	from /usr/local/bundle/gems/ddtrace-1.12.1/lib/datadog/tracing/span_operation.rb:149:in `measure'
	from /usr/local/bundle/gems/ddtrace-1.12.1/lib/datadog/tracing/trace_operation.rb:191:in `measure'
	from /usr/local/bundle/gems/ddtrace-1.12.1/lib/datadog/tracing/tracer.rb:379:in `start_span'
	from /usr/local/bundle/gems/ddtrace-1.12.1/lib/datadog/tracing/tracer.rb:159:in `block in trace'
	from /usr/local/bundle/gems/ddtrace-1.12.1/lib/datadog/tracing/context.rb:45:in `activate!'
	from /usr/local/bundle/gems/ddtrace-1.12.1/lib/datadog/tracing/tracer.rb:158:in `trace'
	from /usr/local/bundle/gems/ddtrace-1.12.1/lib/datadog/tracing.rb:18:in `trace'
	from /usr/local/bundle/gems/ddtrace-1.12.1/lib/datadog/tracing/contrib/redis/trace_middleware.rb:13:in `call'
	from /usr/local/bundle/gems/redis-client-0.17.0/lib/redis_client.rb:223:in `block in call'
	from /usr/local/bundle/gems/redis-client-0.17.0/lib/redis_client.rb:643:in `ensure_connected'
	from /usr/local/bundle/gems/redis-client-0.17.0/lib/redis_client.rb:222:in `call'
	from /usr/local/bundle/gems/redis-client-0.17.0/lib/redis_client/decorator.rb:26:in `call'
	from /usr/local/bundle/gems/sidekiq-7.1.3/lib/sidekiq/scheduled.rb:57:in `zpopbyscore'
	from /usr/local/bundle/gems/sidekiq-7.1.3/lib/sidekiq/scheduled.rb:38:in `block (2 levels) in enqueue_jobs'
	from /usr/local/bundle/gems/sidekiq-7.1.3/lib/sidekiq/scheduled.rb:33:in `each'
	from /usr/local/bundle/gems/sidekiq-7.1.3/lib/sidekiq/scheduled.rb:33:in `block in enqueue_jobs'
	from /usr/local/bundle/gems/sidekiq-7.1.3/lib/sidekiq/config.rb:164:in `block in redis'
	from /usr/local/bundle/gems/connection_pool-2.4.1/lib/connection_pool.rb:110:in `block (2 levels) in with'
	from /usr/local/bundle/gems/connection_pool-2.4.1/lib/connection_pool.rb:109:in `handle_interrupt'
	from /usr/local/bundle/gems/connection_pool-2.4.1/lib/connection_pool.rb:109:in `block in with'
	from /usr/local/bundle/gems/connection_pool-2.4.1/lib/connection_pool.rb:106:in `handle_interrupt'
	from /usr/local/bundle/gems/connection_pool-2.4.1/lib/connection_pool.rb:106:in `with'
	from /usr/local/bundle/gems/sidekiq-7.1.3/lib/sidekiq/config.rb:161:in `redis'
	from /usr/local/bundle/gems/sidekiq-7.1.3/lib/sidekiq/component.rb:28:in `redis'
	from /usr/local/bundle/gems/sidekiq-7.1.3/lib/sidekiq/scheduled.rb:32:in `enqueue_jobs'
	from /usr/local/bundle/gems/sidekiq-7.1.3/lib/sidekiq/scheduled.rb:107:in `enqueue'
	from /usr/local/bundle/gems/ddtrace-1.12.1/lib/datadog/tracing/contrib/sidekiq/server_internal_tracer/scheduled_poller.rb:27:in `block in enqueue'
	from /usr/local/bundle/gems/ddtrace-1.12.1/lib/datadog/tracing/trace_operation.rb:191:in `block in measure'
	from /usr/local/bundle/gems/ddtrace-1.12.1/lib/datadog/tracing/span_operation.rb:149:in `measure'
	from /usr/local/bundle/gems/ddtrace-1.12.1/lib/datadog/tracing/trace_operation.rb:191:in `measure'
	from /usr/local/bundle/gems/ddtrace-1.12.1/lib/datadog/tracing/tracer.rb:379:in `start_span'
	from /usr/local/bundle/gems/ddtrace-1.12.1/lib/datadog/tracing/tracer.rb:159:in `block in trace'
	from /usr/local/bundle/gems/ddtrace-1.12.1/lib/datadog/tracing/context.rb:45:in `activate!'
	from /usr/local/bundle/gems/ddtrace-1.12.1/lib/datadog/tracing/tracer.rb:158:in `trace'
	from /usr/local/bundle/gems/ddtrace-1.12.1/lib/datadog/tracing.rb:18:in `trace'
	from /usr/local/bundle/gems/ddtrace-1.12.1/lib/datadog/tracing/contrib/sidekiq/server_internal_tracer/scheduled_poller.rb:14:in `enqueue'
	from /usr/local/bundle/gems/sidekiq-7.1.3/lib/sidekiq/scheduled.rb:99:in `block in start'
	from /usr/local/bundle/gems/sidekiq-7.1.3/lib/sidekiq/component.rb:10:in `watchdog'
	from /usr/local/bundle/gems/sidekiq-7.1.3/lib/sidekiq/component.rb:19:in `block in safe_thread'
/usr/local/lib/ruby/3.2.0/openssl/buffering.rb:415:in `syswrite_nonblock': Broken pipe (Errno::EPIPE)
	from /usr/local/lib/ruby/3.2.0/openssl/buffering.rb:415:in `write_nonblock'
	from /usr/local/bundle/gems/redis-client-0.17.0/lib/redis_client/ruby_connection/buffered_io.rb:64:in `block in write'
	from /usr/local/bundle/gems/redis-client-0.17.0/lib/redis_client/ruby_connection/buffered_io.rb:63:in `loop'
	from /usr/local/bundle/gems/redis-client-0.17.0/lib/redis_client/ruby_connection/buffered_io.rb:63:in `write'
	from /usr/local/bundle/gems/redis-client-0.17.0/lib/redis_client/ruby_connection.rb:74:in `write'
	from /usr/local/bundle/gems/redis-client-0.17.0/lib/redis_client/connection_mixin.rb:30:in `call'
	from /usr/local/bundle/gems/redis-client-0.17.0/lib/redis_client.rb:224:in `block (2 levels) in call'
	from /usr/local/bundle/gems/redis-client-0.17.0/lib/redis_client/middlewares.rb:16:in `call'
	from /usr/local/bundle/gems/ddtrace-1.12.1/lib/datadog/tracing/contrib/redis/trace_middleware.rb:23:in `block in call'
	from /usr/local/bundle/gems/ddtrace-1.12.1/lib/datadog/tracing/trace_operation.rb:191:in `block in measure'
	from /usr/local/bundle/gems/ddtrace-1.12.1/lib/datadog/tracing/span_operation.rb:149:in `measure'
	from /usr/local/bundle/gems/ddtrace-1.12.1/lib/datadog/tracing/trace_operation.rb:191:in `measure'
	from /usr/local/bundle/gems/ddtrace-1.12.1/lib/datadog/tracing/tracer.rb:379:in `start_span'
	from /usr/local/bundle/gems/ddtrace-1.12.1/lib/datadog/tracing/tracer.rb:159:in `block in trace'
	from /usr/local/bundle/gems/ddtrace-1.12.1/lib/datadog/tracing/context.rb:45:in `activate!'
	from /usr/local/bundle/gems/ddtrace-1.12.1/lib/datadog/tracing/tracer.rb:158:in `trace'
	from /usr/local/bundle/gems/ddtrace-1.12.1/lib/datadog/tracing.rb:18:in `trace'
	from /usr/local/bundle/gems/ddtrace-1.12.1/lib/datadog/tracing/contrib/redis/trace_middleware.rb:13:in `call'
	from /usr/local/bundle/gems/redis-client-0.17.0/lib/redis_client.rb:223:in `block in call'
	from /usr/local/bundle/gems/redis-client-0.17.0/lib/redis_client.rb:643:in `ensure_connected'
	from /usr/local/bundle/gems/redis-client-0.17.0/lib/redis_client.rb:222:in `call'
	from /usr/local/bundle/gems/redis-client-0.17.0/lib/redis_client/decorator.rb:26:in `call'
	from /usr/local/bundle/gems/sidekiq-7.1.3/lib/sidekiq/scheduled.rb:57:in `zpopbyscore'
	from /usr/local/bundle/gems/sidekiq-7.1.3/lib/sidekiq/scheduled.rb:38:in `block (2 levels) in enqueue_jobs'
	from /usr/local/bundle/gems/sidekiq-7.1.3/lib/sidekiq/scheduled.rb:33:in `each'
	from /usr/local/bundle/gems/sidekiq-7.1.3/lib/sidekiq/scheduled.rb:33:in `block in enqueue_jobs'
	from /usr/local/bundle/gems/sidekiq-7.1.3/lib/sidekiq/config.rb:164:in `block in redis'
	from /usr/local/bundle/gems/connection_pool-2.4.1/lib/connection_pool.rb:110:in `block (2 levels) in with'
	from /usr/local/bundle/gems/connection_pool-2.4.1/lib/connection_pool.rb:109:in `handle_interrupt'
	from /usr/local/bundle/gems/connection_pool-2.4.1/lib/connection_pool.rb:109:in `block in with'
	from /usr/local/bundle/gems/connection_pool-2.4.1/lib/connection_pool.rb:106:in `handle_interrupt'
	from /usr/local/bundle/gems/connection_pool-2.4.1/lib/connection_pool.rb:106:in `with'
	from /usr/local/bundle/gems/sidekiq-7.1.3/lib/sidekiq/config.rb:161:in `redis'
	from /usr/local/bundle/gems/sidekiq-7.1.3/lib/sidekiq/component.rb:28:in `redis'
	from /usr/local/bundle/gems/sidekiq-7.1.3/lib/sidekiq/scheduled.rb:32:in `enqueue_jobs'
	from /usr/local/bundle/gems/sidekiq-7.1.3/lib/sidekiq/scheduled.rb:107:in `enqueue'
	from /usr/local/bundle/gems/ddtrace-1.12.1/lib/datadog/tracing/contrib/sidekiq/server_internal_tracer/scheduled_poller.rb:27:in `block in enqueue'
	from /usr/local/bundle/gems/ddtrace-1.12.1/lib/datadog/tracing/trace_operation.rb:191:in `block in measure'
	from /usr/local/bundle/gems/ddtrace-1.12.1/lib/datadog/tracing/span_operation.rb:149:in `measure'
	from /usr/local/bundle/gems/ddtrace-1.12.1/lib/datadog/tracing/trace_operation.rb:191:in `measure'
	from /usr/local/bundle/gems/ddtrace-1.12.1/lib/datadog/tracing/tracer.rb:379:in `start_span'
	from /usr/local/bundle/gems/ddtrace-1.12.1/lib/datadog/tracing/tracer.rb:159:in `block in trace'
	from /usr/local/bundle/gems/ddtrace-1.12.1/lib/datadog/tracing/context.rb:45:in `activate!'
	from /usr/local/bundle/gems/ddtrace-1.12.1/lib/datadog/tracing/tracer.rb:158:in `trace'
	from /usr/local/bundle/gems/ddtrace-1.12.1/lib/datadog/tracing.rb:18:in `trace'
	from /usr/local/bundle/gems/ddtrace-1.12.1/lib/datadog/tracing/contrib/sidekiq/server_internal_tracer/scheduled_poller.rb:14:in `enqueue'
	from /usr/local/bundle/gems/sidekiq-7.1.3/lib/sidekiq/scheduled.rb:99:in `block in start'
	from /usr/local/bundle/gems/sidekiq-7.1.3/lib/sidekiq/component.rb:10:in `watchdog'
	from /usr/local/bundle/gems/sidekiq-7.1.3/lib/sidekiq/component.rb:19:in `block in safe_thread'

@slai11
Copy link

slai11 commented Nov 21, 2023

I'm seeing a similar behaviour too (both EOFError and Broken pipe). I'm using sidekiq 7.1.6, redis-client 0.18.0 and redis 6.X (auth enabled).

More info in https://gitlab.com/gitlab-com/gl-infra/production/-/issues/17158#note_1659816466.

We are fairly confident the upgrade caused the incident but unsure of exactly why. We did a roll-back of the release containing the sidekiq 7 upgrade and the problem (redis errors + sidekiq apdex issues) went away. We subsequently re-deployed the release but this time without the sidekiq 7 upgrade commit and the metrics were all fine.

@casperisfine
Copy link
Collaborator

@slai11 I'd love to get to the bottom of that issue, but without experiencing it myself, it's pretty much impossible.

But if you figure out a repro, please share I'll be happy to squash that bug.

@CaioIcy
Copy link

CaioIcy commented Nov 23, 2023

We've been seeing the same issue: Redis 6.X with TLS, redis-client 0.18, connection_pool 2.4.1

It happens to all our Redis connections, including caching & Sidekiq. I've tried messing with reconnect_attempts and timeout but this did not seem to help. Also, it seems to correlate with fetching larger data when it is the Rails.cache

I've tried to reproduce it but so far no luck. This comment seems the closest I could find to an explanation, so maybe it could help to link it here: mperham/connection_pool#177 (comment).

@davidgm0
Copy link

davidgm0 commented Nov 24, 2023

I believe we're also getting the same here with redis-client 0.18.0 and connection_pool 2.4.1. We're also using TLS. It's only happening with a job that uses activejob-uniqueness and redlock though:

Redlock::LockAcquisitionError: failed to acquire lock on 'Too many Redis errors prevented lock acquisition:
RedisClient::ConnectionError: Broken pipe'
/usr/local/bundle/gems/redlock-2.0.6/lib/redlock/client.rb:323:in `lock_instances'
/usr/local/bundle/gems/redlock-2.0.6/lib/redlock/client.rb:275:in `block in try_lock_instances'
/usr/local/bundle/gems/redlock-2.0.6/lib/redlock/client.rb:271:in `times'
/usr/local/bundle/gems/redlock-2.0.6/lib/redlock/client.rb:271:in `try_lock_instances'
/usr/local/bundle/gems/redlock-2.0.6/lib/redlock/client.rb:79:in `lock'
/usr/local/bundle/gems/activejob-uniqueness-0.3.1/lib/active_job/uniqueness/strategies/base.rb:24:in `lock'

@le0pard
Copy link

le0pard commented Nov 25, 2023

sidekiq/sidekiq#5929 (comment) - provided more info about case.

So basically it happening with no load on sidekiq daemon and some threads get timeout error (some, not all of them in sidekiq). redis.rb had no such issues in previous sidekiq <= 6

@stanhu
Copy link
Contributor

stanhu commented Nov 26, 2023

I'm not sure yet if I reproduced this specific issue, but today I set up a Google Cloud Memorystore instance with Redis 6.2.13 and TLS enabled. Then I ran:

require 'redis-client'
redis_config = RedisClient.config(host: 'redacted-redis-hostname', port: 6378, ssl: true, password: 'redacted-password')
redis = redis_config.new_client
redis.call("PING")
=> "PONG"

I waited an hour or so, and sent another PING, but this time I got the EPIPE:

irb(main):015:0> redis.call("PING")
/opt/gitlab/embedded/lib/ruby/gems/3.0.0/gems/redis-client-0.18.0/lib/redis_client/ruby_connection.rb:76:in `rescue in write': Broken pipe (RedisClient::ConnectionError)
        from /opt/gitlab/embedded/lib/ruby/gems/3.0.0/gems/redis-client-0.18.0/lib/redis_client/ruby_connection.rb:73:in `write'
        from /opt/gitlab/embedded/lib/ruby/gems/3.0.0/gems/redis-client-0.18.0/lib/redis_client/connection_mixin.rb:30:in `call'
        from /opt/gitlab/embedded/lib/ruby/gems/3.0.0/gems/redis-client-0.18.0/lib/redis_client.rb:256:in `block (2 levels) in call'
        from /opt/gitlab/embedded/lib/ruby/gems/3.0.0/gems/redis-client-0.18.0/lib/redis_client/middlewares.rb:16:in `call'
        from /opt/gitlab/embedded/lib/ruby/gems/3.0.0/gems/redis-client-0.18.0/lib/redis_client.rb:255:in `block in call'
        from /opt/gitlab/embedded/lib/ruby/gems/3.0.0/gems/redis-client-0.18.0/lib/redis_client.rb:675:in `ensure_connected'
        from /opt/gitlab/embedded/lib/ruby/gems/3.0.0/gems/redis-client-0.18.0/lib/redis_client.rb:254:in `call'
        from (irb):15:in `<main>'
        from /opt/gitlab/embedded/lib/ruby/gems/3.0.0/gems/irb-1.3.5/exe/irb:11:in `<top (required)>'
        from /opt/gitlab/embedded/bin/irb:23:in `load'
        from /opt/gitlab/embedded/bin/irb:23:in `<main>'
/opt/gitlab/embedded/lib/ruby/gems/3.0.0/gems/openssl-3.1.0/lib/openssl/buffering.rb:415:in `syswrite_nonblock': Broken pipe (Errno::EPIPE)
        from /opt/gitlab/embedded/lib/ruby/gems/3.0.0/gems/openssl-3.1.0/lib/openssl/buffering.rb:415:in `write_nonblock'
        from /opt/gitlab/embedded/lib/ruby/gems/3.0.0/gems/redis-client-0.18.0/lib/redis_client/ruby_connection/buffered_io.rb:64:in `block in write'
        from /opt/gitlab/embedded/lib/ruby/gems/3.0.0/gems/redis-client-0.18.0/lib/redis_client/ruby_connection/buffered_io.rb:63:in `loop'
        from /opt/gitlab/embedded/lib/ruby/gems/3.0.0/gems/redis-client-0.18.0/lib/redis_client/ruby_connection/buffered_io.rb:63:in `write'
        from /opt/gitlab/embedded/lib/ruby/gems/3.0.0/gems/redis-client-0.18.0/lib/redis_client/ruby_connection.rb:74:in `write'
        from /opt/gitlab/embedded/lib/ruby/gems/3.0.0/gems/redis-client-0.18.0/lib/redis_client/connection_mixin.rb:30:in `call'
        from /opt/gitlab/embedded/lib/ruby/gems/3.0.0/gems/redis-client-0.18.0/lib/redis_client.rb:256:in `block (2 levels) in call'
        from /opt/gitlab/embedded/lib/ruby/gems/3.0.0/gems/redis-client-0.18.0/lib/redis_client/middlewares.rb:16:in `call'
        from /opt/gitlab/embedded/lib/ruby/gems/3.0.0/gems/redis-client-0.18.0/lib/redis_client.rb:255:in `block in call'
        from /opt/gitlab/embedded/lib/ruby/gems/3.0.0/gems/redis-client-0.18.0/lib/redis_client.rb:675:in `ensure_connected'
        from /opt/gitlab/embedded/lib/ruby/gems/3.0.0/gems/redis-client-0.18.0/lib/redis_client.rb:254:in `call'
        from (irb):15:in `<main>'
        from /opt/gitlab/embedded/lib/ruby/gems/3.0.0/gems/irb-1.3.5/exe/irb:11:in `<top (required)>'
        from /opt/gitlab/embedded/bin/irb:23:in `load'
        from /opt/gitlab/embedded/bin/irb:23:in `<main>'

Naturally, I was curious if Google's 10-minute VPC timeout was kicking in, but the issue didn't seem to occur if I attempted a PING after 10 minutes. In addition, I also verified with Wireshark that TCP keepalives appear to be sent and acked every 15 seconds, so it doesn't appear that the TCP connection is going idle.

I also checked that the Redis TIMEOUT config parameter defaults to 0.

I'm not sure why the socket appears to be closing like this, so I'll continue to investigate.

@stanhu
Copy link
Contributor

stanhu commented Nov 26, 2023

Interesting, Redis 6.2.13 does NOT have redis/redis#10931, which fixes redis/redis#10915.

Redis 6.2.8 did get it after all: redis/redis#11607

@le0pard
Copy link

le0pard commented Nov 26, 2023

Sidekiq threads send BRPOP each 2 seconds, so it is not VPC timeout

stanhu added a commit to stanhu/sidekiq-cron that referenced this issue Nov 27, 2023
In Sidekiq 7, calls to `Sidekiq.redis` checks out a connection from an
internal connection pool instead of the connection pool used by the
job processors. If that connection is checked out but not used, it's
possible that the heartbeat thread won't keep the connection alive
before the Redis server client timeout.

To avoid this, update the checks to return earlier if the Redis
connection is not needed.

This might help reduce intermittent `Broken pipe` errors as reported
in redis-rb/redis-client#119.
stanhu added a commit to stanhu/sidekiq-cron that referenced this issue Nov 27, 2023
In Sidekiq 7, calls to `Sidekiq.redis` checks out a connection from an
internal connection pool instead of the connection pool used by the
job processors. If a connection is checked out but not used, it's
possible that the heartbeat thread won't keep the connection alive
before the Redis server client timeout.

To avoid this, update the checks to return earlier if the Redis
connection is not needed.

This might help reduce intermittent `Broken pipe` errors as reported
in redis-rb/redis-client#119.
@stanhu
Copy link
Contributor

stanhu commented Nov 27, 2023

I've done some lengthy investigation on this issue as it relates to Sidekiq, and at the moment I don't believe this is a redis-client problem. I suspect some of the issues may be caused by:

  1. A Redis connection that's idling too long (for example in sidekiq-cron: Only check out a Redis connection if necessary sidekiq-cron/sidekiq-cron#438)
  2. A pool contention/deadlock issue when a Sidekiq job tries to access Sidekiq.redis within the job (for example in sidekiq-cron: Avoid using Sidekiq.redis in Sidekiq 7 to avoid pool contention sidekiq-cron/sidekiq-cron#439).

I reproduced broken pipe and EOFErrors by setting the Redis TIMEOUT config to a non-zero value (60 or 120 s seems to work). By default, it should be set to 0.

Sidekiq 6 used a single connection pool and added 5 additional connections as leeway.

However, as described in https://github.com/sidekiq/sidekiq/blob/main/docs/capsule.md#redis-pools, Sidekiq 7 created:

  1. A separate, internal pool of 10 connections.
  2. A capsule pool set to a size of concurrency.

For those of you having trouble:

  1. Do you have the Redis TIMEOUT value set? Run CONFIG GET timeout from redis-cli to verify.
  2. Look carefully in the error backtrace to determine whether the broken pipe errors are happening in the Sidekiq internal pool vs. the capsule pool.
  3. Look for calls to Sidekiq.redis within your application code, particularly ones that check out a Redis connection but don't actually issue commands. This could prevent the Sidekiq heartbeat thread from keeping the connection alive to avoid the Redis timeout.

It's important to determine where this is happening first because a call to Sidekiq.redis may use the local or capsule pool, depending on where it's called. Within a job or a job processor, Sidekiq.redis uses the capsule pool.

For example, I was seeing these broken pipe errors from within the heartbeat thread (line numbers may be slightly off due to debugging), until I realized that sidekiq-cron was calling Sidekiq.redis to access the internal pool.

/opt/gitlab/embedded/lib/ruby/gems/3.0.0/gems/sidekiq-cron-1.9.1/lib/sidekiq/cron/job.rb:20:in `should_enque?'
/opt/gitlab/embedded/lib/ruby/gems/3.0.0/gems/sidekiq-cron-1.9.1/lib/sidekiq/cron/job.rb:39:in `test_and_enque_for_time!'
/opt/gitlab/embedded/lib/ruby/gems/3.0.0/gems/sidekiq-cron-1.9.1/lib/sidekiq/cron/poller.rb:36:in `enqueue_job'
/opt/gitlab/embedded/lib/ruby/gems/3.0.0/gems/sidekiq-cron-1.9.1/lib/sidekiq/cron/poller.rb:23:in `block in enqueue'
/opt/gitlab/embedded/lib/ruby/gems/3.0.0/gems/sidekiq-cron-1.9.1/lib/sidekiq/cron/poller.rb:22:in `each'
/opt/gitlab/embedded/lib/ruby/gems/3.0.0/gems/sidekiq-cron-1.9.1/lib/sidekiq/cron/poller.rb:22:in `enqueue'
/opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/patch/sidekiq_poller.rb:9:in `block (2 levels) in enqueue'
/opt/gitlab/embedded/service/gitlab-rails/gems/gitlab-safe_request_store/lib/gitlab/safe_request_store.rb:66:in `enabling_request_store'
/opt/gitlab/embedded/service/gitlab-rails/gems/gitlab-safe_request_store/lib/gitlab/safe_request_store.rb:59:in `ensure_request_store'
/opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/patch/sidekiq_poller.rb:8:in `block in enqueue'
/opt/gitlab/embedded/lib/ruby/gems/3.0.0/gems/activesupport-7.0.8/lib/active_support/execution_wrapper.rb:92:in `wrap'
/opt/gitlab/embedded/lib/ruby/gems/3.0.0/gems/activesupport-7.0.8/lib/active_support/reloader.rb:72:in `block in wrap'
/opt/gitlab/embedded/lib/ruby/gems/3.0.0/gems/activesupport-7.0.8/lib/active_support/execution_wrapper.rb:92:in `wrap'
/opt/gitlab/embedded/lib/ruby/gems/3.0.0/gems/activesupport-7.0.8/lib/active_support/reloader.rb:71:in `wrap'
/opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/patch/sidekiq_poller.rb:7:in `enqueue'
/opt/gitlab/embedded/lib/ruby/gems/3.0.0/gems/sidekiq-7.1.6/lib/sidekiq/scheduled.rb:99:in `block in start'
/opt/gitlab/embedded/lib/ruby/gems/3.0.0/gems/sidekiq-7.1.6/lib/sidekiq/component.rb:10:in `watchdog'
/opt/gitlab/embedded/lib/ruby/gems/3.0.0/gems/sidekiq-7.1.6/lib/sidekiq/component.rb:19:in `block in safe_thread'
/opt/gitlab/embedded/lib/ruby/gems/3.0.0/gems/redis-client-0.18.0/lib/redis_client.rb:672:in `ensure_connected'
/opt/gitlab/embedded/lib/ruby/gems/3.0.0/gems/redis-client-0.18.0/lib/redis_client.rb:408:in `pipelined'
/opt/gitlab/embedded/lib/ruby/gems/3.0.0/gems/redis-client-0.18.0/lib/redis_client/decorator.rb:51:in `pipelined'
/opt/gitlab/embedded/lib/ruby/gems/3.0.0/gems/sidekiq-7.1.6/lib/sidekiq/launcher.rb:153:in `block in ❤'
/opt/gitlab/embedded/lib/ruby/gems/3.0.0/gems/sidekiq-7.1.6/lib/sidekiq/config.rb:166:in `block in redis'
/opt/gitlab/embedded/lib/ruby/gems/3.0.0/gems/connection_pool-2.4.1/lib/connection_pool.rb:110:in `block (2 levels) in with'
/opt/gitlab/embedded/lib/ruby/gems/3.0.0/gems/connection_pool-2.4.1/lib/connection_pool.rb:109:in `handle_interrupt'
/opt/gitlab/embedded/lib/ruby/gems/3.0.0/gems/connection_pool-2.4.1/lib/connection_pool.rb:109:in `block in with'
/opt/gitlab/embedded/lib/ruby/gems/3.0.0/gems/connection_pool-2.4.1/lib/connection_pool.rb:106:in `handle_interrupt'
/opt/gitlab/embedded/lib/ruby/gems/3.0.0/gems/connection_pool-2.4.1/lib/connection_pool.rb:106:in `with'
/opt/gitlab/embedded/lib/ruby/gems/3.0.0/gems/sidekiq-7.1.6/lib/sidekiq/config.rb:163:in `redis'
/opt/gitlab/embedded/lib/ruby/gems/3.0.0/gems/sidekiq-7.1.6/lib/sidekiq/component.rb:28:in `redis'
/opt/gitlab/embedded/lib/ruby/gems/3.0.0/gems/sidekiq-7.1.6/lib/sidekiq/launcher.rb:149:in `❤'
/opt/gitlab/embedded/lib/ruby/gems/3.0.0/gems/sidekiq-7.1.6/lib/sidekiq/launcher.rb:99:in `beat'
/opt/gitlab/embedded/lib/ruby/gems/3.0.0/gems/sidekiq-7.1.6/lib/sidekiq/launcher.rb:91:in `block in start_heartbeat'
/opt/gitlab/embedded/lib/ruby/gems/3.0.0/gems/sidekiq-7.1.6/lib/sidekiq/launcher.rb:90:in `loop'
/opt/gitlab/embedded/lib/ruby/gems/3.0.0/gems/sidekiq-7.1.6/lib/sidekiq/launcher.rb:90:in `start_heartbeat'
/opt/gitlab/embedded/lib/ruby/gems/3.0.0/gems/sidekiq-7.1.6/lib/sidekiq/component.rb:10:in `watchdog'
/opt/gitlab/embedded/lib/ruby/gems/3.0.0/gems/sidekiq-7.1.6/lib/sidekiq/component.rb:19:in `block in safe_thread'

In Sidekiq 6, the repeated BRPOP calls for every job processor every 2 seconds made it unlikely any of the connections in the pool would time out. However, the introduction of the internal pool in Sidekiq 7 makes it more likely for a connection in the internal pool to time out if the connection is checked out but not used.

stanhu added a commit to stanhu/sidekiq-cron that referenced this issue Nov 28, 2023
In Sidekiq 7, calls to `Sidekiq.redis` checks out a connection from an
internal connection pool instead of the connection pool used by the
job processors. If a connection is checked out but not used, it's
possible that the heartbeat thread won't keep the connection alive
before the Redis server client timeout.

To avoid this, update the checks to return earlier if the Redis
connection is not needed.

This might help reduce intermittent `Broken pipe` errors as reported
in redis-rb/redis-client#119.
@engwan
Copy link

engwan commented Nov 28, 2023

This may also just be due to instrumentation changes.

Looking at the stacktraces posted above, I see that you are using datadog.

/app/vendor/bundle/ruby/3.2.0/gems/ddtrace-1.11.1/lib/datadog/tracing/contrib/redis/trace_middleware.rb is called inside ensure_connected so it captures these connection errors even if they are retried successfully.

For the old Redis <5 client, it just patches Redis::Client which means it won't see these errors since these will be caught inside ensure_connected and retried successfully.

See https://github.com/DataDog/dd-trace-rb/blob/master/lib/datadog/tracing/contrib/redis/patcher.rb#L58-L71

@stanhu
Copy link
Contributor

stanhu commented Nov 28, 2023

This may also just be due to instrumentation changes.

Good point. I should have mentioned that, though I noticed many of the stack traces above suggest the problem is happening on the internal pool.

That makes me wonder whether the heartbeat thread isn't keeping these internal connections alive properly. As you mentioned somewhere else, a thread will tend to get the same connection that it used previously. I've validated this by printing out the conn values in the method.

In Sidekiq 6, the connection pool was touched by all job threads, so I suspect the likelihood of idling on any one connection was zero.

@mperham Should there be an option to make the heartbeat thread iterate through all connections? Right now it might be acceptable to let the Redis client reconnect gracefully, but users that have a Redis TIMEOUT setting defined and instrumentation to track EOFError will see more errors.

I think it would also help if connection_pool offered the ability to track:

  1. The last time the connection was checked out.
  2. The last time the connection issued a network command.

@stanhu
Copy link
Contributor

stanhu commented Nov 28, 2023

I should also mention the default reconnect of 1 may not be sufficient if multiple connections in the internal pool have been disconnected by the Redis server. It seems to me we probably want to preserve the previous behavior of keeping these connections alive.

@byroot
Copy link
Member

byroot commented Nov 28, 2023

the default reconnect of 1 may not be sufficient if multiple connections in the internal pool have been disconnected by the Redis server

I don't follow. reconnect_attempt is per connection, or are you pointing at something else?

@stanhu
Copy link
Contributor

stanhu commented Nov 28, 2023

reconnect_attempt is per connection, or are you pointing at something else?

My mistake. For some reason I thought the retry logic was in Sidekiq, but it's in redis-client.

To reiterate, I think we're likely seeing more of these EOFError exceptions because:

  1. The instrumentation has changed so that any EOFError will be logged, even if the connection retried successfully, as mentioned in Broken Pipe and EOFErrors #119 (comment).
  2. Sidekiq 7 no longer guarantees that each of the connections in the newly-introduced internal pool will send a Redis command within a Redis server timeout. The retry should succeed, however. The EOFErrors may just be noise in this case.

The problem is that it's difficult to determine whether these errors are just noise, or whether they're indicative of a more serious problem.

@byroot
Copy link
Member

byroot commented Nov 28, 2023

The problem is that it's difficult to determine whether these errors are just noise,

Right. I'll try to find some time to look at this instrumentation, see if we can hide non-final errors from it. Unless someone beat me to it of course.

markets pushed a commit to sidekiq-cron/sidekiq-cron that referenced this issue Nov 28, 2023
In Sidekiq 7, calls to `Sidekiq.redis` checks out a connection from an
internal connection pool instead of the connection pool used by the
job processors. If a connection is checked out but not used, it's
possible that the heartbeat thread won't keep the connection alive
before the Redis server client timeout.

To avoid this, update the checks to return earlier if the Redis
connection is not needed.

This might help reduce intermittent `Broken pipe` errors as reported
in redis-rb/redis-client#119.
@mperham
Copy link

mperham commented Nov 30, 2023

I've seen notes that Redis server has issues with OpenSSL 3. I've seen issues with Datadog and NewRelic's Redis instrumentation with the redis -> redis-client migration. It's difficult to know which bug or component is at fault.

If you are seeing mysterious errors, all I can advise at this point is to upgrade everything possibly related, including Redis server itself and any gems visible in the backtrace.

@mperham
Copy link

mperham commented Nov 30, 2023

redis/redis#10915

@slai11
Copy link

slai11 commented Jan 5, 2024

Right. I'll try to find some time to look at this instrumentation, see if we can hide non-final errors from it. Unless someone beat me to it of course.

@byroot I was looking into this and noticed that !@raw_connection.revalidate would only check if the connection is open from the client-side (I hacked around locally with a Redis on 1s timeout, the @io.closed? would be false even though redis-cli client list shows that the connection is closed).

Based on my understanding, retrying the command in ensure_connected is the more performant way to go about this so we should leave it at that.


Since the instrumentation middlewares are reporting the errors, could we pass useful information like retries to the block in

yield connection
and then to the middleware.call? The middleware authors could then handle the errors accordingly with this new information. e.g. rescue ConnectionError and ignore if attempts is 0.

   def measure_round_trip_delay
-    ensure_connected do |connection|
-      @middlewares.call(["PING"], config) do
+    ensure_connected do |connection, attempts|
+      @middlewares.call(["PING"], config, attempts) do
         connection.measure_round_trip_delay
       end
     end

@liaden
Copy link

liaden commented Jul 31, 2024

This may also just be due to instrumentation changes.

Looking at the stacktraces posted above, I see that you are using datadog.

/app/vendor/bundle/ruby/3.2.0/gems/ddtrace-1.11.1/lib/datadog/tracing/contrib/redis/trace_middleware.rb is called inside ensure_connected so it captures these connection errors even if they are retried successfully.

For the old Redis <5 client, it just patches Redis::Client which means it won't see these errors since these will be caught inside ensure_connected and retried successfully.

See https://github.com/DataDog/dd-trace-rb/blob/master/lib/datadog/tracing/contrib/redis/patcher.rb#L58-L71

We were seeing the increase, and I was able to go from an error trace to viewing the error in the apm view. From there, I could see a lot of sidekiq job traces that had been affected with the same issue id, and see the status was "OK". This leads me to believe this theory.

Thanks for suggesting it. 😄

@byroot
Copy link
Member

byroot commented Aug 1, 2024

I opened DataDog/dd-trace-rb#3820, I'll try to work with Datadog people to make the necessary changes to these transcient errors can be either ignored or marked as retried.

@sebgrebe
Copy link

I believe we're also getting the same here with redis-client 0.18.0 and connection_pool 2.4.1. We're also using TLS. It's only happening with a job that uses activejob-uniqueness and redlock though:

Redlock::LockAcquisitionError: failed to acquire lock on 'Too many Redis errors prevented lock acquisition:
RedisClient::ConnectionError: Broken pipe'
/usr/local/bundle/gems/redlock-2.0.6/lib/redlock/client.rb:323:in `lock_instances'
/usr/local/bundle/gems/redlock-2.0.6/lib/redlock/client.rb:275:in `block in try_lock_instances'
/usr/local/bundle/gems/redlock-2.0.6/lib/redlock/client.rb:271:in `times'
/usr/local/bundle/gems/redlock-2.0.6/lib/redlock/client.rb:271:in `try_lock_instances'
/usr/local/bundle/gems/redlock-2.0.6/lib/redlock/client.rb:79:in `lock'
/usr/local/bundle/gems/activejob-uniqueness-0.3.1/lib/active_job/uniqueness/strategies/base.rb:24:in `lock'

@davidgm0 I am having the same issue with redis-client 0.22 and activejob-uniqueness. Did you get to the bottom of this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests