-
Notifications
You must be signed in to change notification settings - Fork 62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
hiredis-client Segfault on pubsub #141
Comments
Thanks, the C-level backtrace is interesting, I'll have a look to see if I spot something wrong.
So if you can, here's what would help:
|
Interestingly, void
rb_ary_store(VALUE ary, long idx, VALUE val)
{
long len = RARRAY_LEN(ary);
if (idx < 0) {
idx += len;
if (idx < 0) {
rb_raise(rb_eIndexError, "index %ld too small for array; minimum: %ld",
idx - len, -len);
}
}
else if (idx >= ARY_MAX_SIZE) {
rb_raise(rb_eIndexError, "index %ld too big", idx);
}
rb_ary_modify(ary);
if (idx >= ARY_CAPA(ary)) {
ary_double_capa(ary, idx);
}
if (idx > len) {
ary_mem_clear(ary, len, idx - len + 1);
}
if (idx >= len) {
ARY_SET_LEN(ary, idx + 1);
}
ARY_SET(ary, idx, val);
} |
Does this mean you recently upgraded? A possible cause for this would be 873cef7 which shipped with |
Potential fix for #141 I suspect the reference isn't always kept on the stack, and GC triggering at the wrong time may collect that object.
I pushed 1ab081c on |
Last week, I was in And from what I see in my git, 0.14.1 was my first version for this gem. My app starts Segfaulting without upgrading the version of the gem (if I remember correctly...).
Sure, thanks for the super quick answer ! |
In the meantime, I will launch a small reproduction script to see if I found the same error with the 0.17.0 version require "bundler/inline"
gemfile(true) do
source "https://rubygems.org"
gem 'rails', '7.0.6'
gem 'bootsnap', '1.16.0', require: false
gem 'hiredis-client', '0.17.0'
end
ENV['BOOTSNAP_CACHE_DIR'] = '~/.bootsnap'
require 'bootsnap/setup' # Speed up boot time by caching expensive operations.
Thread.new do
conn = RedisClient.new(reconnect_attempts: [0.2, 0.8, 2]).pubsub
conn.call('subscribe', 'feature_flags')
loop do
next unless message = conn.next_event(20)
p message
end
end
sleep |
My app with your last commit just segfault again.
I just submit a form in a part of the app that doesn't seem related. Is it possible that this segfault is linked to YJIT ? I can disable it to see if the bug reappear. |
It's not impossible but unlikely. There are a couple known YJIT bugs in 3.2.2 that can cause things like instance variable to be corrupted. These bugs have been backported to the 3_2 stable branch, but will only be out with 3.2.3. If you don't mind disabling YJIT, that would at least rule this out. I'll have another look. Perhaps add a type check so that at least we'll have more information about what is going on. |
Could you try this branch please: https://github.com/redis-rb/redis-client/tree/hiredis-type-check ? It adds more debug information, when you'll run into this bug it will print the object that is supposed to be an array. Ideally you retry a few times so that we can see if the unexpected object is consistent or random. Thanks in advance. |
I cannot start the app anymore :
Same with ou without YJIT. The |
Sorry, that's my bad, my branch wasn't accounting for something. Please pull the branch again. |
I just segfaulted again and nothing was print. :/ |
Oh wow. That would suggest the object is indeed an array, but then I have no idea why it would crash 🤔 . I'm afraid at this stage I'll need a standalone repro script to debug this further. |
No problem! My repro script didn't segfault yet. In the meantime I will try without YJIT and without Jemalloc (the last 2 things I changed) see if I can find any differencies. Thanks for your help and I'll keep you updated |
The small script has just segfaulted.
|
Could you share the reproduction script and steps? How long did it take to happen? |
I suppose it's the script in #141 (comment), but not sure what the event sent look like. |
Yes, the script is above. I launched it like that It didn't received any events. Not ideal to test :/ |
Yeah, I tried various versions of that without success: require "bundler/inline"
gemfile(true) do
source "https://rubygems.org"
gem 'hiredis-client', path: "/Users/byroot/src/github.com/redis-rb/redis-client"
end
require 'hiredis-client'
p RedisClient.default_driver
conn = RedisClient.new(reconnect_attempts: [0.2, 0.8, 2]).pubsub
conn.call('subscribe', 'feature_flags')
loop do
GC.stress = true
message = conn.next_event(0.3)
GC.stress = false
print '.'
end require "bundler/inline"
gemfile(true) do
source "https://rubygems.org"
gem 'hiredis-client', path: "/Users/byroot/src/github.com/redis-rb/redis-client"
end
require 'hiredis-client'
p RedisClient.default_driver
#
thr = Thread.new do
conn2 = RedisClient.new(reconnect_attempts: [0.2, 0.8, 2])
p conn2
100_000.times do
conn2.call("publish", "feature_flags", "Hello" * 1000)
sleep 0.01
end
end
conn = RedisClient.new(reconnect_attempts: [0.2, 0.8, 2]).pubsub
conn.call('subscribe', 'feature_flags')
loop do
GC.stress = true
message = conn.next_event(20)
GC.stress = false
print '.'
end I'm sorry, but without a repro I can't dig this further. I'd recommend making the necessary so that you get a core dump next time, so that at least you can use |
Ok, I'll check with the core dumps. JFYI, I got another segfault from my main app (I'm pretty sure it's linked with YJIT now, or at least it makes the segfault more obvious in a way), and it printed :
Thanks for your help since the beginning and sorry to have nothing easier for you! |
That is very interesting, thank you. So there is two weird stuff here. First the Array we created is no longer an array, so it wasn't marked or wasn't pined, that is weird. But also this This is incredibly weird. |
After several days without the YJIT, I had no segfaults, so I guess the error is not due to this gem. Maybe it's the memory corruption you were talking at the beginning of this thread. I will try again after the ruby 3.3 release. Anyway, thanks for your help! |
Sounds good. If you use ruby-build, rather than wait for Ruby 3.3 you can try https://github.com/Shopify/ruby-definitions |
This is also happening in Sidekiq when a queue is paused or unpaused.
|
Hum, that's interesting, that suggest the array we're trying to push into is frozen. Could be the same bug, but will be just as hard to figure out. |
I didn't have a lot of time recently but I can now almost garanti that it's linked to YJIT. YJIT + 3.2.2 => No problem For now, I stopped using hiredis, which is ok for me. But don't hesitate if you have something you want me to test! |
Hi there, I recently had a similar crash, in this case it was action-cable. It didn't print out a C level backtrace but here is the error
|
@mathieumahe I found and fixed a GC issue in #157, it might be the problem you were witnessing. If you have time I suggest trying the master branch. |
Alan also found a second issue when using |
I just re-enable hiredis with the last version of redis-client and hiredis-client. I'll come back in a few days to tell you if I saw a Segfault or not. Thanks! |
I totally forgot to come back. I'm sorry! Sadly, the same Segfault is still here. |
From time to time I have a segfault in the hiredis gem. It seems to happen only on my local computer.
Here's the error :
and then thousand of lines of loaded files.
Here's the
listen_to_changes
function :This code used to work correctly.
I recently update my ruby version to enable YJIT and Jemalloc on my local computer, so it could be related, but the bug is not easily reproducible.
Tell me if you need more information and thanks for your help.
The text was updated successfully, but these errors were encountered: