-
Notifications
You must be signed in to change notification settings - Fork 6.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix LRU eviction of client notification datastructure #4021
Conversation
@@ -39,6 +39,9 @@ extern RedisChainModule module; | |||
} \ | |||
} | |||
|
|||
/// Map from pub sub channel to clients that are waiting on that channel. | |||
std::unordered_map<std::string, std::vector<std::string>> notification_map; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add the size of this to the debug string somehow?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could alternatively print a warning if this map exceeds certain size bounds, e.g. 1000, 10k, 100k.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done, I added a redis command that can query the DEBUG_STRING
Could you also add
to the jenkins tests? This will hang-fail before this fix. |
It takes about 30 seconds. |
RedisModuleString *channel; | ||
RAY_RETURN_NOT_OK(FormatPubsubChannel(&channel, ctx, pubsub_channel_str, keyname)); | ||
RAY_CHECK_OK(FormatPubsubChannel(&channel, ctx, pubsub_channel_str, keyname)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@guoyuhong removed all RAY_CHECK
calls in #3855. So let's keep it that way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good idea, done
Test FAILed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Test FAILed. |
// therefore we construct a temporary redis string here, which | ||
// will be garbage collected by redis. | ||
auto channel = RedisModule_CreateString( | ||
ctx, client_channel.data(), client_channel.size()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pcmoritz Can we delete this temporary redis string immediately after PUBLISH command finishes? In our system we do not enable redis LRU by default. That may cause redis memory leak.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you talking about LRU eviction or about automatic memory management? We shouldn’t have to free strings or close keys because redis automatic memory management takes care of that. If you do need to free strings then we you would need to do it in many places.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I thought it was a key... It is just a Redis String, which is fine.
One more question, what is the typical request key number in the key value of the map |
Test FAILed. |
Test FAILed. |
Test FAILed. |
jenkins retest this please |
if (reply == NULL) { | ||
return RedisModule_ReplyWithError(ctx, "error during PUBLISH"); | ||
} | ||
} | ||
notification_map.erase(it); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain this line? You said it was a bug, but shouldn’t we only update this when a client calls subscribe or unsubscribe?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was a fluke, I just had forgotten how to erase from a c++ vector properly
Test FAILed. |
jenkins retest this please Also some lint errors... |
Test FAILed. |
Test FAILed. |
@guoyuhong The number of elements in the vector will be fairly small (= the number of clients subscribed to that key), probably < 100 in most cases. I expect vectors to be faster in this case than other datastructures. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pcmoritz Thanks for the confirmation.
Test FAILed. |
Test FAILed. |
Test FAILed. |
jenkins retest this please |
Odd this happens when the test is initializing, perhaps 10MB is too small on the jenkins environment. |
Test FAILed. |
Test FAILed. |
Test FAILed. |
Jenkins retest this please |
Test FAILed. |
Test FAILed. |
alright, looks like Jenkins is finally happy (the multinode_test script might not pass through the flag to redis, it's a little strange but also not the intended use case...) |
Test FAILed. |
Ok, no luck with jenkins. I'm going to merge this and we can figure out the test issue separately. |
Test FAILed. |
This should fix #3997
One question that is still open is if this datastructure is cleaned up properly (both if the wait exits and also if clients die that are blocked in a wait).