Fix LRU eviction of client notification datastructure#4021
Fix LRU eviction of client notification datastructure#4021ericl merged 16 commits intoray-project:masterfrom
Conversation
| } | ||
|
|
||
| /// Map from pub sub channel to clients that are waiting on that channel. | ||
| std::unordered_map<std::string, std::vector<std::string>> notification_map; |
There was a problem hiding this comment.
Can we add the size of this to the debug string somehow?
There was a problem hiding this comment.
We could alternatively print a warning if this map exceeds certain size bounds, e.g. 1000, 10k, 100k.
There was a problem hiding this comment.
done, I added a redis command that can query the DEBUG_STRING
|
Could you also add to the jenkins tests? This will hang-fail before this fix. |
|
It takes about 30 seconds. |
| RedisModuleString *keyname) { | ||
| RedisModuleString *channel; | ||
| RAY_RETURN_NOT_OK(FormatPubsubChannel(&channel, ctx, pubsub_channel_str, keyname)); | ||
| RAY_CHECK_OK(FormatPubsubChannel(&channel, ctx, pubsub_channel_str, keyname)); |
There was a problem hiding this comment.
@guoyuhong removed all RAY_CHECK calls in #3855. So let's keep it that way.
|
Test FAILed. |
|
Test FAILed. |
| // therefore we construct a temporary redis string here, which | ||
| // will be garbage collected by redis. | ||
| auto channel = RedisModule_CreateString( | ||
| ctx, client_channel.data(), client_channel.size()); |
There was a problem hiding this comment.
@pcmoritz Can we delete this temporary redis string immediately after PUBLISH command finishes? In our system we do not enable redis LRU by default. That may cause redis memory leak.
There was a problem hiding this comment.
Are you talking about LRU eviction or about automatic memory management? We shouldn’t have to free strings or close keys because redis automatic memory management takes care of that. If you do need to free strings then we you would need to do it in many places.
There was a problem hiding this comment.
Sorry, I thought it was a key... It is just a Redis String, which is fine.
|
One more question, what is the typical request key number in the key value of the map |
|
Test FAILed. |
|
Test FAILed. |
|
Test FAILed. |
|
jenkins retest this please |
| return RedisModule_ReplyWithError(ctx, "error during PUBLISH"); | ||
| } | ||
| } | ||
| notification_map.erase(it); |
There was a problem hiding this comment.
Can you explain this line? You said it was a bug, but shouldn’t we only update this when a client calls subscribe or unsubscribe?
There was a problem hiding this comment.
This was a fluke, I just had forgotten how to erase from a c++ vector properly
|
Test FAILed. |
|
jenkins retest this please Also some lint errors... |
|
Test FAILed. |
|
Test FAILed. |
|
@guoyuhong The number of elements in the vector will be fairly small (= the number of clients subscribed to that key), probably < 100 in most cases. I expect vectors to be faster in this case than other datastructures. |
|
Test FAILed. |
|
Test FAILed. |
|
Test FAILed. |
|
jenkins retest this please |
|
Odd this happens when the test is initializing, perhaps 10MB is too small on the jenkins environment. |
|
Test FAILed. |
|
Test FAILed. |
|
Test FAILed. |
|
Jenkins retest this please |
|
Test FAILed. |
|
Test FAILed. |
|
alright, looks like Jenkins is finally happy (the multinode_test script might not pass through the flag to redis, it's a little strange but also not the intended use case...) |
|
Test FAILed. |
|
Ok, no luck with jenkins. I'm going to merge this and we can figure out the test issue separately. |
|
Test FAILed. |
This should fix #3997
One question that is still open is if this datastructure is cleaned up properly (both if the wait exits and also if clients die that are blocked in a wait).