-
Notifications
You must be signed in to change notification settings - Fork 6.8k
[xray] Implementing Gcs sharding #2409
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[xray] Implementing Gcs sharding #2409
Conversation
Test FAILed. |
Test PASSed. |
@robertnishihara |
Test FAILed. |
515c0ea
to
c0dd5ee
Compare
Test FAILed. |
Test FAILed. |
Test PASSed. |
Test FAILed. |
Test FAILed. |
Test FAILed. |
Test FAILed. |
Test FAILed. |
In this PR, you should get rid of this if statement Lines 1157 to 1159 in 2a3b026
so that the sharded case will be tested more. |
Test FAILed. |
src/ray/gcs/client.cc
Outdated
static void GetRedisShards(redisContext *context, std::vector<std::string> *addresses, | ||
std::vector<int> *ports) { | ||
static void GetRedisShards(redisContext *context, std::vector<std::string>& addresses, | ||
std::vector<int>& ports) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the nit, but to keep the style consistent, we usually put the &
with the variable name, so &addresses
and &ports
instead of & addresses
and & ports
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Left one small comment, but otherwise looks great :)
Thanks! I would wait for the tests to finish and try linting it a little
bit.
…On Wednesday, August 22, 2018, Stephanie Wang ***@***.***> wrote:
***@***.**** approved this pull request.
Thanks! Left one small comment, but otherwise looks great :)
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#2409 (review)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AGbH8YR2MRn4YCeX1X0TeqwTUB7hGw7vks5uTcTTgaJpZM4VSEPq>
.
|
Test PASSed. |
@robertnishihara @stephanie-wang There's some error where "GET NumRedisShards" is getting a nil reply after I move the things into the constructor. My guess is that this command is not properly setup when the constructor is called. Please feel free to take a look at it /home/travis/build/ray-project/ray/src/ray/gcs/client.cc:23 Check failed: num_attempts < RayConfig::instance().redis_db_connect_retries() No entry found for NumRedisShards |
Test PASSed. |
Test PASSed. |
src/ray/gcs/client.cc
Outdated
// Used since changing the API of Connect is breaking some tests. | ||
// Would be removed soon in the future. | ||
const bool DUMMY_BOOL = true; | ||
RAY_CHECK_OK(primary_context_->Connect(address, port, DUMMY_BOOL)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about we just pass in true
instead of defining DUMMY_BOOL
. Maybe something like pass in true /* sharding */
.
I agree it would be best to just have a default argument, but it's ok to do that separately.
src/ray/gcs/client.cc
Outdated
} | ||
|
||
// Populate shard_contexts. | ||
for (unsigned int i = 0; i < addresses.size(); ++i) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We essentially don't use unsigned int
anywhere in the codebase. Can you just use int
here? This also applies below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep
src/ray/gcs/client.cc
Outdated
|
||
// Populate shard_contexts. | ||
for (unsigned int i = 0; i < addresses.size(); ++i) { | ||
// Slower than emplace but resource safe. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure this comment is helpful. This piece of code is definitely not performance critical. Maybe remove the comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep
src/ray/gcs/client.cc
Outdated
shard_contexts_.push_back(std::make_shared<RedisContext>()); | ||
} | ||
|
||
// Call connect for all contexts. Safe to do many times. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's safe to call Connect
multiple times? Why is that?
src/ray/gcs/client.cc
Outdated
} | ||
|
||
// Call connect for all contexts. Safe to do many times. | ||
// Here shard_contexts_.size() == addresses.size(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider making this comment a RAY_CHECK(shard_contexts_.size() == addresses.size())
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep.
Test FAILed. |
Test PASSed. |
Test FAILed. |
Test FAILed. |
Test FAILed. |
What do these changes do?
Basically a re-implementation of #2281, with modifications of #2298 (A fix of #2334, for rebasing issues.).
[+] Implement sharding for gcs tables.
[+] Keep ClientTable and ErrorTable managed by the primary_shard. TaskTable is managed by the primary_shard for now, until a good hashing for tasks is implemented.
[+] Move AsyncGcsClient's initialization into Connect function.
[-] Move GetRedisShard and bool sharding from RedisContext's connect into AsyncGcsClient. This may make the interface cleaner.
Related issue number