-
Notifications
You must be signed in to change notification settings - Fork 6.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support multiple store providers in ObjectInterface #5452
Conversation
Test PASSed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
High level design looks good, just need to clean up some of the details. Please request my review again when you've addressed the comments.
Co-Authored-By: Edward Oakes <ed.nmi.oakes@gmail.com>
Co-Authored-By: Edward Oakes <ed.nmi.oakes@gmail.com>
Co-Authored-By: Edward Oakes <ed.nmi.oakes@gmail.com>
…lliance/ray into multi-store-providers
@edoakes Thanks for reviewing! comments are resolved. |
Test PASSed. |
Looks much better, I think it's almost there - left a few more comments. The build also seems to be failing with a compile error. |
Thanks, code has been updated.
|
Test FAILed. |
Test PASSed. |
Test PASSed. |
@edoakes Let me know if you have any additional comments, or please give a Go if this looks ok?:) Thanks. |
Test PASSed. |
Looks good except I think we should rethink the way that we're giving both the Right now, the LMK if I'm missing something here. |
This is a good question. This is because the
|
I don't quite understand the problem - does a |
Let me explain it with more details. The problem here is that we cannot guarantee multiple threads access for the same store provider instance would not block for all the store providers. It actually depends on the implementation for that store provider. In this particular case, it's actually ok for the memory store provider to be accessed by multiple threads, given its current implementation; but it is NOT ok for plasma store provider, because when doing a Get, plasma client would block on reply from plasma store, thus other threads accessing the same plasma client would get blocked - this is inherent from the design of plasma, it is not because of core worker implementation. Since we cannot guarantee multi-thread access can work for all the store providers, I think it's better for the transport to always create a separate store provider, so that it always works regardless of the exact store provider used. I think the code is more robust this way, instead of having to think about whether the specific store provider can be shared in multi-thread access, which is implementation specific to that store provider, and is more likely to produce bugs. I do agree that we should avoid workarounds as much as possible, but I think in this case it's not a workaround, it instead deals with different store providers uniformly, avoids tight coupling with store provider specific implementations, and helps to avoid unintended bugs. Anyway, this is not introduced by this PR, and is not for memory store provider. I think we can still discuss this if you have different thoughts, but this should not block this PR from merging. Please kindly let me know if this doesn't make sense to you.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand the problem now, thank you for the detailed explanation!
I'm still not very content with the solution of creating separated providers for the object interface and transport, but you're right that this wasn't introduced my this PR and we shouldn't block it on that. Let's please keep this in mind as we continue developing the core worker.
I left one more comment, otherwise LGTM. Should probably get at least another pair of eyes on it, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Only left some comments about naming and documentation
Co-Authored-By: Hao Chen <chenh1024@gmail.com>
Co-Authored-By: Hao Chen <chenh1024@gmail.com>
…lliance/ray into multi-store-providers
Co-Authored-By: Hao Chen <chenh1024@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated code. thanks.
Thanks for your detailed review also. Yes going forward let's keep it in mind and try to simplify when possible. I think we're on the same page on this. On the other perspective, on the design wise, having a self-contained and loosely coupled architecture is also very important, it makes it easier to make code changes, and it helps to reduce unintentional bugs. As the system grows bigger, it's very hard for a single person to know all the details of the system, thus having a loosely coupled design is important as it helps to make sure that making changes inside a module would not requires full understanding of other parts, and would not result in unintended effects/bugs. There are a few examples in this PR (and the original large PR) that can be viewed differently from different perspectives. Sometimes it's about to find the right balance between code simplicity and architecture design, I think they are both very important, and it would be good to keep both in mind going forward! :)
|
Test PASSed. |
Test PASSed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I should have caught this earlier, but I'm now realizing that we aren't handling duplicate ObjectID
s in Get()
or Wait()
at all here. Also, given that we're using unordered data structures to keep track of the ObjectID
s, can't the ordering in the result set not match that of the input ids?
I think for Get()
, the desired behavior is clear: we should be able to take a list of possibly duplicated IDs and always return the same length list of result objects in the same order as they were passed in. To do this, we should probably de-duplicate them immediately in the top-level Get()
call and then assume the lists passed to store providers are unique. As an optimization we could alternatively pass a unordered_map<ObjectID, vector<int>>
to the store providers and write results directly to the top-level results vector, but that might get messy.
For Wait()
, the desired behavior is much less clear but we should be explicit in our handling of it - we discussed it a bit in person and agree that it's best to raise an exception if a non-unique list is passed in, forcing users to handle it at the application level.
Please add the above changes, document the semantics and assumptions for each method, and add to the object interface test to verify that the desired behavior is met.
Please see my reply inline:
The above statement is not true.
This is already the case with existing implementation.
Yes, this would complicate the code. I'd prefer the first approach.
The PR actually handles the duplicates for
As mentioned, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I'm sorry about that - I clearly didn't read the final write into results
closely enough. The current implementation looks good and given that you already are handling duplicates in Wait()
let's leave it as-is (at least for now).
Could you please document the semantics for Get()
and Wait()
in the header comments though? And maybe a couple of short comments in the code where we de-duplicate and then replace the duplicates.
Thanks! Will update with more comments. |
…lliance/ray into multi-store-providers
Test PASSed. |
What do these changes do?
This is to separate the support for multiple store providers from #5443 , which can be reviewed and merged independently.
Related issue number
Linter
scripts/format.sh
to lint the changes in this PR.