Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ignore object exists error for memory store provider #5607

Merged
merged 4 commits into from
Sep 5, 2019

Conversation

zhijunfu
Copy link
Contributor

@zhijunfu zhijunfu commented Sep 1, 2019

Why are these changes needed?

It's reported that sometimes core worker test fails because we don't ignore the the object exists error for memory store provider. This PR fixes that issue.

https://travis-ci.com/ray-project/ray/jobs/229393241

Related issue number

Checks

@zhijunfu zhijunfu changed the title ignore object exists error for memory store Put interface ignore object exists error for memory store provider Sep 1, 2019
@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/16687/
Test PASSed.

Copy link
Collaborator

@edoakes edoakes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably better to define a separate error type rather than checking the message. Especially considering that we should probably bubble the error up to the worker instead of hard-coding an ignore when there are duplicate keys. Don't have to do that in this PR but would be good to define the error type now.

@zhijunfu
Copy link
Contributor Author

zhijunfu commented Sep 2, 2019

Thanks for reviewing. It probably depends on whether we should bubble up the error or just ignore it inside object interface, but yes I agree it's better to have a type instead of a message for this, thus I've added a new ObjectExists status.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/16720/
Test FAILed.

Copy link
Collaborator

@edoakes edoakes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should definitely do it in the ObjectStoreInterface, it shouldn't be much added work and we can then remove the same logic from inside the PlasmaStoreProvider.

Co-Authored-By: Edward Oakes <ed.nmi.oakes@gmail.com>
@zhijunfu
Copy link
Contributor Author

zhijunfu commented Sep 3, 2019

Looking at the code again, I realized putting it into ObjectStoreInterface would not work in this case, because the memory store provider is used by the direct actor call transport, which calls the provider's Put directly, it won't call ObjectStoreInterface.

We should definitely do it in the ObjectStoreInterface, it shouldn't be much added work and we can then remove the same logic from inside the PlasmaStoreProvider.

@edoakes
Copy link
Collaborator

edoakes commented Sep 3, 2019

Why isn't the direct actor call transport going through the object interface? The object interface should know where to put the object based on the ID, right?

@zhijunfu
Copy link
Contributor Author

zhijunfu commented Sep 3, 2019

Why isn't the direct actor call transport going through the object interface? The object interface should know where to put the object based on the ID, right?

Currently we use memory store just to contain the return objects of direct actor call tasks, nothing else.

When user calls ray.put to add an object into store, we assume he would use the object later and possibly from different machines, so we put the object into plasma in this case. That's why ObjectInterface's put always writes to plasma.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/16731/
Test PASSed.

@edoakes
Copy link
Collaborator

edoakes commented Sep 3, 2019

Hmm we should probably still go through the same codepath for this, otherwise we're going to end up with duplicated code/bugs (like this one). Shouldn't be much code or runtime overhead to switch based on the object ID - what do you think?

@zhijunfu
Copy link
Contributor Author

zhijunfu commented Sep 4, 2019

The issue is about semantics & correctness, not overhead.

I think the assumption is when user directly calls ray.put, which in turns calls ObjectInterface's Put(), he is very likely to use this object later, and possibly pass this object id to another machine, then do a ray.get() there, so I want to make sure that ObjectInterface 's Put() will store the object to plasma and nowhere else. If it's put into memory store, then user is not able to get the object from a remote machine, I think that would surprise the user.

Try to do the switch based on the object ID, while also ensure the user's put will always be stored into plasma inside ObjectInterface would complicate the current logic, not simply it. I'd prefer to keep the current logic as is.

Hmm we should probably still go through the same codepath for this, otherwise we're going to end up with duplicated code/bugs (like this one). Shouldn't be much code or runtime overhead to switch based on the object ID - what do you think?

@edoakes
Copy link
Collaborator

edoakes commented Sep 4, 2019

I see your point, but I would find it very strange to have Get() implement multiple storage providers but not Put(). I also don't think it would complicate the logic very much given that we already have the code to switch based on the storage provider - we can re-use that. Should just be one additional line in Put().

@zhijunfu
Copy link
Contributor Author

zhijunfu commented Sep 5, 2019

If you really want to do it that way, could you please file another PR? You can assign me as reviewer. thanks.

I see your point, but I would find it very strange to have Get() implement multiple storage providers but not Put(). I also don't think it would complicate the logic very much given that we already have the code to switch based on the storage provider - we can re-use that. Should just be one additional line in Put().

@raulchen
Copy link
Contributor

raulchen commented Sep 5, 2019

@edoakes I think the refactor can be done in a separate PR. Let's merge this one first to unblock CI.

@edoakes
Copy link
Collaborator

edoakes commented Sep 5, 2019 via email

@raulchen raulchen merged commit bb5609a into ray-project:master Sep 5, 2019
@raulchen raulchen deleted the fix-memory-store branch September 5, 2019 03:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants