ignore object exists error for memory store provider #5607

zhijunfu · 2019-09-01T03:20:17Z

Why are these changes needed?

It's reported that sometimes core worker test fails because we don't ignore the the object exists error for memory store provider. This PR fixes that issue.

https://travis-ci.com/ray-project/ray/jobs/229393241

Related issue number

Checks

[Y] I've run scripts/format.sh to lint the changes in this PR.
[Y] I've included any doc changes needed for https://ray.readthedocs.io/en/latest/.

AmplabJenkins · 2019-09-01T06:45:20Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/16687/
Test PASSed.

edoakes

Probably better to define a separate error type rather than checking the message. Especially considering that we should probably bubble the error up to the worker instead of hard-coding an ignore when there are duplicate keys. Don't have to do that in this PR but would be good to define the error type now.

zhijunfu · 2019-09-02T04:45:55Z

Thanks for reviewing. It probably depends on whether we should bubble up the error or just ignore it inside object interface, but yes I agree it's better to have a type instead of a message for this, thus I've added a new ObjectExists status.

AmplabJenkins · 2019-09-02T06:06:06Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/16720/
Test FAILed.

src/ray/core_worker/store_provider/memory_store/memory_store.cc

edoakes

We should definitely do it in the ObjectStoreInterface, it shouldn't be much added work and we can then remove the same logic from inside the PlasmaStoreProvider.

src/ray/core_worker/store_provider/memory_store_provider.cc

Co-Authored-By: Edward Oakes <ed.nmi.oakes@gmail.com>

zhijunfu · 2019-09-03T02:42:51Z

Looking at the code again, I realized putting it into ObjectStoreInterface would not work in this case, because the memory store provider is used by the direct actor call transport, which calls the provider's Put directly, it won't call ObjectStoreInterface.

We should definitely do it in the ObjectStoreInterface, it shouldn't be much added work and we can then remove the same logic from inside the PlasmaStoreProvider.

edoakes · 2019-09-03T03:37:01Z

Why isn't the direct actor call transport going through the object interface? The object interface should know where to put the object based on the ID, right?

zhijunfu · 2019-09-03T05:15:03Z

Why isn't the direct actor call transport going through the object interface? The object interface should know where to put the object based on the ID, right?

Currently we use memory store just to contain the return objects of direct actor call tasks, nothing else.

When user calls ray.put to add an object into store, we assume he would use the object later and possibly from different machines, so we put the object into plasma in this case. That's why ObjectInterface's put always writes to plasma.

AmplabJenkins · 2019-09-03T06:03:54Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/16731/
Test PASSed.

edoakes · 2019-09-03T22:53:53Z

Hmm we should probably still go through the same codepath for this, otherwise we're going to end up with duplicated code/bugs (like this one). Shouldn't be much code or runtime overhead to switch based on the object ID - what do you think?

zhijunfu · 2019-09-04T03:25:43Z

The issue is about semantics & correctness, not overhead.

I think the assumption is when user directly calls ray.put, which in turns calls ObjectInterface's Put(), he is very likely to use this object later, and possibly pass this object id to another machine, then do a ray.get() there, so I want to make sure that ObjectInterface 's Put() will store the object to plasma and nowhere else. If it's put into memory store, then user is not able to get the object from a remote machine, I think that would surprise the user.

Try to do the switch based on the object ID, while also ensure the user's put will always be stored into plasma inside ObjectInterface would complicate the current logic, not simply it. I'd prefer to keep the current logic as is.

Hmm we should probably still go through the same codepath for this, otherwise we're going to end up with duplicated code/bugs (like this one). Shouldn't be much code or runtime overhead to switch based on the object ID - what do you think?

edoakes · 2019-09-04T16:22:22Z

I see your point, but I would find it very strange to have Get() implement multiple storage providers but not Put(). I also don't think it would complicate the logic very much given that we already have the code to switch based on the storage provider - we can re-use that. Should just be one additional line in Put().

zhijunfu · 2019-09-05T03:03:45Z

If you really want to do it that way, could you please file another PR? You can assign me as reviewer. thanks.

I see your point, but I would find it very strange to have Get() implement multiple storage providers but not Put(). I also don't think it would complicate the logic very much given that we already have the code to switch based on the storage provider - we can re-use that. Should just be one additional line in Put().

raulchen · 2019-09-05T03:35:28Z

@edoakes I think the refactor can be done in a separate PR. Let's merge this one first to unblock CI.

edoakes · 2019-09-05T03:40:19Z

Sure, I'm not completely sold on any solution yet. Please go ahead and merge.

…

On Wed, Sep 4, 2019, 20:35 Hao Chen ***@***.***> wrote: @edoakes <https://github.com/edoakes> I think the refactor can be done in a separate PR. Let's merge this one first to unblock CI. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#5607?email_source=notifications&email_token=ACLKAZPE72POPIN6LKW4ODTQIB5BHA5CNFSM4ISVOJ62YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD55WYXA#issuecomment-528182364>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ACLKAZI3LEMAJCPA26SZHSDQIB5BHANCNFSM4ISVOJ6Q> .

fix object exists for memory store Put interface

6bcbbfd

zhijunfu requested review from pcmoritz, raulchen and edoakes September 1, 2019 03:20

zhijunfu changed the title ~~ignore object exists error for memory store Put interface~~ ignore object exists error for memory store provider Sep 1, 2019

edoakes reviewed Sep 1, 2019

View reviewed changes

zhijunfu added 2 commits September 2, 2019 12:42

resolve comments

35c05ef

update

13fbe85

raulchen reviewed Sep 2, 2019

View reviewed changes

src/ray/core_worker/store_provider/memory_store/memory_store.cc Show resolved Hide resolved

edoakes reviewed Sep 2, 2019

View reviewed changes

src/ray/core_worker/store_provider/memory_store_provider.cc Outdated Show resolved Hide resolved

Update src/ray/core_worker/store_provider/memory_store_provider.cc

67e60ac

Co-Authored-By: Edward Oakes <ed.nmi.oakes@gmail.com>

raulchen approved these changes Sep 5, 2019

View reviewed changes

raulchen merged commit bb5609a into ray-project:master Sep 5, 2019

raulchen deleted the fix-memory-store branch September 5, 2019 03:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ignore object exists error for memory store provider #5607

ignore object exists error for memory store provider #5607

zhijunfu commented Sep 1, 2019 •

edited

Loading

AmplabJenkins commented Sep 1, 2019

edoakes left a comment

zhijunfu commented Sep 2, 2019

AmplabJenkins commented Sep 2, 2019

edoakes left a comment

zhijunfu commented Sep 3, 2019

edoakes commented Sep 3, 2019

zhijunfu commented Sep 3, 2019

AmplabJenkins commented Sep 3, 2019

edoakes commented Sep 3, 2019

zhijunfu commented Sep 4, 2019

edoakes commented Sep 4, 2019

zhijunfu commented Sep 5, 2019

raulchen commented Sep 5, 2019

edoakes commented Sep 5, 2019 via email

ignore object exists error for memory store provider #5607

ignore object exists error for memory store provider #5607

Conversation

zhijunfu commented Sep 1, 2019 • edited Loading

Why are these changes needed?

Related issue number

Checks

AmplabJenkins commented Sep 1, 2019

edoakes left a comment

Choose a reason for hiding this comment

zhijunfu commented Sep 2, 2019

AmplabJenkins commented Sep 2, 2019

edoakes left a comment

Choose a reason for hiding this comment

zhijunfu commented Sep 3, 2019

edoakes commented Sep 3, 2019

zhijunfu commented Sep 3, 2019

AmplabJenkins commented Sep 3, 2019

edoakes commented Sep 3, 2019

zhijunfu commented Sep 4, 2019

edoakes commented Sep 4, 2019

zhijunfu commented Sep 5, 2019

raulchen commented Sep 5, 2019

edoakes commented Sep 5, 2019 via email

zhijunfu commented Sep 1, 2019 •

edited

Loading