Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GCS]Support gcs client job&actor&node table subscribe idempotent #9424

Conversation

ffbin
Copy link
Contributor

@ffbin ffbin commented Jul 12, 2020

Why are these changes needed?

Design document:
https://docs.google.com/document/d/1Cuqxlw53abEZPVYVF-pUWpbnwrFEgVTel_sQGwt8DmU/edit#heading=h.s3ogmk8m7308
This PR implements GCS client subscribe idempotent.
Node table and object location table are special in that they contain several records, so they are sorted by timestamp in GCS, I will implement it in the next pr.

Related issue number

Checks

  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/latest/.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failure rates at https://ray-travis-tracker.herokuapp.com/.
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested (please justify below)

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@ffbin ffbin requested a review from raulchen July 12, 2020 07:41
@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/28212/
Test FAILed.

@ffbin ffbin requested review from rkooo567 and stephanie-wang July 12, 2020 08:27
@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/28213/
Test PASSed.

@rkooo567 rkooo567 self-assigned this Jul 13, 2020
@stephanie-wang stephanie-wang self-assigned this Jul 14, 2020
@rkooo567
Copy link
Contributor

What's the relationship to this PR btw? #9153

@ffbin
Copy link
Contributor Author

ffbin commented Jul 14, 2020

What's the relationship to this PR btw? #9153
I plan to split #9153 into three parts: the first is this pr, the second is support gcs client node resource subscribe idempotent, and the third is support gcs client object location subscribe idempotent.

@ffbin ffbin force-pushed the dev_gcs_client_sub_single_msg_idempotent branch from 1a27d3f to 9ad2794 Compare July 16, 2020 02:48
@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/28437/
Test PASSed.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/28438/
Test PASSed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/28453/
Test FAILed.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/28462/
Test PASSed.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/28463/
Test PASSed.

Copy link
Contributor

@stephanie-wang stephanie-wang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actor table idempotence is already implemented in #9449 and the node table should already be idempotent. I'm not familiar with what the job table does, but it looks like it's just marking jobs as dead, which is idempotent too.

So do we actually need this PR?

@ffbin
Copy link
Contributor Author

ffbin commented Jul 17, 2020

Actor table idempotence is already implemented in #9449 and the node table should already be idempotent. I'm not familiar with what the job table does, but it looks like it's just marking jobs as dead, which is idempotent too.

So do we actually need this PR?

New changes in notification handlers could easily introduce bugs without being noticed. In GCS client to ensure message processing idempotent, we can develop more simple and efficient.

@stephanie-wang
Copy link
Contributor

stephanie-wang commented Jul 17, 2020

Actor table idempotence is already implemented in #9449 and the node table should already be idempotent. I'm not familiar with what the job table does, but it looks like it's just marking jobs as dead, which is idempotent too.
So do we actually need this PR?

New changes in notification handlers could easily introduce bugs without being noticed. In GCS client to ensure message processing idempotent, we can develop more simple and efficient.

But there is no point in making message processing in the GCS idempotent if the specific table logic is already idempotent. The problem has already been fixed, so this is just adding unnecessary complexity.

I'm not worried about changes in the notification handlers because these can be easily unit tested so we would definitely notice any new bugs. I suggest we close this PR.

@rkooo567 rkooo567 removed their assignment Aug 14, 2020
@stephanie-wang
Copy link
Contributor

Closing this PR for now since I don't believe the PR is necessary and it's now been 1 month. Feel free to reopen if there are more specific changes that need to be made.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants