Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[data] Optimize dataset metadata read/write in Ray client #21939

Merged
merged 18 commits into from
Jan 31, 2022

Conversation

ericl
Copy link
Contributor

@ericl ericl commented Jan 27, 2022

Why are these changes needed?

In Ray client mode, the client may not have the same low-latency access to the datasource as the cluster nodes. Run datasource prepare_read/do_write in a remote task to make sure it doesn't run on the driver node.

Related issue number

Checks

  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

@ericl ericl requested a review from scv119 as a code owner January 27, 2022 22:47
@ericl ericl changed the title [WIP] Optimize dataset metadata read/write nicely in Ray client [data] Optimize dataset metadata read/write in Ray client Jan 28, 2022
Copy link
Collaborator

@jjyao jjyao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm.. does that mean we cannot really do anything heavy in the driver if ray client is a first class citizen.

blocks, metadata = zip(*self._blocks.get_blocks_with_metadata())
write_results = datasource.do_write(blocks, metadata, **write_args)

# Prepare write in a remote task so that in Ray client mode, we aren't
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the cost for non-ray client mode?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't matter much (a few ms)

@ericl
Copy link
Contributor Author

ericl commented Jan 28, 2022

Yeah I think that's a good bias for Ray native libraries (for example, Tune already avoids running heavy things in the driver).

@jjyao
Copy link
Collaborator

jjyao commented Jan 29, 2022

Yeah I think that's a good bias for Ray native libraries (for example, Tune already avoids running heavy things in the driver).

I didn't have that awareness before but yeah I (really everyone) will keep that in mind for future code. Not sure if there is something helps us to remember this (probably one item on the review action list is checking to see if the code works nicely with ray client).

@ericl ericl added the @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. label Jan 29, 2022
@ericl ericl removed the @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. label Jan 29, 2022
@ericl
Copy link
Contributor Author

ericl commented Jan 29, 2022

Hmm this seems to be causing test_basic_actors to hang somehow.

@ericl ericl added the @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. label Jan 29, 2022
@ericl
Copy link
Contributor Author

ericl commented Jan 29, 2022

Blocked on #21970

@bveeramani
Copy link
Member

‼️ ACTION REQUIRED ‼️

We've switched our code formatter from YAPF to Black (see #21311).

To prevent issues with merging your code, here's what you'll need to do:

  1. Install Black
pip install -I black==21.12b0
  1. Format changed files with Black
curl -o format-changed.sh https://gist.githubusercontent.com/bveeramani/42ef0e9e387b755a8a735b084af976f2/raw/7631276790765d555c423b8db2b679fd957b984a/format-changed.sh
chmod +x ./format-changed.sh
./format-changed.sh
rm format-changed.sh
  1. Commit your changes.
git add --all
git commit -m "Format Python code with Black"
  1. Merge master into your branch.
git pull upstream master
  1. Resolve merge conflicts (if necessary).

After running these steps, you'll have the updated format.sh.

@ericl ericl merged commit 45e03bd into ray-project:master Jan 31, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
@author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants