FIX-#4479: Prevent users from using a local filepath when performing a distributed write#4484
FIX-#4479: Prevent users from using a local filepath when performing a distributed write#4484RehanSD wants to merge 16 commits intomodin-project:masterfrom
Conversation
…n performing a distributed write Signed-off-by: Rehan Durrani <rehan@ponder.io>
Signed-off-by: Rehan Durrani <rehan@ponder.io>
Codecov Report
@@ Coverage Diff @@
## master #4484 +/- ##
==========================================
- Coverage 86.82% 78.21% -8.62%
==========================================
Files 222 230 +8
Lines 18038 23692 +5654
==========================================
+ Hits 15662 18530 +2868
- Misses 2376 5162 +2786
📣 Codecov can now indicate which changes are the most critical in Pull Requests. Learn more |
mvashishtha
left a comment
There was a problem hiding this comment.
@RehanSD thanks for the change. I have some style suggestions.
modin/core/execution/ray/implementations/pandas_on_ray/io/io.py
Outdated
Show resolved
Hide resolved
modin/core/execution/ray/implementations/pandas_on_ray/io/io.py
Outdated
Show resolved
Hide resolved
naren-ponder
left a comment
There was a problem hiding this comment.
Does this have to be added for to_sql as well?
| if not cls._to_csv_check_support(kwargs): | ||
| return RayIO.to_csv(qc, **kwargs) | ||
|
|
||
| if len(ray.nodes()) > 1: |
There was a problem hiding this comment.
Can this be moved to _to_csv_check_support, ditto for parquet?
There was a problem hiding this comment.
We could, but _to_csv_check_support currently just checks if we can do a parallel write, so it would be a bit of a paradigm shift to raise an error here, unless you're suggesting we return false and default to a serial write on the head node if a local path is passed?
| if S3_ADDRESS_REGEX.match(path_or_buf) is not None or "://" in path_or_buf: | ||
| return False # S3 or network path. | ||
| if isinstance(path_or_buf, str) or isinstance(path_or_buf, pathlib.PurePath): | ||
| return os.path.exists(path_or_buf) |
There was a problem hiding this comment.
The code as is will not warn users for the common case where they try to write to a non-existent path, i.e. when they want to create and write to a new file. We should warn even if the path doesn't exist locally.
There was a problem hiding this comment.
Changed it to check device_id instead! I originally did this because apparently os.path.exists returns false for networked file paths. I believe it should now work for both non-existing paths and networked paths.
Co-authored-by: Yaroslav Igoshev <Poolliver868@mail.ru>
Co-authored-by: Mahesh Vashishtha <mvashishtha@users.noreply.github.com>
Co-authored-by: Mahesh Vashishtha <mvashishtha@users.noreply.github.com>
Co-authored-by: Mahesh Vashishtha <mvashishtha@users.noreply.github.com>
|
We don't need to do this for |
Signed-off-by: Rehan Durrani <rehan@ponder.io>
…tent local paths Signed-off-by: Rehan Durrani <rehan@ponder.io>
|
This pull request introduces 1 alert when merging dcadcf4 into 60518ca - view on LGTM.com new alerts:
|
mvashishtha
left a comment
There was a problem hiding this comment.
@RehanSD thanks for the changes. I have two minor suggestions. Do we have ray cluster unit tests to which we can add a case for this? Otherwise, have you tested this manually?
Co-authored-by: Mahesh Vashishtha <mvashishtha@users.noreply.github.com>
Co-authored-by: Mahesh Vashishtha <mvashishtha@users.noreply.github.com>
Signed-off-by: Rehan Durrani <rehan@ponder.io>
|
This pull request introduces 1 alert when merging 0608f9e into 958c26a - view on LGTM.com new alerts:
|
modin/core/execution/ray/implementations/pandas_on_ray/io/io.py
Outdated
Show resolved
Hide resolved
Signed-off-by: Rehan Durrani <rehan@ponder.io>
Signed-off-by: Rehan Durrani <rehan@ponder.io>
Signed-off-by: Rehan Durrani <rehan@ponder.io>
Signed-off-by: Rehan Durrani <rehan@ponder.io>
Signed-off-by: Rehan Durrani rehan@ponder.io
What do these changes do?
Prevent users from specifying a local file path on a distributed write.
flake8 modin/ asv_bench/benchmarks scripts/doc_checker.pyblack --check modin/ asv_bench/benchmarks scripts/doc_checker.pygit commit -sdocs/development/architecture.rstis up-to-date