Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add dirs_exist_ok to CloudPath.copytree #281

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

klwetstone
Copy link
Contributor

@klwetstone klwetstone commented Oct 24, 2022

closes #144

Adds dirs_exist_ok argument to CloudPath.copytree

The original issue suggests adding the dirs_exist_ok functionality through mkdir. Using mkdir on S3Path doesn't actually create a directory, since we can't make empty directories in cloud storage. Adding functionality to a no-op mkdir seemed misleading, so the new argument doesn't use mkdir to check if the destination directory exists. Instead the functionality is added to copytree outside of mkdir with the more explicit if destination.exists() and not dirs_exist_ok.

Questions

  • Since .mkdir doesn't work on either GSPath or S3Path, should we raise an implementation error when it is called? Currently it runs without error, but doesn't actually create a directory.

    • For example, the destination.mkdir line in CloudPath.copytree doesn't actually create a cloud directory if it doesn't exist. The directory is created when the first file is copied with subpath.copy(...). It seems like we need this line just for cases when the destination directory is local, is that right?
    • If we raise an implementation error instead of pass without doing anything, the new S3Path.mkdir definition would be:
      def mkdir(self, parents=False, exist_ok=False):
          raise CloudPathNotImplementedError(
              "Empties directories cannot be created in cloud storage"
          )
    • We can add something similar, but with a general NotImplementedError, to GSPath.mkdir
  • There is a note in the comments that empty directories can't be created in other forms of cloud storage (GSPath) either. It seems like we can handle GSPath and S3Path similarly for adding dirs_exist_ok, is that right?

@github-actions
Copy link
Contributor

@codecov-commenter
Copy link

Codecov Report

Merging #281 (25c4ce7) into master (fa8d9fd) will decrease coverage by 0.3%.
The diff coverage is 100.0%.

Additional details and impacted files
@@           Coverage Diff            @@
##           master    #281     +/-   ##
========================================
- Coverage    94.8%   94.4%   -0.4%     
========================================
  Files          20      20             
  Lines        1318    1320      +2     
========================================
- Hits         1250    1247      -3     
- Misses         68      73      +5     
Impacted Files Coverage Δ
cloudpathlib/cloudpath.py 93.6% <100.0%> (+<0.1%) ⬆️
cloudpathlib/s3/s3client.py 92.9% <0.0%> (-2.7%) ⬇️
cloudpathlib/gs/gsclient.py 92.8% <0.0%> (-1.8%) ⬇️

@jayqi
Copy link
Member

jayqi commented Oct 24, 2022

The original issue suggests adding the dirs_exist_ok functionality through mkdir. Using mkdir on S3Path doesn't actually create a directory, since we can't make empty directories in cloud storage. Adding functionality to a no-op mkdir seemed misleading, so the new argument doesn't use mkdir to check if the destination directory exists.

This actually surfaces a tricky issue that I hadn't thought of before. Currently, because mkdir is implemented as a no-op it doesn't match pathlib's behavior. By default, pathlib's mkdir has exists_ok=False. Accordingly, the matching behavior should be for mkdir to—by default—check if the "directory" "exists" in cloud storage, and error if it already does. It seems like there'd be some performance penalty for this. Thoughts, @pjbull?

Since .mkdir doesn't work on either GSPath or S3Path, should we raise an implementation error when it is called? Currently it runs without error, but doesn't actually create a directory.

Definitely not. This is intentionally a no-op so we can copy pathlib's implementation logic to our cloud implementations and not have to make changes to take out all of the mkdir calls.

@pjbull
Copy link
Member

pjbull commented Jul 22, 2023

Going through some stale PRs to get them moving.

I think that we shouldn't introduce dirs_exist_ok=False as the default without a deprecation cycle. There's too much of a chance people use copytree relying on the current behavior.

I think I'd go with an implementation that looks something like:

def copytree(..., dirs_exist_ok: bool = None) -> Union[Path, "CloudPath"]:
    if dirs_exist_ok is None:
        warnings.warn("In a future version, `CloudPath.copytree` will have a `dirs_exist_ok=False` kwarg. Please specify `dirs_exist_ok=True` in your code if you do not want that behavior (issue #144).", CloudPathDeprecationWarning)
        dirs_exist_ok = True  # match current behavior until dirs_exist_ok=False introduced

    ...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add a dirs_exist_ok parameter to the copytree implementation
4 participants