Skip to content

Conversation

@HaochengLIU
Copy link
Contributor

Rationale for this change

Expose new S3 option check_directory_existence_before_creation from GH-41493
PR to expose in Python is also under review.

What changes are included in this PR?

Expose new S3 option check_directory_existence_before_creation from GH-41493

Are these changes tested?

yes

Are there any user-facing changes?

Yes. R function documentation is updated.

@github-actions
Copy link

github-actions bot commented Jun 6, 2024

⚠️ GitHub issue #41973 has been automatically assigned in GitHub to PR creator.

@HaochengLIU
Copy link
Contributor Author

Hi @nealrichardson , could you pls review when you have time? I have to admit I use C++/Python in my daily life but know nothing about R and just follow the existing code for allow_bucket_deletion to expose it.

@HaochengLIU
Copy link
Contributor Author

Hi @paleolimbot @thisisnic , could you review this PR when you have time? Thanks.

@amoeba
Copy link
Member

amoeba commented Aug 1, 2024

Hi @HaochengLIU, I took a look through and left some suggestions.

One thing I think we'll have to sort out here is the issue that's making the Check minimum supported Arrow C++ Version check fail. We'll need an ifdef guard in r/src/filesystem.cpp but I'm not sure if we've got a pattern for what the user experiences.

Edit: And can you please rebase and fix conflicts?

@HaochengLIU HaochengLIU force-pushed the check_directory_existence_before_creation-expose-R branch from 7f3d26e to e46e0dc Compare August 2, 2024 00:23
@HaochengLIU HaochengLIU requested a review from jonkeane as a code owner August 2, 2024 00:23
@HaochengLIU
Copy link
Contributor Author

Hi @amoeba , I rebased and the somehow do not see the suggestion you are referring to.. Could you point me to that?
Also could you guide what do you mean by "ifdef guard" in the filesystem.cpp? It's not using any standard or platform dependent code, simply a new argument

Per the install error

* checking whether package ‘arrow’ can be installed ... [87s/90s] ERROR
Installation failed.
See ‘/home/runner/work/arrow/arrow/src/r/check/arrow.Rcheck/00install.out’ for details.

Where is this 00install.out log 😢
ty.

@thisisnic
Copy link
Member

thisisnic commented Aug 4, 2024

@HaochengLIU The CI output there can be a bit tricky to follow, but the important bit is in the next section: https://github.com/apache/arrow/actions/runs/10207786938/job/28243188693?pr=41998#step:8:49

 filesystem.cpp: In function ‘std::shared_ptr<arrow::fs::S3FileSystem> fs___S3FileSystem__create(bool, std::string, std::string, std::string, std::string, std::string, std::string, int, std::string, std::string, std::string, std::string, bool, bool, bool, bool, double, double)’:
filesystem.cpp:334:11: error: ‘struct arrow::fs::S3Options’ has no member named ‘check_directory_existence_before_creation’
  334 |   s3_opts.check_directory_existence_before_creation =
      |           ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
make: *** [/usr/lib/R/etc/Makeconf:204: filesystem.o] Error 1
ERROR: compilation failed for package ‘arrow’

I think what @amoeba is referring to is that fact that the arrow R package is compatible with previous versions of the Arrow C++ library, and so you're getting that error above because #41822 was only released in version 17.0.0 so you'll get errors trying to use it with previous versions, as our minimum version to support is 15.0.2.

@github-actions github-actions bot added awaiting committer review Awaiting committer review and removed awaiting review Awaiting review labels Aug 6, 2024
@amoeba
Copy link
Member

amoeba commented Aug 6, 2024

Sorry @HaochengLIU, I think I forgot to save my review. You should see them now.

@thisisnic is right about my note. See this example: https://github.com/apache/arrow/blob/c69c1d80eb27022c1b3f91fbbc7979f8293c9e50/r/src/extension-impl.cpp#L103

I think here we want to do here is (1) conditionally define two versions of fs___S3FileSystem__create, one that supports check_directory_existence_before_creation and one that doesn't and then (2) catch calls to S3FileSystem$create which pass the option check_directory_existence_before_creation when arrow::arrow_info()$build_info$cpp_version < "17.0.0" and probably emit a warning but otherwise continue. Let me know if this isn't clear.

@nealrichardson
Copy link
Member

Sorry @HaochengLIU, I think I forgot to save my review. You should see them now.

@thisisnic is right about my note. See this example:

https://github.com/apache/arrow/blob/c69c1d80eb27022c1b3f91fbbc7979f8293c9e50/r/src/extension-impl.cpp#L103

I think here we want to do here is (1) conditionally define two versions of fs___S3FileSystem__create, one that supports check_directory_existence_before_creation and one that doesn't and then (2) catch calls to S3FileSystem$create which pass the option check_directory_existence_before_creation when arrow::arrow_info()$build_info$cpp_version < "17.0.0" and probably emit a warning but otherwise continue. Let me know if this isn't clear.

The tricky thing about that is the codegen script that creates arrowExports.cpp: it doesn't know how to handle the conditional definitions like that, and I'm not sure it's worth extending to do so.

Another option would be to refactor to send a list of options to fs___S3FileSystem__create (and the other FileSystem create methods, for consistency) and then pull the settings out of the list inside the C++ function, like how make_compute_options() works (https://github.com/apache/arrow/blob/main/r/src/compute.cpp#L129). Then the #if ARROW_VERSION_MAJOR >= 16 would just be wrapping a few lines inside that function and not affecting the signature.

@HaochengLIU HaochengLIU force-pushed the check_directory_existence_before_creation-expose-R branch from e46e0dc to 913400d Compare August 8, 2024 16:24
@HaochengLIU
Copy link
Contributor Author

HaochengLIU commented Aug 8, 2024

Hi folks, thanks for reviewing and I've addressed most of the comments.
Referring to the non-backward compatible check_directory_existence_before_creation flag, IMHO refactoring fs___S3FileSystem__create (and the other FileSystem create methods, for consistency) should be a seperate PR.

If it's confirmed that arrow/r/src/extension-impl.cpp will not work per @nealrichardson , I can try to tackle the refactor work first once I'm back from vacation

@jonkeane
Copy link
Member

jonkeane commented Aug 8, 2024

We probably shouldn't do this here, but we should have a discussion about how much effort keeping compatibility with older versions we should push for PRs like these. I've created #43623 for that, but broadly am supportive of bumping the minimum version if we need to in order to implement this feature.

@amoeba
Copy link
Member

amoeba commented Aug 9, 2024

Thanks for the catch @nealrichardson and the suggestion. I'd be supportive of approving this PR with just the "Check minimum supported Arrow C++ Version " job failing and just my existing suggestions applied.

@thisisnic thisisnic force-pushed the check_directory_existence_before_creation-expose-R branch from 913400d to 1e7edab Compare May 19, 2025 14:04
@HaochengLIU
Copy link
Contributor Author

Hi, sorry for the delay here, was carried away by other life events. Let me know if there is anything i can help here

@thisisnic
Copy link
Member

Hi, sorry for the delay here, was carried away by other life events. Let me know if there is anything i can help here

Not your fault at all, we didn't resolve the earlier discussion on previous versions. Based on the CI results after rebasing the PR, there are a few changes needed before merging this, which I was going to take a look at myself. If you'd rather take a look though, let me know, and I'll leave it to you instead!

@HaochengLIU
Copy link
Contributor Author

Hi, sorry for the delay here, was carried away by other life events. Let me know if there is anything i can help here

Not your fault at all, we didn't resolve the earlier discussion on previous versions. Based on the CI results after rebasing the PR, there are a few changes needed before merging this, which I was going to take a look at myself. If you'd rather take a look though, let me know, and I'll leave it to you instead!

I will be in a national park without wifi for the incoming days, could I handle this PR to you :)
Much appreciated!

thisisnic added a commit that referenced this pull request May 29, 2025
…ion - manual rebase (#46619)

See #41998 - rebase was too messy due to PR age so did this manually.
* GitHub Issue: #41973

Lead-authored-by: Haocheng Liu <lbtinglb@gmail.com>
Co-authored-by: Nic Crane <thisisnic@gmail.com>
Signed-off-by: Nic Crane <thisisnic@gmail.com>
@thisisnic
Copy link
Member

I copied this to #46619 as the rebase was pretty horrific to attempt on this branch, but I made sure to attribute the new PR to you, @HaochengLIU

@thisisnic thisisnic closed this May 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants