Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Core: Extend ResolvingFileIO to support prefix-based operations #8334

Merged
merged 3 commits into from
Sep 13, 2023

Conversation

nastra
Copy link
Contributor

@nastra nastra commented Aug 16, 2023

No description provided.

@github-actions github-actions bot added the core label Aug 16, 2023
@nastra nastra force-pushed the resolving-with-prefix-operations branch from f88ae6a to 0e52578 Compare August 16, 2023 13:06
@@ -95,25 +100,12 @@ public void deleteFiles(Iterable<String> pathsToDelete) throws BulkDeletionFailu
Iterators.partition(pathsToDelete.iterator(), BATCH_SIZE)
.forEachRemaining(
partitioned -> {
Map<FileIO, List<String>> pathByFileIO =
Map<DelegateFileIO, List<String>> pathByFileIO =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can just call the delegate directly now, I don't believe there is a need for this logic anymore

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you think we should still batch here or would we want to essentially use the same implementation you had?

// peek at the first element to determine the type of FileIO
    Iterator<String> originalIterator = pathsToDelete.iterator();
    if (!originalIterator.hasNext()) {
      return;
    }

    PeekingIterator<String> iterator = Iterators.peekingIterator(originalIterator);
    DelegateFileIO fileIO = io(iterator.peek());
    fileIO.deleteFiles(() -> iterator);

To me it seems like batching should probably be handled by the underlying FileIO implementation, but just wanted to see what others think?

Copy link
Contributor

@bryanck bryanck Aug 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, it is probably better to keep what you have as there may be multiple FileIO types involved, and the peek method wasn’t ideal.

@nastra nastra force-pushed the resolving-with-prefix-operations branch from 767280b to a3c0ece Compare August 22, 2023 06:31
@nastra nastra closed this Aug 22, 2023
@nastra nastra reopened this Aug 22, 2023
@RussellSpitzer
Copy link
Member

Thanks @nastra for finishing this up and @bryanck for the foundational work and review.

@RussellSpitzer RussellSpitzer merged commit a8ef09e into apache:master Sep 13, 2023
@nastra nastra deleted the resolving-with-prefix-operations branch September 13, 2023 15:18
@leoluan2009
Copy link
Contributor

@nastra @RussellSpitzer Should ADLSFileIO implement DelgateFileIO as GCSFileIO to make code clean and consistently?

@nastra
Copy link
Contributor Author

nastra commented Sep 14, 2023

@leoluan2009 yes it should, good catch. I've opened #8563 to address that

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants