Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pkg/fs: Add RemoveAll function #261

Closed
wants to merge 1 commit into from
Closed

Conversation

kevpar
Copy link
Member

@kevpar kevpar commented Oct 11, 2022

New RemoveAll function intended to be a replacement for os.RemoveAll usage and be tailored to Windows scenarios.

Most importantly, this function works with []uint16 instead of string. This allows it to properly delete Windows filesystem trees that contain files with invalid UTF-16 paths.

Signed-off-by: Kevin Parsons kevpar@microsoft.com

@kevpar kevpar requested a review from a team as a code owner October 11, 2022 07:09
@kevpar
Copy link
Member Author

kevpar commented Oct 11, 2022

I still want to determine if long path support is needed here. However the code should be reasonable to review as it exists already.

@kevpar
Copy link
Member Author

kevpar commented Oct 11, 2022

Looks like some overzealous linter errors. May need to tune those down.

@kevpar kevpar force-pushed the removeall branch 3 times, most recently from 222373c to c8e6070 Compare October 11, 2022 07:23
New RemoveAll function intended to be a replacement for os.RemoveAll
usage and be tailored to Windows scenarios.

Most importantly, this function works with []uint16 instead of string.
This allows it to properly delete Windows filesystem trees that contain
files with invalid UTF-16 paths.

Signed-off-by: Kevin Parsons <kevpar@microsoft.com>
Comment on lines +16 to +19
// [1] This inaccurate version is still usable, as the name and alternate name fields are one character short,
// but the original value is guaranteed to be null terminated, so you just lose the null terminator in the worst case.
// However, this is still annoying to deal with, as you either must find the null in the array, or treat the full array
// as a null terminated string.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we move this discussion right above findData, since it explains why we arent using windows.Win32finddata?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted it here as a reference for why we define our own syscalls. I can look at what I could put in the other spot as well though.

// However, we don't attempt to handle every dynamic case, for instance:
// - If a file has FILE_ATTRIBUTE_READONLY re-set on it after we clear it, we will fail.
// - If an entry is changed from file to directory after we query its attributes, we will fail.
func RemoveAll(path []uint16) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we add an alias or new type UTF16Str for []uint16, so in the future we can do our own validation and add helper functions (like filepath join and co) to work on windows path strings?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I considered adding a type but wasn't sure how much utility it would have, or what a good name would be. UTF16Str is exactly what we don't want since the whole point of this function is it can work with a name that is invalid UTF-16. :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can think more about what a good name would be.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe OSString, to steal Rust's thunder?

pkg/fs/removeall_windows.go Show resolved Hide resolved
}

func join(parent, child []uint16) []uint16 {
return append(append(parent, '\\'), child...)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this would cause duplicate \'s if parent already ends with one?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me think about if there's a path where we would expect to need to handle that here. Maybe just the case where the user passes in a path that ends in a separator.


var (
// Mockable for testing.
removeDirectory = windows.RemoveDirectory
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I never liked the pattern, and it feels un-go-like.
Is there any way around needing it?

The Go os repo has this testlog call which logs what is being opened/accessed.
We can add something similar where we use a buffer/list or items being deleted, and then search through that to make sure what we want is deleted?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see any way to avoid it here, and don't think the cost is too high. We don't just need to log operations done, but actually need to inject a failure case into a Windows API call (look at TestRemoveAllShouldFailWhenSymlinkDeletionFails).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but rather than allowing the removeDirectory function to be modified, we can do something like

type testHook func(p []uint16)

var removeDirectoryTestHook testHook

func removeDirectory(path []uint16) error {
  if removeDirectoryTestHook != nil {
    removeDirectoryTestHook(path)
  }
  windows.RemoveDirectory
}

And then TestRemoveAllShouldFailWhenSymlinkDeletionFails becomes

var linkDeleteAttempted bool
removeDirectoryTestHook = func(path *uint16) {
  if _, name := filepath.Split(windows.UTF16PtrToString(path)); !linkDeleteAttempted && name == "link" {
    linkDeleteAttempted = true
  }
}

if err := RemoveAll(rootU16); err != nil {
  // ...
}

if linkDeleteAttempted {
  t.Fatal("link deleted")
}

Comment on lines +127 to +159
if err := func() error {
var fd findData
// findFirstFileExW allows us to avoid the shortname lookup. See comment at top of file for details.
pattern := join(path, []uint16{'*'})
find, err := findFirstFileExW(terminate(pattern), findExInfoBasic, &fd, findExSearchNameMatch, nil, 0)
if err == windows.ERROR_FILE_NOT_FOUND {
// No children is weird, because we should always have "." and "..".
// If we do hit this, just continue to the next deletion attempt.
return nil
} else if err != nil {
return pathErr("FindFirstFileEx", pattern, err)
}
defer windows.FindClose(find)
for {
var child []uint16
child, err = truncAtNull(fd.name[:])
if err != nil {
return err
}
if !equal(child, []uint16{'.'}) && !equal(child, []uint16{'.', '.'}) { // Ignore "." and ".."
if err := removeAll(join(path, child), fd.attributes, fd.reserved0); err != nil {
return err
}
}
err = findNextFileW(find, &fd)
if err == windows.ERROR_NO_MORE_FILES {
break
} else if err != nil {
return pathErr("FindNextFile", path, err)
}
}
return nil
}(); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be moved out to a separate function?
Ideally we can wrap findFirstFileExW and findNextFileW into a walkFiles(f func(findData) error)) error, since that allows the defer to still work, and I am sure we'll need it again in the future

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure here. I think it can be easier to grok the overall flow when it's kept together in a single function, and didn't want to introduce more abstractions like a walk function when the existing code seems fairly straightforward. However the complexity did grow over time, so maybe it would make sense.

@TBBle
Copy link
Contributor

TBBle commented Jul 18, 2023

For the record, Go 1.21 will obviate the need for this this by using WTF-8 to carry filenames on Windows, ensuring round-trip-safe string representation of the full range of valid Windows filenames.

@kevpar
Copy link
Member Author

kevpar commented Jul 26, 2023

For the record, Go 1.21 will obviate the need for this this by using WTF-8 to carry filenames on Windows, ensuring round-trip-safe string representation of the full range of valid Windows filenames.

Yeah, I think it probably makes sense to just wait for go1.21 at this point. Glad that change made it in. :)

@kevpar kevpar closed this Nov 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants