Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"no such file or directory" when copying large amounts of data #72

Closed
m90 opened this issue Jan 18, 2022 · 8 comments
Closed

"no such file or directory" when copying large amounts of data #72

m90 opened this issue Jan 18, 2022 · 8 comments
Assignees

Comments

@m90
Copy link

m90 commented Jan 18, 2022

I'm using this package in a tool for backing up Docker volumes: https://github.com/offen/docker-volume-backup

Users that do not want to stop their containers while taking a backup can opt in to copying their data to a temporary location before creating the tar archive so that creating the archive does not fail in case data is being written to a file while it's being backed up. To perform this copy, package copy is used (thanks for making it public, much appreciated).

This seemed to work well in tests as well as the real world, however recently an issue was raised where copy would fail with the following error when backing up the data volume for a Prometheus container:

open /backup/prometheus_data/01FSM8TPFEXQ0QC28H11PMQZ0R: no such file or directory

The dataset that is being copied seems to be a. very large and b. pretty volatile which has me thinking this file might actually have been deleted/moved before copy finds the resources to actually copy it. This is the downstream issue: offen/docker-volume-backup#49

Is this issue somehow known? Is there a way to fix it by configuring copy differently?

This is the part where I use copy in code and also where the above error is being returned:

if err := copy.Copy(s.c.BackupSources, backupSources, copy.Options{
	PreserveTimes: true,
	PreserveOwner: true,
}); err != nil {
	return fmt.Errorf("takeBackup: error creating snapshot: %w", err)
}
@otiai10
Copy link
Owner

otiai10 commented Jan 18, 2022

Thank you, @m90
Answering your question quickly, no, it's not known.

Let me clarify that you think there are two possible cause of this problem:

a. Copy failed because the src dir is too large.
b. Copy failed because the src dir does not exist.

Tell me why you think of case a?

@m90
Copy link
Author

m90 commented Jan 19, 2022

My line of thinking (without knowing too much about what copy is actually doing) ist that a. could increase the probability of b. happening as it takes longer to copy over everything, thus increasing the likelihood of another process deleting files and/or directories in the initial set of files to be copied while copy is still working.

Or is there a flaw in that?

@m90
Copy link
Author

m90 commented Jan 19, 2022

For example here:

copy/copy.go

Lines 142 to 166 in 9aae5f7

contents, err := ioutil.ReadDir(srcdir)
if err != nil {
return
}
for _, content := range contents {
cs, cd := filepath.Join(srcdir, content.Name()), filepath.Join(destdir, content.Name())
if err = copyNextOrSkip(cs, cd, content, opt); err != nil {
// If any error, exit immediately
return
}
}
if opt.PreserveTimes {
if err := preserveTimes(info, destdir); err != nil {
return err
}
}
if opt.PreserveOwner {
if err := preserveOwner(srcdir, destdir, info); err != nil {
return err
}
}

we could run into a situation where copyNextOrSkip takes a long time and in the meantime, someone else deletes the next entry in the slice of contents, making the next iteration fail. The likelihood of such a situation happening should increase with the overall amount of files to be copied.

@otiai10
Copy link
Owner

otiai10 commented Jan 19, 2022

Fair enough. Worth thinking.
Thank you very much.

The core issue is neither size nor time, imo.

That is "should we lock what we wanna copy till it's done?".

Let me think about it to make the best interface for us.

@m90
Copy link
Author

m90 commented Jan 19, 2022

That is "should we lock what we wanna copy till it's done?".

This sums it up perfectly :)

If it's possible to add such an option that would definitely be of much help.

@ncopa
Copy link
Contributor

ncopa commented Apr 4, 2023

That is "should we lock what we wanna copy till it's done?".

I don't think there is any point in trying to do that from this go module. You will get the same race condition when trying to lock the file because file can get deleted after directory is read. So the only way to do this is to lock the file system before reading directory, and that can only be done either by "locking" the entire filesystem (eg filesystem snapshot, or lvm snapshot), or by pausing the docker container in the docker-volume-backup use case.

I think it would be enough to simply ignore os.IsNotExist(err).

Users that do not want to stop their containers while taking a backup can opt in to copying their data ...

If you don't stop or pause the container before copying you will always risk that files are deleted while you copy.

ncopa added a commit to ncopa/copy that referenced this issue Apr 5, 2023
ncopa added a commit to ncopa/copy that referenced this issue Apr 5, 2023
Files, symlinks and directories may be deleted while or after directory
list is read. Add test to simulate this so we can fix the desired
behavior.

ref otiai10#72
@otiai10
Copy link
Owner

otiai10 commented Apr 5, 2023

Thank you, and agree with your idea @ncopa, locking is not what this package should provide.
Let me check your pr. I appreciate your way to separate commits and push test code first.

@otiai10 otiai10 closed this as completed in b23de9d Apr 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants