Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fs: copy: use copy_file_range on Linux #50772

Merged
merged 10 commits into from
May 30, 2018
Merged

Conversation

nicokoch
Copy link
Contributor

Linux 4.5 introduced a new system call copy_file_range to copy data from one file to another.

This PR uses the new system call (if available). This has several advantages:

  1. No need to constantly copy data from userspace to kernel space, if the buffer is small or the file is large
  2. On some filesystems, like BTRFS, the kernel can leverage internal fs mechanisms for huge performance gains
  3. Filesystems on the network dont need to copy data between the host and the client machine (they have to in the current read/write implementation)

I have created a small library that also implements the new system call for some huge performance gains here: https://github.com/nicokoch/fastcopy
Benchmark results are in the README

@nicokoch nicokoch changed the title fs: copy: user copy_file_range on Linux fs: copy: use copy_file_range on Linux May 15, 2018
@rust-highfive
Copy link
Collaborator

Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @TimNN (or someone else) soon.

If any changes to this PR are deemed necessary, please add them as extra commits. This ensures that the reviewer can see what has changed since they last reviewed the code. Due to the way GitHub handles out-of-date commits, this should also make it reasonably obvious what issues have or haven't been addressed. Large or tricky changes may require several passes of review and changes.

Please see the contribution instructions for more information.

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label May 15, 2018
@rust-highfive
Copy link
Collaborator

The job x86_64-gnu-llvm-3.9 of your PR failed on Travis (raw log). Through arcane magic we have determined that the following fragments from the build log may contain information about the problem.

Click to expand the log.

[00:06:06] travis_fold:start:tidy
travis_time:start:tidy
tidy check
[00:06:06] tidy error: /checkout/src/libstd/sys/unix/fs.rs:815: trailing whitespace
[00:06:08] some tidy checks failed
[00:06:08] 
[00:06:08] 
[00:06:08] command did not execute successfully: "/checkout/obj/build/x86_64-unknown-linux-gnu/stage0-tools-bin/tidy" "/checkout/src" "/checkout/obj/build/x86_64-unknown-linux-gnu/stage0/bin/cargo" "--no-vendor" "--quiet"
[00:06:08] 
[00:06:08] 
[00:06:08] failed to run: /checkout/obj/build/bootstrap/debug/bootstrap test src/tools/tidy
[00:06:08] Build completed unsuccessfully in 0:02:33
[00:06:08] Build completed unsuccessfully in 0:02:33
[00:06:08] Makefile:79: recipe for target 'tidy' failed
[00:06:08] make: *** [tidy] Error 1

The command "stamp sh -x -c "$RUN_SCRIPT"" exited with 2.
travis_time:start:0dfa0b4c
$ date && (curl -fs --head https://google.com | grep ^Date: | sed 's/Date: //g' || true)

I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact @TimNN. (Feature Requests)

@TimNN
Copy link
Contributor

TimNN commented May 15, 2018

Thanks for the PR! As far as I can tell the implementation looks mostly good. While reviewing it would have been awesome if some more comments about the syscall had been present (mostly with regard to the syscall updating / reading the file offset, so no need to use written in the arguments). Maybe you could add a comment along those lines?

Do we care about the (impossible?) edge case of some calls to copy_file_range succeeding and then failing in such a way that io::copy is called?

Is it possible for this to (significantly) slow down the case where copy_file_range is not available?

I think this needs some more carful handling of the usize / u64 conversions. (IIUC, copying files larger than 4GB on as 32bit system will currently lead to bad copies).

cc @rust-lang/libs: What do you think about this kind of specialised code?

@alexcrichton
Copy link
Member

We're generally amenable to using newer features when detected so long as it's functionally the same as what's there today. That's how things like getrandom, accept4, etc. work. Most of the time, however, we do the feature detection once and then cache what happened in a global, and that way all future calls don't continue to get ENOSYS or do feature detection

@nicokoch
Copy link
Contributor Author

Ping @TimNN

@alexcrichton : IMO, it doesn't make sense to store the availability in a global here:

  1. It's not trivial to do a "noop" copy_file range once. Maybe we could write to /dev/zero or /dev/null or something. But I think it's also possible that these special files do not exist in some configurations.
  2. We have to check the error to detect EXDEV, so also checking for ENOSYS doesn't add a lot of overhead.

@alexcrichton
Copy link
Member

@nicokoch if that's the case, can you whip up a benchmark copying lots of small files and see what the perf impact of this change is for a system that doesn't support the syscall?

@nagisa
Copy link
Member

nagisa commented May 16, 2018

If ENOSYS occurs, it will occur always (if we ignore kpatch, which is very warranted to do) during the execution of the binary, making repeated syscalls a waste of computing resources.

Implementing such a thing is definitely possible and feasible. Here's a short sketch (based on PRd code slightly):

static HAS_COPY_FILE_RANGE: AtomicBool = AtomicBool::new(true);

let copy_result = if HAS_COPY_FILE_RANGE.load(Relaxed) {
    copy_file_range(...)
} else {
    ENOSYS
};

if copy_result == ENOSYS {
    HAS_COPY_FILE_RANGE.store(false, Relaxed);
}
if copy_result == ENOSYS || copy_result == EXDEV {
    // attempt simple copy
}

EXDEV is an error of an entirely different kind and does not factor into HAS_COPY_FILE_RANGE at all.

@nicokoch
Copy link
Contributor Author

Thanks @nagisa . Will implement with the help of your snippet

@nicokoch
Copy link
Contributor Author

Ping

Copy link
Contributor

@TimNN TimNN left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If at all possible, could you extend the benchmark results in your repo with a comparison along the lines that @alexcrichton has pointed out?

I'd be interested in comparing the performance numbers of copying many (different) smaller files

  • With copy_file_range on the same filesystem
  • With copy_file_range across filesystems (so the atomic check does not apply)
  • Without copy_file_range (so the atomic check applies)

} else {
(len - written) as usize
};
let copy_result = if HAS_COPY_FILE_RANGE.load(Ordering::Relaxed) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would move this out of the loop (store the result of the load in a local variable). I don't know if there is any noteworthy overhead of reading relaxed atomic but that feels like a safer alternative.


let mut written = 0u64;
while written < len {
let bytes_to_copy = if len - written > usize::max_value() as u64 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally this would use TryFrom, but that appears to be not implemented for u64 -> usize case.

This implementation could fail for usize == u128 but I don't think this is something we are supporting right now.

  - Move loading of atomic bool outside the loop
  - Add comment about TryFrom for future improvement
@nicokoch
Copy link
Contributor Author

nicokoch commented May 24, 2018

I implemented the atomic load before the loop and added a comment about TryFrom.
This can later be revisited once let my_usize: usize = 0u64.try_from().unwrap() works.

I will see what I can do about the benchmarks :)

@rust-highfive
Copy link
Collaborator

The job x86_64-gnu-llvm-3.9 of your PR failed on Travis (raw log). Through arcane magic we have determined that the following fragments from the build log may contain information about the problem.

Click to expand the log.

[00:04:37] travis_fold:start:tidy
travis_time:start:tidy
tidy check
[00:04:38] tidy error: /checkout/src/libstd/sys/unix/fs.rs:857: TODO is deprecated; use FIXME
[00:04:39] some tidy checks failed
[00:04:39] 
[00:04:39] 
[00:04:39] command did not execute successfully: "/checkout/obj/build/x86_64-unknown-linux-gnu/stage0-tools-bin/tidy" "/checkout/src" "/checkout/obj/build/x86_64-unknown-linux-gnu/stage0/bin/cargo" "--no-vendor" "--quiet"
[00:04:39] 
[00:04:39] 
[00:04:39] failed to run: /checkout/obj/build/bootstrap/debug/bootstrap test src/tools/tidy
[00:04:39] Build completed unsuccessfully in 0:01:51
[00:04:39] Build completed unsuccessfully in 0:01:51
[00:04:39] Makefile:79: recipe for target 'tidy' failed
[00:04:39] make: *** [tidy] Error 1

The command "stamp sh -x -c "$RUN_SCRIPT"" exited with 2.
travis_time:start:00640c9d
$ date && (curl -fs --head https://google.com | grep ^Date: | sed 's/Date: //g' || true)

I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact @TimNN. (Feature Requests)

@nicokoch
Copy link
Contributor Author

@TimNN I have added the benchmark results and some notes in my repo's readme:
https://github.com/nicokoch/fastcopy

@pietroalbini
Copy link
Member

Ping from triage @TimNN! The author should have addressed your comments.

let has_copy_file_range = HAS_COPY_FILE_RANGE.load(Ordering::Relaxed);
let mut written = 0u64;
while written < len {
// TODO should ideally use TryFrom
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use FIXME instead (or the linter complains).

@TimNN
Copy link
Contributor

TimNN commented May 28, 2018

Thanks for the extensive benchmarks @nicokoch, awesome work!

The benchmark results look fine to me, the atomic check introduces negligible overhead but there is some (small, and for larger files probably negligible) overhead when copying many small files across devices (because an additional syscall happens for every copy operation).

I would propose to merge this, however @alexcrichton, can you also please take a look at the benchmark results in https://github.com/nicokoch/fastcopy? (And, should that be needed, propose an FCP or alternatively let us know what you think).

@nicokoch
Copy link
Contributor Author

Thanks!

Note that benchmarks with fs stuff are kind of unreliable and vary depending on background Prozesses and things like i/o scheduling. Sometimes, my implementation was even faster in the EXDEV case, which is logically impossible. So take them with a grain of salt.

@nicokoch
Copy link
Contributor Author

Linter complaints are also fixed now

Copy link
Member

@alexcrichton alexcrichton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes look good to me other than a few nits, and the benchmarks also look good too! Thanks so much again @nicokoch!

while written < len {
let copy_result = if has_copy_file_range {
// FIXME: should ideally use TryFrom
let bytes_to_copy = if len - written > usize::max_value() as u64 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps this?

let bytes_to_copy = cmp::min(len - written, usize::max_value() as u64) as usize;

match err.raw_os_error() {
Some(os_err) if os_err == libc::ENOSYS || os_err == libc::EXDEV => {
// Either kernel is too old or the files are not mounted on the same fs.
// Try again with fallback method
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you also throw an assert in here that written is 0? I don't think it's possible for it to not be 0 but the return value at least I believe will be invalid if it's nonzero

  - compute bytes_to_copy more elegantly
  - add assert that written is 0 in fallback case
@nicokoch
Copy link
Contributor Author

Thanks for the review @alexcrichton ! Nits were addressed in the latest commit.

@alexcrichton
Copy link
Member

@bors: r+

@bors
Copy link
Contributor

bors commented May 29, 2018

📌 Commit c7d6a01 has been approved by alexcrichton

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels May 29, 2018
@bors
Copy link
Contributor

bors commented May 29, 2018

⌛ Testing commit c7d6a01 with merge ec99b22...

bors added a commit that referenced this pull request May 29, 2018
fs: copy: use copy_file_range on Linux

Linux 4.5 introduced a new system call [copy_file_range](http://man7.org/linux/man-pages/man2/copy_file_range.2.html) to copy data from one file to another.

This PR uses the new system call (if available). This has several advantages:

1. No need to constantly copy data from userspace to kernel space, if the buffer is small or the file is large
2. On some filesystems, like BTRFS, the kernel can leverage internal fs mechanisms for huge performance gains
3. Filesystems on the network dont need to copy data between the host and the client machine (they have to in the current read/write implementation)

I have created a small library that also implements the new system call for some huge performance gains here: https://github.com/nicokoch/fastcopy
Benchmark results are in the README
@bors
Copy link
Contributor

bors commented May 30, 2018

☀️ Test successful - status-appveyor, status-travis
Approved by: alexcrichton
Pushing ec99b22 to master...

};
if let Err(ref copy_err) = copy_result {
if let Some(libc::ENOSYS) = copy_err.raw_os_error() {
HAS_COPY_FILE_RANGE.store(false, Ordering::Relaxed);
Copy link
Contributor

@meven meven Jun 6, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't has_copy_file_range be updated to false as well ?
For the first file, after the first chunk of the file has been copied, if copy_file_range was not available, next iterations of the while loop will try again to use copy_file_range and fail continuously, we could avoid it, making mut has_copy_file_range and here has_copy_file_range = false;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. If copy_result is Err the loop will always terminate in the current iteration.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants