fs: copy: use copy_file_range on Linux #50772

nicokoch · 2018-05-15T14:28:32Z

Linux 4.5 introduced a new system call copy_file_range to copy data from one file to another.

This PR uses the new system call (if available). This has several advantages:

No need to constantly copy data from userspace to kernel space, if the buffer is small or the file is large
On some filesystems, like BTRFS, the kernel can leverage internal fs mechanisms for huge performance gains
Filesystems on the network dont need to copy data between the host and the client machine (they have to in the current read/write implementation)

I have created a small library that also implements the new system call for some huge performance gains here: https://github.com/nicokoch/fastcopy
Benchmark results are in the README

rust-highfive · 2018-05-15T14:28:43Z

Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @TimNN (or someone else) soon.

If any changes to this PR are deemed necessary, please add them as extra commits. This ensures that the reviewer can see what has changed since they last reviewed the code. Due to the way GitHub handles out-of-date commits, this should also make it reasonably obvious what issues have or haven't been addressed. Large or tricky changes may require several passes of review and changes.

Please see the contribution instructions for more information.

rust-highfive · 2018-05-15T14:45:42Z

The job x86_64-gnu-llvm-3.9 of your PR failed on Travis (raw log). Through arcane magic we have determined that the following fragments from the build log may contain information about the problem.

Click to expand the log.


[00:06:06] travis_fold:start:tidy
travis_time:start:tidy
tidy check
[00:06:06] tidy error: /checkout/src/libstd/sys/unix/fs.rs:815: trailing whitespace
[00:06:08] some tidy checks failed
[00:06:08] 
[00:06:08] 
[00:06:08] command did not execute successfully: "/checkout/obj/build/x86_64-unknown-linux-gnu/stage0-tools-bin/tidy" "/checkout/src" "/checkout/obj/build/x86_64-unknown-linux-gnu/stage0/bin/cargo" "--no-vendor" "--quiet"
[00:06:08] 
[00:06:08] 
[00:06:08] failed to run: /checkout/obj/build/bootstrap/debug/bootstrap test src/tools/tidy
[00:06:08] Build completed unsuccessfully in 0:02:33
[00:06:08] Build completed unsuccessfully in 0:02:33
[00:06:08] Makefile:79: recipe for target 'tidy' failed
[00:06:08] make: *** [tidy] Error 1

The command "stamp sh -x -c "$RUN_SCRIPT"" exited with 2.
travis_time:start:0dfa0b4c
$ date && (curl -fs --head https://google.com | grep ^Date: | sed 's/Date: //g' || true)

I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact @TimNN. (Feature Requests)

TimNN · 2018-05-15T16:47:27Z

Thanks for the PR! As far as I can tell the implementation looks mostly good. While reviewing it would have been awesome if some more comments about the syscall had been present (mostly with regard to the syscall updating / reading the file offset, so no need to use written in the arguments). Maybe you could add a comment along those lines?

Do we care about the (impossible?) edge case of some calls to copy_file_range succeeding and then failing in such a way that io::copy is called?

Is it possible for this to (significantly) slow down the case where copy_file_range is not available?

I think this needs some more carful handling of the usize / u64 conversions. (IIUC, copying files larger than 4GB on as 32bit system will currently lead to bad copies).

cc @rust-lang/libs: What do you think about this kind of specialised code?

alexcrichton · 2018-05-15T18:04:00Z

We're generally amenable to using newer features when detected so long as it's functionally the same as what's there today. That's how things like getrandom, accept4, etc. work. Most of the time, however, we do the feature detection once and then cache what happened in a global, and that way all future calls don't continue to get ENOSYS or do feature detection

nicokoch · 2018-05-16T08:44:17Z

Ping @TimNN

@alexcrichton : IMO, it doesn't make sense to store the availability in a global here:

It's not trivial to do a "noop" copy_file range once. Maybe we could write to /dev/zero or /dev/null or something. But I think it's also possible that these special files do not exist in some configurations.
We have to check the error to detect EXDEV, so also checking for ENOSYS doesn't add a lot of overhead.

alexcrichton · 2018-05-16T14:14:29Z

@nicokoch if that's the case, can you whip up a benchmark copying lots of small files and see what the perf impact of this change is for a system that doesn't support the syscall?

nagisa · 2018-05-16T20:00:59Z

If ENOSYS occurs, it will occur always (if we ignore kpatch, which is very warranted to do) during the execution of the binary, making repeated syscalls a waste of computing resources.

Implementing such a thing is definitely possible and feasible. Here's a short sketch (based on PRd code slightly):

static HAS_COPY_FILE_RANGE: AtomicBool = AtomicBool::new(true);

let copy_result = if HAS_COPY_FILE_RANGE.load(Relaxed) {
    copy_file_range(...)
} else {
    ENOSYS
};

if copy_result == ENOSYS {
    HAS_COPY_FILE_RANGE.store(false, Relaxed);
}
if copy_result == ENOSYS || copy_result == EXDEV {
    // attempt simple copy
}

EXDEV is an error of an entirely different kind and does not factor into HAS_COPY_FILE_RANGE at all.

nicokoch · 2018-05-17T11:40:09Z

Thanks @nagisa . Will implement with the help of your snippet

nicokoch · 2018-05-19T08:21:42Z

Ping

TimNN

If at all possible, could you extend the benchmark results in your repo with a comparison along the lines that @alexcrichton has pointed out?

I'd be interested in comparing the performance numbers of copying many (different) smaller files

With copy_file_range on the same filesystem
With copy_file_range across filesystems (so the atomic check does not apply)
Without copy_file_range (so the atomic check applies)

TimNN · 2018-05-19T08:34:45Z

src/libstd/sys/unix/fs.rs

+        } else {
+            (len - written) as usize
+        };
+        let copy_result = if HAS_COPY_FILE_RANGE.load(Ordering::Relaxed) {


I would move this out of the loop (store the result of the load in a local variable). I don't know if there is any noteworthy overhead of reading relaxed atomic but that feels like a safer alternative.

TimNN · 2018-05-19T08:34:49Z

src/libstd/sys/unix/fs.rs

+
+    let mut written = 0u64;
+    while written < len {
+        let bytes_to_copy = if len - written > usize::max_value() as u64 {


Ideally this would use TryFrom, but that appears to be not implemented for u64 -> usize case.

This implementation could fail for usize == u128 but I don't think this is something we are supporting right now.

- Move loading of atomic bool outside the loop - Add comment about TryFrom for future improvement

nicokoch · 2018-05-24T13:01:15Z

I implemented the atomic load before the loop and added a comment about TryFrom.
This can later be revisited once let my_usize: usize = 0u64.try_from().unwrap() works.

I will see what I can do about the benchmarks :)

rust-highfive · 2018-05-24T13:10:07Z

The job x86_64-gnu-llvm-3.9 of your PR failed on Travis (raw log). Through arcane magic we have determined that the following fragments from the build log may contain information about the problem.

Click to expand the log.


[00:04:37] travis_fold:start:tidy
travis_time:start:tidy
tidy check
[00:04:38] tidy error: /checkout/src/libstd/sys/unix/fs.rs:857: TODO is deprecated; use FIXME
[00:04:39] some tidy checks failed
[00:04:39] 
[00:04:39] 
[00:04:39] command did not execute successfully: "/checkout/obj/build/x86_64-unknown-linux-gnu/stage0-tools-bin/tidy" "/checkout/src" "/checkout/obj/build/x86_64-unknown-linux-gnu/stage0/bin/cargo" "--no-vendor" "--quiet"
[00:04:39] 
[00:04:39] 
[00:04:39] failed to run: /checkout/obj/build/bootstrap/debug/bootstrap test src/tools/tidy
[00:04:39] Build completed unsuccessfully in 0:01:51
[00:04:39] Build completed unsuccessfully in 0:01:51
[00:04:39] Makefile:79: recipe for target 'tidy' failed
[00:04:39] make: *** [tidy] Error 1

The command "stamp sh -x -c "$RUN_SCRIPT"" exited with 2.
travis_time:start:00640c9d
$ date && (curl -fs --head https://google.com | grep ^Date: | sed 's/Date: //g' || true)

I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact @TimNN. (Feature Requests)

nicokoch · 2018-05-24T15:56:37Z

@TimNN I have added the benchmark results and some notes in my repo's readme:
https://github.com/nicokoch/fastcopy

pietroalbini · 2018-05-28T09:02:17Z

Ping from triage @TimNN! The author should have addressed your comments.

branch

TimNN · 2018-05-28T15:48:28Z

src/libstd/sys/unix/fs.rs

+    let has_copy_file_range = HAS_COPY_FILE_RANGE.load(Ordering::Relaxed);
+    let mut written = 0u64;
+    while written < len {
+        // TODO should ideally use TryFrom


Use FIXME instead (or the linter complains).

TimNN · 2018-05-28T15:53:49Z

Thanks for the extensive benchmarks @nicokoch, awesome work!

The benchmark results look fine to me, the atomic check introduces negligible overhead but there is some (small, and for larger files probably negligible) overhead when copying many small files across devices (because an additional syscall happens for every copy operation).

I would propose to merge this, however @alexcrichton, can you also please take a look at the benchmark results in https://github.com/nicokoch/fastcopy? (And, should that be needed, propose an FCP or alternatively let us know what you think).

nicokoch · 2018-05-28T17:20:28Z

Thanks!

Note that benchmarks with fs stuff are kind of unreliable and vary depending on background Prozesses and things like i/o scheduling. Sometimes, my implementation was even faster in the EXDEV case, which is logically impossible. So take them with a grain of salt.

nicokoch · 2018-05-29T08:38:49Z

Linter complaints are also fixed now

alexcrichton

Changes look good to me other than a few nits, and the benchmarks also look good too! Thanks so much again @nicokoch!

alexcrichton · 2018-05-29T21:12:26Z

src/libstd/sys/unix/fs.rs

+    while written < len {
+        let copy_result = if has_copy_file_range {
+            // FIXME: should ideally use TryFrom
+            let bytes_to_copy = if len - written > usize::max_value() as u64 {


Perhaps this?

let bytes_to_copy = cmp::min(len - written, usize::max_value() as u64) as usize;

alexcrichton · 2018-05-29T21:16:57Z

src/libstd/sys/unix/fs.rs

+                match err.raw_os_error() {
+                    Some(os_err) if os_err == libc::ENOSYS || os_err == libc::EXDEV => {
+                        // Either kernel is too old or the files are not mounted on the same fs.
+                        // Try again with fallback method


Could you also throw an assert in here that written is 0? I don't think it's possible for it to not be 0 but the return value at least I believe will be invalid if it's nonzero

- compute bytes_to_copy more elegantly - add assert that written is 0 in fallback case

nicokoch · 2018-05-29T21:58:00Z

Thanks for the review @alexcrichton ! Nits were addressed in the latest commit.

alexcrichton · 2018-05-29T22:56:34Z

@bors: r+

bors · 2018-05-29T22:56:35Z

📌 Commit c7d6a01 has been approved by alexcrichton

bors · 2018-05-29T23:49:19Z

⌛ Testing commit c7d6a01 with merge ec99b22...

fs: copy: use copy_file_range on Linux Linux 4.5 introduced a new system call [copy_file_range](http://man7.org/linux/man-pages/man2/copy_file_range.2.html) to copy data from one file to another. This PR uses the new system call (if available). This has several advantages: 1. No need to constantly copy data from userspace to kernel space, if the buffer is small or the file is large 2. On some filesystems, like BTRFS, the kernel can leverage internal fs mechanisms for huge performance gains 3. Filesystems on the network dont need to copy data between the host and the client machine (they have to in the current read/write implementation) I have created a small library that also implements the new system call for some huge performance gains here: https://github.com/nicokoch/fastcopy Benchmark results are in the README

bors · 2018-05-30T02:11:52Z

☀️ Test successful - status-appveyor, status-travis
Approved by: alexcrichton
Pushing ec99b22 to master...

meven · 2018-06-06T13:23:25Z

src/libstd/sys/unix/fs.rs

+            };
+            if let Err(ref copy_err) = copy_result {
+                if let Some(libc::ENOSYS) = copy_err.raw_os_error() {
+                    HAS_COPY_FILE_RANGE.store(false, Ordering::Relaxed);


Shouldn't has_copy_file_range be updated to false as well ?
For the first file, after the first chunk of the file has been copied, if copy_file_range was not available, next iterations of the while loop will try again to use copy_file_range and fail continuously, we could avoid it, making mut has_copy_file_range and here has_copy_file_range = false;

No. If copy_result is Err the loop will always terminate in the current iteration.

nicokoch added 2 commits May 15, 2018 15:25

fs: use copy_file_range on linux

834ef9f

Merge branch 'master' of https://github.com/nicokoch/rust

b4b71d5

rust-highfive assigned TimNN May 15, 2018

nicokoch changed the title ~~fs: copy: user copy_file_range on Linux~~ fs: copy: use copy_file_range on Linux May 15, 2018

rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label May 15, 2018

nicokoch added 4 commits May 16, 2018 10:17

Use copy_file_range on android also

00ec3cf

Add clarifying comment about offset argument

b605923

Adjust len in every iteration

f4c2825

Fix large file copies on 32 bit platforms

a5e2942

Store ENOSYS in a global to avoid unnecessary system calls

09d03bc

TimNN reviewed May 19, 2018

View reviewed changes

Implement suggestions from the PR

3f392ab

- Move loading of atomic bool outside the loop - Add comment about TryFrom for future improvement

Use FIXME instead of TODO; Move bytes_to_copy calculation inside if

3b271eb

branch

TimNN reviewed May 28, 2018

View reviewed changes

alexcrichton reviewed May 29, 2018

View reviewed changes

Fix additional nits:

c7d6a01

- compute bytes_to_copy more elegantly - add assert that written is 0 in fallback case

bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels May 29, 2018

bors merged commit c7d6a01 into rust-lang:master May 30, 2018

alexcrichton mentioned this pull request Jun 1, 2018

"Could not copy" (Operation not permitted) when compiling in a docker container #51266

Closed

meven reviewed Jun 6, 2018

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fs: copy: use copy_file_range on Linux #50772

fs: copy: use copy_file_range on Linux #50772

nicokoch commented May 15, 2018

rust-highfive commented May 15, 2018

rust-highfive commented May 15, 2018

TimNN commented May 15, 2018

alexcrichton commented May 15, 2018

nicokoch commented May 16, 2018

alexcrichton commented May 16, 2018

nagisa commented May 16, 2018

nicokoch commented May 17, 2018

nicokoch commented May 19, 2018

TimNN left a comment

TimNN May 19, 2018

TimNN May 19, 2018

nicokoch commented May 24, 2018 •

edited

Loading

rust-highfive commented May 24, 2018

nicokoch commented May 24, 2018

pietroalbini commented May 28, 2018

TimNN May 28, 2018

TimNN commented May 28, 2018

nicokoch commented May 28, 2018

nicokoch commented May 29, 2018

alexcrichton left a comment •

edited

Loading

alexcrichton May 29, 2018

alexcrichton May 29, 2018

nicokoch commented May 29, 2018

alexcrichton commented May 29, 2018

bors commented May 29, 2018

bors commented May 29, 2018

bors commented May 30, 2018

meven Jun 6, 2018 •

edited

Loading

TimNN Jun 6, 2018

fs: copy: use copy_file_range on Linux #50772

fs: copy: use copy_file_range on Linux #50772

Conversation

nicokoch commented May 15, 2018

rust-highfive commented May 15, 2018

rust-highfive commented May 15, 2018

TimNN commented May 15, 2018

alexcrichton commented May 15, 2018

nicokoch commented May 16, 2018

alexcrichton commented May 16, 2018

nagisa commented May 16, 2018

nicokoch commented May 17, 2018

nicokoch commented May 19, 2018

TimNN left a comment

Choose a reason for hiding this comment

TimNN May 19, 2018

Choose a reason for hiding this comment

TimNN May 19, 2018

Choose a reason for hiding this comment

nicokoch commented May 24, 2018 • edited Loading

rust-highfive commented May 24, 2018

nicokoch commented May 24, 2018

pietroalbini commented May 28, 2018

TimNN May 28, 2018

Choose a reason for hiding this comment

TimNN commented May 28, 2018

nicokoch commented May 28, 2018

nicokoch commented May 29, 2018

alexcrichton left a comment • edited Loading

Choose a reason for hiding this comment

alexcrichton May 29, 2018

Choose a reason for hiding this comment

alexcrichton May 29, 2018

Choose a reason for hiding this comment

nicokoch commented May 29, 2018

alexcrichton commented May 29, 2018

bors commented May 29, 2018

bors commented May 29, 2018

bors commented May 30, 2018

meven Jun 6, 2018 • edited Loading

Choose a reason for hiding this comment

TimNN Jun 6, 2018

Choose a reason for hiding this comment

nicokoch commented May 24, 2018 •

edited

Loading

alexcrichton left a comment •

edited

Loading

meven Jun 6, 2018 •

edited

Loading