-
-
Notifications
You must be signed in to change notification settings - Fork 808
Open
Labels
Milestone
Description
In #3955 the question came up, why borg create with master branch code is slower than with 1.1-maint branch code. The benchmarks I did there were archiving lots of relatively small files.
I did a source code based analysis, checking the "hot" code path that is taken when backing up known, unchanged, regular files (like when doing a 2nd backup right after a first one, without much changes).
Found this difference, in process_file (simplified pseudo code that is new in master):
fd = open(path)
st_fd = fstat(fd)
stat_update_check(st_path, st_fd)
These changes were done when switching borg master branch to work based on a FD (file descriptor) to avoid race conditions and potential security issues. See #4043.
For this, master code does it like this:
st_path = stat(path)- before dispatching to file-type handler- dispatch based on st_path (e.g. to process_file, process_fifo, process_symlink, ...)
fd = open(path)- get an FD, so we can do other operations based on this FD. new in master (1.1-maint did not open unchanged files and did acquire all metadata based on the path)- stat_update_check: determining
st_fd = fstat(fd)to check for a race condition ("did we dispatch to correct file-type handler?") by comparing st_fd to st_path. this call / this check is new in master - reading file contents (based on open fd, but not in the "hot" case, here we know contents have not changed)
- reading bsdflags (based on open fd on linux, based on st_fd on others)
- reading xattrs (based on open fd)
- reading ACLs (based on open fd)