Skip to content

1.1-maint vs master performance difference #4498

@ThomasWaldmann

Description

@ThomasWaldmann

In #3955 the question came up, why borg create with master branch code is slower than with 1.1-maint branch code. The benchmarks I did there were archiving lots of relatively small files.

I did a source code based analysis, checking the "hot" code path that is taken when backing up known, unchanged, regular files (like when doing a 2nd backup right after a first one, without much changes).

Found this difference, in process_file (simplified pseudo code that is new in master):

        fd = open(path)
        st_fd = fstat(fd)
        stat_update_check(st_path, st_fd)

These changes were done when switching borg master branch to work based on a FD (file descriptor) to avoid race conditions and potential security issues. See #4043.

For this, master code does it like this:

  • st_path = stat(path) - before dispatching to file-type handler
  • dispatch based on st_path (e.g. to process_file, process_fifo, process_symlink, ...)
  • fd = open(path) - get an FD, so we can do other operations based on this FD. new in master (1.1-maint did not open unchanged files and did acquire all metadata based on the path)
  • stat_update_check: determining st_fd = fstat(fd) to check for a race condition ("did we dispatch to correct file-type handler?") by comparing st_fd to st_path. this call / this check is new in master
  • reading file contents (based on open fd, but not in the "hot" case, here we know contents have not changed)
  • reading bsdflags (based on open fd on linux, based on st_fd on others)
  • reading xattrs (based on open fd)
  • reading ACLs (based on open fd)

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions