Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade xxhash.h to latest dev #11098

Closed
wants to merge 7 commits into from

Conversation

pdillinger
Copy link
Contributor

@pdillinger pdillinger commented Jan 17, 2023

Upgrading xxhash.h to latest dev version as of 1/17/2023, which is d7197ddea81364a539051f116ca77926100fc77f This should improve performance on some ARM machines.

I allowed some of our RocksDB-specific changes to be made obsolete where it seemed appropriate, for example

Merging and resolving conflicts one way or the other was all that went into this diff. Except I had to mix the two sides around defined(__loongarch64)

How I did the upgrade (for future reference), so that I could use usual merge conflict resolution:

# New branch to help with merging
git checkout -b xxh_merge_base
# Check out RocksDB revision before last xxhash.h upgrade
git reset --hard 22161b7547652af82a5dc67458de9ca8946ac83d^
# Create a commit with the raw base version from xxHash repo (from xxHash repo)
git show 2c611a76f914828bed675f0f342d6c4199ffee1e:xxhash.h > ../rocksdb/util/xxhash.h
# In RocksDB repo
git commit -a
# Merge in the last xxhash.h upgrade
git merge 22161b7547652af82a5dc67458de9ca8946ac83d
# Resolve conflict using committed version
git show 22161b7547652af82a5dc67458de9ca8946ac83d:util/xxhash.h > util/xxhash.h
git commit -a
# Catch up to upstream
git merge upstream/main

# Create a different branch for applying raw upgrade
git checkout -b xxh_upgrade_2023
# Find the RocksDB commit we made for the raw base version from xxHash
git log main..HEAD
# Rewind to it
git reset --hard 2428b727a9a19c6078bb75895805d7488cbdd08c
# Copy in latest raw version (from xxHash repo)
cat xxhash.h > ../rocksdb/util/xxhash.h
# Merge in RocksDB changes, use typical tools for conflict resolution
git merge xxh_merge_base

Branch https://github.com/facebook/rocksdb/tree/xxhash_merge_base can be used as a base for future xxhash merges.

Fixes #11073

Test Plan: existing tests (e.g. Bloom filter schema stability tests)

Also seems to include a small performance boost on my Intel dev machine, using ./db_bench --benchmarks=xxh3[-X50] 2>&1 | egrep -o 'operations;.*' | sort

Fastest out of 50 runs, before: 15477.3 MB/s
Fastest out of 50 runs, after: 15850.7 MB/s, and 11 more runs faster than the "before" number

Slowest out of 50 runs, before: 12267.5 MB/s
Slowest out of 50 runs, after: 13897.1 MB/s

More repetitions show the distinction is repeatable

@facebook-github-bot
Copy link
Contributor

@pdillinger has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@pdillinger pdillinger requested a review from hx235 January 18, 2023 00:51
Copy link
Contributor

@hx235 hx235 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - will appreciate extra eye on our regression benchmark to further ensure no regression.

Learning question: why didn't we directly merge with the latest dev but instead do it in a two-step manner - first merging with last xxhash.h version we used and then to the latestdev? For example, in the step of "git show 2c611a76f914828bed675f0f342d6c4199ffee1e:xxhash.h > ../rocksdb/util/xxhash.h", we do maybe "git show latestdevhash:xxhash.h > ../rocksdb/util/xxhash.h". Is it for less merging conflict at each merge??

@hx235
Copy link
Contributor

hx235 commented Jan 18, 2023

@pdillinger Should we also mention this in history for "performance improvement" section?

@pdillinger
Copy link
Contributor Author

Learning question: why didn't we directly merge with the latest dev but instead do it in a two-step manner

I tried simply applying a patch of the upstream changes, and enough hunks failed to apply that I wanted to do a git-based merge. But that's a bit tricky. In order to do the merge with git across repos, we need a RocksDB commit that matches the last merged version from upstream, without our RocksDB changes. We don't have this permanently in rocksdb git because our infrastructure linearizes the history in the "main" branch, so we only keep the versions with RocksDB changes. I suppose I could push a branch on facebook/rocksdb for it. Done: https://github.com/facebook/rocksdb/commits/xxhash_merge_base

How:

->P  -> ... -> A
   \           \
    `-> X  -----`-> A'
         \           \
          `-> Y ------`-> B

P is the RocksDB commit before the last xxhash merge and A is the latest main. A' is exactly the same as A except that it has X in its history, which is created to have an unmodified version of what was last merged from upstream. To get Y we modify X with the latest upstream changes from xxhash. Then, merging Y into A' allows us to use git to resolve conflicts in the usual way, as if the parallel xxhash development and RocksDB development had been happening in the same repo.

@mrambacher
Copy link
Contributor

What are the differences now between the "standard" xxhash and the "RocksDB" version? Can they be called out somewhere? And if there are no differences, can we either move xxhash into 3rd party or check for it to be installed similar to the compression algorithms?

@pdillinger
Copy link
Contributor Author

pdillinger commented Jan 19, 2023

What are the differences now between the "standard" xxhash and the "RocksDB" version?

You can answer this yourself with diff.

Can they be called out somewhere?

I'm not a fan of unauthoritative lists of relatively unimportant things. It's begging to fall out of date and waste readers and developers time.

@facebook-github-bot
Copy link
Contributor

@pdillinger has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@pdillinger has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@pdillinger merged this pull request in fd911f9.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Performance for xxh3 on ARM is almost 1.5X better with latest code from xxHash repo
4 participants