-
Notifications
You must be signed in to change notification settings - Fork 6.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Upgrade xxhash.h to latest dev #11098
Conversation
@pdillinger has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM - will appreciate extra eye on our regression benchmark to further ensure no regression.
Learning question: why didn't we directly merge with the latest dev but instead do it in a two-step manner - first merging with last xxhash.h version we used and then to the latestdev? For example, in the step of "git show 2c611a76f914828bed675f0f342d6c4199ffee1e:xxhash.h > ../rocksdb/util/xxhash.h", we do maybe "git show latestdevhash:xxhash.h > ../rocksdb/util/xxhash.h". Is it for less merging conflict at each merge??
@pdillinger Should we also mention this in history for "performance improvement" section? |
I tried simply applying a patch of the upstream changes, and enough hunks failed to apply that I wanted to do a git-based merge. But that's a bit tricky. In order to do the merge with git across repos, we need a RocksDB commit that matches the last merged version from upstream, without our RocksDB changes. We don't have this permanently in rocksdb git because our infrastructure linearizes the history in the "main" branch, so we only keep the versions with RocksDB changes. I suppose I could push a branch on facebook/rocksdb for it. Done: https://github.com/facebook/rocksdb/commits/xxhash_merge_base How:
P is the RocksDB commit before the last xxhash merge and A is the latest main. A' is exactly the same as A except that it has X in its history, which is created to have an unmodified version of what was last merged from upstream. To get Y we modify X with the latest upstream changes from xxhash. Then, merging Y into A' allows us to use git to resolve conflicts in the usual way, as if the parallel xxhash development and RocksDB development had been happening in the same repo. |
What are the differences now between the "standard" xxhash and the "RocksDB" version? Can they be called out somewhere? And if there are no differences, can we either move xxhash into 3rd party or check for it to be installed similar to the compression algorithms? |
You can answer this yourself with diff.
I'm not a fan of unauthoritative lists of relatively unimportant things. It's begging to fall out of date and waste readers and developers time. |
@pdillinger has updated the pull request. You must reimport the pull request before landing. |
@pdillinger has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
@pdillinger merged this pull request in fd911f9. |
Upgrading xxhash.h to latest dev version as of 1/17/2023, which is d7197ddea81364a539051f116ca77926100fc77f This should improve performance on some ARM machines.
I allowed some of our RocksDB-specific changes to be made obsolete where it seemed appropriate, for example
Merging and resolving conflicts one way or the other was all that went into this diff. Except I had to mix the two sides around
defined(__loongarch64)
How I did the upgrade (for future reference), so that I could use usual merge conflict resolution:
Branch https://github.com/facebook/rocksdb/tree/xxhash_merge_base can be used as a base for future xxhash merges.
Fixes #11073
Test Plan: existing tests (e.g. Bloom filter schema stability tests)
Also seems to include a small performance boost on my Intel dev machine, using
./db_bench --benchmarks=xxh3[-X50] 2>&1 | egrep -o 'operations;.*' | sort
Fastest out of 50 runs, before: 15477.3 MB/s
Fastest out of 50 runs, after: 15850.7 MB/s, and 11 more runs faster than the "before" number
Slowest out of 50 runs, before: 12267.5 MB/s
Slowest out of 50 runs, after: 13897.1 MB/s
More repetitions show the distinction is repeatable