Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix technical UB + compiler warnings #549

Closed
wants to merge 1 commit into from

Conversation

pdillinger
Copy link
Contributor

@pdillinger pdillinger commented Aug 6, 2021

Summary:

  • XXH*_update functions technically have UB on pointer arithmetic
    beyond the bounds of the input object, which can trigger a compiler
    warning in some cases (e.g. statically sized input buffer). This can be
    suppressed with cast to uintptr_t where available. (In theory if your
    buffer is close to an extreme of your address space, it could still
    over/underflow, but eh.)

  • typedef XXH_endianess in implementation is unused, so removed.
    (When it was in xxhash.c it would cause compiler warning.)

Test Plan: existing tests

@Cyan4973
Copy link
Owner

Cyan4973 commented Aug 6, 2021

Unfortunately uintptr_t is not C90 compatible, which remains a goal of this code base.

Also FYI, note that, even in C99 environments, uintptr_t is an optional type, not guaranteed to be present, thus with portability limitations.

A potential workaround could be to work with bEnd-p instead, though it might come with (minor) performance impact, and ensuring p <= bEnd now becomes critical (I believe it's guaranteed by preceding code, in which case an assert() would be appropriate).

@pdillinger
Copy link
Contributor Author

I don't think anything becomes more "critical," because pointer difference is signed. bEnd - p is probably what you (we) need to be fully safe.

For speed, can't we whenever state->memsize + len < 32 fails,

  • Observe: one of the next to if's must be true, therefore
  • Go ahead and unconditionally copy v1->v4 to locals and unconditionally save before last if.
  • Jump straight into loop if state->memsize fails, skipping the bEnd - p < something in some cases

?

@Cyan4973
Copy link
Owner

Cyan4973 commented Aug 8, 2021

You are right, I forgot that ptr substraction is a signed operation.

Maybe the speed improvement could be the topic a separate PR ?

Summary:
* XXH*_update functions technically have UB on pointer arithmetic
beyond the bounds of the input object, which can trigger a compiler
warning in the case of statically sized input buffer. This can be
solved with cast to uintptr_t where available. (In theory if your
buffer is close to an extreme of your address space, it could still
over/underflow, but eh.)

* typedef XXH_endianess in implementation is unused, so removed.
(When it was in xxhash.c it would cause compiler warning.)

Test Plan: existing tests
@pdillinger
Copy link
Contributor Author

So my update seeks not to affect performance, but get us into defined behavior whenever uintptr_t is available. Well-defined over/underflow is still possible in exotic cases.

@Cyan4973
Copy link
Owner

Cyan4973 commented Aug 17, 2021

I'm a bit puzzled regarding this PR.
On the one hand, it works fine, that part is in no doubt. I'm also pretty convinced it won't negatively impact performance.

On the other hand, I can easily imagine any future maintainers coming to this part of the code and being puzzled by the complex casting operation.
It "feels" complex with regards to the original intention (avoid UB pointer comparison).
It also seems to "hide" the issue whenever possible, by moving the problem into unsigned arithmetic area when it's available (UINTPTR_MAX), while still relying on traditional ptr arithmetic and its potential UB when it's not. So it kind of half-fix the original problem.

@Cyan4973
Copy link
Owner

Cyan4973 commented Aug 17, 2021

I'm wondering if there is a way to reproduce this issue.
I tried using a statically defined buffer (on stack), but it failed to generate any warning.
@t-mat found one, using gcc-8 (specifically), but it doesn't trigger all the positions modified by this PR, just one of them.
Having a reproducible case would help, both to control proposed fixes and to improve the test suite.

Cyan4973 added a commit that referenced this pull request Aug 21, 2021
As suggested by @pdillinger in #549,
`XXH_endianess` is never used and should be removed.
@pdillinger
Copy link
Contributor Author

I tried reproducing the warning in various ways but was not successful in latest xxhash, so I'm dropping this PR. facebook/rocksdb#8691

@pdillinger pdillinger closed this Aug 26, 2021
facebook-github-bot pushed a commit to facebook/rocksdb that referenced this pull request Jan 19, 2023
Summary:
Upgrading xxhash.h to latest dev version as of 1/17/2023, which is d7197ddea81364a539051f116ca77926100fc77f This should improve performance on some ARM machines.

I allowed some of our RocksDB-specific changes to be made obsolete where it seemed appropriate, for example
* xxhash.h has its own fallthrough marker (which I hope works for us)
* As in Cyan4973/xxHash#549

Merging and resolving conflicts one way or the other was all that went into this diff. Except I had to mix the two sides around `defined(__loongarch64)`

How I did the upgrade (for future reference), so that I could use usual merge conflict resolution:
```
# New branch to help with merging
git checkout -b xxh_merge_base
# Check out RocksDB revision before last xxhash.h upgrade
git reset --hard 22161b7^
# Create a commit with the raw base version from xxHash repo (from xxHash repo)
git show 2c611a76f914828bed675f0f342d6c4199ffee1e:xxhash.h > ../rocksdb/util/xxhash.h
# In RocksDB repo
git commit -a
# Merge in the last xxhash.h upgrade
git merge 22161b7
# Resolve conflict using committed version
git show 22161b7:util/xxhash.h > util/xxhash.h
git commit -a
# Catch up to upstream
git merge upstream/main

# Create a different branch for applying raw upgrade
git checkout -b xxh_upgrade_2023
# Find the RocksDB commit we made for the raw base version from xxHash
git log main..HEAD
# Rewind to it
git reset --hard 2428b72
# Copy in latest raw version (from xxHash repo)
cat xxhash.h > ../rocksdb/util/xxhash.h
# Merge in RocksDB changes, use typical tools for conflict resolution
git merge xxh_merge_base
```

Branch https://github.com/facebook/rocksdb/tree/xxhash_merge_base can be used as a base for future xxhash merges.

Fixes #11073

Pull Request resolved: #11098

Test Plan:
existing tests (e.g. Bloom filter schema stability tests)

Also seems to include a small performance boost on my Intel dev machine, using `./db_bench --benchmarks=xxh3[-X50] 2>&1 | egrep -o 'operations;.*' | sort`

Fastest out of 50 runs, before: 15477.3 MB/s
Fastest out of 50 runs, after: 15850.7 MB/s, and 11 more runs faster than the "before" number

Slowest out of 50 runs, before: 12267.5 MB/s
Slowest out of 50 runs, after: 13897.1 MB/s

More repetitions show the distinction is repeatable

Reviewed By: hx235

Differential Revision: D42560010

Pulled By: pdillinger

fbshipit-source-id: c43ee52f1c5fe0ba3d6d6e4eebb22ded5f5492ea
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants