Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use hash index instead of bloom filter #87

Closed
wants to merge 1 commit into from

Conversation

ngaut
Copy link
Member

@ngaut ngaut commented Apr 12, 2019

Hash index is much better than bloom filter, much more efficiency.
I also add some comments to describe the SST format.

// Add key to bloom filter.
if len(key) > 0 {
y.Assert(len(key) > 0)
if b.enableHashIndex {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need enable by default and remove the option?

@coocood coocood closed this Apr 13, 2019
@bobotu
Copy link

bobotu commented Apr 13, 2019

I also tried this improvement a few months ago, but at the time I found that performance was degraded during the benchmark. Maybe due to hash index has a much higher false positive rate (hash collision) which slows down the lookup in the higher levels?

I'm considering about disabling bloom filter only for the last level as Monkey proposed?

@coocood
Copy link
Member

coocood commented Apr 13, 2019

@bobotu if some of the Get requests tries to find a non-exist key, enable bloom filter still have benefits.
The last level table block may not be cached in memory, use hash index may trigger an I/O operation.

@ngaut ngaut deleted the ngaut/use-hash-replace-bloom branch April 24, 2019 05:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants