-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add roaring bitmaps to TSI index files. #10122
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just some general nits/suggestions at the moment.
tsdb/index/tsi1/index_files.go
Outdated
@@ -185,6 +187,13 @@ func (p IndexFiles) CompactTo(w io.Writer, sfile *tsdb.SeriesFile, m, k uint64, | |||
return n, err | |||
} | |||
|
|||
// Ensure block is word aligned. | |||
if offset := (n) % 8; offset != 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are those parentheses necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The n
needed a hug. Removed in 8d66027.
tsdb/index/tsi1/tsi1.go
Outdated
@@ -545,8 +548,17 @@ func uvarint(data []byte) (value uint64, n int, err error) { | |||
return | |||
} | |||
|
|||
// memalign returns data if its memory address is word align. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't this be 4 bytes
for 32-bit architectures? We could put the alignment size behind a build flag.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think roaring
only uses popcnt
for amd64
so the alignment only needs to be for 64-bit. Switching for architectures would only save us a few bytes so I don't think it's worth making it GOARCH
dependent.
mm.seriesIDSet = tsdb.NewSeriesIDSet() | ||
} | ||
for _, seriesID := range seriesIDs { | ||
mm.seriesIDSet.AddNoLock(seriesID) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be faster to add these in bulk. I will write a benchmark and see.
tsdb/index/tsi1/partition.go
Outdated
fs, err := p.RetainFileSet() | ||
if err != nil { | ||
return nil // TODO(edd): this should probably return an error. | ||
return nil, err // TODO(edd): this should probably return an error. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you remove this comment 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed in 8d66027.
tsdb/index/tsi1/tag_block.go
Outdated
series struct { | ||
n uint64 // Series count | ||
data []byte // Raw series data | ||
} | ||
|
||
seriesIDSetData []byte |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe a little comment about this being used instead of series
above when bitmaps are in use?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a comment in 8d66027.
return ss, nil | ||
} | ||
|
||
// Otherwise decode series ids from uvarint encoding. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't we memoize this? Otherwise we're going to be building a series id set each time the block is read until the next TSI compaction kicks in and converts to the bitmap.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't seem like it's worth trying to optimize index files that are just going to get recompacted eventually. It also could add a lot to the heap if there are a lot of tag key values.
tsdb/series_set.go
Outdated
s.RLock() | ||
defer s.RUnlock() | ||
|
||
var a []uint64 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be sized to s.bitmap.GetCardinality()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having said that, I think this is only called from SeriesIDs()
on the tag value block, which is only called from within tests.
Given that, I think it would be better to return a []uint32
here and then have the caller to the slice allocation and conversion. The reason being that if we expose this API here it may end up getting consumed and it allocates a []uint64
each time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The rest of the SeriesIDSet
API uses []uint64
so it seems strange to me to return []uint32
.
tsdb/series_set.go
Outdated
// References to the underlying data are used so data should not be reused by caller. | ||
func (s *SeriesIDSet) UnmarshalBinaryUnsafe(data []byte) error { | ||
s.RLock() | ||
defer s.RUnlock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be a write lock?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zeebo I think we only need to protect access to bitmap, and not worry about mutations happening inside the roaring library.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. I asked because UnmarshalBinary above grabbed the write lock.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I see what you mean now. I guess this should be a lock, though it shouldn’t ever be called concurrently with another method on the type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in cb828f0. 👍
8d66027
to
cb828f0
Compare
@benbjohnson can we see if we can have a more graceful/helpful exit if the user downgrades? I started up a previous version after using this branch and hit:
Maybe something explaining they need to delete their indexes and rebuild them? |
I think we need to hold off merging this until we figure a couple of things out. TSI CompactionsI'm seeing around a 10% reduction in series insertion performance, which I think is down to the TSI compactions. Here are a couple of flames during a run where I'm inserting 30M series. Both profiles were taken when about 19M series had been inserted. Disk UsageI need to dig into this more but it looks like this PR is taking up more space on disk than 1.6.1. I would expect a compressed bitmap with 30M/8 series to take up less than writing out the binary encoding of the series ids. 1.6.1
PR
|
Actually, on the disk thing I was thinking about this the wrong way around.... So with 30M series in one tag (which is what I did in this case), means 30M separate bitmaps. I expect that that's going to take up more size than 8 bytes for a single series ID. |
tsdb/index/tsi1/index_files.go
Outdated
return n, err | ||
} | ||
} | ||
// if offset := n % 8; offset != 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we remove the commented code if we don't need it?
Problem
Planning time for high cardinality data sets can consume a majority of the total query time. We previously added support for building series id sets (which wrap roaring bitmaps) which improved set operations between indices, however, the planning time is now consumed by the time to build the sets at query time.
Solution
This pull request uses the mmap support added in influxdata/roaring#1 and embeds series id sets directly into the TSI index for measurements and tag values. This allows us to reference the data in the mmap file directly so that sets do not need to be built at query time.
This uses the
flag
field for measurement and tag value entries to support backwards compatibility with the older file format.TODO
Required for all non-trivial PRs