-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(influxql): Add hyper log log operators #22322
Conversation
In addition to helping with normal queries, this can improve the 'SHOW CARDINALITY' meta-queries.
@@ -75,7 +77,6 @@ Because of the version bump to `go`, the macOS build for this release requires a | |||
1. [21910](https://github.com/influxdata/influxdb/pull/21910): Added `--ui-disabled` option to `influxd` to allow for running with the UI disabled. | |||
1. [21958](https://github.com/influxdata/influxdb/pull/21958): Telemetry improvements: Do not record telemetry data for non-existant paths; replace invalid static asset paths with a slug. | |||
1. [22023](https://github.com/influxdata/influxdb/pull/22023): Upgrade Flux to v0.124.0. | |||
1. [22316](https://github.com/influxdata/influxdb/pull/22316): Optimize series iteration for queries that can be answered without inspecting TSM data. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Accidentally put this under 2.0.8
@@ -241,6 +242,10 @@ func (h *Plus) MarshalBinary() (data []byte, err error) { | |||
return nil, nil | |||
} | |||
|
|||
if h.sparse { | |||
h.mergeSparse() | |||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NOTE: This extra mergeSparse
changes the output of TSL->TSI compaction in some cases, because the measurement block in a TSI file contains HLL sketches for new & deleted measurements.
Before this change, marshalled TSI data could contain entries in both the HLL's tmp_set
and in its sparse-list. After this change, the tmp_set
is always merged into the sparse-list before data is marshalled.
I believe this is a backwards-compatible change because the unmarshalling logic for HLLs still reads both the tmp_set
and sparse-list, so the new code should still be able to open old TSI files without any data loss. As a quick sanity-check I:
- Used
TestIndex_DiskSizeBytes
fromtsi1/index_test.go
to generate TSI files before & after this change - Identified a before/after pair of TSI files that changed size as a result of the new HLL marshalling logic
- Ran both
report-tsi
anddump-tsi
on the pair of files, and checked that the results were identical
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 thanks! This makes sense now. this was added because the sparseList
has repeatable marshalling, but the tmp_set
marshals in randomized order (it iterates a golang map). We maintain the sparseList
as a compressed list of keys in sorted order. So a given HLL set used to have many possible binary representations but now it always has exactly one representation.
The tmp_set
is really just a runtime optimization to batch updates into the sparseList
.
Closes #22317
Forward-ports #20603 and #21818