Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(influxql): Add hyper log log operators #22322

Merged
merged 4 commits into from
Aug 30, 2021

Conversation

danxmoran
Copy link
Contributor

Closes #22317

Forward-ports #20603 and #21818

lesam and others added 4 commits August 27, 2021 10:13
@@ -75,7 +77,6 @@ Because of the version bump to `go`, the macOS build for this release requires a
1. [21910](https://github.com/influxdata/influxdb/pull/21910): Added `--ui-disabled` option to `influxd` to allow for running with the UI disabled.
1. [21958](https://github.com/influxdata/influxdb/pull/21958): Telemetry improvements: Do not record telemetry data for non-existant paths; replace invalid static asset paths with a slug.
1. [22023](https://github.com/influxdata/influxdb/pull/22023): Upgrade Flux to v0.124.0.
1. [22316](https://github.com/influxdata/influxdb/pull/22316): Optimize series iteration for queries that can be answered without inspecting TSM data.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Accidentally put this under 2.0.8

@@ -241,6 +242,10 @@ func (h *Plus) MarshalBinary() (data []byte, err error) {
return nil, nil
}

if h.sparse {
h.mergeSparse()
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NOTE: This extra mergeSparse changes the output of TSL->TSI compaction in some cases, because the measurement block in a TSI file contains HLL sketches for new & deleted measurements.

Before this change, marshalled TSI data could contain entries in both the HLL's tmp_set and in its sparse-list. After this change, the tmp_set is always merged into the sparse-list before data is marshalled.

I believe this is a backwards-compatible change because the unmarshalling logic for HLLs still reads both the tmp_set and sparse-list, so the new code should still be able to open old TSI files without any data loss. As a quick sanity-check I:

  1. Used TestIndex_DiskSizeBytes from tsi1/index_test.go to generate TSI files before & after this change
  2. Identified a before/after pair of TSI files that changed size as a result of the new HLL marshalling logic
  3. Ran both report-tsi and dump-tsi on the pair of files, and checked that the results were identical

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 thanks! This makes sense now. this was added because the sparseList has repeatable marshalling, but the tmp_set marshals in randomized order (it iterates a golang map). We maintain the sparseList as a compressed list of keys in sorted order. So a given HLL set used to have many possible binary representations but now it always has exactly one representation.

The tmp_set is really just a runtime optimization to batch updates into the sparseList.

@danxmoran danxmoran requested a review from lesam August 30, 2021 13:02
@danxmoran danxmoran merged commit 37088e8 into master Aug 30, 2021
@danxmoran danxmoran deleted the dm-forward-port-influxql-hll-22317 branch August 30, 2021 19:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[forward-port 2.x] Add HLL InfluxQL functions
2 participants