feat(tdigest): add the support of TDIGEST.REVRANK command by donghao526 · Pull Request #3130 · apache/kvrocks

donghao526 · 2025-08-19T15:55:41Z

ISSUE

It closes #3063.

Proposed Changes

Add TDIGEST.REVRANK command implementation
Add cpp unit tests

PragmaTwice · 2025-08-19T16:43:13Z

Thank you for your contribution. Could you add some golang test cases for it?

Refer to https://github.com/apache/kvrocks/blob/unstable/tests/gocase/unit/type/tdigest/tdigest_test.go.

donghao526 · 2025-08-19T23:38:56Z

@PragmaTwice ok，I will add some golang test

sonarqubecloud · 2025-08-20T02:52:21Z

Quality Gate failed

Failed conditions
39.7% Coverage on New Code (required ≥ 50%)

See analysis details on SonarQube Cloud

LindaSummer

Hi @donghao526 ,

Thanks very much for your contribution! 😊

Left some comments.

Best Regards,
Edward

src/types/redis_tdigest.cc

LindaSummer · 2025-08-20T07:46:38Z

src/types/redis_tdigest.cc

+  for (auto value : inputs) {
+    auto status_or_rank = TDigestRevRank(dump_centroids, value);
+    if (!status_or_rank) {
+      return rocksdb::Status::InvalidArgument(status_or_rank.Msg());
+    }
+    result->push_back(*status_or_rank);
+  }


Hi @donghao526 ,

We could sort the inputs and get the ranks with just one scan of the centroids since it's sorted.

Best Regards,
Edward

Hi @LindaSummer
I encountered a problem when I was testing. After the nodes merged, are there two adjacent centroids can be with the same mean?

I Test with

TDIGEST.CREATE s COMPRESSION 1000 TDIGEST.ADD s 10 10 10 10 20 20

I found the centroids after merged are:
(1) mean: 10 weight: 1
(2) mean: 10 weight: 1
(3) mean: 10 weight: 1
(4) mean: 10 weight: 1
(5) mean: 20 weight: 1
(6) mean: 20 weight: 1

Is this as expected or a bug?

Hi @donghao526 ,

It is expected, and you could refer to #2878 for more details.
So we need a stable way for both serialization and deserialization.

The trigger for the merge is the weight, not the mean. So we could treat the mean only as a label of one centroid. The whole logic is driven by weight.

Best Regards,
Edward

LindaSummer · 2025-08-20T07:52:08Z

src/types/tdigest.h

 }
+
+template <typename TD>
+inline StatusOr<int> TDigestRevRank(TD&& td, double value) {


Hi @donghao526 ,

We need to use a stable way to compare between doubles.

It will be tough to assume that the two double numbers are equal to or greater than.

After solving this, we should add some test cases for this corner case.

Best Regards,
Edward

Hi @donghao526 ,

Since the other code snippets use this way now. You could leave it with the current logic.

I will try to create a new PR to solve the unstable comparison problem in this file.

Best Regards,
Edward

OK, after your new PR, I can help to fix here.

…ks into feature/tdigest-revrank

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

…ks into feature/tdigest-revrank

LindaSummer

Hi @donghao526 ,

Generally LGTM.
Left two comments.

Thanks for your effort!❤️

Best Regards,
Edward

src/commands/cmd_tdigest.cc

src/types/tdigest.h

Copilot

Pull Request Overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 4 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2025-11-03T06:35:54Z

src/types/tdigest.h

+  std::map<double, size_t, DoubleComparator> value_to_indices;
+  for (size_t i = 0; i < inputs.size(); ++i) {
+    value_to_indices[inputs[i]] = i;
+  }


When duplicate input values exist, this map will only store the index of the last occurrence, causing incorrect results to be returned. The value_to_indices map should use a multimap or map to vector to store all indices for duplicate values, not just the last one.

The inputs have been deduplicated before calling this function, and there are no duplicate input values

src/types/tdigest.h

Copilot · 2025-11-03T06:35:55Z

src/commands/cmd_tdigest.cc

+    std::unordered_set<std::string> unique_inputs_set(args.begin() + 2, args.end());
+    origin_inputs_.assign(args.begin() + 2, args.end());
+
+    unique_inputs_.reserve(unique_inputs_set.size());
+    size_t i = 0;
+    for (const auto &input : unique_inputs_set) {
+      auto value = ParseFloat(input);
+      if (!value) {
+        return {Status::RedisParseErr, errValueIsNotFloat};
+      }
+      unique_inputs_.push_back(*value);
+      unique_inputs_order_[input] = i++;
+    }


The iteration order of std::unordered_set is not deterministic, which means the mapping in unique_inputs_order_ may not correspond correctly to the indices in unique_inputs_. This can lead to incorrect results when looking up values. Consider using the string representation as the key and ensuring consistent ordering, or iterate over unique_inputs_ after population to build the mapping.

The origin_inputs_ vector stores the correct order of the origin inputs. So the order of the results is deterministic.

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

…eading Co-authored-by: Edward Xu <xuxiangad@foxmail.com>

Co-authored-by: Edward Xu <xuxiangad@foxmail.com>

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

LindaSummer

LGTM, left one nitpit comment.

src/commands/cmd_tdigest.cc

Co-authored-by: Edward Xu <xuxiangad@foxmail.com>

sonarqubecloud · 2025-11-03T10:48:16Z

Quality Gate passed

Issues
12 New issues
0 Accepted issues

Measures
0 Security Hotspots
68.8% Coverage on New Code
1.4% Duplication on New Code

See analysis details on SonarQube Cloud

donghao526 added 7 commits August 17, 2025 22:50

feat: impl tdigest.revrank

03b69e1

feat: impl tdigest.revrank

97df4a1

feat: impl tdigest.revrank

70a39d3

feat: impl tdigest.revrank

dde8410

feat: impl tdigest.revrank

0d3e9cc

test: add unit test for tdigest.revrank

bb172a8

test: add unit test for tdigest.revrank

a64add4

PragmaTwice changed the title ~~feat(tdigest): add TDIGEST.Revrank command implementation #3063~~ feat(tdigest): add the support of TDIGEST.REVRANK command Aug 19, 2025

Merge branch 'unstable' into feature/tdigest-revrank

3954b1f

donghao526 closed this Aug 19, 2025

donghao526 reopened this Aug 19, 2025

donghao526 added 2 commits August 20, 2025 08:39

add golang test cases for tdigest.revrank

05d1202

add golang test cases for tdigest.revrank

f688e14

donghao526 and others added 4 commits August 20, 2025 10:56

add golang test cases for tdigest.revrank

8bcad0f

add golang test cases for tdigest.revrank

46ac984

add golang test cases for tdigest.revrank

495e072

add golang test cases for tdigest.revrank

2b6785d

PragmaTwice requested a review from LindaSummer August 20, 2025 06:52

LindaSummer reviewed Aug 20, 2025

View reviewed changes

donghao526 added 2 commits August 20, 2025 20:58

feat: impl tdigest.revrank

3af3b54

Merge branch 'feature/tmp' into feature/tdigest-revrank

f3d85d3

donghao526 marked this pull request as draft August 20, 2025 14:27

donghao526 and others added 5 commits August 21, 2025 15:11

Merge branch 'unstable' into feature/tdigest-revrank

e68689d

feat: impl tdigest.revrank

eb8674f

Merge branch 'feature/tmp' into feature/tdigest-revrank

c70f410

Merge branch 'unstable' into feature/tdigest-revrank

b991d0d

Merge branch 'feature/tdigest-revrank' of github.com:donghao526/kvroc…

4c9a41d

…ks into feature/tdigest-revrank

donghao526 and others added 9 commits October 30, 2025 09:31

fix: Corrected phrasing for proper naming convention

625510d

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

fix: Corrected phrasing for proper naming convention

cc01373

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

feat: add a guard to validate the revrank result

07b1a09

Merge branch 'feature/tdigest-revrank' of github.com:donghao526/kvroc…

b14dee6

…ks into feature/tdigest-revrank

Merge branch 'unstable' into feature/tdigest-revrank

d385597

feat: unique the inputs before call tdigest.RevRank

222dec1

Merge branch 'feature/tdigest-revrank' of github.com:donghao526/kvroc…

97426ac

…ks into feature/tdigest-revrank

fix: fix typo

baeafed

Merge branch 'unstable' into feature/tdigest-revrank

9e97b6e

donghao526 requested a review from LindaSummer November 2, 2025 04:35

LindaSummer reviewed Nov 3, 2025

View reviewed changes

src/commands/cmd_tdigest.cc Outdated Show resolved Hide resolved

src/types/tdigest.h Outdated Show resolved Hide resolved

LindaSummer requested review from PragmaTwice, Copilot and git-hulk November 3, 2025 06:33

Copilot AI reviewed Nov 3, 2025

View reviewed changes

donghao526 and others added 5 commits November 3, 2025 14:54

chore: update comments for clarity

8a8592a

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

style: use a new line for the increment to make the code better for r…

cc4f736

…eading Co-authored-by: Edward Xu <xuxiangad@foxmail.com>

fix: fix the range of the check for ranks

17d9d48

Co-authored-by: Edward Xu <xuxiangad@foxmail.com>

chore: resolve the grammatical error in the comments

83fae63

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Merge branch 'unstable' into feature/tdigest-revrank

54ad714

PragmaTwice requested a review from LindaSummer November 3, 2025 07:21

LindaSummer previously approved these changes Nov 3, 2025

View reviewed changes

src/commands/cmd_tdigest.cc Outdated Show resolved Hide resolved

refactor: replace the unnecessary std::unordered_set with a std::set

e3d85b4

Co-authored-by: Edward Xu <xuxiangad@foxmail.com>

donghao526 dismissed LindaSummer’s stale review via e3d85b4 November 3, 2025 08:29

PragmaTwice approved these changes Nov 3, 2025

View reviewed changes

LindaSummer enabled auto-merge (squash) November 3, 2025 08:56

LindaSummer approved these changes Nov 3, 2025

View reviewed changes

LindaSummer merged commit 35e72fc into apache:unstable Nov 3, 2025
67 of 69 checks passed

donghao526 deleted the feature/tdigest-revrank branch November 3, 2025 10:47

Conversation

donghao526 commented Aug 19, 2025 • edited by PragmaTwice Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

ISSUE

Proposed Changes

Uh oh!

PragmaTwice commented Aug 19, 2025

Uh oh!

donghao526 commented Aug 19, 2025

Uh oh!

sonarqubecloud bot commented Aug 20, 2025

Quality Gate failed

Uh oh!

LindaSummer left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

LindaSummer Aug 20, 2025

Choose a reason for hiding this comment

Uh oh!

donghao526 Aug 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

LindaSummer Aug 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

donghao526 Aug 28, 2025

Choose a reason for hiding this comment

Uh oh!

LindaSummer Aug 20, 2025

Choose a reason for hiding this comment

Uh oh!

LindaSummer Aug 20, 2025

Choose a reason for hiding this comment

Uh oh!

donghao526 Aug 28, 2025

Choose a reason for hiding this comment

Uh oh!

LindaSummer left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

Copilot AI Nov 3, 2025

Choose a reason for hiding this comment

Uh oh!

donghao526 Nov 3, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Copilot AI Nov 3, 2025

Choose a reason for hiding this comment

Uh oh!

donghao526 Nov 3, 2025

Choose a reason for hiding this comment

Uh oh!

LindaSummer left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

sonarqubecloud bot commented Nov 3, 2025

Quality Gate passed

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

10 participants

donghao526 commented Aug 19, 2025 •

edited by PragmaTwice

Loading

donghao526 Aug 25, 2025 •

edited

Loading

LindaSummer Aug 26, 2025 •

edited

Loading