Skip to content

feat(tdigest): add the support of TDIGEST.REVRANK command#3130

Merged
LindaSummer merged 75 commits intoapache:unstablefrom
donghao526:feature/tdigest-revrank
Nov 3, 2025
Merged

feat(tdigest): add the support of TDIGEST.REVRANK command#3130
LindaSummer merged 75 commits intoapache:unstablefrom
donghao526:feature/tdigest-revrank

Conversation

@donghao526
Copy link
Contributor

@donghao526 donghao526 commented Aug 19, 2025

ISSUE

It closes #3063.

Proposed Changes

Add TDIGEST.REVRANK command implementation
Add cpp unit tests

@PragmaTwice PragmaTwice changed the title feat(tdigest): add TDIGEST.Revrank command implementation #3063 feat(tdigest): add the support of TDIGEST.REVRANK command Aug 19, 2025
@PragmaTwice
Copy link
Member

Thank you for your contribution. Could you add some golang test cases for it?

Refer to https://github.com/apache/kvrocks/blob/unstable/tests/gocase/unit/type/tdigest/tdigest_test.go.

@donghao526 donghao526 closed this Aug 19, 2025
@donghao526
Copy link
Contributor Author

@PragmaTwice ok,I will add some golang test

@donghao526 donghao526 reopened this Aug 19, 2025
@sonarqubecloud
Copy link

Quality Gate Failed Quality Gate failed

Failed conditions
39.7% Coverage on New Code (required ≥ 50%)

See analysis details on SonarQube Cloud

Copy link
Member

@LindaSummer LindaSummer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @donghao526 ,

Thanks very much for your contribution! 😊

Left some comments.

Best Regards,
Edward

Comment on lines +240 to +246
for (auto value : inputs) {
auto status_or_rank = TDigestRevRank(dump_centroids, value);
if (!status_or_rank) {
return rocksdb::Status::InvalidArgument(status_or_rank.Msg());
}
result->push_back(*status_or_rank);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @donghao526 ,

We could sort the inputs and get the ranks with just one scan of the centroids since it's sorted.

Best Regards,
Edward

Copy link
Contributor Author

@donghao526 donghao526 Aug 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @LindaSummer
I encountered a problem when I was testing. After the nodes merged, are there two adjacent centroids can be with the same mean?

I Test with

TDIGEST.CREATE s COMPRESSION 1000

TDIGEST.ADD s 10 10 10 10 20 20

I found the centroids after merged are:
(1) mean: 10 weight: 1
(2) mean: 10 weight: 1
(3) mean: 10 weight: 1
(4) mean: 10 weight: 1
(5) mean: 20 weight: 1
(6) mean: 20 weight: 1

Is this as expected or a bug?

Copy link
Member

@LindaSummer LindaSummer Aug 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @donghao526 ,

It is expected, and you could refer to #2878 for more details.
So we need a stable way for both serialization and deserialization.

The trigger for the merge is the weight, not the mean. So we could treat the mean only as a label of one centroid. The whole logic is driven by weight.

Best Regards,
Edward

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

got it

}

template <typename TD>
inline StatusOr<int> TDigestRevRank(TD&& td, double value) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @donghao526 ,

We need to use a stable way to compare between doubles.

It will be tough to assume that the two double numbers are equal to or greater than.

After solving this, we should add some test cases for this corner case.

Best Regards,
Edward

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @donghao526 ,

Since the other code snippets use this way now. You could leave it with the current logic.

I will try to create a new PR to solve the unstable comparison problem in this file.

Best Regards,
Edward

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, after your new PR, I can help to fix here.

@donghao526 donghao526 marked this pull request as draft August 20, 2025 14:27
Copy link
Member

@LindaSummer LindaSummer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @donghao526 ,

Generally LGTM.
Left two comments.

Thanks for your effort!❤️

Best Regards,
Edward

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 4 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +175 to +178
std::map<double, size_t, DoubleComparator> value_to_indices;
for (size_t i = 0; i < inputs.size(); ++i) {
value_to_indices[inputs[i]] = i;
}
Copy link

Copilot AI Nov 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When duplicate input values exist, this map will only store the index of the last occurrence, causing incorrect results to be returned. The value_to_indices map should use a multimap or map to vector to store all indices for duplicate values, not just the last one.

Copilot uses AI. Check for mistakes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The inputs have been deduplicated before calling this function, and there are no duplicate input values

Comment on lines +184 to +196
std::unordered_set<std::string> unique_inputs_set(args.begin() + 2, args.end());
origin_inputs_.assign(args.begin() + 2, args.end());

unique_inputs_.reserve(unique_inputs_set.size());
size_t i = 0;
for (const auto &input : unique_inputs_set) {
auto value = ParseFloat(input);
if (!value) {
return {Status::RedisParseErr, errValueIsNotFloat};
}
unique_inputs_.push_back(*value);
unique_inputs_order_[input] = i++;
}
Copy link

Copilot AI Nov 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The iteration order of std::unordered_set is not deterministic, which means the mapping in unique_inputs_order_ may not correspond correctly to the indices in unique_inputs_. This can lead to incorrect results when looking up values. Consider using the string representation as the key and ensuring consistent ordering, or iterate over unique_inputs_ after population to build the mapping.

Copilot uses AI. Check for mistakes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The origin_inputs_ vector stores the correct order of the origin inputs. So the order of the results is deterministic.

donghao526 and others added 5 commits November 3, 2025 14:54
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
…eading

Co-authored-by: Edward Xu <xuxiangad@foxmail.com>
Co-authored-by: Edward Xu <xuxiangad@foxmail.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
LindaSummer
LindaSummer previously approved these changes Nov 3, 2025
Copy link
Member

@LindaSummer LindaSummer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, left one nitpit comment.

Co-authored-by: Edward Xu <xuxiangad@foxmail.com>
@LindaSummer LindaSummer enabled auto-merge (squash) November 3, 2025 08:56
@LindaSummer LindaSummer merged commit 35e72fc into apache:unstable Nov 3, 2025
67 of 69 checks passed
@donghao526 donghao526 deleted the feature/tdigest-revrank branch November 3, 2025 10:47
@sonarqubecloud
Copy link

sonarqubecloud bot commented Nov 3, 2025

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

TDigest: Implement TDIGEST.REVRANK command

10 participants