feat(tdigest): implement TDIGEST.TRIMMED_MEAN command#3312
feat(tdigest): implement TDIGEST.TRIMMED_MEAN command#3312chakkk309 wants to merge 6 commits intoapache:unstablefrom
Conversation
# Conflicts: # src/types/redis_tdigest.h # src/types/tdigest.h
LindaSummer
left a comment
There was a problem hiding this comment.
Hi @chakkk309 ,
😊 It seems this commit couldn't pass the ci. Please help check the error message in github actions.
src/types/redis_tdigest.cc
Outdated
| if (auto status = dumpCentroids(ctx, ns_key, metadata, ¢roids); !status.ok()) { | ||
| return status; | ||
| } | ||
| auto dump_centroids = DummyCentroids(metadata, centroids); |
There was a problem hiding this comment.
Hi @chakkk309 ,
It seems that this line has compile error in CI. Please make a check.
|
Hi, thank you for your contribution! Before you start coding, could you please read our contribution guide (https://kvrocks.apache.org/community/contributing/)? It can be better if you build and test Kvrocks against your changes successfully in your local before pushing them. Also note that we have guidelines for AI-assisted contributions: https://kvrocks.apache.org/community/contributing/#guidelines-for-ai-assisted-contributions |
|
Hi @chakkk309 , Thanks very much for your effort. I will review later today.😊 |
There was a problem hiding this comment.
Pull request overview
Implements the Redis-compatible TDIGEST.TRIMMED_MEAN command in Kvrocks’ TDigest module (Fixes #3066), exposing the functionality through the command layer and adding unit test coverage.
Changes:
- Add core trimmed-mean computation helper (
TDigestTrimmedMean) to the TDigest algorithm utilities. - Wire trimmed-mean into the Redis TDigest type (
redis::TDigest::TrimmedMean) and register thetdigest.trimmed_meancommand. - Add Go and C++ unit tests for
TDIGEST.TRIMMED_MEAN.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/gocase/unit/type/tdigest/tdigest_test.go | Adds Go integration tests for TDIGEST.TRIMMED_MEAN including argument/quantile validation cases. |
| tests/cppunit/types/tdigest_test.cc | Adds a C++ unit test for trimmed mean behavior on a basic dataset. |
| src/types/tdigest.h | Introduces TDigestTrimmedMean helper to compute trimmed mean from centroids. |
| src/types/redis_tdigest.h | Adds TDigestTrimmedMeanResult and the TDigest::TrimmedMean API. |
| src/types/redis_tdigest.cc | Implements TDigest::TrimmedMean by dumping centroids and calling the helper. |
| src/commands/cmd_tdigest.cc | Adds CommandTDigestTrimmedMean and registers tdigest.trimmed_mean. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| double low_boundary = std::numeric_limits<double>::quiet_NaN(); | ||
| double high_boundary = std::numeric_limits<double>::quiet_NaN(); | ||
|
|
||
| if (low_cut_quantile == 0.0) { | ||
| low_boundary = td.Min(); | ||
| } else { | ||
| auto low_result = TDigestQuantile(td, low_cut_quantile); | ||
| if (!low_result) { | ||
| return low_result; | ||
| } | ||
| low_boundary = *low_result; | ||
| } | ||
|
|
||
| if (high_cut_quantile == 1.0) { | ||
| high_boundary = td.Max(); | ||
| } else { | ||
| auto high_result = TDigestQuantile(td, high_cut_quantile); | ||
| if (!high_result) { | ||
| return high_result; | ||
| } | ||
| high_boundary = *high_result; | ||
| } | ||
|
|
||
| auto iter = td.Begin(); | ||
| double total_weight_in_range = 0; | ||
| double weighted_sum = 0; | ||
|
|
||
| while (iter->Valid()) { | ||
| auto centroid = GET_OR_RET(iter->GetCentroid()); | ||
|
|
||
| if ((low_cut_quantile == 0.0 && high_cut_quantile == 1.0) || | ||
| (centroid.mean >= low_boundary && centroid.mean <= high_boundary)) { | ||
| total_weight_in_range += centroid.weight; | ||
| weighted_sum += centroid.mean * centroid.weight; | ||
| } | ||
|
|
||
| iter->Next(); | ||
| } | ||
|
|
||
| if (total_weight_in_range == 0) { |
There was a problem hiding this comment.
TDigestTrimmedMean can incorrectly return NaN when the low/high cut boundaries fall between centroid means (e.g., after quantile interpolation). The current logic only includes whole centroids whose mean is within [low_boundary, high_boundary], so it may exclude all centroids even though the quantile range contains weight. Consider computing overlap by weight position (e.g., iterate centroids with cumulative weight and add partial centroid weight for the boundary centroids) rather than filtering by centroid.mean.
| double low_boundary = std::numeric_limits<double>::quiet_NaN(); | |
| double high_boundary = std::numeric_limits<double>::quiet_NaN(); | |
| if (low_cut_quantile == 0.0) { | |
| low_boundary = td.Min(); | |
| } else { | |
| auto low_result = TDigestQuantile(td, low_cut_quantile); | |
| if (!low_result) { | |
| return low_result; | |
| } | |
| low_boundary = *low_result; | |
| } | |
| if (high_cut_quantile == 1.0) { | |
| high_boundary = td.Max(); | |
| } else { | |
| auto high_result = TDigestQuantile(td, high_cut_quantile); | |
| if (!high_result) { | |
| return high_result; | |
| } | |
| high_boundary = *high_result; | |
| } | |
| auto iter = td.Begin(); | |
| double total_weight_in_range = 0; | |
| double weighted_sum = 0; | |
| while (iter->Valid()) { | |
| auto centroid = GET_OR_RET(iter->GetCentroid()); | |
| if ((low_cut_quantile == 0.0 && high_cut_quantile == 1.0) || | |
| (centroid.mean >= low_boundary && centroid.mean <= high_boundary)) { | |
| total_weight_in_range += centroid.weight; | |
| weighted_sum += centroid.mean * centroid.weight; | |
| } | |
| iter->Next(); | |
| } | |
| if (total_weight_in_range == 0) { | |
| // First, compute the total weight of the t-digest. | |
| double total_weight = 0.0; | |
| { | |
| auto iter = td.Begin(); | |
| while (iter->Valid()) { | |
| auto centroid = GET_OR_RET(iter->GetCentroid()); | |
| total_weight += centroid.weight; | |
| iter->Next(); | |
| } | |
| } | |
| if (total_weight == 0.0) { | |
| return std::numeric_limits<double>::quiet_NaN(); | |
| } | |
| // If no trimming is requested, just return the global weighted mean. | |
| if (low_cut_quantile == 0.0 && high_cut_quantile == 1.0) { | |
| double weighted_sum = 0.0; | |
| auto iter = td.Begin(); | |
| while (iter->Valid()) { | |
| auto centroid = GET_OR_RET(iter->GetCentroid()); | |
| weighted_sum += centroid.mean * centroid.weight; | |
| iter->Next(); | |
| } | |
| return weighted_sum / total_weight; | |
| } | |
| // Compute rank boundaries in weight space. | |
| const double low_rank = low_cut_quantile * total_weight; | |
| const double high_rank = high_cut_quantile * total_weight; | |
| double cumulative_weight = 0.0; | |
| double total_weight_in_range = 0.0; | |
| double weighted_sum = 0.0; | |
| auto iter = td.Begin(); | |
| while (iter->Valid()) { | |
| auto centroid = GET_OR_RET(iter->GetCentroid()); | |
| const double start_rank = cumulative_weight; | |
| const double end_rank = cumulative_weight + centroid.weight; | |
| // If this centroid is entirely before the trimmed region, skip it. | |
| if (end_rank <= low_rank) { | |
| cumulative_weight = end_rank; | |
| iter->Next(); | |
| continue; | |
| } | |
| // If we've passed the trimmed region, we can stop. | |
| if (start_rank >= high_rank) { | |
| break; | |
| } | |
| // Compute overlap of this centroid's weight with [low_rank, high_rank). | |
| double overlap_start = start_rank; | |
| if (overlap_start < low_rank) { | |
| overlap_start = low_rank; | |
| } | |
| double overlap_end = end_rank; | |
| if (overlap_end > high_rank) { | |
| overlap_end = high_rank; | |
| } | |
| const double overlap = overlap_end - overlap_start; | |
| if (overlap > 0.0) { | |
| total_weight_in_range += overlap; | |
| weighted_sum += centroid.mean * overlap; | |
| } | |
| cumulative_weight = end_rank; | |
| iter->Next(); | |
| } | |
| if (total_weight_in_range == 0.0) { |
| } | ||
|
|
||
| rocksdb::Status TDigest::TrimmedMean(engine::Context& ctx, const Slice& digest_name, double low_cut_quantile, | ||
| double high_cut_quantile, TDigestTrimmedMeanResult* result) { |
There was a problem hiding this comment.
TDigest::TrimmedMean can leave TDigestTrimmedMeanResult populated with a stale value if the caller reuses the result object: on success you only assign result->mean when there are observations, and the empty-digest early return doesn’t reset it. Reset/clear result->mean at function entry (and before the early return) so the output is well-defined.
| double high_cut_quantile, TDigestTrimmedMeanResult* result) { | |
| double high_cut_quantile, TDigestTrimmedMeanResult* result) { | |
| result->mean = 0; |
| if meanStr == "nan" { | ||
| return | ||
| } | ||
| mean, err := strconv.ParseFloat(meanStr, 64) | ||
| require.NoError(t, err) | ||
| require.Greater(t, mean, 0.0) |
There was a problem hiding this comment.
This test allows "nan" and returns early, which can mask real correctness issues (a non-empty digest with low_cut < high_cut should always have some weight in the trimmed range). It would be better to assert the result is not NaN for this dataset and verify it’s within an expected numeric range/value.
| if meanStr == "nan" { | |
| return | |
| } | |
| mean, err := strconv.ParseFloat(meanStr, 64) | |
| require.NoError(t, err) | |
| require.Greater(t, mean, 0.0) | |
| mean, err := strconv.ParseFloat(meanStr, 64) | |
| require.NoError(t, err) | |
| require.False(t, math.IsNaN(mean)) | |
| require.Greater(t, mean, 4.0) | |
| require.Less(t, mean, 7.0) |
|



Fixes #3066