feat(tdigest): implement TDIGEST.TRIMMED_MEAN command by chakkk309 · Pull Request #3312 · apache/kvrocks

chakkk309 · 2025-12-25T15:17:42Z

# Conflicts: # src/types/redis_tdigest.h # src/types/tdigest.h

LindaSummer

😊 It seems this commit couldn't pass the ci. Please help check the error message in github actions.

LindaSummer · 2025-12-27T16:07:10Z

src/types/redis_tdigest.cc

+  if (auto status = dumpCentroids(ctx, ns_key, metadata, &centroids); !status.ok()) {
+    return status;
+  }
+  auto dump_centroids = DummyCentroids(metadata, centroids);


Hi @chakkk309 ,

It seems that this line has compile error in CI. Please make a check.

PragmaTwice · 2025-12-27T16:37:04Z

Hi, thank you for your contribution!

Before you start coding, could you please read our contribution guide (https://kvrocks.apache.org/community/contributing/)? It can be better if you build and test Kvrocks against your changes successfully in your local before pushing them.

Also note that we have guidelines for AI-assisted contributions: https://kvrocks.apache.org/community/contributing/#guidelines-for-ai-assisted-contributions

LindaSummer · 2026-03-04T06:37:45Z

Hi @chakkk309 ,

Thanks very much for your effort. I will review later today.😊

Copilot

Pull request overview

Implements the Redis-compatible TDIGEST.TRIMMED_MEAN command in Kvrocks’ TDigest module (Fixes #3066), exposing the functionality through the command layer and adding unit test coverage.

Changes:

Add core trimmed-mean computation helper (TDigestTrimmedMean) to the TDigest algorithm utilities.
Wire trimmed-mean into the Redis TDigest type (redis::TDigest::TrimmedMean) and register the tdigest.trimmed_mean command.
Add Go and C++ unit tests for TDIGEST.TRIMMED_MEAN.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
tests/gocase/unit/type/tdigest/tdigest_test.go	Adds Go integration tests for `TDIGEST.TRIMMED_MEAN` including argument/quantile validation cases.
tests/cppunit/types/tdigest_test.cc	Adds a C++ unit test for trimmed mean behavior on a basic dataset.
src/types/tdigest.h	Introduces `TDigestTrimmedMean` helper to compute trimmed mean from centroids.
src/types/redis_tdigest.h	Adds `TDigestTrimmedMeanResult` and the `TDigest::TrimmedMean` API.
src/types/redis_tdigest.cc	Implements `TDigest::TrimmedMean` by dumping centroids and calling the helper.
src/commands/cmd_tdigest.cc	Adds `CommandTDigestTrimmedMean` and registers `tdigest.trimmed_mean`.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-04T06:41:25Z

src/types/tdigest.h

+  double low_boundary = std::numeric_limits<double>::quiet_NaN();
+  double high_boundary = std::numeric_limits<double>::quiet_NaN();
+
+  if (low_cut_quantile == 0.0) {
+    low_boundary = td.Min();
+  } else {
+    auto low_result = TDigestQuantile(td, low_cut_quantile);
+    if (!low_result) {
+      return low_result;
+    }
+    low_boundary = *low_result;
+  }
+
+  if (high_cut_quantile == 1.0) {
+    high_boundary = td.Max();
+  } else {
+    auto high_result = TDigestQuantile(td, high_cut_quantile);
+    if (!high_result) {
+      return high_result;
+    }
+    high_boundary = *high_result;
+  }
+
+  auto iter = td.Begin();
+  double total_weight_in_range = 0;
+  double weighted_sum = 0;
+
+  while (iter->Valid()) {
+    auto centroid = GET_OR_RET(iter->GetCentroid());
+
+    if ((low_cut_quantile == 0.0 && high_cut_quantile == 1.0) ||
+        (centroid.mean >= low_boundary && centroid.mean <= high_boundary)) {
+      total_weight_in_range += centroid.weight;
+      weighted_sum += centroid.mean * centroid.weight;
+    }
+
+    iter->Next();
+  }
+
+  if (total_weight_in_range == 0) {


TDigestTrimmedMean can incorrectly return NaN when the low/high cut boundaries fall between centroid means (e.g., after quantile interpolation). The current logic only includes whole centroids whose mean is within [low_boundary, high_boundary], so it may exclude all centroids even though the quantile range contains weight. Consider computing overlap by weight position (e.g., iterate centroids with cumulative weight and add partial centroid weight for the boundary centroids) rather than filtering by centroid.mean.

Suggested change

double low_boundary = std::numeric_limits<double>::quiet_NaN();

double high_boundary = std::numeric_limits<double>::quiet_NaN();

if (low_cut_quantile == 0.0) {

low_boundary = td.Min();

} else {

auto low_result = TDigestQuantile(td, low_cut_quantile);

if (!low_result) {

return low_result;

}

low_boundary = *low_result;

}

if (high_cut_quantile == 1.0) {

high_boundary = td.Max();

} else {

auto high_result = TDigestQuantile(td, high_cut_quantile);

if (!high_result) {

return high_result;

}

high_boundary = *high_result;

}

auto iter = td.Begin();

double total_weight_in_range = 0;

double weighted_sum = 0;

while (iter->Valid()) {

auto centroid = GET_OR_RET(iter->GetCentroid());

if ((low_cut_quantile == 0.0 && high_cut_quantile == 1.0) ||

(centroid.mean >= low_boundary && centroid.mean <= high_boundary)) {

total_weight_in_range += centroid.weight;

weighted_sum += centroid.mean * centroid.weight;

}

iter->Next();

}

if (total_weight_in_range == 0) {

// First, compute the total weight of the t-digest.

double total_weight = 0.0;

{

auto iter = td.Begin();

while (iter->Valid()) {

auto centroid = GET_OR_RET(iter->GetCentroid());

total_weight += centroid.weight;

iter->Next();

}

}

if (total_weight == 0.0) {

return std::numeric_limits<double>::quiet_NaN();

}

// If no trimming is requested, just return the global weighted mean.

if (low_cut_quantile == 0.0 && high_cut_quantile == 1.0) {

double weighted_sum = 0.0;

auto iter = td.Begin();

while (iter->Valid()) {

auto centroid = GET_OR_RET(iter->GetCentroid());

weighted_sum += centroid.mean * centroid.weight;

iter->Next();

}

return weighted_sum / total_weight;

}

// Compute rank boundaries in weight space.

const double low_rank = low_cut_quantile * total_weight;

const double high_rank = high_cut_quantile * total_weight;

double cumulative_weight = 0.0;

double total_weight_in_range = 0.0;

double weighted_sum = 0.0;

auto iter = td.Begin();

while (iter->Valid()) {

auto centroid = GET_OR_RET(iter->GetCentroid());

const double start_rank = cumulative_weight;

const double end_rank = cumulative_weight + centroid.weight;

// If this centroid is entirely before the trimmed region, skip it.

if (end_rank <= low_rank) {

cumulative_weight = end_rank;

iter->Next();

continue;

}

// If we've passed the trimmed region, we can stop.

if (start_rank >= high_rank) {

break;

}

// Compute overlap of this centroid's weight with [low_rank, high_rank).

double overlap_start = start_rank;

if (overlap_start < low_rank) {

overlap_start = low_rank;

}

double overlap_end = end_rank;

if (overlap_end > high_rank) {

overlap_end = high_rank;

}

const double overlap = overlap_end - overlap_start;

if (overlap > 0.0) {

total_weight_in_range += overlap;

weighted_sum += centroid.mean * overlap;

}

cumulative_weight = end_rank;

iter->Next();

}

if (total_weight_in_range == 0.0) {

Copilot · 2026-03-04T06:41:25Z

src/types/redis_tdigest.cc

 }

+rocksdb::Status TDigest::TrimmedMean(engine::Context& ctx, const Slice& digest_name, double low_cut_quantile,
+                                     double high_cut_quantile, TDigestTrimmedMeanResult* result) {


TDigest::TrimmedMean can leave TDigestTrimmedMeanResult populated with a stale value if the caller reuses the result object: on success you only assign result->mean when there are observations, and the empty-digest early return doesn’t reset it. Reset/clear result->mean at function entry (and before the early return) so the output is well-defined.

Suggested change

double high_cut_quantile, TDigestTrimmedMeanResult* result) {

double high_cut_quantile, TDigestTrimmedMeanResult* result) {

result->mean = 0;

Copilot · 2026-03-04T06:41:25Z

tests/gocase/unit/type/tdigest/tdigest_test.go

+		if meanStr == "nan" {
+			return
+		}
+		mean, err := strconv.ParseFloat(meanStr, 64)
+		require.NoError(t, err)
+		require.Greater(t, mean, 0.0)


This test allows "nan" and returns early, which can mask real correctness issues (a non-empty digest with low_cut < high_cut should always have some weight in the trimmed range). It would be better to assert the result is not NaN for this dataset and verify it’s within an expected numeric range/value.

Suggested change

if meanStr == "nan" {

return

}

mean, err := strconv.ParseFloat(meanStr, 64)

require.NoError(t, err)

require.Greater(t, mean, 0.0)

mean, err := strconv.ParseFloat(meanStr, 64)

require.NoError(t, err)

require.False(t, math.IsNaN(mean))

require.Greater(t, mean, 4.0)

require.Less(t, mean, 7.0)

sonarqubecloud · 2026-03-04T08:43:52Z

Quality Gate passed

Issues
10 New issues
0 Accepted issues

Measures
0 Security Hotspots
58.7% Coverage on New Code
2.0% Duplication on New Code

See analysis details on SonarQube Cloud

chakkk309 added 2 commits December 25, 2025 22:43

feat(tdigest): implement TDIGEST.TRIMMED_MEAN command

6a2198f

Merge branch 'unstable' into feat-implement-TDIGEST.TRIMMED_MEAN-command

7799524

# Conflicts: # src/types/redis_tdigest.h # src/types/tdigest.h

git-hulk requested a review from LindaSummer December 27, 2025 14:47

LindaSummer reviewed Dec 27, 2025

View reviewed changes

chakkk309 added 4 commits January 15, 2026 19:08

Merge branch 'unstable' into feat-implement-TDIGEST.TRIMMED_MEAN-command

4394482

tdigest: refine trimmed mean implementation and tests

178b1ee

Merge branch 'unstable' into feat-implement-TDIGEST.TRIMMED_MEAN-command

8b2bb67

Merge branch 'unstable' into feat-implement-TDIGEST.TRIMMED_MEAN-command

cf5451a

chakkk309 requested a review from LindaSummer March 4, 2026 06:34

LindaSummer requested review from Copilot and removed request for LindaSummer March 4, 2026 06:36

Copilot started reviewing on behalf of LindaSummer March 4, 2026 06:36 View session

Copilot AI reviewed Mar 4, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(tdigest): implement TDIGEST.TRIMMED_MEAN command#3312

feat(tdigest): implement TDIGEST.TRIMMED_MEAN command#3312
chakkk309 wants to merge 6 commits intoapache:unstablefrom
chakkk309:feat-implement-TDIGEST.TRIMMED_MEAN-command

chakkk309 commented Dec 25, 2025

Uh oh!

LindaSummer left a comment

Uh oh!

LindaSummer Dec 27, 2025

Uh oh!

PragmaTwice commented Dec 27, 2025 •

edited

Loading

Uh oh!

LindaSummer commented Mar 4, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 4, 2026

Uh oh!

Copilot AI Mar 4, 2026

Uh oh!

Copilot AI Mar 4, 2026

Uh oh!

sonarqubecloud bot commented Mar 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

	double high_cut_quantile, TDigestTrimmedMeanResult* result) {
	double high_cut_quantile, TDigestTrimmedMeanResult* result) {
	result->mean = 0;

Conversation

chakkk309 commented Dec 25, 2025

Uh oh!

LindaSummer left a comment

Choose a reason for hiding this comment

Uh oh!

LindaSummer Dec 27, 2025

Choose a reason for hiding this comment

Uh oh!

PragmaTwice commented Dec 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

LindaSummer commented Mar 4, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

sonarqubecloud bot commented Mar 4, 2026

Quality Gate passed

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

PragmaTwice commented Dec 27, 2025 •

edited

Loading