Skip to content

feat(tdigest): implement TDIGEST.TRIMMED_MEAN command#3312

Open
chakkk309 wants to merge 6 commits intoapache:unstablefrom
chakkk309:feat-implement-TDIGEST.TRIMMED_MEAN-command
Open

feat(tdigest): implement TDIGEST.TRIMMED_MEAN command#3312
chakkk309 wants to merge 6 commits intoapache:unstablefrom
chakkk309:feat-implement-TDIGEST.TRIMMED_MEAN-command

Conversation

@chakkk309
Copy link
Contributor

Fixes #3066

@git-hulk git-hulk requested a review from LindaSummer December 27, 2025 14:47
Copy link
Member

@LindaSummer LindaSummer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @chakkk309 ,

😊 It seems this commit couldn't pass the ci. Please help check the error message in github actions.

if (auto status = dumpCentroids(ctx, ns_key, metadata, &centroids); !status.ok()) {
return status;
}
auto dump_centroids = DummyCentroids(metadata, centroids);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @chakkk309 ,

It seems that this line has compile error in CI. Please make a check.

@PragmaTwice
Copy link
Member

PragmaTwice commented Dec 27, 2025

Hi, thank you for your contribution!

Before you start coding, could you please read our contribution guide (https://kvrocks.apache.org/community/contributing/)? It can be better if you build and test Kvrocks against your changes successfully in your local before pushing them.

Also note that we have guidelines for AI-assisted contributions: https://kvrocks.apache.org/community/contributing/#guidelines-for-ai-assisted-contributions

@chakkk309 chakkk309 requested a review from LindaSummer March 4, 2026 06:34
@LindaSummer LindaSummer requested review from Copilot and removed request for LindaSummer March 4, 2026 06:36
@LindaSummer
Copy link
Member

Hi @chakkk309 ,

Thanks very much for your effort. I will review later today.😊

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Implements the Redis-compatible TDIGEST.TRIMMED_MEAN command in Kvrocks’ TDigest module (Fixes #3066), exposing the functionality through the command layer and adding unit test coverage.

Changes:

  • Add core trimmed-mean computation helper (TDigestTrimmedMean) to the TDigest algorithm utilities.
  • Wire trimmed-mean into the Redis TDigest type (redis::TDigest::TrimmedMean) and register the tdigest.trimmed_mean command.
  • Add Go and C++ unit tests for TDIGEST.TRIMMED_MEAN.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
tests/gocase/unit/type/tdigest/tdigest_test.go Adds Go integration tests for TDIGEST.TRIMMED_MEAN including argument/quantile validation cases.
tests/cppunit/types/tdigest_test.cc Adds a C++ unit test for trimmed mean behavior on a basic dataset.
src/types/tdigest.h Introduces TDigestTrimmedMean helper to compute trimmed mean from centroids.
src/types/redis_tdigest.h Adds TDigestTrimmedMeanResult and the TDigest::TrimmedMean API.
src/types/redis_tdigest.cc Implements TDigest::TrimmedMean by dumping centroids and calling the helper.
src/commands/cmd_tdigest.cc Adds CommandTDigestTrimmedMean and registers tdigest.trimmed_mean.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +329 to +368
double low_boundary = std::numeric_limits<double>::quiet_NaN();
double high_boundary = std::numeric_limits<double>::quiet_NaN();

if (low_cut_quantile == 0.0) {
low_boundary = td.Min();
} else {
auto low_result = TDigestQuantile(td, low_cut_quantile);
if (!low_result) {
return low_result;
}
low_boundary = *low_result;
}

if (high_cut_quantile == 1.0) {
high_boundary = td.Max();
} else {
auto high_result = TDigestQuantile(td, high_cut_quantile);
if (!high_result) {
return high_result;
}
high_boundary = *high_result;
}

auto iter = td.Begin();
double total_weight_in_range = 0;
double weighted_sum = 0;

while (iter->Valid()) {
auto centroid = GET_OR_RET(iter->GetCentroid());

if ((low_cut_quantile == 0.0 && high_cut_quantile == 1.0) ||
(centroid.mean >= low_boundary && centroid.mean <= high_boundary)) {
total_weight_in_range += centroid.weight;
weighted_sum += centroid.mean * centroid.weight;
}

iter->Next();
}

if (total_weight_in_range == 0) {
Copy link

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TDigestTrimmedMean can incorrectly return NaN when the low/high cut boundaries fall between centroid means (e.g., after quantile interpolation). The current logic only includes whole centroids whose mean is within [low_boundary, high_boundary], so it may exclude all centroids even though the quantile range contains weight. Consider computing overlap by weight position (e.g., iterate centroids with cumulative weight and add partial centroid weight for the boundary centroids) rather than filtering by centroid.mean.

Suggested change
double low_boundary = std::numeric_limits<double>::quiet_NaN();
double high_boundary = std::numeric_limits<double>::quiet_NaN();
if (low_cut_quantile == 0.0) {
low_boundary = td.Min();
} else {
auto low_result = TDigestQuantile(td, low_cut_quantile);
if (!low_result) {
return low_result;
}
low_boundary = *low_result;
}
if (high_cut_quantile == 1.0) {
high_boundary = td.Max();
} else {
auto high_result = TDigestQuantile(td, high_cut_quantile);
if (!high_result) {
return high_result;
}
high_boundary = *high_result;
}
auto iter = td.Begin();
double total_weight_in_range = 0;
double weighted_sum = 0;
while (iter->Valid()) {
auto centroid = GET_OR_RET(iter->GetCentroid());
if ((low_cut_quantile == 0.0 && high_cut_quantile == 1.0) ||
(centroid.mean >= low_boundary && centroid.mean <= high_boundary)) {
total_weight_in_range += centroid.weight;
weighted_sum += centroid.mean * centroid.weight;
}
iter->Next();
}
if (total_weight_in_range == 0) {
// First, compute the total weight of the t-digest.
double total_weight = 0.0;
{
auto iter = td.Begin();
while (iter->Valid()) {
auto centroid = GET_OR_RET(iter->GetCentroid());
total_weight += centroid.weight;
iter->Next();
}
}
if (total_weight == 0.0) {
return std::numeric_limits<double>::quiet_NaN();
}
// If no trimming is requested, just return the global weighted mean.
if (low_cut_quantile == 0.0 && high_cut_quantile == 1.0) {
double weighted_sum = 0.0;
auto iter = td.Begin();
while (iter->Valid()) {
auto centroid = GET_OR_RET(iter->GetCentroid());
weighted_sum += centroid.mean * centroid.weight;
iter->Next();
}
return weighted_sum / total_weight;
}
// Compute rank boundaries in weight space.
const double low_rank = low_cut_quantile * total_weight;
const double high_rank = high_cut_quantile * total_weight;
double cumulative_weight = 0.0;
double total_weight_in_range = 0.0;
double weighted_sum = 0.0;
auto iter = td.Begin();
while (iter->Valid()) {
auto centroid = GET_OR_RET(iter->GetCentroid());
const double start_rank = cumulative_weight;
const double end_rank = cumulative_weight + centroid.weight;
// If this centroid is entirely before the trimmed region, skip it.
if (end_rank <= low_rank) {
cumulative_weight = end_rank;
iter->Next();
continue;
}
// If we've passed the trimmed region, we can stop.
if (start_rank >= high_rank) {
break;
}
// Compute overlap of this centroid's weight with [low_rank, high_rank).
double overlap_start = start_rank;
if (overlap_start < low_rank) {
overlap_start = low_rank;
}
double overlap_end = end_rank;
if (overlap_end > high_rank) {
overlap_end = high_rank;
}
const double overlap = overlap_end - overlap_start;
if (overlap > 0.0) {
total_weight_in_range += overlap;
weighted_sum += centroid.mean * overlap;
}
cumulative_weight = end_rank;
iter->Next();
}
if (total_weight_in_range == 0.0) {

Copilot uses AI. Check for mistakes.
}

rocksdb::Status TDigest::TrimmedMean(engine::Context& ctx, const Slice& digest_name, double low_cut_quantile,
double high_cut_quantile, TDigestTrimmedMeanResult* result) {
Copy link

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TDigest::TrimmedMean can leave TDigestTrimmedMeanResult populated with a stale value if the caller reuses the result object: on success you only assign result->mean when there are observations, and the empty-digest early return doesn’t reset it. Reset/clear result->mean at function entry (and before the early return) so the output is well-defined.

Suggested change
double high_cut_quantile, TDigestTrimmedMeanResult* result) {
double high_cut_quantile, TDigestTrimmedMeanResult* result) {
result->mean = 0;

Copilot uses AI. Check for mistakes.
Comment on lines +808 to +813
if meanStr == "nan" {
return
}
mean, err := strconv.ParseFloat(meanStr, 64)
require.NoError(t, err)
require.Greater(t, mean, 0.0)
Copy link

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test allows "nan" and returns early, which can mask real correctness issues (a non-empty digest with low_cut < high_cut should always have some weight in the trimmed range). It would be better to assert the result is not NaN for this dataset and verify it’s within an expected numeric range/value.

Suggested change
if meanStr == "nan" {
return
}
mean, err := strconv.ParseFloat(meanStr, 64)
require.NoError(t, err)
require.Greater(t, mean, 0.0)
mean, err := strconv.ParseFloat(meanStr, 64)
require.NoError(t, err)
require.False(t, math.IsNaN(mean))
require.Greater(t, mean, 4.0)
require.Less(t, mean, 7.0)

Copilot uses AI. Check for mistakes.
@sonarqubecloud
Copy link

sonarqubecloud bot commented Mar 4, 2026

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

TDigest: Implement TDIGEST.TRIMMED_MEAN command

4 participants