[TENT] fix: Reduce unnecessary bandwidth consumption in TCP recvData operations by 00fish0 · Pull Request #1513 · kvcache-ai/Mooncake

00fish0 · 2026-02-06T19:12:11Z

Description

Optimizes TCP transport in TENT backend by reducing unnecessary bandwidth consumption in recvData operations.

Reduced recvData RPC request size from 16 bytes + N to 16 bytes only, eliminating redundant pre-allocated buffer in requests. This cuts request-response bandwidth by ~50% for READ operations.
Fixed --xport_type parameter handling in TENT backend. Now properly disables unwanted transports and enables only the specified one (rdma/tcp/shm/iouring/gds/mnnvl).
Added missing getBaseAddr() stub for TENT backend (returns nullptr as TENT doesn't support CXL). This is required to maintain compatibility with the updated interface introduced in PR [Store][Feature] Add CXL storage for mooncake_store #1365 .

Type of Change

Types
- Bug fix
- New feature
  - Transfer Engine
  - Mooncake Store
  - Mooncake EP
  - Integration
  - P2P Store
  - Python Wheel
- Breaking change
- CI/CD
- Documentation update
- Other

How Has This Been Tested?

# Target
./tebench --xport_type=tcp --seg_type=DRAM --metadata_type=p2p

# Initiator
./tebench --target_seg_name=<addr> --xport_type=tcp --op_type=read

Before:

BlkSize (B)   Batch   BW (GB/S)     Avg Lat (us)  Avg Tx (us)   P99 Tx (us)   P999 Tx (us)  
----------------------------------------------------------------------------------------------------------------------------------------------------------------
I0207 02:50:11.476197 420608 segment_manager.cpp:44] Opened segment #1: 192.168.200.17:15822
4096          1       0.029985      136.6         135.1         188.0         343.4         
8192          1       0.059502      137.7         136.2         172.0         294.0         
16384         1       0.106295      154.1         152.5         180.0         242.6         
32768         1       0.185946      176.2         174.6         201.0         302.2         
65536         1       0.253678      258.3         256.7         325.0         503.3         
131072        1       0.392522      333.9         332.5         464.3         615.1         
262144        1       0.403777      649.2         647.5         734.0         997.0         
524288        1       0.485275      1080.4        1078.7        1199.7        2575.4        
1048576       1       0.446561      2348.1        2346.4        3317.7        45421.1       
2097152       1       0.522427      4014.2        4012.4        4274.8        46224.0       
4194304       1       0.565539      7416.5        7414.0        8616.6        25377.0       
8388608       1       0.538270      15584.4       15580.3       20437.4       57343.8       
16777216      1       0.530047      31652.3       31647.1       73797.7       74095.8       
33554432      1       0.216957      154659.5      154651.3      168331.8      171388.4      
67108864      1       0.229261      292718.4      292709.7      310151.2      311053.7

After:

BlkSize (B)   Batch   BW (GB/S)     Avg Lat (us)  Avg Tx (us)   P99 Tx (us)   P999 Tx (us)  
----------------------------------------------------------------------------------------------------------------------------------------------------------------
I0207 02:24:33.354920 407885 segment_manager.cpp:44] Opened segment #1: 192.168.200.17:15314
4096          1       0.030621      133.8         132.2         149.0         238.2         
8192          1       0.058706      139.5         137.9         160.0         216.3         
16384         1       0.114198      143.5         141.9         161.0         314.2         
32768         1       0.186659      175.5         173.8         228.0         299.9         
65536         1       0.327960      199.8         198.1         237.0         376.8         
131072        1       0.490405      267.3         265.6         302.0         385.5         
262144        1       0.634434      413.2         411.5         474.0         583.5         
524288        1       0.878559      596.8         595.0         690.0         847.6         
1048576       1       1.064387      985.1         983.2         1313.2        2287.4        
2097152       1       1.041997      2012.6        2010.3        2544.5        3992.5        
4194304       1       1.236025      3393.4        3389.9        4625.9        8471.5        
8388608       1       1.176762      7128.5        7124.1        9491.0        12068.1       
16777216      1       1.301293      12892.7       12886.6       15687.8       18889.1       
33554432      1       0.251798      133259.1      133250.3      136573.3      136767.4      
67108864      1       0.292828      229174.7      229166.4      249890.1      249926.0

Checklist

I have performed a self-review of my own code.
I have formatted my own code using ./scripts/code_format.sh before submitting.
I have updated the documentation.
I have added tests to prove my changes are effective.

gemini-code-assist · 2026-02-06T19:12:27Z

Summary of Changes

Hello @00fish0, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request primarily focuses on optimizing the TCP transport within the TENT backend to reduce unnecessary bandwidth consumption during recvData operations. This is achieved by streamlining the RPC request size. Additionally, it includes a crucial fix for the --xport_type parameter handling, ensuring correct transport configuration, and adds a necessary stub for CXL-related functionality.

Highlights

Bandwidth Optimization: Reduced the recvData RPC request size from 16 bytes + N to 16 bytes by eliminating redundant pre-allocated buffers, significantly cutting request-response bandwidth for READ operations.
Transport Configuration Fix: Corrected the handling of the --xport_type parameter in the TENT backend to properly disable all transports by default and enable only the specified one (rdma/tcp/shm/iouring/gds/mnnvl).
CXL Support Stub: Introduced a getBaseAddr() stub for the TENT backend, which returns nullptr as TENT does not support CXL.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog

mooncake-transfer-engine/benchmark/tent_backend.cpp
- Implemented logic to dynamically enable only the transport specified by --xport_type (e.g., rdma, tcp, shm) while disabling all others.
- Corrected a minor formatting issue in the getCudaDeviceNumaID function.
mooncake-transfer-engine/src/transfer_engine.cpp
- Added a getBaseAddr() method to the TransferEngine class, which returns nullptr when use_tent_ is true, indicating no CXL base address support for the TENT backend.
mooncake-transfer-engine/tent/src/runtime/control_plane.cpp
- In ControlClient::sendData, added a check for non-empty RPC responses to return RpcServiceError.
- In ControlClient::recvData, optimized the RPC request by removing the length component from its size, reducing it to sizeof(XferDataDesc). Also added a check to ensure the response size matches the expected length, returning an error if not.
- In ControlService::onSendData, added an error response if the target address is not found in a registered buffer.
- In ControlService::onRecvData, moved the response.resize(length) call inside the buffer finding conditional, ensuring memory is allocated only when the buffer is valid.

Activity

No specific activity (comments, reviews, or progress updates) has been recorded for this pull request yet.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

Copilot

Pull request overview

This PR optimizes the TENT backend’s control-plane data path by shrinking recvData RPC requests (removing redundant preallocated payload) and tightening error signaling, along with benchmark transport-selection fixes and a TENT stub for getBaseAddr().

Changes:

Reduce recvData RPC request size to only the descriptor header and adjust server/client behavior accordingly.
Add error propagation for sendData/recvData when the target address is not in a registered buffer.
Update TENT benchmark config to honor --xport_type by disabling/enabling transports; add TransferEngine::getBaseAddr() stub for TENT.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File	Description
mooncake-transfer-engine/tent/src/runtime/control_plane.cpp	Shrinks `recvData` request payload; adds/adjusts RPC error handling for Send/Recv.
mooncake-transfer-engine/src/transfer_engine.cpp	Adds `getBaseAddr()` behavior for TENT mode (returns `nullptr`).
mooncake-transfer-engine/benchmark/tent_backend.cpp	Changes `--xport_type` handling to disable all transports and enable only the chosen one; minor loop formatting tweak.

Comments suppressed due to low confidence (1)

mooncake-transfer-engine/tent/src/runtime/control_plane.cpp:257

onRecvData() leaves response empty when the address range isn't registered, so the client can only infer failure via a size mismatch. For consistency with onSendData() and better diagnosability, set an explicit error message in the else path (and have the client surface it when present).

    if (local_desc->findBuffer(peer_mem_addr, length)) {
        response.resize(length);
        Platform::getLoader().copy(response.data(), (void*)peer_mem_addr,
                                   length);
    }
}

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-02-06T19:15:37Z

mooncake-transfer-engine/tent/src/runtime/control_plane.cpp

+    if (response.size() != length)
+        return Status::RpcServiceError(
+            "RecvData failed: target address not in registered buffer");


recvData() treats any response size mismatch as "target address not in registered buffer" and discards the actual server response. This can misreport other failure modes (e.g., server-side error string, partial response) and makes debugging harder. Consider returning RpcServiceError(response) when !response.empty() and otherwise include expected vs actual sizes in the error message.

Suggested change

if (response.size() != length)

return Status::RpcServiceError(

"RecvData failed: target address not in registered buffer");

if (response.size() != length) {

if (!response.empty()) {

// Server likely returned an explicit error message instead of raw data.

return Status::RpcServiceError(response);

}

return Status::RpcServiceError(

"RecvData failed: response size mismatch (expected " +

std::to_string(length) + " bytes, got " +

std::to_string(response.size()) + " bytes)");

}

Copilot · 2026-02-06T19:15:38Z

mooncake-transfer-engine/benchmark/tent_backend.cpp

+        // Enable only the specified transport
+        if (XferBenchConfig::xport_type == "rdma") {
+            config->set("transports/rdma/enable", true);
+        } else if (XferBenchConfig::xport_type == "tcp") {
+            config->set("transports/tcp/enable", true);
+        } else if (XferBenchConfig::xport_type == "shm") {
+            config->set("transports/shm/enable", true);
+        } else if (XferBenchConfig::xport_type == "iouring") {
+            config->set("transports/io_uring/enable", true);
+        } else if (XferBenchConfig::xport_type == "gds") {
+            config->set("transports/gds/enable", true);
+        } else if (XferBenchConfig::xport_type == "mnnvl") {
+            config->set("transports/mnnvl/enable", true);
+        }


If XferBenchConfig::xport_type is set to an unexpected value, this code disables all transports and enables none, which will make the benchmark fail in a non-obvious way. Add an else branch that logs/returns an error (or falls back to defaults) when the value is not one of the supported strings.

Copilot · 2026-02-06T19:15:38Z

mooncake-transfer-engine/benchmark/tent_backend.cpp

+        // Disable all transports by default
+        config->set("transports/rdma/enable", false);
+        config->set("transports/tcp/enable", false);
+        config->set("transports/shm/enable", false);
+        config->set("transports/io_uring/enable", false);
+        config->set("transports/gds/enable", false);
+        config->set("transports/mnnvl/enable", false);


When xport_type is specified, this block disables only a subset of transports. Others that are enabled-by-default in the TENT transport loader (e.g. transports/nvlink/enable and transports/ascend_direct/enable) can still be loaded, so it may not actually "enable only the specified transport". Consider explicitly disabling all known transport enable keys before turning on the selected one.

gemini-code-assist

Code Review

This pull request introduces an important optimization to the recvData operation by reducing RPC request size, which significantly cuts down bandwidth consumption for READ operations. It also includes a fix for the --xport_type parameter handling and adds a getBaseAddr() stub for the TENT backend. However, critical vulnerabilities have been identified in the RPC handlers onSendData and onRecvData. These functions lack sufficient validation of user-supplied memory addresses and lengths, potentially leading to arbitrary memory read/write and Denial of Service (DoS) vulnerabilities due to integer overflows in memory range checks and missing buffer size validations. Robust input validation and overflow checks are highly recommended before merging these changes.

mooncake-transfer-engine/tent/src/runtime/control_plane.cpp

gemini-code-assist · 2026-02-06T19:15:50Z

mooncake-transfer-engine/benchmark/tent_backend.cpp

+        if (XferBenchConfig::xport_type == "rdma") {
+            config->set("transports/rdma/enable", true);
+        } else if (XferBenchConfig::xport_type == "tcp") {
+            config->set("transports/tcp/enable", true);
+        } else if (XferBenchConfig::xport_type == "shm") {
+            config->set("transports/shm/enable", true);
+        } else if (XferBenchConfig::xport_type == "iouring") {
+            config->set("transports/io_uring/enable", true);
+        } else if (XferBenchConfig::xport_type == "gds") {
+            config->set("transports/gds/enable", true);
+        } else if (XferBenchConfig::xport_type == "mnnvl") {
+            config->set("transports/mnnvl/enable", true);


The series of if-else if statements for enabling transport types can become cumbersome as more transport types are added. Consider refactoring this into a std::map<std::string, bool> where keys are transport names and values indicate enablement, or using a switch statement if XferBenchConfig::xport_type can be mapped to an enum. This would improve maintainability and scalability of the configuration logic.

std::unordered_map<std::string, bool> transport_enable_map; transport_enable_map["rdma"] = false; transport_enable_map["tcp"] = false; transport_enable_map["shm"] = false; transport_enable_map["iouring"] = false; transport_enable_map["gds"] = false; transport_enable_map["mnnvl"] = false; if (transport_enable_map.count(XferBenchConfig::xport_type)) { transport_enable_map[XferBenchConfig::xport_type] = true; } for (const auto& entry : transport_enable_map) { config->set("transports/" + entry.first + "/enable", entry.second); }

Following this comment is better if extending transports.

- Add request size validation in onSendData to prevent buffer over-read - Add length upper bound check in onRecvData to prevent DoS via memory exhaustion - Add integer overflow checks in findBuffer for memory range validation Fixes security vulnerabilities identified by Gemini Code Assist review

codecov-commenter · 2026-02-06T19:44:56Z

⚠️ Please install the to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

alogfans

Others LGTM.

…ility

fatSheep added 3 commits February 5, 2026 02:57

[TENT] fix: TCP transport recvData bandwidth waste

ab7581e

chore: tebench load xfer type

54005ce

fix getBaseAddr

29cdaec

00fish0 requested a review from alogfans as a code owner February 6, 2026 19:12

Copilot AI review requested due to automatic review settings February 6, 2026 19:12

00fish0 requested review from chestnut-Q and doujiang24 as code owners February 6, 2026 19:12

github-actions bot added run-ci Transfer Engine labels Feb 6, 2026

Copilot started reviewing on behalf of 00fish0 February 6, 2026 19:12 View session

00fish0 changed the title ~~[TENT] R、educing unnecessary bandwidth consumption in recvData operations~~ [TENT] Reducing unnecessary bandwidth consumption in TCP recvData operations Feb 6, 2026

00fish0 changed the title ~~[TENT] Reducing unnecessary bandwidth consumption in TCP recvData operations~~ [TENT] fix: Reduce unnecessary bandwidth consumption in TCP recvData operations Feb 6, 2026

Copilot AI reviewed Feb 6, 2026

View reviewed changes

gemini-code-assist bot reviewed Feb 6, 2026

View reviewed changes

fatSheep added 2 commits February 7, 2026 03:20

fix: Add error message for invalid buffer in onRecvData

155a3f4

chore: format code

c999d5a

alogfans approved these changes Feb 8, 2026

View reviewed changes

refactor: Use map-based transport configuration for better maintainab…

78c9383

…ility

alogfans merged commit fd2da4f into kvcache-ai:main Feb 9, 2026
16 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

[TENT] fix: Reduce unnecessary bandwidth consumption in TCP recvData operations#1513

[TENT] fix: Reduce unnecessary bandwidth consumption in TCP recvData operations#1513
alogfans merged 7 commits intokvcache-ai:mainfrom
00fish0:fix/tent-tcp-recvdata

00fish0 commented Feb 6, 2026 •

edited

Loading

Uh oh!

gemini-code-assist bot commented Feb 6, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Feb 6, 2026

Uh oh!

Copilot AI Feb 6, 2026

Uh oh!

Copilot AI Feb 6, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

gemini-code-assist bot Feb 6, 2026

Uh oh!

alogfans Feb 8, 2026

Uh oh!

codecov-commenter commented Feb 6, 2026

Uh oh!

alogfans left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

-    if (response.size() != length)
-        return Status::RpcServiceError(
-            "RecvData failed: target address not in registered buffer");
+    if (response.size() != length) {
+        if (!response.empty()) {
+            // Server likely returned an explicit error message instead of raw data.
+            return Status::RpcServiceError(response);
+        }
+        return Status::RpcServiceError(
+            "RecvData failed: response size mismatch (expected " +
+            std::to_string(length) + " bytes, got " +
+            std::to_string(response.size()) + " bytes)");
+    }

Comments

Conversation

00fish0 commented Feb 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of Change

How Has This Been Tested?

Checklist

Uh oh!

gemini-code-assist bot commented Feb 6, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

gemini-code-assist bot Feb 6, 2026

Choose a reason for hiding this comment

Uh oh!

alogfans Feb 8, 2026

Choose a reason for hiding this comment

Uh oh!

codecov-commenter commented Feb 6, 2026

Codecov Report

Uh oh!

alogfans left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

00fish0 commented Feb 6, 2026 •

edited

Loading