Skip to content

Comments

[TE][Store] refactor: centralize environment variable handling across components#1538

Open
stmatengss wants to merge 4 commits intomainfrom
refactor/centralize-env-vars
Open

[TE][Store] refactor: centralize environment variable handling across components#1538
stmatengss wants to merge 4 commits intomainfrom
refactor/centralize-env-vars

Conversation

@stmatengss
Copy link
Collaborator

Refactored environment variable access across mooncake-store and mooncake-transfer-engine to use centralized GetEnv() functions from mooncake-common/environ.h. This improves code maintainability and reduces duplication.

Changes:

  • Extended mooncake-common/environ.h with additional env var getters
  • Updated config.cpp to use centralized env var functions (342 insertions, 404 deletions)
  • Migrated all components to use GetEnv() instead of direct getenv() calls
  • Affected components: client_metric, client_service, ha_helper, master, transfer_engine, topology, and all transport implementations

Description

Type of Change

  • Types
    • Bug fix
    • New feature
      • Transfer Engine
      • Mooncake Store
      • Mooncake EP
      • Integration
      • P2P Store
      • Python Wheel
    • Breaking change
    • CI/CD
    • Documentation update
    • Other

How Has This Been Tested?

Checklist

  • I have performed a self-review of my own code.
  • I have formatted my own code using ./scripts/code_format.sh before submitting.
  • I have updated the documentation.
  • I have added tests to prove my changes are effective.

Refactored environment variable access across mooncake-store and
mooncake-transfer-engine to use centralized GetEnv() functions from
mooncake-common/environ.h. This improves code maintainability and
reduces duplication.

Changes:
- Extended mooncake-common/environ.h with additional env var getters
- Updated config.cpp to use centralized env var functions (342 insertions, 404 deletions)
- Migrated all components to use GetEnv() instead of direct getenv() calls
- Affected components: client_metric, client_service, ha_helper, master,
  transfer_engine, topology, and all transport implementations
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @stmatengss, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a significant refactoring effort to centralize how environment variables are accessed throughout the Mooncake codebase. By moving all environment variable retrieval into a dedicated Environ class within mooncake-common, the change aims to enhance code consistency, simplify future modifications, and reduce the potential for errors associated with scattered getenv() calls. This consolidation impacts a wide range of components, streamlining their configuration processes.

Highlights

  • Centralized Environment Variable Handling: Refactored environment variable access across mooncake-store and mooncake-transfer-engine components to use a centralized GetEnv() function from mooncake-common/environ.h. This significantly improves code maintainability and reduces duplication by consolidating environment variable retrieval logic.
  • Expanded environ.h: Extended mooncake-common/include/environ.h with numerous new getter methods for various environment variables, providing a single, consistent interface for configuration.
  • Broad Component Migration: Migrated environment variable access in client_metric, client_service, ha_helper, master, transfer_engine, topology, and all transport implementations to utilize the new centralized Environ::Get() methods, replacing direct getenv() calls.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • mooncake-common/include/environ.h
    • Added 32 new getter methods for various environment variables (e.g., GetCustomTopoJson, GetCxlDevPath, GetHandshakePort, GetStoreClientMetric, GetTeMetric, GetUseTent, GetYltLogLevel).
    • Declared corresponding private member variables for each new environment variable.
  • mooncake-common/src/environ.cpp
    • Initialized 32 new private member variables by calling appropriate GetBool, GetInt, GetString, or GetSizeT helper functions to retrieve environment variable values.
  • mooncake-store/src/client_metric.cpp
    • Included environ.h.
    • Replaced direct getenv("MC_STORE_CLIENT_METRIC") with Environ::Get().GetStoreClientMetric().
    • Replaced direct getenv("MC_STORE_CLIENT_METRIC_INTERVAL") with Environ::Get().GetStoreClientMetricInterval().
    • Simplified the logic for parsing metric interval, removing std::stoull and exception handling as parsing is now handled by Environ.
  • mooncake-store/src/client_service.cpp
    • Included environ.h.
    • Replaced direct getenv("MC_MS_AUTO_DISC") with Environ::Get().GetMsAutoDisc().
    • Replaced direct getenv("MC_MS_FILTERS") with Environ::Get().GetMsFilters().
    • Adjusted string parsing logic for MC_MS_FILTERS to work with std::string returned by Environ.
  • mooncake-store/src/ha_helper.cpp
    • Included environ.h.
    • Replaced direct getenv("MC_STORE_CLUSTER_ID") with Environ::Get().GetStoreClusterId().
  • mooncake-store/src/master.cpp
    • Included environ.h.
    • Replaced direct getenv("MC_RPC_PROTOCOL") with Environ::Get().GetRpcProtocol() for determining RPC protocol.
  • mooncake-store/src/real_client.cpp
    • Included environ.h.
  • mooncake-store/src/transfer_task.cpp
    • Included environ.h.
    • Replaced direct getenv("MC_STORE_MEMCPY") with Environ::Get().GetStoreMemcpy().
  • mooncake-transfer-engine/src/config.cpp
    • Included environ.h.
    • Replaced numerous direct getenv() calls for various configuration parameters (e.g., MC_NUM_CQ_PER_CTX, MC_IB_PORT, MC_MAX_CQE_PER_CTX, MC_MTU, MC_HANDSHAKE_PORT, MC_LOG_LEVEL, MC_MIN_PRC_PORT, MC_FRAGMENT_RATIO, MC_IB_TC) with corresponding Environ::Get() methods.
    • Simplified parsing logic for several environment variables, leveraging the Environ class's type-safe getters.
  • mooncake-transfer-engine/src/topology.cpp
    • Included environ.h.
    • Replaced direct getenv("MC_PATH_ROUNDROBIN") with Environ::Get().GetPathRoundrobin().
  • mooncake-transfer-engine/src/transfer_engine.cpp
    • Included environ.h.
    • Replaced direct getenv("MC_USE_TENT") and getenv("MC_USE_TEV1") with Environ::Get().GetUseTent() and Environ::Get().GetUseTev1() respectively.
  • mooncake-transfer-engine/src/transfer_engine_impl.cpp
    • Included environ.h.
    • Replaced direct getenv("MC_LEGACY_RPC_PORT_BINDING") with Environ::Get().GetLegacyRpcPortBinding().
    • Replaced direct getenv("MC_TCP_BIND_ADDRESS") with Environ::Get().GetTcpBindAddress().
    • Replaced direct getenv("MC_CXL_DEV_PATH") with Environ::Get().GetCxlDevPath().
    • Replaced direct getenv("MC_CUSTOM_TOPO_JSON") with Environ::Get().GetCustomTopoJson().
    • Replaced direct getenv("MC_FORCE_MNNVL") and getenv("MC_INTRANODE_NVLINK") with Environ::Get().GetForceMnnvl() and Environ::Get().GetIntranodeNvlink().
    • Replaced direct getenv("MC_FORCE_TCP") and getenv("MC_FORCE_HCA") with Environ::Get().GetForceTcp() and Environ::Get().GetForceHca().
    • Replaced direct getenv("MC_TE_METRIC") and getenv("MC_TE_METRIC_INTERVAL_SECONDS") with Environ::Get().GetTeMetric() and Environ::Get().GetTeMetricIntervalSeconds().
  • mooncake-transfer-engine/src/transfer_metadata.cpp
    • Included environ.h.
    • Replaced direct getenv("MC_METADATA_CLUSTER_ID") with Environ::Get().GetMetadataClusterId().
  • mooncake-transfer-engine/src/transfer_metadata_plugin.cpp
    • Included environ.h.
    • Replaced direct getenv("MC_REDIS_PASSWORD") with Environ::Get().GetRedisPassword().
    • Replaced direct getenv("MC_REDIS_DB_INDEX") with Environ::Get().GetRedisDbIndex() and simplified parsing logic.
  • mooncake-transfer-engine/src/transport/cxl_transport/cxl_transport.cpp
    • Included environ.h.
    • Replaced direct getenv("MC_CXL_DEV_PATH") with Environ::Get().GetCxlDevPath().
    • Replaced direct getenv("MC_CXL_DEV_SIZE") with Environ::Get().GetCxlDevSize().
  • mooncake-transfer-engine/src/transport/hip_transport/hip_transport.cpp
    • Included environ.h.
    • Replaced direct getenv("MC_HIP_NUM_STREAMS") with Environ::Get().GetHipNumStreams() and simplified parsing.
    • Replaced direct getenv("MC_HIP_NUM_EVENTS") with Environ::Get().GetHipNumEvents() and simplified parsing.
    • Replaced direct getenv("MC_USE_HIP_IPC") and getenv("MC_USE_NVLINK_IPC") with Environ::Get().GetUseHipIpc() and Environ::Get().GetUseNvlinkIpc().
  • mooncake-transfer-engine/src/transport/nvlink_transport/nvlink_transport.cpp
    • Included environ.h.
    • Replaced direct getenv("MC_USE_NVLINK_IPC") with Environ::Get().GetUseNvlinkIpc().
  • mooncake-transfer-engine/src/transport/rpc_communicator/rpc_communicator.cpp
    • Included environ.h.
    • Replaced direct getenv("MC_RPC_PROTOCOL") with Environ::Get().GetRpcProtocol().
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors environment variable access across the codebase by introducing a new Environ singleton class. The changes involve adding numerous getter methods and corresponding member variables to mooncake-common/include/environ.h and initializing them in mooncake-common/src/environ.cpp. Subsequently, all direct calls to std::getenv() for MC_ prefixed environment variables are replaced with calls to Environ::Get().Get<EnvVarName>() in various files, including mooncake-store/src/client_metric.cpp, client_service.cpp, ha_helper.cpp, master.cpp, transfer_task.cpp, and mooncake-transfer-engine/src/config.cpp, topology.cpp, transfer_engine.cpp, transfer_engine_impl.cpp, transfer_metadata.cpp, transfer_metadata_plugin.cpp, transport/cxl_transport/cxl_transport.cpp, transport/hip_transport/hip_transport.cpp, transport/nvlink_transport/nvlink_transport.cpp, and transport/rpc_communicator/rpc_communicator.cpp. Review comments highlight a missing closing brace in client_metric.cpp causing a compilation error, a broken fallback logic for MC_GID_INDEX to NCCL_IB_GID_INDEX in config.cpp due to the new Environ class's default return value, a risky C-style cast hiding a const_cast in cxl_transport.cpp when assigning cxl_dev_path, and a suggestion to change MC_STORE_MEMCPY and MC_STORE_USE_HUGEPAGE to return bool instead of std::string for consistency.

Comment on lines 569 to 570
} else {
LOG(WARNING)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

There appears to be a missing closing brace } after the if (interval > 0) block on line 568. This is a syntax error that will cause a compilation failure.

        } else if (interval < 0) {
            LOG(WARNING)

Comment on lines +40 to +46
int gid_index = env.GetGidIndex();
if (gid_index == 0) {
const char *nccl_gid = std::getenv("NCCL_IB_GID_INDEX");
if (nccl_gid) gid_index = atoi(nccl_gid);
}

const char *mtu_length_env = std::getenv("MC_MTU");
if (mtu_length_env) {
size_t val = atoi(mtu_length_env);
if (val == 512)
config.mtu_length = IBV_MTU_512;
else if (val == 1024)
config.mtu_length = IBV_MTU_1024;
else if (val == 2048)
config.mtu_length = IBV_MTU_2048;
else if (val == 4096)
config.mtu_length = IBV_MTU_4096;
else {
LOG(ERROR) << "Ignore value from environment variable MC_MTU, it "
"should be 512|1024|2048|4096";
exit(EXIT_FAILURE);
}
if (gid_index >= 0 && gid_index < 256)
config.gid_index = gid_index;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The fallback logic for MC_GID_INDEX to NCCL_IB_GID_INDEX appears to be broken with this change. env.GetGidIndex() returns a default value of 3 if MC_GID_INDEX is not set, so the if (gid_index == 0) block is not executed, and the fallback to NCCL_IB_GID_INDEX is missed. Also, if a user explicitly sets MC_GID_INDEX=0, the current logic would incorrectly fall back to NCCL_IB_GID_INDEX.

The original logic of checking MC_GID_INDEX first, then NCCL_IB_GID_INDEX, should be preserved. A proper fix would be to handle this fallback logic inside the Environ class itself for better encapsulation.

if (!env_cxl_dev_path.empty()) {
LOG(INFO) << "MC_CXL_DEV_PATH: " << env_cxl_dev_path;
cxl_dev_path = (char *)env_cxl_dev_path;
cxl_dev_path = (char *)env_cxl_dev_path.c_str();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The C-style cast (char *) hides a const_cast. The member cxl_dev_path is a char*, but it's being assigned from env_cxl_dev_path.c_str() which returns a const char*. Modifying the string literal returned by c_str() is undefined behavior. It would be safer to change the type of cxl_dev_path to std::string or const char* to avoid this risky cast.

Comment on lines +72 to +73
std::string GetStoreMemcpy() const { return store_memcpy_; }
std::string GetStoreUseHugepage() const { return store_use_hugepage_; }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The environment variables MC_STORE_MEMCPY and MC_STORE_USE_HUGEPAGE seem to be boolean flags. However, they are defined here to return std::string. For consistency with other boolean flags like GetUseHipIpc and to simplify usage, consider changing them to return bool and use GetBool in the implementation. This would also require changing the corresponding member variables store_memcpy_ and store_use_hugepage_ to bool.

- Fix undefined variable 'value' in rpc_communicator.cpp, replace with 'rpc_protocol'
- Fix malformed InitializeMetricsConfig() function in transfer_engine_impl.cpp
- Fix malformed parseMetricsInterval() function in client_metric.cpp
- Add mooncake_common library link to transfer_engine CMakeLists.txt
- Add mooncake:: namespace qualifier to Environ::Get() calls in master.cpp

All compilation and linking errors resolved. Build completes successfully.
@stmatengss stmatengss changed the title [Refactor] Centralize environment variable handling across components [TE][Store] refactor: centralize environment variable handling across components Feb 12, 2026
Fix CXL hardware-dependent tests to skip instead of failing when CXL
devices are not available in CI environments.

Changes:
- mooncake-transfer-engine/tests/cxl_transport_test.cpp: Use GTEST_SKIP()
  when CXL transport installation fails
- mooncake-store/tests/cxl_client_integration_test.cpp: Skip test suite
  when CXL client creation fails, preventing segfault

This allows tests to pass (as skipped) in environments without CXL
hardware while still running properly on systems with actual devices.
@stmatengss stmatengss force-pushed the refactor/centralize-env-vars branch from f2389e3 to b220000 Compare February 12, 2026 06:43
@ykwd
Copy link
Collaborator

ykwd commented Feb 13, 2026

Glad that we now have a unified place to set and retrieve environment variables. I believe this will make things much more convenient for users.

@alogfans
Copy link
Collaborator

LGTM for TE part.

Copy link
Collaborator

@amd-arozanov amd-arozanov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Collaborator

@alogfans alogfans left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before merging it, pls fix the build issue of Mooncake TE.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants