Skip to content

Conversation

@raidshoebox1
Copy link

What does this PR do?

Feature Description: Uses the KT_NUMA_CPU_OFFSET environment variable to load models from specified CPU cores. Prints whether the environment variable has been set, as well as the starting cpuid of the current NUMA node.

Use Case: Deploy multiple sglang instances on the same server, avoid CPU core conflicts between instances.

Example: Load models from cores with cpuid 8 (NUMA 0) and 48 (NUMA 1), while cores with cpuid 0-7 and 40-47 remain idle:

export KT_NUMA_CPU_OFFSET=8

Terminal display information:

CPUInfer[0x390647e0]: Hello
WorkerPool[0x3e2249d0] 2 subpools, [numa:threads][0:14] [1:14]
KT_NUMA_CPU_OFFSET successfully set to 8
===========In NumaPool============
In Numa Worker Pool at NUMA 0, 14 threads, loading model from CPU 8
===========In NumaPool============
In Numa Worker Pool at NUMA 1, 14 threads, loading model from CPU 48

I have tested it on Intel CPU with AMXINT4 backend and AMD CPU with MOE_INT8 backend. No issues have been found so far.

More testing is needed.

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @raidshoebox1, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request implements a new feature that provides fine-grained control over CPU core allocation for model loading within the kt-kernel system. By leveraging the KT_NUMA_CPU_OFFSET environment variable, users can now define a starting CPU core offset for each NUMA node, which is particularly beneficial for deploying multiple sglang instances on a single server without CPU core conflicts. The changes involve modifying the worker pool initialization logic to respect this offset and enhancing the output logs to clearly indicate the specific CPU cores being utilized.

Highlights

  • CPU Core Allocation Control: Introduced the KT_NUMA_CPU_OFFSET environment variable, allowing users to specify an offset for CPU core allocation when loading models. This enables more granular control over resource assignment.
  • Enhanced Logging: Improved logging to display the specific physical CPU ID from which models are loaded within each NUMA worker pool, providing clearer insights into resource utilization.
  • Documentation Update: Updated the kt-kernel/README.md to include comprehensive documentation for the new KT_NUMA_CPU_OFFSET environment variable, complete with an example demonstrating its usage.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new feature to specify a CPU core offset for model loading using the KT_NUMA_CPU_OFFSET environment variable. This is useful for running multiple instances on the same server without CPU core conflicts. The changes include updates to the worker pool initialization logic to respect this offset and documentation for the new feature in the README.

My review focuses on improving code robustness, maintainability, and documentation clarity. Key feedback points include:

  • Refactoring duplicated code for reading the environment variable and using a safer parsing function (strtol instead of atoi).
  • Fixing several garbled comments in the C++ code, likely caused by character encoding issues, by translating them to English for consistency.
  • Enhancing the documentation in README.md to provide a clearer explanation and example of how the CPU offset works, especially in multi-NUMA setups.

Comment on lines +195 to +200
**Load from Specified CPU Cores:**
- Uses the `KT_NUMA_CPU_OFFSET` environment variable to load models from specified CPU cores.
- Example: Load models from cores with cpuid 8 (NUMA 0) and 48 (NUMA 1), while cores with cpuid 0-7 and 40-47 remain idle:
```bash
export KT_NUMA_CPU_OFFSET=8
```
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The explanation for KT_NUMA_CPU_OFFSET could be more explicit about how the offset is applied across multiple NUMA nodes. The current example is a bit brief. To avoid ambiguity for users with different hardware configurations, consider clarifying that the offset is applied to the logical core index within each NUMA node. A more detailed example, similar to the one in the pull request description, would be very helpful.

fprintf(stderr, "Failed to set thread name: %s\n", strerror(res_set_name));
}
// 检查线程是否成功命名
// 检查线程是否成功命名
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This comment is garbled, likely due to a character encoding issue. For better code maintainability and consistency with other comments in this file, please translate it to English.

Suggested change
// ����߳��Ƿ�ɹ�����
// Check if the thread name was set successfully

}
auto start = std::chrono::high_resolution_clock::now();
WorkerPool::thread_local_id = thread_id; // 设置线程本地变量
WorkerPool::thread_local_id = thread_id; // 设置线程本地变量
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This comment is garbled, likely due to a character encoding issue. To improve readability and maintain consistency, it should be translated to English.

Suggested change
WorkerPool::thread_local_id = thread_id; // �����̱߳��ر���
WorkerPool::thread_local_id = thread_id; // Set thread-local variable

Comment on lines +285 to +288
int offset = 0;
if (const char* env_p = std::getenv("KT_NUMA_CPU_OFFSET")) {
offset = std::atoi(env_p);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

std::atoi is not safe for converting strings to integers because it doesn't handle parsing errors. For example, if KT_NUMA_CPU_OFFSET is set to a non-numeric value, atoi will silently return 0. It's better to use std::strtol which allows for error checking.

This logic is also duplicated in WorkerPool::init at lines 416-420. It would be best to extract this into a helper function to avoid code duplication and ensure consistent behavior (e.g., one location prints a message, the other doesn't).

  int offset = 0;
  if (const char* env_p = std::getenv("KT_NUMA_CPU_OFFSET")) {
    char* end;
    long val = std::strtol(env_p, &end, 10);
    if (end == env_p || *end != '\0') {
      fprintf(stderr, "Warning: Invalid value for KT_NUMA_CPU_OFFSET: '%s'. Using default 0.\n", env_p);
    } else {
      offset = static_cast<int>(val);
    }
  }

continue;
}
// 精简 cpuset
// 精简 cpuset
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This comment is garbled due to a character encoding issue. Please translate it to English for consistency and readability.

    // Simplify cpuset

hwloc_bitmap_copy(cpuset_simple, core_obj->cpuset);
hwloc_bitmap_singlify(cpuset_simple);
// 打印绑定的具体的 CPU 物理索引
// 打印绑定的具体的 CPU 物理索引
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This comment is garbled, likely due to a character encoding issue. It should be translated to English to improve code clarity.

    // Print the specific physical CPU index to which it is bound

fprintf(stderr, "Failed to set thread CPU binding: %s\n", strerror(errno));
}
// 检查线程是否绑定到指定的 核上了
// 检查线程是否绑定到指定的 核上了
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This comment is garbled due to a character encoding issue. For better readability and maintainability, please translate it to English.

    // Check if the thread is bound to the specified core

Comment on lines +416 to +420
int offset = 0;
if (const char* env_p = std::getenv("KT_NUMA_CPU_OFFSET")) {
offset = std::atoi(env_p);
printf("KT_NUMA_CPU_OFFSET successfully set to %d\n", offset);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This logic for reading the KT_NUMA_CPU_OFFSET environment variable is a duplicate of the code in NumaJobDistributor::init (lines 285-288) and uses the unsafe std::atoi function.

As mentioned in the other comment, this should be refactored into a single helper function that uses std::strtol for safe parsing. This will improve robustness and maintainability by centralizing the logic.

  int offset = 0;
  if (const char* env_p = std::getenv("KT_NUMA_CPU_OFFSET")) {
    char* end;
    long val = std::strtol(env_p, &end, 10);
    if (end == env_p || *end != '\0') {
      fprintf(stderr, "Warning: Invalid value for KT_NUMA_CPU_OFFSET: '%s'. Using default 0.\n", env_p);
    } else {
      offset = static_cast<int>(val);
      printf("KT_NUMA_CPU_OFFSET successfully set to %d\n", offset);
    }
  }

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant