-
Notifications
You must be signed in to change notification settings - Fork 1.2k
[feat](kt-kernel): Load model from specified CPU cores #1712
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Summary of ChangesHello @raidshoebox1, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request implements a new feature that provides fine-grained control over CPU core allocation for model loading within the Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a new feature to specify a CPU core offset for model loading using the KT_NUMA_CPU_OFFSET environment variable. This is useful for running multiple instances on the same server without CPU core conflicts. The changes include updates to the worker pool initialization logic to respect this offset and documentation for the new feature in the README.
My review focuses on improving code robustness, maintainability, and documentation clarity. Key feedback points include:
- Refactoring duplicated code for reading the environment variable and using a safer parsing function (
strtolinstead ofatoi). - Fixing several garbled comments in the C++ code, likely caused by character encoding issues, by translating them to English for consistency.
- Enhancing the documentation in
README.mdto provide a clearer explanation and example of how the CPU offset works, especially in multi-NUMA setups.
| **Load from Specified CPU Cores:** | ||
| - Uses the `KT_NUMA_CPU_OFFSET` environment variable to load models from specified CPU cores. | ||
| - Example: Load models from cores with cpuid 8 (NUMA 0) and 48 (NUMA 1), while cores with cpuid 0-7 and 40-47 remain idle: | ||
| ```bash | ||
| export KT_NUMA_CPU_OFFSET=8 | ||
| ``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The explanation for KT_NUMA_CPU_OFFSET could be more explicit about how the offset is applied across multiple NUMA nodes. The current example is a bit brief. To avoid ambiguity for users with different hardware configurations, consider clarifying that the offset is applied to the logical core index within each NUMA node. A more detailed example, similar to the one in the pull request description, would be very helpful.
| fprintf(stderr, "Failed to set thread name: %s\n", strerror(res_set_name)); | ||
| } | ||
| // 检查线程是否成功命名 | ||
| // 检查线程是否成功命名 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| } | ||
| auto start = std::chrono::high_resolution_clock::now(); | ||
| WorkerPool::thread_local_id = thread_id; // 设置线程本地变量 | ||
| WorkerPool::thread_local_id = thread_id; // 设置线程本地变量 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| int offset = 0; | ||
| if (const char* env_p = std::getenv("KT_NUMA_CPU_OFFSET")) { | ||
| offset = std::atoi(env_p); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
std::atoi is not safe for converting strings to integers because it doesn't handle parsing errors. For example, if KT_NUMA_CPU_OFFSET is set to a non-numeric value, atoi will silently return 0. It's better to use std::strtol which allows for error checking.
This logic is also duplicated in WorkerPool::init at lines 416-420. It would be best to extract this into a helper function to avoid code duplication and ensure consistent behavior (e.g., one location prints a message, the other doesn't).
int offset = 0;
if (const char* env_p = std::getenv("KT_NUMA_CPU_OFFSET")) {
char* end;
long val = std::strtol(env_p, &end, 10);
if (end == env_p || *end != '\0') {
fprintf(stderr, "Warning: Invalid value for KT_NUMA_CPU_OFFSET: '%s'. Using default 0.\n", env_p);
} else {
offset = static_cast<int>(val);
}
}| continue; | ||
| } | ||
| // 精简 cpuset | ||
| // 精简 cpuset |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| hwloc_bitmap_copy(cpuset_simple, core_obj->cpuset); | ||
| hwloc_bitmap_singlify(cpuset_simple); | ||
| // 打印绑定的具体的 CPU 物理索引 | ||
| // 打印绑定的具体的 CPU 物理索引 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| fprintf(stderr, "Failed to set thread CPU binding: %s\n", strerror(errno)); | ||
| } | ||
| // 检查线程是否绑定到指定的 核上了 | ||
| // 检查线程是否绑定到指定的 核上了 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| int offset = 0; | ||
| if (const char* env_p = std::getenv("KT_NUMA_CPU_OFFSET")) { | ||
| offset = std::atoi(env_p); | ||
| printf("KT_NUMA_CPU_OFFSET successfully set to %d\n", offset); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This logic for reading the KT_NUMA_CPU_OFFSET environment variable is a duplicate of the code in NumaJobDistributor::init (lines 285-288) and uses the unsafe std::atoi function.
As mentioned in the other comment, this should be refactored into a single helper function that uses std::strtol for safe parsing. This will improve robustness and maintainability by centralizing the logic.
int offset = 0;
if (const char* env_p = std::getenv("KT_NUMA_CPU_OFFSET")) {
char* end;
long val = std::strtol(env_p, &end, 10);
if (end == env_p || *end != '\0') {
fprintf(stderr, "Warning: Invalid value for KT_NUMA_CPU_OFFSET: '%s'. Using default 0.\n", env_p);
} else {
offset = static_cast<int>(val);
printf("KT_NUMA_CPU_OFFSET successfully set to %d\n", offset);
}
}
What does this PR do?
Feature Description: Uses the
KT_NUMA_CPU_OFFSETenvironment variable to load models from specified CPU cores. Prints whether the environment variable has been set, as well as the starting cpuid of the current NUMA node.Use Case: Deploy multiple sglang instances on the same server, avoid CPU core conflicts between instances.
Example: Load models from cores with cpuid 8 (NUMA 0) and 48 (NUMA 1), while cores with cpuid 0-7 and 40-47 remain idle:
export KT_NUMA_CPU_OFFSET=8Terminal display information:
I have tested it on Intel CPU with AMXINT4 backend and AMD CPU with MOE_INT8 backend. No issues have been found so far.
More testing is needed.