[feat](kt-kernel): Load model from specified CPU cores #1712

raidshoebox1 · 2025-12-12T15:32:52Z

What does this PR do?

Feature Description: Uses the KT_NUMA_CPU_OFFSET environment variable to load models from specified CPU cores. Prints whether the environment variable has been set, as well as the starting cpuid of the current NUMA node.

Use Case: Deploy multiple sglang instances on the same server, avoid CPU core conflicts between instances.

Example: Load models from cores with cpuid 8 (NUMA 0) and 48 (NUMA 1), while cores with cpuid 0-7 and 40-47 remain idle:

export KT_NUMA_CPU_OFFSET=8

Terminal display information:

CPUInfer[0x390647e0]: Hello
WorkerPool[0x3e2249d0] 2 subpools, [numa:threads][0:14] [1:14]
KT_NUMA_CPU_OFFSET successfully set to 8
===========In NumaPool============
In Numa Worker Pool at NUMA 0, 14 threads, loading model from CPU 8
===========In NumaPool============
In Numa Worker Pool at NUMA 1, 14 threads, loading model from CPU 48

I have tested it on Intel CPU with AMXINT4 backend and AMD CPU with MOE_INT8 backend. No issues have been found so far.

More testing is needed.

gemini-code-assist · 2025-12-12T15:33:09Z

Summary of Changes

Hello @raidshoebox1, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request implements a new feature that provides fine-grained control over CPU core allocation for model loading within the kt-kernel system. By leveraging the KT_NUMA_CPU_OFFSET environment variable, users can now define a starting CPU core offset for each NUMA node, which is particularly beneficial for deploying multiple sglang instances on a single server without CPU core conflicts. The changes involve modifying the worker pool initialization logic to respect this offset and enhancing the output logs to clearly indicate the specific CPU cores being utilized.

Highlights

CPU Core Allocation Control: Introduced the KT_NUMA_CPU_OFFSET environment variable, allowing users to specify an offset for CPU core allocation when loading models. This enables more granular control over resource assignment.
Enhanced Logging: Improved logging to display the specific physical CPU ID from which models are loaded within each NUMA worker pool, providing clearer insights into resource utilization.
Documentation Update: Updated the kt-kernel/README.md to include comprehensive documentation for the new KT_NUMA_CPU_OFFSET environment variable, complete with an example demonstrating its usage.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces a new feature to specify a CPU core offset for model loading using the KT_NUMA_CPU_OFFSET environment variable. This is useful for running multiple instances on the same server without CPU core conflicts. The changes include updates to the worker pool initialization logic to respect this offset and documentation for the new feature in the README.

My review focuses on improving code robustness, maintainability, and documentation clarity. Key feedback points include:

Refactoring duplicated code for reading the environment variable and using a safer parsing function (strtol instead of atoi).
Fixing several garbled comments in the C++ code, likely caused by character encoding issues, by translating them to English for consistency.
Enhancing the documentation in README.md to provide a clearer explanation and example of how the CPU offset works, especially in multi-NUMA setups.

gemini-code-assist · 2025-12-12T15:35:03Z

kt-kernel/README.md

+**Load from Specified CPU Cores:**
+- Uses the `KT_NUMA_CPU_OFFSET` environment variable to load models from specified CPU cores.
+- Example: Load models from cores with cpuid 8 (NUMA 0) and 48 (NUMA 1), while cores with cpuid 0-7 and 40-47 remain idle:
+```bash
+export KT_NUMA_CPU_OFFSET=8
+```


The explanation for KT_NUMA_CPU_OFFSET could be more explicit about how the offset is applied across multiple NUMA nodes. The current example is a bit brief. To avoid ambiguity for users with different hardware configurations, consider clarifying that the offset is applied to the logical core index within each NUMA node. A more detailed example, similar to the one in the pull request description, would be very helpful.

gemini-code-assist · 2025-12-12T15:35:03Z