-
Notifications
You must be signed in to change notification settings - Fork 19
[WIP] feat(codegen): Add PTO2 V2 orchestration codegen with scope inference #178
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Summary of ChangesHello @zhusy54, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly upgrades the orchestration code generation capabilities by introducing a V2 system for PTO2. It features automatic scope inference for optimized intermediate tensor management, a new Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a significant new feature: V2 orchestration code generation with automatic scope inference for tensor lifetime management. The changes are extensive and well-structured, particularly the refactoring of orchestration codegen to use visitor patterns (OrchestrationStmtCodegen and OrchestrationStmtCodegenV2) and the introduction of an OrchestrationOpRegistry for host-side tensor operations. The addition of the tensor.dim operator and comprehensive new tests are also valuable. I've found a critical issue in the V2 codegen path for tensor.read and a use of undefined behavior in a helper function, along with a minor maintainability concern. Overall, this is a great step forward for the codegen capabilities.
| current_result_var_ = result_var; | ||
| auto& registry = OrchestrationOpRegistry::GetInstance(); | ||
| auto codegen_func = registry.Get(op_name); | ||
| if (codegen_func.has_value()) { | ||
| std::string gen_code = (*codegen_func)(call, *this); | ||
| std::istringstream iss(gen_code); | ||
| std::string line; | ||
| while (std::getline(iss, line)) { | ||
| if (!line.empty()) { | ||
| code_ << Indent() << line << "\n"; | ||
| } | ||
| } | ||
| } | ||
| } | ||
| // tensor.view, tensor.reshape etc. are metadata-only in V2 as well for now |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The V2 codegen for tensor.read reuses the registered V1 codegen function. This V1 function generates code that depends on host_<var> variables, which are not defined in the V2 codegen context (it uses arg_<var>_ptr instead). This will cause a compilation error. The tensor.read operation needs a V2-specific implementation that uses the correct host pointer variables.
} else if (op_name == "tensor.read") {
// V2 implementation for tensor.read, using arg_<name>_ptr instead of host_<name>
CHECK(call->args_.size() == 2) << "tensor.read requires 2 arguments";
std::string input_name = TryGetVarName(call->args_[0]);
auto input_type = As<TensorType>(call->args_[0]->GetType());
auto result_type = As<ScalarType>(call->GetType());
auto indices_tuple = As<MakeTuple>(call->args_[1]);
CHECK(input_name != "" && input_type && result_type && indices_tuple);
std::ostringstream idx_oss;
for (size_t i = 0; i < indices_tuple->elements_.size(); ++i) {
if (i > 0) idx_oss << " + ";
idx_oss << GenerateExprString(indices_tuple->elements_[i]);
for (size_t j = i + 1; j < input_type->shape_.size(); ++j) {
idx_oss << " * " << GenerateExprString(input_type->shape_[j]);
}
}
code_ << Indent() << "size_t idx_" << result_var << " = " << (idx_oss.str().empty() ? "0" : idx_oss.str()) << ";\n";
code_ << Indent() << result_type->dtype_.ToCTypeString() << " " << result_var << " = static_cast<"
<< result_type->dtype_.ToCTypeString() << "*>(arg_" << input_name << "_ptr)[idx_" << result_var << "];\n";
}| oss << "// Helper to encode float as uint64_t for scalar params\n"; | ||
| oss << "static uint64_t float_to_u64(float f) {\n"; | ||
| oss << " union {\n"; | ||
| oss << " float f32;\n"; | ||
| oss << " uint64_t u64;\n"; | ||
| oss << " } conv;\n"; | ||
| oss << " conv.u64 = 0; // Clear upper bits\n"; | ||
| oss << " conv.f32 = f;\n"; | ||
| oss << " return conv.u64;\n"; | ||
| oss << "}\n\n"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The float_to_u64 function uses a union for type-punning, which is undefined behavior in C++. While it might appear to work on some compilers, it's not guaranteed to be portable or correct. A safer, standard-compliant way to perform this bit-level conversion is to use std::memcpy. Please also ensure #include <cstring> is present in the file.
static uint64_t float_to_u64(float f) {
uint32_t u32_val;
static_assert(sizeof(float) == sizeof(uint32_t), "float and uint32_t must have the same size");
std::memcpy(&u32_val, &f, sizeof(float));
return static_cast<uint64_t>(u32_val);
}| // Re-indent the task code: replace leading 4-space indent with 8-space | ||
| std::istringstream iss(task_records[tid].code); | ||
| std::string line; | ||
| while (std::getline(iss, line)) { | ||
| if (line.empty()) { | ||
| oss << "\n"; | ||
| } else if (line.substr(0, 4) == " ") { | ||
| oss << " " << line.substr(4) << "\n"; | ||
| } else { | ||
| oss << " " << line << "\n"; | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logic for re-indenting the task code for the inner PTO2_SCOPE is fragile. It relies on string manipulation (line.substr(0, 4) == " ") which assumes that the task code generated by OrchestrationStmtCodegenV2 will always have a 4-space indent. If that indentation changes in the future, this logic will break or produce incorrectly formatted code. Consider making this more robust, for example by having the task generation logic not produce any indentation and letting the caller handle it completely.
5cd7373 to
be615d1
Compare
be615d1 to
00e8d99
Compare
…sole version Delete V1 orchestration code generation (Runtime API, BuildXXX signature, device_malloc, add_task/add_successor pattern) and rename V2 as the only implementation. Update tensor op registry to emit V2-style code (make_tensor, data pointer via GetTensorDataPtr). Rewrite tests to cover the unified orchestration format with new cases for tuple intermediates, tuple outputs, tensor.create, and tensor.dim.
00e8d99 to
15af5a5
Compare
Summary
Add PTO2 (V2) orchestration code generation targeting the
PTO2Runtime*API, with automaticPTO2_SCOPEinference for intermediate tensor lifetime management.Changes
V2 Orchestration Codegen (
orchestration_codegen.cpp)GenerateOrchestrationV2()generating PTO2-format C++ code:#include "pto_orchestration_api.h",ARG_PTR_/ARG_SIZE_definesmake_tensor_external()for params/returns,make_tensor()for intermediatesPTOParamarrays withmake_input_param/make_output_param/make_scalar_parampto2_rt_submit_task()with func_id, worker type, kernel namefloat_to_u64()helper for float scalar paramsPTO2OrchestrationConfigviaaicpu_orchestration_config()OrchestrationStmtCodegenV2visitor handling function calls, for-loops, tensor ops, scalar/bool constants, and tuple returnsTaskRecord-based scope analysis: tasks with all-external inputs → outer scope; others →PTO2_SCOPE(rt) { ... }inner scopekernel_config_v2.py)CCE Codegen Integration (
cce_codegen.cpp,cce_codegen.h)CCECodegen::Generate()now emits both V1 and V2 orchestration files (<name>.cpp+<name>_v2.cpp)GenerateConfigFileV2()for PTO2 kernel configTests (
test_orchestration_codegen.py)TestOrchestrationV2class with 4 test cases:test_v2_basic_structure: V2 format, includes, ARG defines, external/intermediate tensors, PTO2_SCOPEtest_v2_config_file: V2 config file generationtest_v2_independent_tasks: all-external tasks → no PTO2_SCOPEtest_v2_vector_example_dag: 5-task DAG matchingvector_examplereference (kernel_add,kernel_add_scalar,kernel_mul), scalar params viafloat_to_u64, PTO2_SCOPE wrapping inner tasksScope Inference Algorithm
Tasks are classified based on their input tensor dependencies:
PTO2_SCOPE(rt) { ... }): any input tensor is an intermediate (produced by another task)Intermediate tensors produced by outer tasks are declared before the scope; inner-only intermediates are declared inside the scope for proper lifetime management.