[WIP] feat(codegen): Add PTO2 V2 orchestration codegen with scope inference #178

zhusy54 · 2026-02-12T08:39:15Z

Summary

Stacked on #171, please merge #171 first.

Add PTO2 (V2) orchestration code generation targeting the PTO2Runtime* API, with automatic PTO2_SCOPE inference for intermediate tensor lifetime management.

Changes

V2 Orchestration Codegen (`orchestration_codegen.cpp`)

New GenerateOrchestrationV2() generating PTO2-format C++ code:
- #include "pto_orchestration_api.h", ARG_PTR_/ARG_SIZE_ defines
- make_tensor_external() for params/returns, make_tensor() for intermediates
- PTOParam arrays with make_input_param/make_output_param/make_scalar_param
- pto2_rt_submit_task() with func_id, worker type, kernel name
- float_to_u64() helper for float scalar params
- PTO2OrchestrationConfig via aicpu_orchestration_config()
OrchestrationStmtCodegenV2 visitor handling function calls, for-loops, tensor ops, scalar/bool constants, and tuple returns
TaskRecord-based scope analysis: tasks with all-external inputs → outer scope; others → PTO2_SCOPE(rt) { ... } inner scope
V2 config file generation (kernel_config_v2.py)

CCE Codegen Integration (`cce_codegen.cpp`, `cce_codegen.h`)

CCECodegen::Generate() now emits both V1 and V2 orchestration files (<name>.cpp + <name>_v2.cpp)
GenerateConfigFileV2() for PTO2 kernel config

Tests (`test_orchestration_codegen.py`)

TestOrchestrationV2 class with 4 test cases:
- test_v2_basic_structure: V2 format, includes, ARG defines, external/intermediate tensors, PTO2_SCOPE
- test_v2_config_file: V2 config file generation
- test_v2_independent_tasks: all-external tasks → no PTO2_SCOPE
- test_v2_vector_example_dag: 5-task DAG matching vector_example reference (kernel_add, kernel_add_scalar, kernel_mul), scalar params via float_to_u64, PTO2_SCOPE wrapping inner tasks

Scope Inference Algorithm

Tasks are classified based on their input tensor dependencies:

Outer scope: all input tensors are external (function params or return tensors)
Inner scope (PTO2_SCOPE(rt) { ... }): any input tensor is an intermediate (produced by another task)

Intermediate tensors produced by outer tasks are declared before the scope; inner-only intermediates are declared inside the scope for proper lifetime management.

gemini-code-assist · 2026-02-12T08:39:57Z

Summary of Changes

Hello @zhusy54, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly upgrades the orchestration code generation capabilities by introducing a V2 system for PTO2. It features automatic scope inference for optimized intermediate tensor management, a new tensor.dim operation for extracting tensor dimensions, and a more modular architecture for host-side tensor operations. The changes ensure that both V1 and V2 orchestration outputs are generated, providing backward compatibility while enabling advanced features for the PTO2 runtime.

Highlights

PTO2 V2 Orchestration Codegen: Introduced a new V2 orchestration code generation targeting the PTO2Runtime API, which includes automatic PTO2_SCOPE inference for efficient intermediate tensor lifetime management.
Enhanced Tensor Operations: Added a new tensor.dim operation to extract shape dimensions from tensors, supporting both positive and negative indexing, and integrated it into the Python language frontend.
Modular Codegen Architecture: Refactored the code generation by introducing OrchestrationOpRegistry and tensor_op_codegen.cpp to modularize host-side tensor operation code generation, improving extensibility.
Dual Orchestration Output: Modified the CCE Codegen to emit both V1 and V2 orchestration files (.cpp and _v2.cpp) along with their respective configuration files (kernel_config.py and kernel_config_v2.py).
Comprehensive Testing: Added extensive unit tests for the new V2 orchestration, covering basic structure, config file generation, independent tasks, and complex DAGs with scalar parameters and scope inference.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog

CMakeLists.txt
- Added new source files for codegen_base, orchestration_op_registry, and tensor_op_codegen.
include/pypto/codegen/cce/cce_codegen.h
- Declared GenerateConfigFileV2 method.
include/pypto/codegen/codegen_base.h
- Added static methods TryGetVarName and GenerateExprString.
include/pypto/codegen/orchestration/orchestration_codegen.h
- Declared GenerateOrchestrationV2 function.
include/pypto/codegen/orchestration_op_registry.h
- Added new header defining OrchestrationOpRegistry for host-side operation codegen.
python/pypto/ir/op/tensor_ops.py
- Added dim function to extract shape dimensions from tensors.
python/pypto/language/init.py
- Imported and exposed the new dim function.
python/pypto/language/op/init.py
- Imported and exposed the new dim function.
python/pypto/language/op/tensor_ops.py
- Added dim function for Tensor objects.
src/codegen/cce/cce_codegen.cpp
- Modified Generate to produce both V1 and V2 orchestration and config files.
- Implemented GenerateConfigFileV2 for PTO2 kernel configuration.
src/codegen/codegen_base.cpp
- Added new source file implementing TryGetVarName and GenerateExprString.
src/codegen/orchestration/orchestration_codegen.cpp
- Updated argument extraction and device memory allocation to support multiple return tensors.
- Refactored OrchestrationInfoCollector to handle tuple returns and intermediate tensors.
- Introduced OrchestrationStmtCodegen for structured code generation.
- Implemented GenerateOrchestrationV2 for PTO2 runtime API, including scope inference and task submission.
- Introduced OrchestrationStmtCodegenV2 for V2-specific statement generation.
src/codegen/orchestration_op_registry.cpp
- Added new source file implementing the OrchestrationOpRegistry singleton.
src/codegen/tensor_op_codegen.cpp
- Added new source file implementing REGISTER_ORCHESTRATION_OP for tensor.create, tensor.read, and tensor.dim.
src/ir/op/tensor_ops/memory.cpp
- Added DeduceTensorDimType and registered the tensor.dim operation.
tests/ut/codegen/test_orchestration_codegen.py
- Added new unit tests for V2 orchestration, covering basic structure, config file, independent tasks, and a complex DAG example.
- Added unit tests for V1 orchestration with single/tuple returns, dependencies, and tensor operations.
tests/ut/ir/operators/test_tensor_ops.py
- Added tests for the new tensor.dim operation, including negative axis indexing.
- Updated operator registration test to include tensor.dim.
tests/ut/language/test_unified_ops.py
- Added a test for the promoted pl.dim function.

Activity

No human activity (comments, reviews) has been recorded on this pull request yet.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces a significant new feature: V2 orchestration code generation with automatic scope inference for tensor lifetime management. The changes are extensive and well-structured, particularly the refactoring of orchestration codegen to use visitor patterns (OrchestrationStmtCodegen and OrchestrationStmtCodegenV2) and the introduction of an OrchestrationOpRegistry for host-side tensor operations. The addition of the tensor.dim operator and comprehensive new tests are also valuable. I've found a critical issue in the V2 codegen path for tensor.read and a use of undefined behavior in a helper function, along with a minor maintainability concern. Overall, this is a great step forward for the codegen capabilities.

gemini-code-assist · 2026-02-12T08:42:51Z

src/codegen/orchestration/orchestration_codegen.cpp

+      current_result_var_ = result_var;
+      auto& registry = OrchestrationOpRegistry::GetInstance();
+      auto codegen_func = registry.Get(op_name);
+      if (codegen_func.has_value()) {
+        std::string gen_code = (*codegen_func)(call, *this);
+        std::istringstream iss(gen_code);
+        std::string line;
+        while (std::getline(iss, line)) {
+          if (!line.empty()) {
+            code_ << Indent() << line << "\n";
+          }
+        }
+      }
+    }
+    // tensor.view, tensor.reshape etc. are metadata-only in V2 as well for now


The V2 codegen for tensor.read reuses the registered V1 codegen function. This V1 function generates code that depends on host_<var> variables, which are not defined in the V2 codegen context (it uses arg_<var>_ptr instead). This will cause a compilation error. The tensor.read operation needs a V2-specific implementation that uses the correct host pointer variables.

} else if (op_name == "tensor.read") { // V2 implementation for tensor.read, using arg_<name>_ptr instead of host_<name> CHECK(call->args_.size() == 2) << "tensor.read requires 2 arguments"; std::string input_name = TryGetVarName(call->args_[0]); auto input_type = As<TensorType>(call->args_[0]->GetType()); auto result_type = As<ScalarType>(call->GetType()); auto indices_tuple = As<MakeTuple>(call->args_[1]); CHECK(input_name != "" && input_type && result_type && indices_tuple); std::ostringstream idx_oss; for (size_t i = 0; i < indices_tuple->elements_.size(); ++i) { if (i > 0) idx_oss << " + "; idx_oss << GenerateExprString(indices_tuple->elements_[i]); for (size_t j = i + 1; j < input_type->shape_.size(); ++j) { idx_oss << " * " << GenerateExprString(input_type->shape_[j]); } } code_ << Indent() << "size_t idx_" << result_var << " = " << (idx_oss.str().empty() ? "0" : idx_oss.str()) << ";\n"; code_ << Indent() << result_type->dtype_.ToCTypeString() << " " << result_var << " = static_cast<" << result_type->dtype_.ToCTypeString() << "*>(arg_" << input_name << "_ptr)[idx_" << result_var << "];\n"; }

gemini-code-assist · 2026-02-12T08:42:51Z

src/codegen/orchestration/orchestration_codegen.cpp

+  oss << "// Helper to encode float as uint64_t for scalar params\n";
+  oss << "static uint64_t float_to_u64(float f) {\n";
+  oss << "    union {\n";
+  oss << "        float f32;\n";
+  oss << "        uint64_t u64;\n";
+  oss << "    } conv;\n";
+  oss << "    conv.u64 = 0;  // Clear upper bits\n";
+  oss << "    conv.f32 = f;\n";
+  oss << "    return conv.u64;\n";
+  oss << "}\n\n";


The float_to_u64 function uses a union for type-punning, which is undefined behavior in C++. While it might appear to work on some compilers, it's not guaranteed to be portable or correct. A safer, standard-compliant way to perform this bit-level conversion is to use std::memcpy. Please also ensure #include <cstring> is present in the file.

static uint64_t float_to_u64(float f) { uint32_t u32_val; static_assert(sizeof(float) == sizeof(uint32_t), "float and uint32_t must have the same size"); std::memcpy(&u32_val, &f, sizeof(float)); return static_cast<uint64_t>(u32_val); }

gemini-code-assist · 2026-02-12T08:42:51Z

src/codegen/orchestration/orchestration_codegen.cpp

+      // Re-indent the task code: replace leading 4-space indent with 8-space
+      std::istringstream iss(task_records[tid].code);
+      std::string line;
+      while (std::getline(iss, line)) {
+        if (line.empty()) {
+          oss << "\n";
+        } else if (line.substr(0, 4) == "    ") {
+          oss << "        " << line.substr(4) << "\n";
+        } else {
+          oss << "        " << line << "\n";
+        }
+      }


The logic for re-indenting the task code for the inner PTO2_SCOPE is fragile. It relies on string manipulation (line.substr(0, 4) == " ") which assumes that the task code generated by OrchestrationStmtCodegenV2 will always have a 4-space indent. If that indentation changes in the future, this logic will break or produce incorrectly formatted code. Consider making this more robust, for example by having the task generation logic not produce any indentation and letting the caller handle it completely.

…sole version Delete V1 orchestration code generation (Runtime API, BuildXXX signature, device_malloc, add_task/add_successor pattern) and rename V2 as the only implementation. Update tensor op registry to emit V2-style code (make_tensor, data pointer via GetTensorDataPtr). Rewrite tests to cover the unified orchestration format with new cases for tuple intermediates, tuple outputs, tensor.create, and tensor.dim.

gemini-code-assist bot reviewed Feb 12, 2026

View reviewed changes

zhusy54 force-pushed the orchest-refact branch from 5cd7373 to be615d1 Compare February 12, 2026 08:49

zhusy54 changed the title ~~feat(codegen): Add PTO2 V2 orchestration codegen with scope inference~~ [WIP] feat(codegen): Add PTO2 V2 orchestration codegen with scope inference Feb 12, 2026

zhusy54 force-pushed the orchest-refact branch from be615d1 to 00e8d99 Compare February 12, 2026 12:09

zhusy54 force-pushed the orchest-refact branch from 00e8d99 to 15af5a5 Compare February 12, 2026 13:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] feat(codegen): Add PTO2 V2 orchestration codegen with scope inference #178

[WIP] feat(codegen): Add PTO2 V2 orchestration codegen with scope inference #178

Uh oh!

zhusy54 commented Feb 12, 2026

Uh oh!

gemini-code-assist bot commented Feb 12, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Feb 12, 2026

Uh oh!

gemini-code-assist bot Feb 12, 2026

Uh oh!

gemini-code-assist bot Feb 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

[WIP] feat(codegen): Add PTO2 V2 orchestration codegen with scope inference #178

Are you sure you want to change the base?

[WIP] feat(codegen): Add PTO2 V2 orchestration codegen with scope inference #178

Uh oh!

Conversation

zhusy54 commented Feb 12, 2026

Summary

Changes

V2 Orchestration Codegen (orchestration_codegen.cpp)

CCE Codegen Integration (cce_codegen.cpp, cce_codegen.h)

Tests (test_orchestration_codegen.py)

Scope Inference Algorithm

Uh oh!

gemini-code-assist bot commented Feb 12, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

V2 Orchestration Codegen (`orchestration_codegen.cpp`)

CCE Codegen Integration (`cce_codegen.cpp`, `cce_codegen.h`)

Tests (`test_orchestration_codegen.py`)