Skip to content

Conversation

@zhusy54
Copy link
Contributor

@zhusy54 zhusy54 commented Feb 12, 2026

Summary

Stacked on #171, please merge #171 first.

Add PTO2 (V2) orchestration code generation targeting the PTO2Runtime* API, with automatic PTO2_SCOPE inference for intermediate tensor lifetime management.

Changes

V2 Orchestration Codegen (orchestration_codegen.cpp)

  • New GenerateOrchestrationV2() generating PTO2-format C++ code:
    • #include "pto_orchestration_api.h", ARG_PTR_/ARG_SIZE_ defines
    • make_tensor_external() for params/returns, make_tensor() for intermediates
    • PTOParam arrays with make_input_param/make_output_param/make_scalar_param
    • pto2_rt_submit_task() with func_id, worker type, kernel name
    • float_to_u64() helper for float scalar params
    • PTO2OrchestrationConfig via aicpu_orchestration_config()
  • OrchestrationStmtCodegenV2 visitor handling function calls, for-loops, tensor ops, scalar/bool constants, and tuple returns
  • TaskRecord-based scope analysis: tasks with all-external inputs → outer scope; others → PTO2_SCOPE(rt) { ... } inner scope
  • V2 config file generation (kernel_config_v2.py)

CCE Codegen Integration (cce_codegen.cpp, cce_codegen.h)

  • CCECodegen::Generate() now emits both V1 and V2 orchestration files (<name>.cpp + <name>_v2.cpp)
  • GenerateConfigFileV2() for PTO2 kernel config

Tests (test_orchestration_codegen.py)

  • TestOrchestrationV2 class with 4 test cases:
    • test_v2_basic_structure: V2 format, includes, ARG defines, external/intermediate tensors, PTO2_SCOPE
    • test_v2_config_file: V2 config file generation
    • test_v2_independent_tasks: all-external tasks → no PTO2_SCOPE
    • test_v2_vector_example_dag: 5-task DAG matching vector_example reference (kernel_add, kernel_add_scalar, kernel_mul), scalar params via float_to_u64, PTO2_SCOPE wrapping inner tasks

Scope Inference Algorithm

Tasks are classified based on their input tensor dependencies:

  • Outer scope: all input tensors are external (function params or return tensors)
  • Inner scope (PTO2_SCOPE(rt) { ... }): any input tensor is an intermediate (produced by another task)

Intermediate tensors produced by outer tasks are declared before the scope; inner-only intermediates are declared inside the scope for proper lifetime management.

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @zhusy54, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly upgrades the orchestration code generation capabilities by introducing a V2 system for PTO2. It features automatic scope inference for optimized intermediate tensor management, a new tensor.dim operation for extracting tensor dimensions, and a more modular architecture for host-side tensor operations. The changes ensure that both V1 and V2 orchestration outputs are generated, providing backward compatibility while enabling advanced features for the PTO2 runtime.

Highlights

  • PTO2 V2 Orchestration Codegen: Introduced a new V2 orchestration code generation targeting the PTO2Runtime API, which includes automatic PTO2_SCOPE inference for efficient intermediate tensor lifetime management.
  • Enhanced Tensor Operations: Added a new tensor.dim operation to extract shape dimensions from tensors, supporting both positive and negative indexing, and integrated it into the Python language frontend.
  • Modular Codegen Architecture: Refactored the code generation by introducing OrchestrationOpRegistry and tensor_op_codegen.cpp to modularize host-side tensor operation code generation, improving extensibility.
  • Dual Orchestration Output: Modified the CCE Codegen to emit both V1 and V2 orchestration files (.cpp and _v2.cpp) along with their respective configuration files (kernel_config.py and kernel_config_v2.py).
  • Comprehensive Testing: Added extensive unit tests for the new V2 orchestration, covering basic structure, config file generation, independent tasks, and complex DAGs with scalar parameters and scope inference.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • CMakeLists.txt
    • Added new source files for codegen_base, orchestration_op_registry, and tensor_op_codegen.
  • include/pypto/codegen/cce/cce_codegen.h
    • Declared GenerateConfigFileV2 method.
  • include/pypto/codegen/codegen_base.h
    • Added static methods TryGetVarName and GenerateExprString.
  • include/pypto/codegen/orchestration/orchestration_codegen.h
    • Declared GenerateOrchestrationV2 function.
  • include/pypto/codegen/orchestration_op_registry.h
    • Added new header defining OrchestrationOpRegistry for host-side operation codegen.
  • python/pypto/ir/op/tensor_ops.py
    • Added dim function to extract shape dimensions from tensors.
  • python/pypto/language/init.py
    • Imported and exposed the new dim function.
  • python/pypto/language/op/init.py
    • Imported and exposed the new dim function.
  • python/pypto/language/op/tensor_ops.py
    • Added dim function for Tensor objects.
  • src/codegen/cce/cce_codegen.cpp
    • Modified Generate to produce both V1 and V2 orchestration and config files.
    • Implemented GenerateConfigFileV2 for PTO2 kernel configuration.
  • src/codegen/codegen_base.cpp
    • Added new source file implementing TryGetVarName and GenerateExprString.
  • src/codegen/orchestration/orchestration_codegen.cpp
    • Updated argument extraction and device memory allocation to support multiple return tensors.
    • Refactored OrchestrationInfoCollector to handle tuple returns and intermediate tensors.
    • Introduced OrchestrationStmtCodegen for structured code generation.
    • Implemented GenerateOrchestrationV2 for PTO2 runtime API, including scope inference and task submission.
    • Introduced OrchestrationStmtCodegenV2 for V2-specific statement generation.
  • src/codegen/orchestration_op_registry.cpp
    • Added new source file implementing the OrchestrationOpRegistry singleton.
  • src/codegen/tensor_op_codegen.cpp
    • Added new source file implementing REGISTER_ORCHESTRATION_OP for tensor.create, tensor.read, and tensor.dim.
  • src/ir/op/tensor_ops/memory.cpp
    • Added DeduceTensorDimType and registered the tensor.dim operation.
  • tests/ut/codegen/test_orchestration_codegen.py
    • Added new unit tests for V2 orchestration, covering basic structure, config file, independent tasks, and a complex DAG example.
    • Added unit tests for V1 orchestration with single/tuple returns, dependencies, and tensor operations.
  • tests/ut/ir/operators/test_tensor_ops.py
    • Added tests for the new tensor.dim operation, including negative axis indexing.
    • Updated operator registration test to include tensor.dim.
  • tests/ut/language/test_unified_ops.py
    • Added a test for the promoted pl.dim function.
Activity
  • No human activity (comments, reviews) has been recorded on this pull request yet.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a significant new feature: V2 orchestration code generation with automatic scope inference for tensor lifetime management. The changes are extensive and well-structured, particularly the refactoring of orchestration codegen to use visitor patterns (OrchestrationStmtCodegen and OrchestrationStmtCodegenV2) and the introduction of an OrchestrationOpRegistry for host-side tensor operations. The addition of the tensor.dim operator and comprehensive new tests are also valuable. I've found a critical issue in the V2 codegen path for tensor.read and a use of undefined behavior in a helper function, along with a minor maintainability concern. Overall, this is a great step forward for the codegen capabilities.

Comment on lines 1103 to 1117
current_result_var_ = result_var;
auto& registry = OrchestrationOpRegistry::GetInstance();
auto codegen_func = registry.Get(op_name);
if (codegen_func.has_value()) {
std::string gen_code = (*codegen_func)(call, *this);
std::istringstream iss(gen_code);
std::string line;
while (std::getline(iss, line)) {
if (!line.empty()) {
code_ << Indent() << line << "\n";
}
}
}
}
// tensor.view, tensor.reshape etc. are metadata-only in V2 as well for now
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The V2 codegen for tensor.read reuses the registered V1 codegen function. This V1 function generates code that depends on host_<var> variables, which are not defined in the V2 codegen context (it uses arg_<var>_ptr instead). This will cause a compilation error. The tensor.read operation needs a V2-specific implementation that uses the correct host pointer variables.

    } else if (op_name == "tensor.read") {
        // V2 implementation for tensor.read, using arg_<name>_ptr instead of host_<name>
        CHECK(call->args_.size() == 2) << "tensor.read requires 2 arguments";
        std::string input_name = TryGetVarName(call->args_[0]);
        auto input_type = As<TensorType>(call->args_[0]->GetType());
        auto result_type = As<ScalarType>(call->GetType());
        auto indices_tuple = As<MakeTuple>(call->args_[1]);
        CHECK(input_name != "" && input_type && result_type && indices_tuple);

        std::ostringstream idx_oss;
        for (size_t i = 0; i < indices_tuple->elements_.size(); ++i) {
            if (i > 0) idx_oss << " + ";
            idx_oss << GenerateExprString(indices_tuple->elements_[i]);
            for (size_t j = i + 1; j < input_type->shape_.size(); ++j) {
                idx_oss << " * " << GenerateExprString(input_type->shape_[j]);
            }
        }

        code_ << Indent() << "size_t idx_" << result_var << " = " << (idx_oss.str().empty() ? "0" : idx_oss.str()) << ";\n";
        code_ << Indent() << result_type->dtype_.ToCTypeString() << " " << result_var << " = static_cast<"
              << result_type->dtype_.ToCTypeString() << "*>(arg_" << input_name << "_ptr)[idx_" << result_var << "];\n";
    }

Comment on lines +878 to +887
oss << "// Helper to encode float as uint64_t for scalar params\n";
oss << "static uint64_t float_to_u64(float f) {\n";
oss << " union {\n";
oss << " float f32;\n";
oss << " uint64_t u64;\n";
oss << " } conv;\n";
oss << " conv.u64 = 0; // Clear upper bits\n";
oss << " conv.f32 = f;\n";
oss << " return conv.u64;\n";
oss << "}\n\n";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The float_to_u64 function uses a union for type-punning, which is undefined behavior in C++. While it might appear to work on some compilers, it's not guaranteed to be portable or correct. A safer, standard-compliant way to perform this bit-level conversion is to use std::memcpy. Please also ensure #include <cstring> is present in the file.

static uint64_t float_to_u64(float f) {
    uint32_t u32_val;
    static_assert(sizeof(float) == sizeof(uint32_t), "float and uint32_t must have the same size");
    std::memcpy(&u32_val, &f, sizeof(float));
    return static_cast<uint64_t>(u32_val);
}

Comment on lines +1440 to +1451
// Re-indent the task code: replace leading 4-space indent with 8-space
std::istringstream iss(task_records[tid].code);
std::string line;
while (std::getline(iss, line)) {
if (line.empty()) {
oss << "\n";
} else if (line.substr(0, 4) == " ") {
oss << " " << line.substr(4) << "\n";
} else {
oss << " " << line << "\n";
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The logic for re-indenting the task code for the inner PTO2_SCOPE is fragile. It relies on string manipulation (line.substr(0, 4) == " ") which assumes that the task code generated by OrchestrationStmtCodegenV2 will always have a 4-space indent. If that indentation changes in the future, this logic will break or produce incorrectly formatted code. Consider making this more robust, for example by having the task generation logic not produce any indentation and letting the caller handle it completely.

@zhusy54 zhusy54 changed the title feat(codegen): Add PTO2 V2 orchestration codegen with scope inference [WIP] feat(codegen): Add PTO2 V2 orchestration codegen with scope inference Feb 12, 2026
…sole version

Delete V1 orchestration code generation (Runtime API, BuildXXX signature,
device_malloc, add_task/add_successor pattern) and rename V2 as the only
implementation. Update tensor op registry to emit V2-style code
(make_tensor, data pointer via GetTensorDataPtr). Rewrite tests to cover
the unified orchestration format with new cases for tuple intermediates,
tuple outputs, tensor.create, and tensor.dim.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant