Skip to content

[NPU]RFC: Ascend CI Integration #1022

@noemotiovon

Description

@noemotiovon

Summary

This RFC proposes integrating a Continuous Integration (CI) system for Ascend NPU into the Liger-Kernel project to continuously monitor the support status of operators in the ops/ directory on Ascend devices, ensuring code quality and functional correctness.

Background and Motivation

Current Status

Liger-Kernel already supports CI for multiple hardware platforms:

  • NVIDIA GPU (CUDA) - via Modal CI
  • AMD GPU (ROCm) - via self-hosted runner
  • Intel GPU (XPU) - via self-hosted runner

Ascend NPU support has been implemented at the code level (setup.py platform detection, backends/_ascend/ backend architecture), but lacks automated CI coverage.

Completed Work

Currently, 24 operators require adaptation for Ascend devices, of which 18 operators have passed accuracy verification.

Kernel Name File Path Accuracy Description
Cross Entropy cross_entropy.py Cross-entropy loss function
DyT dyt.py DyT normalization operation
Embedding experimental/embedding.py 🟡 Embedding layer (experimental)
Fused Add RMS Norm fused_add_rms_norm.py Fused Add + RMS Norm
Fused Linear Cross Entropy fused_linear_cross_entropy.py 🟡 Fused Linear + Cross Entropy
Fused Linear JSD fused_linear_jsd.py 🟡 Fused Linear + JSD
Fused Neighborhood Attention fused_neighborhood_attention.py 🟡 Fused neighborhood attention
GEGLU geglu.py GELU gated linear unit
Group Norm group_norm.py Group normalization
GRPO Loss grpo_loss.py GRPO loss function
JSD jsd.py 🟡 Jensen-Shannon divergence
KL Div kl_div.py KL divergence loss
Layer Norm layer_norm.py Layer normalization
Llama4 ROPE llama4_rope.py 🟡 Llama4 rotary position encoding
Multi Token Attention multi_token_attention.py 🟡 Multi-token attention
Poly Norm poly_norm.py Polynomial normalization
Qwen2VL MRope qwen2vl_mrope.py Qwen2VL multi-rotary position encoding
RMS Norm rms_norm.py RMS normalization
ROPE rope.py Rotary position encoding
Softmax softmax.py Softmax activation function
Sparsemax sparsemax.py Sparsemax activation function
SWIGLU swiglu.py SiLU gated linear unit
Tiled MLP tiled_mlp.py Tiled MLP
TVD tvd.py TVD loss function

Note: ✅ Accuracy verified, 🟡 In progress.

Problem Statement

The correctness of Ascend operators currently relies on manual testing, lacking automated CI coverage. This creates regression risks, and community contributors cannot quickly verify the correctness of Ascend-related modifications.

Proposal

Objectives

Establish an Ascend CI pipeline by adding an Ascend CI workflow to GitHub Actions, continuously verifying operator correctness, automatically triggering tests on code changes, and promptly detecting regression issues.

Technical Solution

1. CI Workflow Design

Following the existing Intel and AMD CI configurations, create .github/workflows/ascend-ci.yml:

name: Ascend NPU

on:
  push:
    branches:
      - main
    paths:
      - "src/**"
      - "test/**"
  pull_request:
    branches:
      - main
    paths:
      - "src/**"
      - "test/**"
  workflow_dispatch:  # Enable manual trigger

concurrency:
  group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}
  cancel-in-progress: true

jobs:
  checkstyle:
    runs-on: ubuntu-latest
    steps:
    - name: Checkout code
      uses: actions/checkout@v6

    - name: Set up Python
      uses: actions/setup-python@v6
      with:
        python-version: '3.10'

    - name: Install dependencies
      run: |
        python -m pip install --upgrade pip
        pip install -r dev/fmt-requirements.txt

    - name: Run checkstyle
      run: make checkstyle

  tests:
    runs-on: [self-hosted, ascend-npu]  # Requires Ascend self-hosted runner configuration
    needs: [checkstyle]  # Wait for checkstyle job to complete
    if: success()  # Only run tests when checkstyle passes
    container:
      image: ascend/ascend-toolkit:latest  # Ascend container image with CANN, torch_npu, triton-ascend pre-installed
      options: --privileged -v /dev/davinci_manager:/dev/davinci_manager -v /dev/devmm_svm:/dev/devmm_svm -v /dev/hisi_hdc:/dev/hisi_hdc --ipc=host
    steps:
    - name: Checkout code
      uses: actions/checkout@v6

    - name: Set up Python
      shell: bash
      run: |
        # Python should be pre-installed in the container image
        python --version

    - name: Verify NPU availability
      shell: bash
      run: |
        npu-smi info
        echo "NPU devices available"

    - name: Setup Dependencies
      shell: bash
      run: |
        python -m pip install --upgrade pip
        pip install -e .[dev]
        # torch_npu and triton-ascend are pre-installed in the container image

    - name: List Python Environments
      shell: bash
      run: python -m pip list

    - name: Run Unit Tests
      shell: bash
      run: |
        # Run tests only after checkstyle passes
        # Initial phase: Run only test cases in test/transformers directory
        python -m pytest test/transformers/ --disable-warnings
        # Future phases will expand to full test suite:
        # make test
        # make test-convergence

Note: The actual configuration needs to be adjusted based on the specific Ascend runner environment, particularly:

  • Runner label configuration (self-hosted, ascend-npu)
  • Container image with CANN toolkit, torch_npu, and triton-ascend pre-installed
  • Container options for NPU device access (device mounting, privileged mode, etc.)

2. Runtime Environment Requirements

  • Hardware: Ascend NPU device (e.g., Atlas 800I A2)
  • Container Image: Pre-built Docker image containing:
    • Python 3.10+
    • PyTorch with torch_npu (2.7.1)
    • triton-ascend
    • Ascend CANN toolkit
  • Container Configuration: Privileged mode and device mounting for NPU access

3. Test Scope

Initial Phase: CI will execute test cases in the test/transformers directory to verify the forward and backward propagation correctness of Ascend-adapted operators.

Future Phases: Gradually expand to the full test suite:

  • Unit Tests (make test): Verify all Ascend-adapted operators
  • Convergence Tests (make test-convergence): Verify operator convergence on complete models

Code Style Check (make checkstyle) runs as an independent job at all times.

Implementation Plan

  1. Infrastructure Preparation: Configure Ascend self-hosted runner, install CANN toolkit and PyTorch + torch_npu environment
  2. CI Workflow Development: Create .github/workflows/ascend-ci.yml, implement environment setup, dependency installation, and test execution

Technical Details

Runner Configuration

Self-hosted runner configuration is required (GitHub-hosted runners do not provide Ascend hardware):

  • Labels: ascend-npu or ascend-910b4
  • Operating System: Ubuntu 20.04/22.04 (according to CANN requirements)
  • Hardware: At least one Ascend device (e.g., Atlas 800I A2)

Environment Variables

CANN-related environment variables need to be set, such as ASCEND_RT_VISIBLE_DEVICES, etc.

Testing Strategy

  • Test only Ascend-adapted operators to avoid CI failures from unadapted operators
  • Use pytest.mark.skip or xfail for known issues
  • If multiple NPUs are available, tests can be executed in parallel

Future Work

  • Integrate performance benchmarks in CI to track operator performance changes
  • Extend to more Ascend device models (e.g., Ascend310, Ascend910A, etc.)
  • Continuously adapt more operators to improve Ascend support coverage
  • Supplement Ascend usage documentation, best practices, and troubleshooting guides

Conclusion

Integrating Ascend CI will significantly improve Liger-Kernel's support quality for the Ascend platform, ensuring that code changes do not break Ascend operator functionality. Although some infrastructure investment is required, it will greatly reduce maintenance costs and improve community collaboration efficiency in the long run.

Related Resources

Alternatives

No response

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions