-
Notifications
You must be signed in to change notification settings - Fork 483
Description
Summary
This RFC proposes integrating a Continuous Integration (CI) system for Ascend NPU into the Liger-Kernel project to continuously monitor the support status of operators in the ops/ directory on Ascend devices, ensuring code quality and functional correctness.
Background and Motivation
Current Status
Liger-Kernel already supports CI for multiple hardware platforms:
- NVIDIA GPU (CUDA) - via Modal CI
- AMD GPU (ROCm) - via self-hosted runner
- Intel GPU (XPU) - via self-hosted runner
Ascend NPU support has been implemented at the code level (setup.py platform detection, backends/_ascend/ backend architecture), but lacks automated CI coverage.
Completed Work
Currently, 24 operators require adaptation for Ascend devices, of which 18 operators have passed accuracy verification.
| Kernel Name | File Path | Accuracy | Description |
|---|---|---|---|
| Cross Entropy | cross_entropy.py |
✅ | Cross-entropy loss function |
| DyT | dyt.py |
✅ | DyT normalization operation |
| Embedding | experimental/embedding.py |
🟡 | Embedding layer (experimental) |
| Fused Add RMS Norm | fused_add_rms_norm.py |
✅ | Fused Add + RMS Norm |
| Fused Linear Cross Entropy | fused_linear_cross_entropy.py |
🟡 | Fused Linear + Cross Entropy |
| Fused Linear JSD | fused_linear_jsd.py |
🟡 | Fused Linear + JSD |
| Fused Neighborhood Attention | fused_neighborhood_attention.py |
🟡 | Fused neighborhood attention |
| GEGLU | geglu.py |
✅ | GELU gated linear unit |
| Group Norm | group_norm.py |
✅ | Group normalization |
| GRPO Loss | grpo_loss.py |
✅ | GRPO loss function |
| JSD | jsd.py |
🟡 | Jensen-Shannon divergence |
| KL Div | kl_div.py |
✅ | KL divergence loss |
| Layer Norm | layer_norm.py |
✅ | Layer normalization |
| Llama4 ROPE | llama4_rope.py |
🟡 | Llama4 rotary position encoding |
| Multi Token Attention | multi_token_attention.py |
🟡 | Multi-token attention |
| Poly Norm | poly_norm.py |
✅ | Polynomial normalization |
| Qwen2VL MRope | qwen2vl_mrope.py |
✅ | Qwen2VL multi-rotary position encoding |
| RMS Norm | rms_norm.py |
✅ | RMS normalization |
| ROPE | rope.py |
✅ | Rotary position encoding |
| Softmax | softmax.py |
✅ | Softmax activation function |
| Sparsemax | sparsemax.py |
✅ | Sparsemax activation function |
| SWIGLU | swiglu.py |
✅ | SiLU gated linear unit |
| Tiled MLP | tiled_mlp.py |
✅ | Tiled MLP |
| TVD | tvd.py |
✅ | TVD loss function |
Note: ✅ Accuracy verified, 🟡 In progress.
Problem Statement
The correctness of Ascend operators currently relies on manual testing, lacking automated CI coverage. This creates regression risks, and community contributors cannot quickly verify the correctness of Ascend-related modifications.
Proposal
Objectives
Establish an Ascend CI pipeline by adding an Ascend CI workflow to GitHub Actions, continuously verifying operator correctness, automatically triggering tests on code changes, and promptly detecting regression issues.
Technical Solution
1. CI Workflow Design
Following the existing Intel and AMD CI configurations, create .github/workflows/ascend-ci.yml:
name: Ascend NPU
on:
push:
branches:
- main
paths:
- "src/**"
- "test/**"
pull_request:
branches:
- main
paths:
- "src/**"
- "test/**"
workflow_dispatch: # Enable manual trigger
concurrency:
group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}
cancel-in-progress: true
jobs:
checkstyle:
runs-on: ubuntu-latest
steps:
- name: Checkout code
uses: actions/checkout@v6
- name: Set up Python
uses: actions/setup-python@v6
with:
python-version: '3.10'
- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install -r dev/fmt-requirements.txt
- name: Run checkstyle
run: make checkstyle
tests:
runs-on: [self-hosted, ascend-npu] # Requires Ascend self-hosted runner configuration
needs: [checkstyle] # Wait for checkstyle job to complete
if: success() # Only run tests when checkstyle passes
container:
image: ascend/ascend-toolkit:latest # Ascend container image with CANN, torch_npu, triton-ascend pre-installed
options: --privileged -v /dev/davinci_manager:/dev/davinci_manager -v /dev/devmm_svm:/dev/devmm_svm -v /dev/hisi_hdc:/dev/hisi_hdc --ipc=host
steps:
- name: Checkout code
uses: actions/checkout@v6
- name: Set up Python
shell: bash
run: |
# Python should be pre-installed in the container image
python --version
- name: Verify NPU availability
shell: bash
run: |
npu-smi info
echo "NPU devices available"
- name: Setup Dependencies
shell: bash
run: |
python -m pip install --upgrade pip
pip install -e .[dev]
# torch_npu and triton-ascend are pre-installed in the container image
- name: List Python Environments
shell: bash
run: python -m pip list
- name: Run Unit Tests
shell: bash
run: |
# Run tests only after checkstyle passes
# Initial phase: Run only test cases in test/transformers directory
python -m pytest test/transformers/ --disable-warnings
# Future phases will expand to full test suite:
# make test
# make test-convergenceNote: The actual configuration needs to be adjusted based on the specific Ascend runner environment, particularly:
- Runner label configuration (
self-hosted, ascend-npu) - Container image with CANN toolkit, torch_npu, and triton-ascend pre-installed
- Container options for NPU device access (device mounting, privileged mode, etc.)
2. Runtime Environment Requirements
- Hardware: Ascend NPU device (e.g., Atlas 800I A2)
- Container Image: Pre-built Docker image containing:
- Python 3.10+
- PyTorch with torch_npu (2.7.1)
- triton-ascend
- Ascend CANN toolkit
- Container Configuration: Privileged mode and device mounting for NPU access
3. Test Scope
Initial Phase: CI will execute test cases in the test/transformers directory to verify the forward and backward propagation correctness of Ascend-adapted operators.
Future Phases: Gradually expand to the full test suite:
- Unit Tests (
make test): Verify all Ascend-adapted operators - Convergence Tests (
make test-convergence): Verify operator convergence on complete models
Code Style Check (make checkstyle) runs as an independent job at all times.
Implementation Plan
- Infrastructure Preparation: Configure Ascend self-hosted runner, install CANN toolkit and PyTorch + torch_npu environment
- CI Workflow Development: Create
.github/workflows/ascend-ci.yml, implement environment setup, dependency installation, and test execution
Technical Details
Runner Configuration
Self-hosted runner configuration is required (GitHub-hosted runners do not provide Ascend hardware):
- Labels:
ascend-npuorascend-910b4 - Operating System: Ubuntu 20.04/22.04 (according to CANN requirements)
- Hardware: At least one Ascend device (e.g., Atlas 800I A2)
Environment Variables
CANN-related environment variables need to be set, such as ASCEND_RT_VISIBLE_DEVICES, etc.
Testing Strategy
- Test only Ascend-adapted operators to avoid CI failures from unadapted operators
- Use
pytest.mark.skiporxfailfor known issues - If multiple NPUs are available, tests can be executed in parallel
Future Work
- Integrate performance benchmarks in CI to track operator performance changes
- Extend to more Ascend device models (e.g., Ascend310, Ascend910A, etc.)
- Continuously adapt more operators to improve Ascend support coverage
- Supplement Ascend usage documentation, best practices, and troubleshooting guides
Conclusion
Integrating Ascend CI will significantly improve Liger-Kernel's support quality for the Ascend platform, ensuring that code changes do not break Ascend operator functionality. Although some infrastructure investment is required, it will greatly reduce maintenance costs and improve community collaboration efficiency in the long run.
Related Resources
Alternatives
No response
Additional context
No response