Feature Request: [CANN] backend supports Ascend 310P #10160
Description
Prerequisites
- I am running the latest code. Mention the version if possible as well.
- I carefully followed the README.md.
- I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
- I reviewed the Discussions, and have a new and useful enhancement to share.
Feature Description
CANN backend supports Ascend 310P inference accelerator card. Currently, llama.cpp already supports Ascend 910B. However, some APIs of Ascend 910B are different from those of 310P, so they need to be adapted in CANN backend implementation.
Motivation
Compare to Ascend 910, Ascend 310 focuses on power-efficient inference on edge devices, The basic information as following:
Inference-Oriented: The 310P is optimized for inference tasks, focusing more on efficient and low-power operations, rather than the computational intensity required for training.
- Lower Throughput: While it supports similar operators for inference tasks (e.g., convolution, activation functions, pooling, etc.), it is not as heavily optimized for large-scale parallelism and training tasks. The 310P focuses on executing pre-trained models with lower computational demands.
Possible Implementation
The CANN backend adapts to 310P and maintains compatibility with 910B.