Skip to content

CANN: Implement GLU ops #14884

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jul 26, 2025
Merged

CANN: Implement GLU ops #14884

merged 1 commit into from
Jul 26, 2025

Conversation

hipudding
Copy link
Collaborator

Implement REGLU, GEGLU, SWIGLU ops according to #14158

Make sure to read the contributing guidelines before submitting a PR

@github-actions github-actions bot added ggml changes relating to the ggml tensor library for machine learning Ascend NPU issues specific to Ascend NPUs labels Jul 26, 2025
Implement REGLU, GEGLU, SWIGLU ops according to ggml-org#14158
@hipudding
Copy link
Collaborator Author

Testing 2 devices
Backend 1/2: CANN0
Device description: Ascend910B4
Device memory: 30196 MB (29819 MB free)
REGLU(type=f16,ne_a=[128,2,2,2],v=0,swapped=0): OK
REGLU(type=f16,ne_a=[5,7,11,13],v=0,swapped=0): OK
REGLU(type=f16,ne_a=[128,2,2,2],v=0,swapped=1): OK
REGLU(type=f16,ne_a=[5,7,11,13],v=0,swapped=1): OK
REGLU(type=f16,ne_a=[128,2,2,2],v=0,split): OK
REGLU(type=f16,ne_a=[5,7,11,13],v=0,split): OK
REGLU(type=f16,ne_a=[128,2,2,2],v=1,swapped=0): OK
REGLU(type=f16,ne_a=[5,7,11,13],v=1,swapped=0): OK
REGLU(type=f16,ne_a=[128,2,2,2],v=1,swapped=1): OK
REGLU(type=f16,ne_a=[5,7,11,13],v=1,swapped=1): OK
REGLU(type=f16,ne_a=[128,2,2,2],v=1,split): OK
REGLU(type=f16,ne_a=[5,7,11,13],v=1,split): OK
REGLU(type=f32,ne_a=[128,2,2,2],v=0,swapped=0): OK
REGLU(type=f32,ne_a=[5,7,11,13],v=0,swapped=0): OK
REGLU(type=f32,ne_a=[128,2,2,2],v=0,swapped=1): OK
REGLU(type=f32,ne_a=[5,7,11,13],v=0,swapped=1): OK
REGLU(type=f32,ne_a=[128,2,2,2],v=0,split): OK
REGLU(type=f32,ne_a=[5,7,11,13],v=0,split): OK
REGLU(type=f32,ne_a=[128,2,2,2],v=1,swapped=0): OK
REGLU(type=f32,ne_a=[5,7,11,13],v=1,swapped=0): OK
REGLU(type=f32,ne_a=[128,2,2,2],v=1,swapped=1): OK
REGLU(type=f32,ne_a=[5,7,11,13],v=1,swapped=1): OK
REGLU(type=f32,ne_a=[128,2,2,2],v=1,split): OK
REGLU(type=f32,ne_a=[5,7,11,13],v=1,split): OK
GEGLU(type=f16,ne_a=[128,2,2,2],v=0,swapped=0): OK
GEGLU(type=f16,ne_a=[5,7,11,13],v=0,swapped=0): OK
GEGLU(type=f16,ne_a=[128,2,2,2],v=0,swapped=1): OK
GEGLU(type=f16,ne_a=[5,7,11,13],v=0,swapped=1): OK
GEGLU(type=f16,ne_a=[128,2,2,2],v=0,split): OK
GEGLU(type=f16,ne_a=[5,7,11,13],v=0,split): OK
GEGLU(type=f16,ne_a=[128,2,2,2],v=1,swapped=0): OK
GEGLU(type=f16,ne_a=[5,7,11,13],v=1,swapped=0): OK
GEGLU(type=f16,ne_a=[128,2,2,2],v=1,swapped=1): OK
GEGLU(type=f16,ne_a=[5,7,11,13],v=1,swapped=1): OK
GEGLU(type=f16,ne_a=[128,2,2,2],v=1,split): OK
GEGLU(type=f16,ne_a=[5,7,11,13],v=1,split): OK
GEGLU(type=f32,ne_a=[128,2,2,2],v=0,swapped=0): OK
GEGLU(type=f32,ne_a=[5,7,11,13],v=0,swapped=0): OK
GEGLU(type=f32,ne_a=[128,2,2,2],v=0,swapped=1): OK
GEGLU(type=f32,ne_a=[5,7,11,13],v=0,swapped=1): OK
GEGLU(type=f32,ne_a=[128,2,2,2],v=0,split): OK
GEGLU(type=f32,ne_a=[5,7,11,13],v=0,split): OK
GEGLU(type=f32,ne_a=[128,2,2,2],v=1,swapped=0): OK
GEGLU(type=f32,ne_a=[5,7,11,13],v=1,swapped=0): OK
GEGLU(type=f32,ne_a=[128,2,2,2],v=1,swapped=1): OK
GEGLU(type=f32,ne_a=[5,7,11,13],v=1,swapped=1): OK
GEGLU(type=f32,ne_a=[128,2,2,2],v=1,split): OK
GEGLU(type=f32,ne_a=[5,7,11,13],v=1,split): OK
SWIGLU(type=f16,ne_a=[128,2,2,2],v=0,swapped=0): OK
SWIGLU(type=f16,ne_a=[5,7,11,13],v=0,swapped=0): OK
SWIGLU(type=f16,ne_a=[128,2,2,2],v=0,swapped=1): OK
SWIGLU(type=f16,ne_a=[5,7,11,13],v=0,swapped=1): OK
SWIGLU(type=f16,ne_a=[128,2,2,2],v=0,split): OK
SWIGLU(type=f16,ne_a=[5,7,11,13],v=0,split): OK
SWIGLU(type=f16,ne_a=[128,2,2,2],v=1,swapped=0): OK
SWIGLU(type=f16,ne_a=[5,7,11,13],v=1,swapped=0): OK
SWIGLU(type=f16,ne_a=[128,2,2,2],v=1,swapped=1): OK
SWIGLU(type=f16,ne_a=[5,7,11,13],v=1,swapped=1): OK
SWIGLU(type=f16,ne_a=[128,2,2,2],v=1,split): OK
SWIGLU(type=f16,ne_a=[5,7,11,13],v=1,split): OK
SWIGLU(type=f32,ne_a=[128,2,2,2],v=0,swapped=0): OK
SWIGLU(type=f32,ne_a=[5,7,11,13],v=0,swapped=0): OK
SWIGLU(type=f32,ne_a=[128,2,2,2],v=0,swapped=1): OK
SWIGLU(type=f32,ne_a=[5,7,11,13],v=0,swapped=1): OK
SWIGLU(type=f32,ne_a=[128,2,2,2],v=0,split): OK
SWIGLU(type=f32,ne_a=[5,7,11,13],v=0,split): OK
SWIGLU(type=f32,ne_a=[128,2,2,2],v=1,swapped=0): OK
SWIGLU(type=f32,ne_a=[5,7,11,13],v=1,swapped=0): OK
SWIGLU(type=f32,ne_a=[128,2,2,2],v=1,swapped=1): OK
SWIGLU(type=f32,ne_a=[5,7,11,13],v=1,swapped=1): OK
SWIGLU(type=f32,ne_a=[128,2,2,2],v=1,split): OK
SWIGLU(type=f32,ne_a=[5,7,11,13],v=1,split): OK
GEGLU_ERF(type=f16,ne_a=[128,2,2,2],v=0,swapped=0): OK
GEGLU_ERF(type=f16,ne_a=[5,7,11,13],v=0,swapped=0): OK
GEGLU_ERF(type=f16,ne_a=[128,2,2,2],v=0,swapped=1): OK
GEGLU_ERF(type=f16,ne_a=[5,7,11,13],v=0,swapped=1): OK
GEGLU_ERF(type=f16,ne_a=[128,2,2,2],v=0,split): OK
GEGLU_ERF(type=f16,ne_a=[5,7,11,13],v=0,split): OK
GEGLU_ERF(type=f16,ne_a=[128,2,2,2],v=1,swapped=0): OK
GEGLU_ERF(type=f16,ne_a=[5,7,11,13],v=1,swapped=0): OK
GEGLU_ERF(type=f16,ne_a=[128,2,2,2],v=1,swapped=1): OK
GEGLU_ERF(type=f16,ne_a=[5,7,11,13],v=1,swapped=1): OK
GEGLU_ERF(type=f16,ne_a=[128,2,2,2],v=1,split): OK
GEGLU_ERF(type=f16,ne_a=[5,7,11,13],v=1,split): OK
GEGLU_ERF(type=f32,ne_a=[128,2,2,2],v=0,swapped=0): OK
GEGLU_ERF(type=f32,ne_a=[5,7,11,13],v=0,swapped=0): OK
GEGLU_ERF(type=f32,ne_a=[128,2,2,2],v=0,swapped=1): OK
GEGLU_ERF(type=f32,ne_a=[5,7,11,13],v=0,swapped=1): OK
GEGLU_ERF(type=f32,ne_a=[128,2,2,2],v=0,split): OK
GEGLU_ERF(type=f32,ne_a=[5,7,11,13],v=0,split): OK
GEGLU_ERF(type=f32,ne_a=[128,2,2,2],v=1,swapped=0): OK
GEGLU_ERF(type=f32,ne_a=[5,7,11,13],v=1,swapped=0): OK
GEGLU_ERF(type=f32,ne_a=[128,2,2,2],v=1,swapped=1): OK
GEGLU_ERF(type=f32,ne_a=[5,7,11,13],v=1,swapped=1): OK
GEGLU_ERF(type=f32,ne_a=[128,2,2,2],v=1,split): OK
GEGLU_ERF(type=f32,ne_a=[5,7,11,13],v=1,split): OK
GEGLU_QUICK(type=f16,ne_a=[128,2,2,2],v=0,swapped=0): OK
GEGLU_QUICK(type=f16,ne_a=[5,7,11,13],v=0,swapped=0): OK
GEGLU_QUICK(type=f16,ne_a=[128,2,2,2],v=0,swapped=1): OK
GEGLU_QUICK(type=f16,ne_a=[5,7,11,13],v=0,swapped=1): OK
GEGLU_QUICK(type=f16,ne_a=[128,2,2,2],v=0,split): OK
GEGLU_QUICK(type=f16,ne_a=[5,7,11,13],v=0,split): OK
GEGLU_QUICK(type=f16,ne_a=[128,2,2,2],v=1,swapped=0): OK
GEGLU_QUICK(type=f16,ne_a=[5,7,11,13],v=1,swapped=0): OK
GEGLU_QUICK(type=f16,ne_a=[128,2,2,2],v=1,swapped=1): OK
GEGLU_QUICK(type=f16,ne_a=[5,7,11,13],v=1,swapped=1): OK
GEGLU_QUICK(type=f16,ne_a=[128,2,2,2],v=1,split): OK
GEGLU_QUICK(type=f16,ne_a=[5,7,11,13],v=1,split): OK
GEGLU_QUICK(type=f32,ne_a=[128,2,2,2],v=0,swapped=0): OK
GEGLU_QUICK(type=f32,ne_a=[5,7,11,13],v=0,swapped=0): OK
GEGLU_QUICK(type=f32,ne_a=[128,2,2,2],v=0,swapped=1): OK
GEGLU_QUICK(type=f32,ne_a=[5,7,11,13],v=0,swapped=1): OK
GEGLU_QUICK(type=f32,ne_a=[128,2,2,2],v=0,split): OK
GEGLU_QUICK(type=f32,ne_a=[5,7,11,13],v=0,split): OK
GEGLU_QUICK(type=f32,ne_a=[128,2,2,2],v=1,swapped=0): OK
GEGLU_QUICK(type=f32,ne_a=[5,7,11,13],v=1,swapped=0): OK
GEGLU_QUICK(type=f32,ne_a=[128,2,2,2],v=1,swapped=1): OK
GEGLU_QUICK(type=f32,ne_a=[5,7,11,13],v=1,swapped=1): OK
GEGLU_QUICK(type=f32,ne_a=[128,2,2,2],v=1,split): OK
GEGLU_QUICK(type=f32,ne_a=[5,7,11,13],v=1,split): OK
Backend CANN0: OK
Backend 2/2: CPU
Skipping
2/2 backends passed

@noemotiovon
Copy link
Contributor

LGTM!

@hipudding hipudding requested review from CISC, slaren and ggerganov July 26, 2025 09:12
@hipudding hipudding self-assigned this Jul 26, 2025
@CISC
Copy link
Collaborator

CISC commented Jul 26, 2025

LGTM, but I can't test.

@CISC CISC removed their request for review July 26, 2025 09:16
@hipudding hipudding merged commit 11dd5a4 into ggml-org:master Jul 26, 2025
48 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Ascend NPU issues specific to Ascend NPUs ggml changes relating to the ggml tensor library for machine learning
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants