[CANN]: add the basic supports of Flash Attention kernel #13627

shibizhao · 2025-05-19T05:53:01Z

Dear authors,

This PR enhances the CANN backend with the FA kernel. Currently, it can only support the F16 KV tensors and no logit softcap. We have tested the kernel on Ascend 910B using test-op-backends.

Thanks.

shibizhao · 2025-05-20T06:25:28Z

BTW, we only test them on 910B. In our school, the CANN environment for 310P server is 7.x, so we cannot compile the llama.cpp with CANN backend.

hipudding · 2025-05-20T11:13:25Z

BTW, we only test them on 910B. In our school, the CANN environment for 310P server is 7.x, so we cannot compile the llama.cpp with CANN backend.

We can test it with 310P.

shibizhao · 2025-05-21T02:16:55Z

Evaluation Report on Ascend 910B + Kunpeng 920

Authors from Peking University: Bizhao Shi, Yuxin Yang, Ruiyang Ma, Guojie Luo

Llama-7B-f16

Scripts

./build/bin/llama-batched-bench -c 65536 -m ~/models/LLM-Research/Llama-2-7B_f16.gguf -npp 128,256,512 -ntg 128,256,512 -npl 1,2,4,8,16,32,64 --split-mode none --main-gpu 0 -ngl 999 [-fa]

With FA

PP	TG	B	N_KV	T_PP s	S_PP t/s	T_TG s	S_TG t/s	T s	S t/s
128	128	1	256	2.611	49.02	3.437	37.24	6.048	42.33
128	128	2	512	3.234	79.15	4.566	56.07	7.800	65.64
128	128	4	1024	4.717	108.55	6.290	81.40	11.006	93.04
128	128	8	2048	7.833	130.73	9.551	107.21	17.384	117.81
128	128	16	4096	13.672	149.80	16.037	127.71	29.708	137.87
128	128	32	8192	25.933	157.95	28.974	141.37	54.907	149.20
128	128	64	16384	50.124	163.43	55.615	147.30	105.739	154.95
128	256	1	384	2.479	51.64	6.931	36.93	9.410	40.81
128	256	2	768	3.234	79.17	9.256	55.31	12.490	61.49
128	256	4	1536	4.692	109.13	12.719	80.51	17.411	88.22
128	256	8	3072	7.699	133.01	19.338	105.91	27.037	113.62
128	256	16	6144	13.755	148.89	32.652	125.44	46.407	132.40
128	256	32	12288	25.822	158.62	59.660	137.31	85.483	143.75
128	256	64	24576	50.207	163.16	116.131	141.08	166.338	147.75
128	512	1	640	2.485	51.51	13.985	36.61	16.470	38.86
128	512	2	1280	3.196	80.11	18.666	54.86	21.861	58.55
128	512	4	2560	4.709	108.72	25.902	79.07	30.611	83.63
128	512	8	5120	7.694	133.10	39.781	102.96	47.474	107.85
128	512	16	10240	13.722	149.25	68.060	120.36	81.781	125.21
128	512	32	20480	25.824	158.62	126.612	129.40	152.436	134.35
128	512	64	40960	50.331	162.76	249.659	131.25	299.991	136.54
256	128	1	384	3.253	78.70	3.527	36.29	6.780	56.63
256	128	2	768	4.687	109.24	5.063	50.57	9.750	78.77
256	128	4	1536	7.689	133.17	6.436	79.55	14.125	108.74
256	128	8	3072	13.735	149.11	9.771	104.80	23.506	130.69
256	128	16	6144	25.776	158.91	16.527	123.92	42.303	145.24
256	128	32	12288	50.121	163.44	30.283	135.26	80.404	152.83
256	128	64	24576	99.680	164.37	59.057	138.71	158.737	154.82
256	256	1	512	3.277	78.12	7.003	36.55	10.280	49.81
256	256	2	1024	4.720	108.47	9.356	54.72	14.076	72.75
256	256	4	2048	7.905	129.54	12.960	79.01	20.865	98.16
256	256	8	4096	13.704	149.45	19.782	103.53	33.486	122.32
256	256	16	8192	25.773	158.92	33.657	121.70	59.430	137.84
256	256	32	16384	50.156	163.33	62.378	131.33	112.534	145.59
256	256	64	32768	99.887	164.02	122.710	133.52	222.597	147.21
256	512	1	768	3.278	78.10	14.091	36.33	17.369	44.22
256	512	2	1536	4.698	108.97	18.941	54.06	23.639	64.98
256	512	4	3072	7.722	132.60	26.353	77.71	34.075	90.15
256	512	8	6144	13.717	149.31	40.608	100.87	54.325	113.10
256	512	16	12288	25.824	158.61	70.141	116.79	95.965	128.05
256	512	32	24576	50.250	163.03	132.627	123.53	182.877	134.39
256	512	64	49152	100.315	163.32	264.479	123.90	364.795	134.74
512	128	1	640	4.751	107.76	3.569	35.86	8.321	76.92
512	128	2	1280	7.707	132.87	4.811	53.21	12.518	102.25
512	128	4	2560	13.720	149.27	6.666	76.81	20.386	125.58
512	128	8	5120	25.769	158.95	10.230	100.10	35.999	142.23
512	128	16	10240	50.174	163.27	17.592	116.42	67.766	151.11
512	128	32	20480	99.779	164.20	32.819	124.80	132.598	154.45
512	128	64	40960	201.561	162.57	64.568	126.87	266.129	153.91
512	256	1	768	4.701	108.92	7.126	35.93	11.826	64.94
512	256	2	1536	7.675	133.43	9.659	53.01	17.333	88.61
512	256	4	3072	13.732	149.14	13.344	76.74	27.076	113.46
512	256	8	6144	25.798	158.77	20.720	98.84	46.518	132.08
512	256	16	12288	50.158	163.32	35.775	114.49	85.934	142.99
512	256	32	24576	99.680	164.37	67.347	121.64	167.027	147.14
512	256	64	49152	202.263	162.01	133.739	122.51	336.001	146.29
512	512	1	1024	4.738	108.06	14.405	35.54	19.143	53.49
512	512	2	2048	7.701	132.96	19.593	52.26	27.294	75.03
512	512	4	4096	13.742	149.03	27.127	75.50	40.869	100.22
512	512	8	8192	25.772	158.93	42.481	96.42	68.253	120.02
512	512	16	16384	50.185	163.24	74.460	110.02	124.644	131.45
512	512	32	32768	100.074	163.72	142.046	115.34	242.121	135.34
512	512	64	65536	203.587	160.95	287.718	113.89	491.306	133.39

Without FA

PP	TG	B	N_KV	T_PP s	S_PP t/s	T_TG s	S_TG t/s	T s	S t/s
128	128	1	256	2.484	51.52	6.060	21.12	8.544	29.96
128	128	2	512	3.382	75.70	7.849	32.61	11.231	45.59
128	128	4	1024	4.705	108.81	9.930	51.56	14.636	69.97
128	128	8	2048	7.809	131.13	14.094	72.66	21.903	93.50
128	128	16	4096	13.959	146.72	23.576	86.87	37.535	109.12
128	128	32	8192	26.398	155.16	43.669	93.80	70.067	116.92
128	128	64	16384	52.106	157.22	86.425	94.79	138.531	118.27
128	256	1	384	2.472	51.79	12.770	20.05	15.242	25.19
128	256	2	768	3.206	79.85	16.053	31.89	19.259	39.88
128	256	4	1536	4.734	108.15	20.449	50.08	25.183	60.99
128	256	8	3072	7.737	132.35	31.228	65.58	38.965	78.84
128	256	16	6144	13.923	147.10	52.560	77.93	66.483	92.42
128	256	32	12288	26.282	155.85	99.787	82.09	126.070	97.47
128	256	64	24576	52.504	156.03	199.672	82.05	252.175	97.46
128	512	1	640	2.492	51.37	26.737	19.15	29.229	21.90
128	512	2	1280	3.257	78.61	33.246	30.80	36.503	35.07
128	512	4	2560	4.732	108.19	45.475	45.04	50.208	50.99
128	512	8	5120	7.774	131.72	73.254	55.92	81.028	63.19
128	512	16	10240	13.811	148.29	127.577	64.21	141.388	72.42
128	512	32	20480	26.469	154.75	247.673	66.15	274.141	74.71
128	512	64	40960	52.364	156.44	505.619	64.81	557.983	73.41
256	128	1	384	3.255	78.65	6.703	19.09	9.958	38.56
256	128	2	768	4.704	108.84	8.188	31.26	12.892	59.57
256	128	4	1536	7.804	131.21	10.529	48.63	18.333	83.78
256	128	8	3072	13.859	147.77	17.210	59.50	31.069	98.88
256	128	16	6144	26.449	154.86	29.389	69.69	55.838	110.03
256	128	32	12288	52.430	156.25	56.929	71.95	109.360	112.36
256	128	64	24576	109.508	149.61	116.553	70.29	226.062	108.71
256	256	1	512	3.263	78.45	13.597	18.83	16.861	30.37
256	256	2	1024	4.726	108.33	16.703	30.65	21.429	47.79
256	256	4	2048	7.751	132.10	22.090	46.36	29.841	68.63
256	256	8	4096	13.925	147.08	36.786	55.67	50.711	80.77
256	256	16	8192	26.297	155.76	63.942	64.06	90.238	90.78
256	256	32	16384	52.825	155.08	125.673	65.19	178.498	91.79
256	256	64	32768	109.027	150.28	260.214	62.96	369.240	88.74
256	512	1	768	3.276	78.16	27.850	18.38	31.126	24.67
256	512	2	1536	4.746	107.89	34.535	29.65	39.280	39.10
256	512	4	3072	7.785	131.53	50.180	40.81	57.965	53.00
256	512	8	6144	13.847	147.90	84.451	48.50	98.298	62.50
256	512	16	12288	26.463	154.78	151.157	54.20	177.621	69.18
256	512	32	24576	52.490	156.07	300.016	54.61	352.506	69.72
256	512	64	49152	109.592	149.50	616.284	53.17	725.876	67.71
512	128	1	640	4.724	108.39	7.043	18.17	11.767	54.39
512	128	2	1280	7.782	131.59	8.667	29.54	16.449	77.81
512	128	4	2560	13.852	147.84	13.416	38.16	27.269	93.88
512	128	8	5120	26.476	154.71	22.455	45.60	48.930	104.64
512	128	16	10240	52.402	156.33	40.547	50.51	92.950	110.17
512	128	32	20480	109.698	149.36	81.209	50.44	190.907	107.28
512	128	64	40960	240.247	136.39	168.863	48.51	409.110	100.12
512	256	1	768	4.820	106.23	14.268	17.94	19.088	40.24
512	256	2	1536	7.776	131.68	17.944	28.53	25.720	59.72
512	256	4	3072	13.957	146.74	28.172	36.35	42.129	72.92
512	256	8	6144	26.305	155.71	48.356	42.35	74.661	82.29
512	256	16	12288	52.738	155.33	87.019	47.07	139.758	87.92
512	256	32	24576	109.195	150.04	174.406	46.97	283.602	86.66
512	256	64	49152	241.761	135.54	362.944	45.14	604.706	81.28
512	512	1	1024	4.768	107.38	29.159	17.56	33.928	30.18
512	512	2	2048	7.917	129.34	37.779	27.10	45.696	44.82
512	512	4	4096	13.819	148.20	61.382	33.36	75.201	54.47
512	512	8	8192	26.494	154.60	105.948	38.66	132.442	61.85
512	512	16	16384	52.440	156.22	196.027	41.79	248.467	65.94
512	512	32	32768	109.805	149.21	397.307	41.24	507.111	64.62
512	512	64	65536	240.651	136.16	826.526	39.65	1067.177	61.41

Qwen3-14B-Q8_0

Scripts

./build/bin/llama-batched-bench -c 65536 -m ~/models/Qwen/Qwen3-14B_Q8_0.gguf -npp 128,256,512 -ntg 128,256,512 -npl 1,2,4,8,16,32,64 --split-mode none --main-gpu 0 -ngl 999 [-fa]

With FA

PP	TG	B	N_KV	T_PP s	S_PP t/s	T_TG s	S_TG t/s	T s	S t/s
128	128	1	256	0.602	212.59	6.487	19.73	7.090	36.11
128	128	2	512	0.607	421.70	7.037	36.38	7.644	66.98
128	128	4	1024	0.688	744.68	7.358	69.58	8.046	127.27
128	128	8	2048	0.772	1325.58	7.802	131.25	8.574	238.85
128	128	16	4096	1.018	2011.85	8.279	247.36	9.297	440.56
128	128	32	8192	1.586	2582.21	8.555	478.76	10.142	807.76
128	128	64	16384	2.903	2822.21	11.454	715.22	14.357	1141.22
128	256	1	384	0.603	212.22	13.003	19.69	13.606	28.22
128	256	2	768	0.607	421.73	14.158	36.16	14.765	52.02
128	256	4	1536	0.652	785.62	14.690	69.71	15.342	100.12
128	256	8	3072	0.771	1328.02	15.646	130.90	16.417	187.12
128	256	16	6144	1.018	2012.46	16.967	241.41	17.985	341.62
128	256	32	12288	1.582	2589.00	18.573	441.08	20.155	609.69
128	256	64	24576	3.139	2609.93	27.013	606.52	30.152	815.08
128	512	1	640	0.598	214.04	26.102	19.62	26.700	23.97
128	512	2	1280	0.607	421.58	28.382	36.08	28.989	44.15
128	512	4	2560	0.651	786.33	29.619	69.15	30.270	84.57
128	512	8	5120	0.770	1330.21	31.999	128.01	32.768	156.25
128	512	16	10240	1.015	2017.34	35.463	231.00	36.478	280.72
128	512	32	20480	1.588	2579.61	42.133	388.86	43.721	468.42
128	512	64	40960	3.476	2356.84	68.228	480.27	71.704	571.24
256	128	1	384	0.614	417.18	6.545	19.56	7.159	53.64
256	128	2	768	0.650	787.68	7.091	36.10	7.741	99.21
256	128	4	1536	0.776	1319.87	7.371	69.47	8.146	188.55
256	128	8	3072	1.033	1982.38	7.815	131.02	8.848	347.18
256	128	16	6144	1.580	2593.14	8.513	240.56	10.093	608.74
256	128	32	12288	2.879	2845.02	9.449	433.46	12.329	996.68
256	128	64	24576	6.508	2517.54	13.927	588.21	20.435	1202.65
256	256	1	512	0.618	413.97	13.048	19.62	13.666	37.46
256	256	2	1024	0.651	786.79	14.185	36.09	14.836	69.02
256	256	4	2048	0.795	1288.67	14.851	68.95	15.646	130.90
256	256	8	4096	1.019	2009.16	15.872	129.03	16.891	242.49
256	256	16	8192	1.576	2599.33	17.524	233.74	19.100	428.91
256	256	32	16384	2.993	2737.06	20.359	402.37	23.352	701.60
256	256	64	32768	7.269	2253.92	31.833	514.69	39.102	838.02
256	512	1	768	0.611	418.70	26.117	19.60	26.728	28.73
256	512	2	1536	0.653	784.39	28.477	35.96	29.130	52.73
256	512	4	3072	0.788	1299.70	29.862	68.58	30.650	100.23
256	512	8	6144	1.031	1985.62	32.568	125.77	33.600	182.86
256	512	16	12288	1.608	2546.99	37.092	220.86	38.700	317.52
256	512	32	24576	3.415	2399.14	46.545	352.00	49.960	491.91
256	512	64	49152	9.144	1791.80	80.369	407.72	89.513	549.10
512	128	1	640	0.685	747.78	6.565	19.50	7.249	88.28
512	128	2	1280	0.770	1329.60	7.173	35.69	7.943	161.14
512	128	4	2560	1.043	1963.36	7.457	68.66	8.500	301.16
512	128	8	5120	1.581	2590.62	8.139	125.81	9.720	526.73
512	128	16	10240	2.889	2835.58	9.126	224.41	12.015	852.27
512	128	32	20480	6.469	2532.71	11.276	363.24	17.745	1154.11
512	128	64	40960	18.111	1809.33	18.655	439.14	36.765	1114.09
512	256	1	768	0.673	760.42	13.102	19.54	13.775	55.75
512	256	2	1536	0.771	1328.81	14.324	35.74	15.095	101.76
512	256	4	3072	1.017	2013.71	14.929	68.59	15.946	192.65
512	256	8	6144	1.567	2614.39	16.356	125.21	17.923	342.80
512	256	16	12288	2.928	2797.47	18.784	218.06	21.713	565.94
512	256	32	24576	7.076	2315.33	23.976	341.68	31.052	791.44
512	256	64	49152	20.831	1573.03	41.525	394.56	62.356	788.25
512	512	1	1024	0.684	748.55	26.237	19.51	26.921	38.04
512	512	2	2048	0.788	1299.71	28.848	35.50	29.636	69.11
512	512	4	4096	1.017	2013.14	30.145	67.94	31.162	131.44
512	512	8	8192	1.605	2551.98	33.498	122.27	35.104	233.37
512	512	16	16384	3.243	2526.05	39.920	205.21	43.163	379.58
512	512	32	32768	8.391	1952.63	54.109	302.80	62.499	524.29
512	512	64	65536	26.338	1244.13	101.482	322.89	127.820	512.72

Without FA

PP	TG	B	N_KV	T_PP s	S_PP t/s	T_TG s	S_TG t/s	T s	S t/s
128	128	1	256	0.606	211.24	7.568	16.91	8.174	31.32
128	128	2	512	0.618	414.14	8.299	30.85	8.917	57.42
128	128	4	1024	0.667	767.67	8.753	58.50	9.419	108.71
128	128	8	2048	0.825	1241.92	9.671	105.88	10.496	195.13
128	128	16	4096	1.154	1774.80	11.384	179.90	12.538	326.68
128	128	32	8192	2.046	2001.83	14.565	281.23	16.611	493.18
128	128	64	16384	4.786	1711.72	24.961	328.20	29.746	550.79
128	256	1	384	0.615	208.12	15.360	16.67	15.975	24.04
128	256	2	768	0.616	415.43	16.725	30.61	17.341	44.29
128	256	4	1536	0.674	759.38	17.696	57.87	18.370	83.62
128	256	8	3072	0.806	1270.80	20.537	99.72	21.342	143.94
128	256	16	6144	1.131	1810.54	25.457	160.90	26.589	231.08
128	256	32	12288	2.070	1978.83	35.569	230.31	37.639	326.47
128	256	64	24576	4.926	1663.06	66.478	246.46	71.404	344.18
128	512	1	640	0.614	208.57	31.134	16.45	31.747	20.16
128	512	2	1280	0.632	405.02	34.036	30.09	34.668	36.92
128	512	4	2560	0.664	771.09	37.297	54.91	37.961	67.44
128	512	8	5120	0.806	1270.11	46.201	88.66	47.007	108.92
128	512	16	10240	1.132	1809.06	60.407	135.61	61.539	166.40
128	512	32	20480	2.100	1950.33	91.316	179.42	93.416	219.23
128	512	64	40960	4.985	1643.29	191.041	171.52	196.026	208.95
256	128	1	384	0.626	409.25	7.775	16.46	8.400	45.71
256	128	2	768	0.663	772.62	8.442	30.32	9.105	84.35
256	128	4	1536	0.838	1221.30	8.993	56.94	9.831	156.24
256	128	8	3072	1.127	1816.74	10.942	93.59	12.069	254.53
256	128	16	6144	2.074	1974.98	14.177	144.46	16.251	378.07
256	128	32	12288	5.026	1630.08	21.726	188.53	26.752	459.33
256	128	64	24576	15.583	1051.40	43.979	186.27	59.562	412.61
256	256	1	512	0.769	332.74	15.633	16.38	16.403	31.21
256	256	2	1024	0.663	772.40	17.040	30.05	17.703	57.84
256	256	4	2048	0.804	1273.13	18.433	55.55	19.237	106.46
256	256	8	4096	1.154	1775.04	23.135	88.52	24.289	168.64
256	256	16	8192	2.077	1971.71	30.118	136.00	32.195	254.45
256	256	32	16384	5.051	1621.86	47.640	171.95	52.691	310.94
256	256	64	32768	15.864	1032.79	103.328	158.56	119.192	274.92
256	512	1	768	0.626	409.04	31.577	16.21	32.203	23.85
256	512	2	1536	0.682	750.83	34.626	29.57	35.308	43.50
256	512	4	3072	0.804	1273.13	39.390	51.99	40.194	76.43
256	512	8	6144	1.127	1816.76	51.457	79.60	52.584	116.84
256	512	16	12288	2.085	1964.60	71.083	115.25	73.168	167.94
256	512	32	24576	5.073	1614.76	116.455	140.69	121.528	202.22
256	512	64	49152	15.913	1029.60	258.834	126.60	274.747	178.90
512	128	1	640	0.694	738.05	7.921	16.16	8.615	74.29
512	128	2	1280	0.824	1242.88	8.739	29.29	9.563	133.85
512	128	4	2560	1.154	1774.65	10.199	50.20	11.354	225.48
512	128	8	5120	2.089	1960.41	13.509	75.80	15.598	328.25
512	128	16	10240	5.020	1631.97	19.268	106.29	24.287	421.62
512	128	32	20480	15.561	1052.92	31.343	130.68	46.903	436.64
512	128	64	40960	55.053	595.21	73.735	111.10	128.788	318.04
512	256	1	768	0.691	741.36	15.951	16.05	16.642	46.15
512	256	2	1536	0.977	1048.15	17.594	29.10	18.571	82.71
512	256	4	3072	1.130	1812.90	20.972	48.83	22.102	138.99
512	256	8	6144	2.082	1967.61	28.300	72.37	30.381	202.23
512	256	16	12288	5.043	1624.30	40.820	100.34	45.863	267.93
512	256	32	24576	15.731	1041.51	69.208	118.37	84.939	289.34
512	256	64	49152	54.769	598.29	163.681	100.10	218.450	225.00
512	512	1	1024	0.710	721.24	32.244	15.88	32.954	31.07
512	512	2	2048	0.810	1263.62	36.088	28.37	36.899	55.50
512	512	4	4096	1.301	1573.70	44.401	46.13	45.702	89.62
512	512	8	8192	2.081	1968.19	59.872	68.41	61.953	132.23
512	512	16	16384	5.032	1628.14	88.457	92.61	93.489	175.25
512	512	32	32768	15.602	1050.14	160.120	102.32	175.722	186.48
512	512	64	65536	55.415	591.32	383.509	85.44	438.923	149.31

Qwen3-32B-Q8_0

Scripts

./build/bin/llama-batched-bench -c 65536 -m ~/models/Qwen/Qwen3-32B_Q8_0.gguf -npp 128,256,512 -ntg 128,256,512 -npl 1,2,4,8,16,32,64 --split-mode none --main-gpu 0 -ngl 999 [-fa]
###With FA

PP	TG	B	N_KV	T_PP s	S_PP t/s	T_TG s	S_TG t/s	T s	S t/s
128	128	1	256	0.983	130.25	14.148	9.05	15.131	16.92
128	128	2	512	1.034	247.49	15.089	16.97	16.123	31.76
128	128	4	1024	1.115	459.12	15.569	32.89	16.684	61.38
128	128	8	2048	1.368	748.44	16.263	62.97	17.631	116.16
128	128	16	4096	1.958	1045.95	17.245	118.76	19.203	213.30
128	128	32	8192	3.132	1307.92	17.566	233.18	20.698	395.79
128	128	64	16384	6.018	1361.36	22.924	357.36	28.941	566.12
128	256	1	384	0.999	128.10	28.371	9.02	29.370	13.07
128	256	2	768	1.008	254.06	30.309	16.89	31.317	24.52
128	256	4	1536	1.144	447.51	31.315	32.70	32.459	47.32
128	256	8	3072	1.367	748.93	32.742	62.55	34.109	90.06
128	256	16	6144	1.974	1037.45	35.300	116.03	37.274	164.83
128	256	32	12288	3.135	1306.63	37.697	217.31	40.832	300.94
128	256	64	24576	6.087	1345.81	53.306	307.36	59.393	413.79
128	512	1	640	1.005	127.39	56.847	9.01	57.852	11.06
128	512	2	1280	1.008	254.05	60.989	16.79	61.997	20.65
128	512	4	2560	1.124	455.45	63.104	32.45	64.229	39.86
128	512	8	5120	1.524	671.92	66.717	61.39	68.241	75.03
128	512	16	10240	1.925	1063.78	74.475	110.00	76.400	134.03
128	512	32	20480	3.189	1284.28	85.010	192.73	88.200	232.20
128	512	64	40960	6.261	1308.40	131.042	250.06	137.304	298.32
256	128	1	384	1.022	250.58	14.242	8.99	15.264	25.16
256	128	2	768	1.109	461.69	15.240	16.80	16.349	46.98
256	128	4	1536	1.398	732.72	15.722	32.57	17.119	89.72
256	128	8	3072	1.924	1064.23	16.467	62.18	18.391	167.03
256	128	16	6144	3.185	1286.14	17.896	114.44	21.081	291.45
256	128	32	12288	5.964	1373.59	19.357	211.61	25.321	485.30
256	128	64	24576	13.538	1210.20	27.452	298.41	40.990	599.56
256	256	1	512	1.013	252.70	28.446	9.00	29.460	17.38
256	256	2	1024	1.117	458.19	30.528	16.77	31.645	32.36
256	256	4	2048	1.370	747.46	31.651	32.35	33.021	62.02
256	256	8	4096	1.945	1053.09	33.202	61.68	35.147	116.54
256	256	16	8192	3.130	1308.69	36.643	111.78	39.773	205.97
256	256	32	16384	6.016	1361.67	41.327	198.22	47.343	346.07
256	256	64	32768	14.307	1145.17	62.465	262.29	76.772	426.82
256	512	1	768	1.022	250.52	56.950	8.99	57.972	13.25
256	512	2	1536	1.107	462.42	61.316	16.70	62.423	24.61
256	512	4	3072	1.388	737.65	63.599	32.20	64.987	47.27
256	512	8	6144	1.926	1063.50	67.708	60.50	69.633	88.23
256	512	16	12288	3.149	1300.58	77.623	105.54	80.772	152.13
256	512	32	24576	6.249	1310.93	93.462	175.30	99.711	246.47
256	512	64	49152	16.148	1014.60	154.350	212.30	170.498	288.28
512	128	1	640	1.125	455.19	14.302	8.95	15.427	41.49
512	128	2	1280	1.368	748.35	15.475	16.54	16.843	76.00
512	128	4	2560	1.944	1053.30	15.952	32.10	17.896	143.05
512	128	8	5120	3.113	1315.70	17.024	60.15	20.137	254.26
512	128	16	10240	5.943	1378.32	19.469	105.19	25.413	402.95
512	128	32	20480	13.388	1223.76	22.824	179.46	36.212	565.55
512	128	64	40960	35.723	917.29	35.860	228.44	71.583	572.20
512	256	1	768	1.125	455.05	28.569	8.96	29.694	25.86
512	256	2	1536	1.371	746.85	30.843	16.60	32.215	47.68
512	256	4	3072	1.917	1068.44	31.912	32.09	33.829	90.81
512	256	8	6144	3.121	1312.36	34.308	59.69	37.429	164.15
512	256	16	12288	6.019	1361.01	39.928	102.59	45.947	267.44
512	256	32	24576	13.586	1205.92	48.301	169.60	61.888	397.11
512	256	64	49152	39.760	824.15	79.873	205.13	119.633	410.86
512	512	1	1024	1.141	448.76	57.190	8.95	58.331	17.56
512	512	2	2048	1.374	745.05	62.175	16.47	63.549	32.23
512	512	4	4096	1.924	1064.22	64.353	31.82	66.277	61.80
512	512	8	8192	3.145	1302.46	70.050	58.47	73.195	111.92
512	512	16	16384	6.039	1356.59	83.984	97.54	90.023	182.00
512	512	32	32768	14.991	1092.95	108.299	151.28	123.290	265.78
512	512	64	65536	47.882	684.34	190.131	172.34	238.014	275.35

Without FA

PP	TG	B	N_KV	T_PP s	S_PP t/s	T_TG s	S_TG t/s	T s	S t/s
128	128	1	256	1.004	127.51	15.905	8.05	16.909	15.14
128	128	2	512	1.027	249.31	17.028	15.03	18.055	28.36
128	128	4	1024	1.138	450.02	17.789	28.78	18.926	54.10
128	128	8	2048	1.447	707.47	19.382	52.83	20.830	98.32
128	128	16	4096	2.222	921.80	22.671	90.34	24.892	164.55
128	128	32	8192	4.234	967.42	27.892	146.85	32.126	254.99
128	128	64	16384	10.369	790.04	48.891	167.56	59.260	276.48
128	256	1	384	0.990	129.32	32.083	7.98	33.072	11.61
128	256	2	768	1.022	250.41	34.382	14.89	35.404	21.69
128	256	4	1536	1.273	402.27	36.180	28.30	37.453	41.01
128	256	8	3072	1.445	708.70	41.076	49.86	42.521	72.25
128	256	16	6144	2.214	924.90	49.780	82.28	51.994	118.17
128	256	32	12288	4.273	958.68	66.292	123.57	70.565	174.14
128	256	64	24576	10.486	781.23	126.601	129.41	137.087	179.27
128	512	1	640	1.053	121.52	64.987	7.88	66.040	9.69
128	512	2	1280	1.047	244.59	69.936	14.64	70.983	18.03
128	512	4	2560	1.142	448.27	76.178	26.88	77.321	33.11
128	512	8	5120	1.450	706.33	91.764	44.64	93.214	54.93
128	512	16	10240	2.215	924.51	115.488	70.93	117.703	87.00
128	512	32	20480	4.262	961.15	168.092	97.47	172.354	118.83
128	512	64	40960	10.539	777.31	364.535	89.89	375.074	109.21
256	128	1	384	1.048	244.18	16.168	7.92	17.217	22.30
256	128	2	768	1.133	451.92	17.295	14.80	18.428	41.67
256	128	4	1536	1.448	707.39	18.378	27.86	19.826	77.48
256	128	8	3072	2.197	932.23	21.740	47.10	23.937	128.34
256	128	16	6144	4.293	954.17	27.218	75.24	31.511	194.98
256	128	32	12288	10.574	774.76	38.745	105.72	49.319	249.15
256	128	64	24576	32.279	507.58	79.954	102.46	112.233	218.97
256	256	1	512	1.040	246.23	32.519	7.87	33.559	15.26
256	256	2	1024	1.149	445.56	34.906	14.67	36.055	28.40
256	256	4	2048	1.452	705.03	37.642	27.20	39.094	52.39
256	256	8	4096	2.223	921.39	45.792	44.72	48.015	85.31
256	256	16	8192	4.265	960.46	57.395	71.37	61.659	132.86
256	256	32	16384	10.602	772.68	84.747	96.66	95.349	171.83
256	256	64	32768	32.336	506.68	186.740	87.74	219.076	149.57
256	512	1	768	1.062	241.02	65.598	7.81	66.660	11.52
256	512	2	1536	1.161	441.14	71.001	14.42	72.161	21.29
256	512	4	3072	1.448	707.42	79.918	25.63	81.366	37.76
256	512	8	6144	2.195	932.93	100.988	40.56	103.183	59.54
256	512	16	12288	4.417	927.32	132.675	61.75	137.092	89.63
256	512	32	24576	10.588	773.70	210.950	77.67	221.538	110.93
256	512	64	49152	32.333	506.73	480.575	68.19	512.907	95.83
512	128	1	640	1.155	443.14	16.448	7.78	17.604	36.36
512	128	2	1280	1.465	699.17	17.914	14.29	19.378	66.05
512	128	4	2560	2.195	933.02	20.583	24.87	22.779	112.39
512	128	8	5120	4.269	959.51	26.470	38.68	30.739	166.56
512	128	16	10240	10.604	772.55	35.764	57.26	46.368	220.84
512	128	32	20480	32.361	506.29	57.668	71.03	90.029	227.48

Qwen2-72B-Q8_0

Scripts

./build/bin/llama-batched-bench -c 65536 -m ~/models/Qwen/Qwen2-72B_q8_0.gguf -npp 128,256,512 -ntg 128,256,512 -npl 1,2,4,8,16,32,64 -ngl 999 -fa

With FA

PP	TG	B	N_KV	T_PP s	S_PP t/s	T_TG s	S_TG t/s	T s	S t/s
128	128	1	256	1.372	93.30	38.387	3.33	39.759	6.44
128	128	2	512	1.448	176.84	35.743	7.16	37.191	13.77
128	128	4	1024	1.580	324.02	36.357	14.08	37.937	26.99
128	128	8	2048	2.066	495.71	37.195	27.53	39.261	52.16
128	128	16	4096	3.008	680.91	38.329	53.43	41.337	99.09
128	128	32	8192	5.225	783.98	34.634	118.26	39.859	205.53
128	128	64	16384	10.377	789.41	41.835	195.82	52.213	313.79
128	256	1	384	1.393	91.90	76.870	3.33	78.263	4.91
128	256	2	768	1.426	179.56	71.721	7.14	73.147	10.50
128	256	4	1536	1.601	319.75	72.916	14.04	74.517	20.61
128	256	8	3072	2.035	503.24	74.645	27.44	76.679	40.06
128	256	16	6144	3.039	674.02	77.652	52.75	80.691	76.14
128	256	32	12288	5.336	767.56	71.958	113.84	77.295	158.98
128	256	64	24576	10.752	761.93	90.031	181.98	100.782	243.85
128	512	1	640	1.412	90.67	154.127	3.32	155.539	4.11
128	512	2	1280	1.427	179.34	144.149	7.10	145.577	8.79
128	512	4	2560	1.585	322.98	146.922	13.94	148.507	17.24
128	512	8	5120	2.040	502.07	150.944	27.14	152.984	33.47
128	512	16	10240	3.043	672.96	158.487	51.69	161.530	63.39
128	512	32	20480	5.416	756.22	153.610	106.66	159.026	128.78
128	512	64	40960	11.169	733.45	203.749	160.83	214.918	190.58
256	128	1	384	1.477	173.35	38.522	3.32	39.999	9.60
256	128	2	768	1.591	321.77	35.918	7.13	37.509	20.47
256	128	4	1536	2.057	497.74	36.569	14.00	38.626	39.77
256	128	8	3072	3.011	680.28	37.398	27.38	40.409	76.02
256	128	16	6144	5.200	787.68	39.079	52.41	44.279	138.76
256	128	32	12288	10.379	789.28	36.840	111.19	47.219	260.24
256	128	64	24576	23.727	690.52	46.955	174.46	70.683	347.70
256	128	64	24576	23.727	690.52	46.955	174.46	70.683	347.70
256	256	1	512	1.451	176.41	76.953	3.33	78.404	6.53
256	256	2	1024	1.566	326.86	71.976	7.11	73.543	13.92
256	256	4	2048	2.048	500.02	73.273	13.98	75.321	27.19
256	256	8	4096	3.018	678.70	75.227	27.22	78.244	52.35
256	256	16	8192	5.254	779.61	78.877	51.93	84.131	97.37
256	256	32	16384	10.689	766.39	76.207	107.50	86.896	188.55
256	256	64	32768	24.541	667.61	100.662	162.76	125.203	261.72
256	512	1	768	1.486	172.27	153.922	3.33	155.408	4.94
256	512	2	1536	1.567	326.69	144.149	7.10	145.716	10.54
256	512	4	3072	2.039	502.09	147.154	13.92	149.193	20.59
256	512	8	6144	3.048	671.82	151.881	26.97	154.930	39.66
256	512	16	12288	5.335	767.83	161.522	50.72	166.856	73.64
256	512	32	24576	10.975	746.42	162.131	101.05	173.106	141.97
256	512	64	49152	25.694	637.65	225.436	145.35	251.130	195.72
512	128	1	640	1.586	322.87	38.599	3.32	40.185	15.93
512	128	2	1280	2.044	501.10	36.142	7.08	38.185	33.52
512	128	4	2560	3.015	679.31	36.719	13.94	39.734	64.43
512	128	8	5120	5.169	792.49	38.111	26.87	43.280	118.30
512	128	16	10240	10.270	797.70	40.407	50.68	50.677	202.07
512	128	32	20480	23.506	697.03	40.956	100.01	64.462	317.71
512	128	64	40960	61.032	536.90	57.373	142.78	118.406	345.93
512	256	1	768	1.615	316.96	77.101	3.32	78.717	9.76
512	256	2	1536	2.085	491.05	72.335	7.08	74.420	20.64
512	256	4	3072	3.105	659.54	73.618	13.91	76.723	40.04
512	256	8	6144	5.218	785.03	76.483	26.78	81.701	75.20
512	256	16	12288	10.455	783.58	82.056	49.92	92.511	132.83
512	256	32	24576	24.223	676.38	84.581	96.85	108.804	225.87
512	256	64	49152	63.445	516.48	121.450	134.90	184.895	265.84
512	512	1	1024	1.609	318.18	154.456	3.31	156.065	6.56
512	512	2	2048	2.058	497.64	144.961	7.06	147.018	13.93
512	512	4	4096	3.051	671.26	147.749	13.86	150.800	27.16
512	512	8	8192	5.312	771.05	154.219	26.56	159.532	51.35
512	512	16	16384	10.784	759.66	168.836	48.52	179.620	91.21
512	512	32	32768	25.506	642.36	180.051	91.00	205.557	159.41
512	512	64	65536	67.685	484.13	268.905	121.86	336.590	194.71

Without FA

PP	TG	B	N_KV	T_PP s	S_PP t/s	T_TG s	S_TG t/s	T s	S t/s
128	128	1	256	1.387	92.29	40.706	3.14	42.093	6.08
128	128	2	512	1.465	174.69	38.280	6.69	39.745	12.88
128	128	4	1024	1.602	319.56	39.234	13.05	40.836	25.08
128	128	8	2048	2.169	472.21	41.386	24.74	43.555	47.02
128	128	16	4096	3.456	592.53	45.278	45.23	48.735	84.05
128	128	32	8192	6.656	615.43	47.898	85.51	54.554	150.16
128	128	64	16384	15.853	516.75	76.937	106.48	92.789	176.57
128	256	1	384	1.384	92.51	81.874	3.13	83.257	4.61
128	256	2	768	1.447	176.97	76.820	6.66	78.266	9.81
128	256	4	1536	1.602	319.65	79.248	12.92	80.850	19.00
128	256	8	3072	2.136	479.42	85.638	23.91	87.774	35.00
128	256	16	6144	3.387	604.71	96.047	42.65	99.434	61.79
128	256	32	12288	6.712	610.23	108.826	75.28	115.538	106.35
128	256	64	24576	16.136	507.69	188.571	86.89	204.707	120.05
128	512	1	640	1.400	91.42	164.687	3.11	166.087	3.85
128	512	2	1280	1.460	175.32	155.386	6.59	156.846	8.16
128	512	4	2560	1.638	312.57	163.515	12.52	165.153	15.50
128	512	8	5120	2.160	474.05	182.685	22.42	184.845	27.70
128	512	16	10240	3.411	600.48	211.340	38.76	214.750	47.68
128	512	32	20480	6.668	614.25	261.947	62.55	268.616	76.24
128	512	64	40960	16.306	502.39	514.376	63.70	530.682	77.18
256	128	1	384	1.471	174.08	41.127	3.11	42.598	9.01
256	128	2	768	1.612	317.57	38.593	6.63	40.205	19.10
256	128	4	1536	2.145	477.44	40.030	12.79	42.175	36.42
256	128	8	3072	3.382	605.63	44.344	23.09	47.725	64.37
256	128	16	6144	6.750	606.78	50.710	40.39	57.461	106.93
256	128	32	12288	16.186	506.12	60.817	67.35	77.003	159.58
256	128	64	24576	47.326	346.20	112.135	73.05	159.461	154.12
256	256	1	512	1.485	172.34	82.465	3.10	83.951	6.10
256	256	2	1024	1.706	300.14	77.652	6.59	79.358	12.90
256	256	4	2048	2.237	457.79	81.149	12.62	83.386	24.56
256	256	8	4096	3.383	605.44	91.353	22.42	94.736	43.24
256	256	16	8192	6.685	612.76	104.897	39.05	111.581	73.42
256	256	32	16384	16.194	505.87	130.307	62.87	146.501	111.84
256	256	64	32768	47.441	345.36	257.209	63.70	304.650	107.56
256	512	1	768	1.462	175.11	165.687	3.09	167.149	4.59
256	512	2	1536	1.633	313.50	156.948	6.52	158.581	9.69
256	512	4	3072	2.134	479.93	168.065	12.19	170.198	18.05
256	512	8	6144	3.382	605.61	194.019	21.11	197.400	31.12
256	512	16	12288	6.704	611.01	231.779	35.34	238.482	51.53
256	512	32	24576	16.299	502.62	311.565	52.59	327.863	74.96
256	512	64	49152	47.588	344.29	647.057	50.64	694.645	70.76
512	128	1	640	1.634	313.29	41.503	3.08	43.137	14.84
512	128	2	1280	2.159	474.34	39.422	6.49	41.581	30.78
512	128	4	2560	3.448	594.04	42.801	11.96	46.249	55.35
512	128	8	5120	6.730	608.64	49.955	20.50	56.684	90.32
512	128	16	10240	16.408	499.28	61.076	33.53	77.484	132.16
512	128	32	20480	48.166	340.16	84.175	48.66	132.341	154.75
512	128	64	40960	162.809	201.27	182.069	44.99	344.878	118.77
512	256	1	768	1.623	315.51	83.171	3.08	84.794	9.06
512	256	2	1536	2.148	476.77	79.253	6.46	81.400	18.87
512	256	4	3072	3.444	594.72	86.890	11.79	90.333	34.01
512	256	8	6144	6.737	607.95	102.615	19.96	109.352	56.19
512	256	16	12288	16.379	500.15	126.873	32.28	143.252	85.78
512	256	32	24576	48.092	340.68	182.106	44.98	230.199	106.76
512	256	64	49152	164.418	199.30	399.139	41.05	563.557	87.22
512	512	1	1024	1.626	314.88	167.303	3.06	168.929	6.06
512	512	2	2048	2.141	478.30	160.724	6.37	162.865	12.57
512	512	4	4096	3.401	602.20	179.202	11.43	182.603	22.43
512	512	8	8192	6.743	607.45	212.768	19.25	219.511	37.32
512	512	16	16384	16.412	499.16	268.024	30.56	284.436	57.60
512	512	32	32768	48.841	335.46	414.713	39.51	463.554	70.69
512	512	64	65536	163.954	199.86	933.081	35.12	1097.035	59.74

noemotiovon

You’ve implemented FlashAttention (FA) on CANN and provided a comprehensive test report — it looks excellent and is highly meaningful! Thank you so much to you and your colleagues for your valuable contributions to the llama.cpp project and your support for Huawei Ascend!

noemotiovon · 2025-05-21T03:29:32Z

docs/backend/CANN.md


 ## TODO
- Support more models and data types.
+- Support more models and d


Here seems to be some documentation errors

noemotiovon · 2025-05-21T03:54:58Z

ggml/src/ggml-cann/aclnn_ops.cpp

+        ggml_cann_pool_alloc bcast_pse_allocator(ctx.pool());
+        void* bcast_pse_buffer = nullptr;
+        if(src3)
+            bcast_pse_buffer = bcast_pse_allocator.alloc(


Could the memory allocation here be moved into the src3 != nullptr block below?

noemotiovon · 2025-05-21T06:25:13Z

ggml/src/ggml-cann/aclnn_ops.cpp

+        if(src3)
+            ggml_cann_release_resources(ctx, bcast_pse_tensor);
+    }else{
+        throw std::runtime_error("Function not implemented");


I think using GGML_ABORT("Function not implemented"); would be a better choice.

noemotiovon · 2025-05-21T06:38:45Z

ggml/src/ggml-cann/aclnn_ops.cpp

+#include <string>
+#include <cstring>
+
+#include "aclnnop/aclnn_flash_attention_score.h"


Remove unnecessary imports.

#include "aclnnop/aclnn_flash_attention_score.h" #include "aclnnop/aclnn_logical_not.h"

noemotiovon · 2025-05-21T07:05:17Z

ggml/src/ggml-cann/aclnn_ops.cpp

@@ -72,12 +72,23 @@
 #include <exception>
 #include <vector>

+#include <iostream>


Remove unnecessary imports.

#include <iostream> #include <fstream> #include <string> #include <cstring>

ggml/src/ggml-cann/aclnn_ops.cpp

noemotiovon · 2025-05-21T07:10:22Z

ggml/src/ggml-cann/aclnn_ops.h

@@ -45,6 +45,8 @@
 #include <aclnnop/aclnn_cos.h>
 #include <aclnnop/aclnn_log.h>
 #include <aclnnop/aclnn_sign.h>
+#include <aclnnop/aclnn_fused_infer_attention_score_v2.h>


I suggest moving this #include <aclnnop/aclnn_fused_infer_attention_score_v2.h> to the aclnn_ops.cpp file, and if we don't need aclnn_isneginf, It can be removed.

shibizhao · 2025-05-21T08:13:14Z

We have updated the files according to the review comments. Thanks for your time. @noemotiovon @hipudding

noemotiovon

Sorry, I just noticed a few minor issues.
I pulled your latest code and tested the FA operator using a script, but encountered the following problems.
Could you please help me check the cause? Thank you so much!
Environment:

910B3
CANN 8.1 RC1

Script:

./bin/test-backend-ops test -b CANN0 -o FLASH_ATTN_EXT

Error:

Backend 1/2: CANN0
ggml_backend_cann_context: device 0 async operator submission is OFF
  Device description: Ascend910B3
  Device memory: 62432 MB (62147 MB free)

  FLASH_ATTN_EXT(hsk=64,hsv=64,nh=4,nr=1,kv=512,nb=1,mask=1,max_bias=0.000000,logit_softcap=0.000000,prec=f32,type_KV=f16,permute=[0,1,2,3]): new_pool_for_device: device 0 use vmm pool
CANN error: EZ9999: Inner Error!
EZ9999: [PID: 3008073] 2025-05-21-09:49:04.442.578 precision mode[2] should be 0 or 1[FUNC:InputAttrsPreProcess][FILE:incre_flash_attention_tiling.cc][LINE:303]
        TraceBack (most recent call last):
       FusedInferAttentionScore do tiling failed, ret is -1.
       Check NnopbaseExecutorDoTiling(executor) failed
       Check NnopbaseExecutorTilingAndUpdateBinInfo(executor) failed
       Check NnopbaseExecutorMatchCache(executor) failed
       Check NnopbaseRunForWorkspace(*executor, workspaceSize) failed

  current device: 0, in function ggml_cann_flash_attn_ext at /home/cmq/lcg/github/llama.cpp/ggml/src/ggml-cann/aclnn_ops.cpp:2858
  aclnnFusedInferAttentionScoreV2GetWorkspaceSize(acl_q_tensor, acl_k_tensor_list, acl_v_tensor_list, bcast_pse_tensor, nullptr, nullptr, nullptr, nullptr, nullptr, nullptr, nullptr, nullptr, nullptr, nullptr, nullptr, nullptr, nullptr, nullptr, nullptr, nullptr, nullptr, nullptr, nullptr, nullptr, numHeads, scaleValue, preTokens, nextTokens, layout, numKeyValueHeads, sparseMode, innerPrecise, blockSize, antiquantMode, softmaxLseFlag, keyAntiquantMode, valueAntiquantMode, acl_dst_f16_tensor, nullptr, &workspaceSize, &executor)
/home/cmq/lcg/github/llama.cpp/ggml/src/ggml-cann/ggml-cann.cpp:65: CANN error

noemotiovon · 2025-05-21T09:46:11Z

ggml/src/ggml-cann/aclnn_ops.cpp

+        aclTensor* acl_src0_f16_tensor = nullptr;
+        aclTensor* acl_src1_f16_tensor = nullptr;
+        aclTensor* acl_src2_f16_tensor = nullptr;
+        aclTensor* acl_src3_f16_tensor = nullptr;


This variable acl_src3_f16_tensor is not used and can likely be removed.

noemotiovon · 2025-05-21T09:47:49Z

ggml/src/ggml-cann/aclnn_ops.cpp

+        GGML_ABORT("Function not implemented");
+    }
+}
+


This line appears to contain some unexpected or strange characters.

shibizhao · 2025-05-21T10:15:24Z

Thanks for your reply. In 8.0.RC2, there is no CANN inner error.
To re-produce this error, we rented an Ascend 910B with 8.0.0 version.
Now, we have fixed this bug according to your feedback.

noemotiovon

Sorry for the late reply. The previous issue has been resolved after your fix—thank you very much for your contribution! There are still a few logic points here that I'd like you to help confirm.

noemotiovon · 2025-05-22T01:45:50Z

ggml/src/ggml-cann/ggml-cann.cpp

+            if (op->src[0]->ne[0] ==  64 && op->src[1]->type == GGML_TYPE_F16) {
+                return true;
+            }
+            if (op->src[0]->ne[0] == 128) {


I think there's a logical issue here. Currently, it seems that when op->src[0]->ne[0] == 128, the code allows kv to have a data type like q4/q8, implying that this case is supported. However, quantized formats are actually not supported at the moment. I believe the logic should be adjusted accordingly to reflect this:

if (op->src[0]->ne[0] != 128) { return false; }

Could you please help confirm the logic?

We have updated the if-else logic to pass all of the tests.

noemotiovon · 2025-05-22T01:50:02Z

ggml/src/ggml-cann/ggml-cann.cpp

+            }
+            if (op->src[0]->ne[0] == 256 && op->src[1]->type == GGML_TYPE_F16 && op->src[2]->type == GGML_TYPE_F16) {
+                return true;
+            }


It seems that the current FA doesn't support cases where logitSoftcap is not equal to 0, so we should add a check to ensure logitSoftcap equals 0 here, as shown in the code below.

float logitSoftcap = 0.0f; memcpy(&logitSoftcap, (float*)op->op_params + 2, sizeof(float)); if(logitSoftcap != 0.0f) { return false; }

Thanks for your comment. We have added it.

shibizhao · 2025-05-22T02:39:56Z

Thanks for your comments. We have updated the ggml-cann.cpp.

noemotiovon · 2025-05-22T03:16:22Z

LGTM! Thank you for your outstanding contribution! 😊
cc @hipudding

hipudding · 2025-05-23T02:29:09Z

@shibizhao Please resolve conflicts. Thanks.

shibizhao · 2025-05-23T03:27:41Z

@shibizhao Please resolve conflicts. Thanks.

Thanks. We think we have resolved the conflicts.

shibizhao added 8 commits May 14, 2025 22:40

cann: add the basic FA support

72df31d

cann: update the readme

3a73182

cann: update the FlashAttention with PSEShift

6a39d63

cann: update the input parameters in FA

8a902b9

cann: update the alibi with max_bias

f5e24a5

cann: add the constrints of softcap

c8c2908

cann: update the docs CANN.md

47f2c64

cann: update the docs CANN.md

fb62f01

github-actions bot added documentation Improvements or additions to documentation ggml changes relating to the ggml tensor library for machine learning labels May 19, 2025

shibizhao changed the title ~~cann: add the basic supports of Flash Attention kernel~~ [CANN]: add the basic supports of Flash Attention kernel May 20, 2025

hipudding self-requested a review May 20, 2025 06:06

hipudding added the Ascend NPU issues specific to Ascend NPUs label May 20, 2025

noemotiovon reviewed May 21, 2025

View reviewed changes

shibizhao added 3 commits May 21, 2025 15:35

cann: fix typo of CANN.md

b266beb

cann: add some comments and update the CANN.md

8a112f0

cann: update the CANN.md

1779e00

noemotiovon reviewed May 21, 2025

View reviewed changes

cann: update the inner precise for fusedInferAttention

092ccf6

noemotiovon reviewed May 22, 2025

View reviewed changes

cann: update the constraints of flash_attn_ext on ggml-cann.cpp

c380305

cann: resolve the conflict with laster master branch

89f884e

shibizhao added 4 commits May 23, 2025 17:26

Merge branch 'master' into flash-attn-cann

1a3bfec

cann: clean the whitespace

3b084d5

cann: clean the whitespace

d23697b

cann: add a new endline

8a7829b

[CANN]: add the basic supports of Flash Attention kernel #13627

Are you sure you want to change the base?

[CANN]: add the basic supports of Flash Attention kernel #13627

Uh oh!

Conversation

shibizhao commented May 19, 2025

Uh oh!

shibizhao commented May 20, 2025

Uh oh!

hipudding commented May 20, 2025

Uh oh!

shibizhao commented May 21, 2025

Evaluation Report on Ascend 910B + Kunpeng 920

Llama-7B-f16

Scripts

With FA

Without FA

Qwen3-14B-Q8_0

Scripts

With FA

Without FA

Qwen3-32B-Q8_0

Scripts

Without FA

Qwen2-72B-Q8_0

Scripts

With FA

Without FA

Uh oh!

noemotiovon left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

shibizhao commented May 21, 2025

Uh oh!

noemotiovon left a comment

Choose a reason for hiding this comment

Uh oh!

noemotiovon May 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

shibizhao commented May 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

noemotiovon left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

shibizhao commented May 22, 2025

Uh oh!

noemotiovon commented May 22, 2025

Uh oh!

hipudding commented May 23, 2025

Uh oh!

shibizhao commented May 23, 2025

Uh oh!

Uh oh!

noemotiovon May 21, 2025 •

edited

Loading

shibizhao commented May 21, 2025 •

edited

Loading