metal: somewhat faster f16 x f32 matrix multiply kernel #2951

ikawrakow · 2023-09-01T07:18:35Z

Simply via better accumulation of thread results. The larger the context (prompt), the more improvement we see in pp timing.

7B Q4_0 on 30-core M2 Max.

model	backend	test	t/s (Master)	ts (PR)	Speedup
LLaMA 7B mostly Q4_0	Metal	pp 32	370.26 ± 1.59	374.78 ± 0.95	1.2%
LLaMA 7B mostly Q4_0	Metal	pp 64	425.70 ± 0.62	433.17 ± 0.46	1.8%
LLaMA 7B mostly Q4_0	Metal	pp 128	406.00 ± 0.62	419.64 ± 0.76	3.4%
LLaMA 7B mostly Q4_0	Metal	pp 256	350.71 ± 0.15	373.11 ± 0.23	6.4%
LLaMA 7B mostly Q4_0	Metal	pp 512	264.21 ± 0.42	290.76 ± 0.29	10.0%

Update:

If we also change the number of thread groups from 64 to 32, it gets even better:

model	backend	test	t/s (Master)	ts (PR)	Speedup
LLaMA 7B mostly Q4_0	Metal	pp 32	370.26 ± 1.59	376.74 ± 2.03	+1.7%
LLaMA 7B mostly Q4_0	Metal	pp 64	425.70 ± 0.62	442.49 ± 0.98	+3.9%
LLaMA 7B mostly Q4_0	Metal	pp 128	406.00 ± 0.62	437.34 ± 0.66	+7.7%
LLaMA 7B mostly Q4_0	Metal	pp 256	350.71 ± 0.15	400.49 ± 0.50	+14.2%
LLaMA 7B mostly Q4_0	Metal	pp 512	264.21 ± 0.42	323.87 ± 0.10	+22.6%

It does give a small benefit for TG too. E.g., for TG-128 I get 61.28 +/- 0.16 vs 59.98 ± 0.10 on master.

monatis · 2023-09-01T07:34:58Z

The speedup column is not so intuitive. It's usually reported as PR / Master, so for pp 512, for example, the speedup should be 1.1 instead of 10.0%.

ggerganov

M2 Ultra

model	size	params	ngl	test	master t/s	PR t/s	speedup
LLaMA 7B Q4_0	3.56 GiB	6.74 B	1	pp 32	686.11 ± 4.85	684.45 ± 3.89	1.00
LLaMA 7B Q4_0	3.56 GiB	6.74 B	1	pp 64	852.72 ± 2.61	859.01 ± 1.66	1.01
LLaMA 7B Q4_0	3.56 GiB	6.74 B	1	pp 128	910.65 ± 1.78	927.03 ± 2.15	1.02
LLaMA 7B Q4_0	3.56 GiB	6.74 B	1	pp 256	811.85 ± 0.98	837.43 ± 0.99	1.03
LLaMA 7B Q4_0	3.56 GiB	6.74 B	1	pp 512	632.02 ± 0.11	663.84 ± 0.14	1.05
LLaMA 7B Q4_0	3.56 GiB	6.74 B	1	pp 1024	467.89 ± 0.12	498.51 ± 0.09	1.07
LLaMA 7B Q4_0	3.56 GiB	6.74 B	1	tg 128	87.20 ± 0.12	87.46 ± 0.03	1.01

I think this PR does not conflict with #2891 so we can merge it

ggerganov · 2023-09-01T07:58:00Z

Here is the speedup after the last commit:

model	size	params	ngl	test	master t/s	PR t/s	speedup
LLaMA 7B Q4_0	3.56 GiB	6.74 B	1	pp 32	686.11 ± 4.85	689.51 ± 3.60	1.00
LLaMA 7B Q4_0	3.56 GiB	6.74 B	1	pp 64	852.72 ± 2.61	862.13 ± 2.97	1.01
LLaMA 7B Q4_0	3.56 GiB	6.74 B	1	pp 128	910.65 ± 1.78	945.49 ± 1.71	1.04
LLaMA 7B Q4_0	3.56 GiB	6.74 B	1	pp 256	811.85 ± 0.98	873.72 ± 0.16	1.08
LLaMA 7B Q4_0	3.56 GiB	6.74 B	1	pp 512	632.02 ± 0.11	711.37 ± 0.26	1.13
LLaMA 7B Q4_0	3.56 GiB	6.74 B	1	pp 1024	467.89 ± 0.12	543.60 ± 0.10	1.16
LLaMA 7B Q4_0	3.56 GiB	6.74 B	1	tg 128	87.20 ± 0.12	87.38 ± 0.08	1.00

🦙

Somewhat faster f16 x f32 matrix multiply kernel

af226bd

ikawrakow requested a review from ggerganov September 1, 2023 07:18

ggerganov approved these changes Sep 1, 2023

View reviewed changes

Better use 32 thread groups for f16 x f32

cad50d1

ikawrakow merged commit e8d9158 into master Sep 1, 2023

ikawrakow deleted the ik/metal_faster_mm_f16_f32 branch September 1, 2023 08:16

ikawrakow mentioned this pull request Sep 1, 2023

More optimizations on metal #2959

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

metal: somewhat faster f16 x f32 matrix multiply kernel #2951

metal: somewhat faster f16 x f32 matrix multiply kernel #2951

Uh oh!

ikawrakow commented Sep 1, 2023 •

edited

Loading

Uh oh!

monatis commented Sep 1, 2023

Uh oh!

ggerganov left a comment •

edited

Loading

Uh oh!

ggerganov commented Sep 1, 2023

Uh oh!

Uh oh!

metal: somewhat faster f16 x f32 matrix multiply kernel #2951

metal: somewhat faster f16 x f32 matrix multiply kernel #2951

Uh oh!

Conversation

ikawrakow commented Sep 1, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

monatis commented Sep 1, 2023

Uh oh!

ggerganov left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ggerganov commented Sep 1, 2023

Uh oh!

Uh oh!

ikawrakow commented Sep 1, 2023 •

edited

Loading

ggerganov left a comment •

edited

Loading