Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CLBlast: Add outer loops over src0 for broadcasting in mulmat #3669

Merged
merged 1 commit into from
Oct 20, 2023

Conversation

shibe2
Copy link
Contributor

@shibe2 shibe2 commented Oct 18, 2023

When broadcasting, each 2D plane of src0 is matched with multiple 2D planes of src1. Planes of src0 need to be copied and/or de-quantized only once per multiple GEMM operations. A more natural way to handle this is to create outer loop over src0 and do copying and de-quantisation there.

Previously, de-quantization was performed before each GEMM. Now it is moved to an outer loop.

There is still duplication in case of broadcasting over dimension 3. Handling that properly would require a more substantial change, i.e. storing 3D instead of 2D slices of src0 in VRAM. I don't know of a case where broadcasting over dimension 3 is currently used, so I leave it for the future. Nevertheless, it produces correct results even in this case.

In case of matrix-vector multiplication, de-quantization is done repeatedly because de-quantized data is not stored in RAM.

Tested in isolation and with models that use GQA.

Reduce repeated dequantization of the same data.
@shibe2 shibe2 merged commit 465219b into ggerganov:master Oct 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants