Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

README: add graphic for matrix multiplication #6881

Merged

Conversation

JohannesGaessler
Copy link
Collaborator

While looking at the README regarding matrix memory layout I felt confused regarding the statement zT = x @ yT because the output tensor is transposed. @ggerganov what mental image do you have of the memory layout? Do you imagine basically all tensors in llama.cpp to be transposed, and therefore to be actually column-major? To make sure there are no misunderstandings I adapted a graphic I made before to visualize my mental image (which I suppose would also make sense to add for documentation).

I imagine the memory layout on the left whenever I'm thinking about matrix multiplications.

Copy link
Owner

@ggerganov ggerganov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main reason for the current layout is that I wanted matrix multiplications to be expressed as dot products of rows of elements that are ordered sequentially in memory. Normally, the result C_ij is defined as the product of i-th row in A by the j-th column in B. But accessing a column in a row-major array is not cache friendly, so I figured it would be better to have the matrix B transposed in order to perform the dot products in a cache-friendly manner - multiply row by row. The result is stored also in transposed form since this fits nicely in the transformer architecture - the result of a matrix multiplication is often used afterwards as the "B" for the next matrix multiplication:

B_1 = A_0 x B_0 
B_2 = A_1 x B_1
...

Here the A's are the weights and the B's are the activations.

I guess instead of saying "transposed", we can also say "stored in column-major order" as you have noted. And probably this makes more sense.

It's a nice graphic to have. Though when I draw the arrays on paper I always draw them in the way they are stored in memory, so for me B^T rows in the picture going vertically is confusing. But I understand it

There is also this description, which I'm not sure if it helps or not: https://github.com/ggerganov/ggml/tree/master/examples/simple

@JohannesGaessler JohannesGaessler merged commit 784e11d into ggerganov:master Apr 24, 2024
21 checks passed
@JohannesGaessler
Copy link
Collaborator Author

Thanks for the high-effort reply.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants