-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ggml : extend ggml_mul_mat to support non-F32 input for parameter b
#455
Comments
I guess this is the reason why 1D tensors needs to be stored as F32 in the model files. |
This also seems to be needed for PR ggerganov/llama.cpp#2632 |
Not exactly - it is a similar problem, but it is related to |
We could add a flag to if (ctx->quantize_matmul && b->type != type_traits[a->type].vec_dot_type) {
struct ggml_tensor * tmp_b = ggml_new_tensor(ctx, type_traits[a->type].vec_dot_type, b->n_dims, b->ne);
b = ggml_cpy(ctx, b, tmp_b);
} The disadvantage is that it may add a bit of overhead by increasing the size of the graphs, but it shouldn't be too bad. To avoid this, we could set a flag in |
Somewhat related to this, it would be good to have a |
Currently, we always pass
b
toggml_mul_mat
as F32 and internally quantize it depending on the type ofa
.There is no option that allows to pass an already quantized
b
.The primary goal of this task is to add such option.
For more info, see: ggerganov/llama.cpp#2615 (comment)
The primary focus will be
ggml_mul_mat
, but we can also think about some more general approach for the rest of the operators. For example,ggml_mul
currently also works with just F32 input, which prevents from having 1D F16 norm tensors. This is not a huge drawback since these tensors are usually small, but would be nice to also support F16.Additionally, we can extend
ggml
with parameters that control the implicit quantizations.I.e. disable / enable / change types, etc. This is secondary objective and not 100% sure how it would work from an API POV
The text was updated successfully, but these errors were encountered: