Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Currently, the
GgmlDType
only supports F16 and not BF16. This PR introduces support for the BF16 type.I would appreciate a check if this looks good! I have tested with success on my machine which has
avx
andf16c
, and the CUDA tests also pass even though no changes were necessary.I also noted that there will be a confusing situation in this case, though, if the tensor is part of a QMatMul. In this case (and for all other types not supported for quantized matmul in QStorage), we should perhaps dequantize and then perform the matmul using cublas? This modification could be made in
QStorage::fwd
, perhaps.