Skip to content

Conversation

benfred
Copy link
Owner

@benfred benfred commented May 30, 2023

This adds support for using float16 factors in the GPU version of the ALS model. This reduces the memory needed for the ALS model embeddings by half - while providing a small speedup in training time, and virtually no difference in the accuracy of the learned model.

All computations are still performed using float32 - including both training and inference. This is done with using mixed precision matrix multiplications during inference : the fp16 factors are multiplied together with results accumulated as fp32. During training, the factors are converted from fp16 to fp32 - and updates are calculated in 32-bit before being stored back as fp16.

@benfred
Copy link
Owner Author

benfred commented May 30, 2023

both training and inference times are slightly faster with fp16 - but not drastically so:

dataset dtype training time (s) similar_items time (s)
lastfm float16 6.03652 7.99366
lastfm float32 6.17446 8.58448
movielens-20m float16 3.5967 0.984919548034668
movielens-20m float32 3.6981 1.01374

This is as expected, since we're computing results in float32 - just storing in float16.

@benfred
Copy link
Owner Author

benfred commented May 30, 2023

Running some quick experiments with cross-validation, and I got equivalent results with both fp16 and fp32 factors. This indicates that there isn't an accuracy hit to using fp16 factors in the learned model.

Running a simple experiment on the lastfm dataset:

from implicit.evaluation import precision_at_k, train_test_split
from implicit.datasets.lastfm import get_lastfm
from implicit.gpu.als import AlternatingLeastSquares

_, _, ratings = get_lastfm()
train, test = train_test_split(ratings.T.tocsr())

fp_16_model = AlternatingLeastSquares(factors=128, dtype="float16")
fp_16_model.fit(train)
p = precision_at_k(fp_16_model, train, test, K=10)
print("precision@10, fp16", p)

fp_32_model = AlternatingLeastSquares(factors=128, dtype="float32")
fp_32_model.fit(train)
p = precision_at_k(fp_32_model, train, test, K=10)
print("precision@10, fp32", p)

Prints out

precision@10, fp16 0.14532461631304008
precision@10, fp32 0.14520956046071604

(note this was with just default hyper-parameters - the goal here is to show if the results are equivalent between fp16/fp32 or not, rather than to be the best possible results for the lastfm dataset).

@benfred benfred merged commit ec36f33 into main May 30, 2023
@benfred benfred deleted the fp16 branch May 30, 2023 18:24
@benfred benfred linked an issue May 30, 2023 that may be closed by this pull request
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support storing factors as FP16 on the GPU
1 participant