A PaddlePaddle implementation of Sequential Recommendation Via Personalized Transformer, an updated version of SASRec model, with the following 2 updates:
- Stochastic Shared Embeddings layers are added.
- item embeddings are concatenated with their corresponding user embeddings.
This repo is basically an update of SASRec.paddle, and the README.md is also a modification of the original one.
Paper: SSE-PT: Sequential Recommendation Via Personalized Transformer (acm.org)
The results of this repo are reproduced on a GTX 1070. It takes about 10 minutes for the model to achieve the best results on MovieLens-1M, at about epoch 210.
This results is achieved by only setting the probability of replacing user embeddings to 0.1, with no replacement for item embeddings.
Datasets | Metrics | Paper's | Ours | abs. improv. |
---|---|---|---|---|
MovieLens-1m | HIT@10 | 0.8351 | 0.8497 | 0.0146 |
MovieLens-1m | NDCG@10 | 0.6174 | 0.6245 | 0.0071 |
tqdm>=4.51.0
paddlepaddle_gpu>=2.1.2
numpy>=1.20.3
- see
requirements.txt
The users with less than 5 items are removed in the dataset provided by the author, as implemented in SASRec.paddle. The two columns are user_id
and item_id
respectively.
bash train.sh
The model is evaluated at every val_interval
batches while training and the training process is fast on MovieLens-1M, so you can also evaluate the model during training.
bash eval.sh
- [inherited from SASRec.paddle.] The original SASRec places
LayerNorm
beforeMultiHeadAttention
, while it's moved afterMultiHeadAttention
in this repo. This arrangement achieved better results, and is accordance with the originalTransformer
model. - [inherited from SASRec.paddle.] The original SASRec uses optimizer
Adam
, while this repo usesAdamW
and achievement better results. - In this repo, The SSE replacement probability is defined slightly different from the original paper. The original SSE-PT take
p_u
as the probability of keeping the original embeddings, while this repo take it as the probability of replacing them with others, just like dropout rate. i.e.,p_u'=1-p_u
,p_u'
is the SSE replacement probability in this repo whilep_u
is the probability of keeping as defined in the original paper.
-
implementations on github: