This repo provides our implementation for λOpt. λOpt is a fine-grained adaptive regularizer for recommender models. The key idea is to specify an individual regularization strength for each user and item. Check our paper for details about λOpt if you are interested!
Yihong Chen, Bei Chen, Xiangnan He, Chen Gao, Yong Li, Jian-Guang Lou and Yue Wang(2019). λOpt: Learn to Regularize Recommender Models in Finer Levels. In KDD'19, Anchorage, Alaska, USA, August 4-8, 2019.
For example, movielens-1m, put ratings.dat
in src/data/ml-1m
cd src
mkdir data
cd data
mkdir ml-1m
cd src
mkdir tmp
mkdir tmp/data
mkdir tmp/res
mkdir tmp/res/ml1m/
mkdir tmp/penalty/
mkdir tmp/penalty/ml1m/
For fixed lambda on movielens-1m, the hyper-parameters are in train_ml1m_fixed.py
For lambdaopt on movielens-1m, the hyper-parameters are in train_ml1m_alter.py
cd src
python train_ml1m_alter.py
src/regularizer/
: lambdaoptsrc/factorizer/
: matrix factorization modelsrc/utils/
: handy modules for data sampling, evaluation etc.src/engine.py
: training enginesrc/train_[dataset]_[regularization method].py
: entry point
λOpt features a parameterized regularizer, where very fine-grained regularization is employed over the user and item embeddings. While itself offers a novel technique to solve the bi-level optimization problem, there are also a bunch of cool work from the meta-learning community. Hence, to put readers in a broader context, this repo also offers the implementation of λOpt using Higher, a pytorch meta-learning repo. Higher provides useful tookits, like differentiable optimizers and automatic patch to make pytorch model functional, which facilitates convenient computation of higher-order gradients.
Comparison between the original λOpt and λOpt using higher is shown in the following table.
λOpt Original | λOpt Higher | |
---|---|---|
Gradient Path | Truncated 2nd-order | Fully 2nd-order |
Memory Cost | 1X | ~1.2X |
Computational Cost | 1X | ~1.7X |
Require manual gradient composition? | Yes | No |
Require duplicating model forward logic? | Yes | No |
Require manual differentiable optimizer? | Yes | No |
Concise code | No | Yes |
Note that λOpt Higher takes more computational resources but offers a very concise and beautiful code:
with higher.innerloop_ctx(m, m_optimizer) as (fmodel, diffopt):
# look-ahead, this is very similar to factorizer update except that lambda is included in the computational graph
u, i, j = sampler.get_sample('train')
preference = torch.ones(u.size())[0]
if self.use_cuda:
u, i, j = u.cuda(), i.cuda(), j.cuda()
preference = preference.cuda()
prob_preference = fmodel.forward_triple(u, i, j)
l_fit = self.criterion(prob_preference, preference) / (u.size()[0])
lmbda = self.lambda_network.parse_lmbda(is_detach=False)
l_reg = fmodel.l2_penalty(lmbda, u, i, j) / (u.size()[0])
l = l_fit + l_reg
diffopt.step(l)
# compute the validation loss
valid_u, valid_i, valid_j = sampler.get_sample('valid')
valid_preference = torch.ones(valid_u.size()[0])
if self.use_cuda:
valid_preference = valid_preference.cuda()
valid_u, valid_i, valid_j = valid_u.cuda(), valid_i.cuda(), valid_j.cuda()
self.lambda_network.train()
self.optimizer.zero_grad()
valid_prob_preference = fmodel.forward_triple(valid_u, valid_i, valid_j) / valid_u.size()[0]
l_val = self.criterion(valid_prob_preference, valid_preference)
l_val.backward()
torch.nn.utils.clip_grad_norm_(self.lambda_network.parameters(), self.clip)
self.optimizer.step()
self.valid_mf_loss = l_val.item()
return self.lambda_network.parse_lmbda(is_detach=True)
How to run λOpt in higher?
- simply specify the
regularizer
asalter_mf_higher
- pytorch=1.2.0
- higher
If you this repo, please kindly cite our paper.
@inproceedings{lambdaopt,
author = {Chen, Yihong and Chen, Bei and He, Xiangnan and Gao, Chen and Li, Yong and Lou, Jian-Guang and Wang, Yue},
title = {lambdaOpt: Learn to Regularize Recommender Models in Finer Levels},
booktitle = {Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery \&\#38; Data Mining},
series = {KDD '19},
year = {2019},
isbn = {978-1-4503-6201-6},
location = {Anchorage, AK, USA},
pages = {978--986},
numpages = {9},
url = {http://doi.acm.org/10.1145/3292500.3330880},
doi = {10.1145/3292500.3330880},
acmid = {3330880},
publisher = {ACM},
address = {New York, NY, USA},
keywords = {matrix factorization, regularization hyperparameter, top-k recommendation},
}
Any feedback is much appreciated! Drop us a line at yihong.chen@cs.ucl.ac.uk or simply open a new issue.
- Test LambdaOpt in Higher