Reginx

Reginx is short for recommendation engine X. I plan to build most parts of modern recommendation engine from scratch.
Initial plan including:

Popular machine learning models like CF, FM, XGBoost, TwoTower, W&D, DeepFM, DCN, MaskNet, SASRec, Bert4Rec, Transformer, etc.
Online inference service written by Golang, including candidate generator, ranking and re-ranking layers
Feature engineering and preprocessing, including both online and offline part
Diversity approaches, like MMR, DPP
Deduplication approaches, like LSH or BloomFilter
Training data pipeline
Model registry, monitoring and versioning

Supported models

Tensorflow 2 and Google Cloud is used for model training and performance tracking. The conda environment config is here.
I have a personal blog in substack explaining the models and I put the corresponding links in the table below.

Model	Paper	Code	Blog
Factorization Machines	Factorization Machines	Code	Post
DeepFM	DeepFM: A Factorization-Machine based Neural Network for CTR Prediction	Code	Post
XDeepFM	xDeepFM: Combining Explicit and Implicit Feature Interactions for Recommender Systems	Code	Post
AutoInt	AutoInt: Automatic Feature Interaction Learning via Self-Attentive Neural Networks	Code	Post
DCN	Deep & Cross Network for Ad Click Predictions	Code	Post
DCN V2	DCN V2: Improved Deep & Cross Network and Practical Lessons for Web-scale Learning to Rank Systems	Code	Post
DLRM	Deep Learning Recommendation Model for Personalization and Recommendation Systems	Code	Post
FinalMLP	FinalMLP: An Enhanced Two-Stream MLP Model for CTR Prediction	DualMLP FinalMLP	Post
MaskNet	MaskNet: Introducing Feature-Wise Multiplication to CTR Ranking Models by Instance-Guided Mask	Code	Post
TwoTower	Sampling-Bias-Corrected Neural Modeling for Large Corpus Item Recommendations Mixed Negative Sampling for Learning Two-tower Neural Networks in Recommendations	Code	Post1 Post2 Post3
Wide and Deep	Wide & Deep Learning for Recommender Systems	Code	Post
Transformer	Attention Is All You Need	Code	Post
BERT	BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding	Code	Post
SASRec	Self-Attentive Sequential Recommendation	Code	Post
BERT4REC	BERT4Rec: Sequential Recommendation with Bidirectional Encoder Representations from Transformer	Code	Post
ESMM	Entire Space Multi-Task Model: An Effective Approach for Estimating Post-Click Conversion Rate	Code	Post
MMoE	Modeling Task Relationships in Multi-task Learning with Multi-gate Mixture-of-Experts	Code	Post

Local Training

Here is an example to train a two-tower model in local machine.

Setup Conda

Setup your conda environment using the conda config here.

conda env create -f environment.yml
conda activate tf

Set your PYTHONPATH to the root folder of this project. Or you can add it to your bashrc:

export PYTHONPATH=/your_project_folder/reginx

Prepare Movielens Training Data

You can run this script to generate meta and training data in your local directory. By default, it's using the movielens-1m from TensorFlow datasets.
And copy your dataset files to your local /tmp/train, /tmp/test, /tmp/item folder. Notice that the TwoTower model implementation require 3 kinds of files, train files for training, test files for test and item files for mixing global negative samples.
If you want to use your dataset other than movielens, please prepare your own dataset and save it to your local directory.

Check Config File

There is example config file for candidate-retriever training.
If you want to use your dataset other than movielens, please prepare your own query and candidate embedding class.

model:
  temperature: 0.05
  # specify training model under models folder
  base_model: TwoTower
  # specify query embedding model under models/features folder
  query_emb: MovieLensQueryEmb
  # specify candidate embedding model under models/features folder
  candidate_emb: MovieLensCandidateEmb
  # specify the unique key for candidates
  item_id_key: movie_id

train:
  # specify task under tasks folder
  task_name: CandidateRetrieverTrain
  epochs: 1
  batch_size: 256
  mixed_negative_batch_size: 128
  learning_rate: 0.05
  train_data: movielens/data/ratings_train
  test_data: movielens/data/ratings_test
  candidate_data: movielens/data/movies
  meta_data: trainer/meta/movie_lens.json
  model_dir: trainer/saved_models/movielens_cr
  log_dir: logs

Training

Simply run the script below and specify your the config file in you activated conda environment.

python trainer/local_train.py -c movielens_candidate_retriever

By default, the training metrics show once per 1000 training steps for faster training. You can modify the setting by tuning the steps_per_execution hyperparameter while compiling model.
After the training, evaluation will be run on the test dataset. You should see metrics like:

391/391 [==============================] - 50s 129ms/step - factorized_top_k/top_1_categorical_accuracy: 0.0036 - factorized_top_k/top_5_categorical_accuracy: 0.0181 - factorized_top_k/top_10_categorical_accuracy: 0.0349 - factorized_top_k/top_50_categorical_accuracy: 0.1428 - factorized_top_k/top_100_categorical_accuracy: 0.2409 - loss: 1406.8086 - regularization_loss: 7.9244 - total_loss: 1414.7329

Name		Name	Last commit message	Last commit date
Latest commit History 56 Commits
environment		environment
scheduler		scheduler
trainer		trainer
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Reginx

Supported models

Local Training

Setup Conda

Prepare Movielens Training Data

Check Config File

Training

About

Releases

Packages

Contributors 2

Languages

caesarjuly/reginx

Folders and files

Latest commit

History

Repository files navigation

Reginx

Supported models

Local Training

Setup Conda

Prepare Movielens Training Data

Check Config File

Training

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages