Skip to content

zerolovesea/NextRec

Python PyTorch License Version

English | 中文文档

A Unified, Efficient, and Scalable Recommendation System Framework

Introduction

NextRec is a modern recommendation framework built on PyTorch, delivering a unified experience for modeling, training, and evaluation. It follows a modular design with rich model implementations, data-processing utilities, and engineering-ready training components. NextRec focuses on large-scale industrial recall scenarios on Spark clusters, training on massive offline parquet features.

Why NextRec

  • Unified feature engineering & data pipeline: Dense/Sparse/Sequence feature definitions, persistent DataProcessor, and batch-optimized RecDataLoader, matching offline feature training/inference in industrial big-data settings.
  • Multi-scenario coverage: Ranking (CTR/CVR), retrieval, multi-task learning, and more marketing/rec models, with a continuously expanding model zoo.
  • Developer-friendly experience: Stream processing/training/inference for csv/parquet/pathlike data, plus GPU/MPS acceleration and visualization support.
  • Efficient training & evaluation: Standardized engine with optimizers, LR schedulers, early stopping, checkpoints, and detailed logging out of the box.

Architecture

NextRec adopts a modular and low-coupling engineering design, enabling full-pipeline reusability and scalability across data processing → model construction → training & evaluation → inference & deployment. Its core components include: a Feature-Spec-driven Embedding architecture, the BaseModel abstraction, a set of independent reusable Layers, a unified DataLoader for both training and inference, and a ready-to-use Model Zoo.

NextRec Architecture

The project borrows ideas from excellent open-source rec libraries. Early layers referenced torch-rechub but have been replaced with in-house implementations. torch-rechub remains mature in architecture and models; the author contributed a bit there—feel free to check it out.


Installation

You can quickly install the latest NextRec via pip install nextrec; Python 3.10+ is required.

Tutorials

See tutorials/ for examples covering ranking, retrieval, multi-task learning, and data processing:

To dive deeper, Jupyter notebooks are available:

Current version [0.3.3]: the matching module is not fully polished yet and may have compatibility issues or unexpected errors. Please raise an issue if you run into problems.

5-Minute Quick Start

We provide a detailed quick start and paired datasets to help you learn the framework. In datasets/ you’ll find an e-commerce sample dataset like this:

user_id item_id dense_0 dense_1 dense_2 dense_3 dense_4 dense_5 dense_6 dense_7 sparse_0 sparse_1 sparse_2 sparse_3 sparse_4 sparse_5 sparse_6 sparse_7 sparse_8 sparse_9 sequence_0 sequence_1 label
1 7817 0.14704075 0.31020382 0.77780896 0.944897 0.62315375 0.57124174 0.77009535 0.3211029 315 260 379 146 168 161 138 88 5 312 [170,175,97,338,105,353,272,546,175,545,463,128,0,0,0] [368,414,820,405,548,63,327,0,0,0,0,0,0,0,0] 0
1 3579 0.77811223 0.80359334 0.5185201 0.91091245 0.043562356 0.82142705 0.8803686 0.33748195 149 229 442 6 167 252 25 402 7 168 [179,48,61,551,284,165,344,151,0,0,0,0,0,0,0] [814,0,0,0,0,0,0,0,0,0,0,0,0,0,0] 1

Below is a short example showing how to train a DIN model. DIN (Deep Interest Network) won Best Paper at KDD 2018 for CTR prediction. You can also run python tutorials/example_ranking_din.py directly.

After training, detailed logs are available under nextrec_logs/din_tutorial.

import pandas as pd

from nextrec.models.ranking.din import DIN
from nextrec.basic.features import DenseFeature, SparseFeature, SequenceFeature

df = pd.read_csv('dataset/ranking_task.csv')

for col in df.columns and 'sequence' in col: # csv loads lists as text; convert them back to objects
    df[col] = df[col].apply(lambda x: eval(x) if isinstance(x, str) else x)

# Define feature columns
dense_features = [DenseFeature(name=f'dense_{i}', input_dim=1) for i in range(8)]

sparse_features = [SparseFeature(name='user_id', embedding_name='user_emb', vocab_size=int(df['user_id'].max() + 1), embedding_dim=32), SparseFeature(name='item_id', embedding_name='item_emb', vocab_size=int(df['item_id'].max() + 1), embedding_dim=32),]

sparse_features.extend([SparseFeature(name=f'sparse_{i}', embedding_name=f'sparse_{i}_emb', vocab_size=int(df[f'sparse_{i}'].max() + 1), embedding_dim=32) for i in range(10)])

sequence_features = [
    SequenceFeature(name='sequence_0', vocab_size=int(df['sequence_0'].apply(lambda x: max(x)).max() + 1), embedding_dim=32, padding_idx=0, embedding_name='item_emb'),
    SequenceFeature(name='sequence_1', vocab_size=int(df['sequence_1'].apply(lambda x: max(x)).max() + 1), embedding_dim=16, padding_idx=0, embedding_name='sparse_0_emb'),]

mlp_params = {
    "dims": [256, 128, 64],
    "activation": "relu",
    "dropout": 0.3,
}

model = DIN(
    dense_features=dense_features,
    sparse_features=sparse_features,
    sequence_features=sequence_features,
    mlp_params=mlp_params,
    attention_hidden_units=[80, 40],
    attention_activation='sigmoid',
    attention_use_softmax=True,
    target=['label'],                                     # target variable
    device='mps',                                         
    embedding_l1_reg=1e-6,
    embedding_l2_reg=1e-5,
    dense_l1_reg=1e-5,
    dense_l2_reg=1e-4,
    session_id="din_tutorial",                            # experiment id for logs
)

# Compile model with optimizer and loss
model.compile(
            optimizer = "adam",
            optimizer_params = {"lr": 1e-3, "weight_decay": 1e-5},
            loss = "focal",
            loss_params={"gamma": 2.0, "alpha": 0.25},
        )

model.fit(
    train_data=df,
    metrics=['auc', 'gauc', 'logloss'],  # metrics to track
    epochs=3,
    batch_size=512,
    shuffle=True,
    user_id_column='user_id'             # used for GAUC
)

# Evaluate after training
metrics = model.evaluate(
    df,
    metrics=['auc', 'gauc', 'logloss'],
    batch_size=512,
    user_id_column='user_id'
)

Supported Models

Ranking Models

Model Paper Year Status
FM Factorization Machines ICDM 2010 Supported
AFM Attentional Factorization Machines: Learning the Weight of Feature Interactions via Attention Networks IJCAI 2017 Supported
DeepFM DeepFM: A Factorization-Machine based Neural Network for CTR Prediction IJCAI 2017 Supported
Wide&Deep Wide & Deep Learning for Recommender Systems DLRS 2016 Supported
xDeepFM xDeepFM: Combining Explicit and Implicit Feature Interactions KDD 2018 Supported
FiBiNET FiBiNET: Combining Feature Importance and Bilinear Feature Interaction for CTR Prediction RecSys 2019 Supported
PNN Product-based Neural Networks for User Response Prediction ICDM 2016 Supported
AutoInt AutoInt: Automatic Feature Interaction Learning CIKM 2019 Supported
DCN Deep & Cross Network for Ad Click Predictions ADKDD 2017 Supported
DCN v2 DCN V2: Improved Deep & Cross Network and Practical Lessons for Web-scale Learning to Rank Systems KDD 2021 In Progress
DIN Deep Interest Network for CTR Prediction KDD 2018 Supported
DIEN Deep Interest Evolution Network AAAI 2019 Supported
MaskNet MaskNet: Feature-wise Gating Blocks for High-dimensional Sparse Recommendation Data 2020 Supported

Retrieval Models

Model Paper Year Status
DSSM Learning Deep Structured Semantic Models CIKM 2013 Supported
DSSM v2 DSSM with pairwise BPR-style optimization - Supported
YouTube DNN Deep Neural Networks for YouTube Recommendations RecSys 2016 Supported
MIND Multi-Interest Network with Dynamic Routing CIKM 2019 Supported
SDM Sequential Deep Matching Model - Supported

Multi-task Models

Model Paper Year Status
MMOE Modeling Task Relationships in Multi-task Learning KDD 2018 Supported
PLE Progressive Layered Extraction RecSys 2020 Supported
ESMM Entire Space Multi-task Model SIGIR 2018 Supported
ShareBottom Multitask Learning - Supported
POSO POSO: Personalized Cold-start Modules for Large-scale Recommender Systems 2021 Supported
POSO-IFLYTEK POSO with PLE-style gating for sequential marketing tasks - Supported

Generative Models

Model Paper Year Status
TIGER Recommender Systems with Generative Retrieval NeurIPS 2023 In Progress
HSTU Hierarchical Sequential Transduction Units - In Progress

Contributing

We welcome contributions of any form!

How to Contribute

  1. Fork the repository
  2. Create your feature branch (git checkout -b feature/AmazingFeature)
  3. Commit your changes (git commit -m 'Add AmazingFeature')
  4. Push your branch (git push origin feature/AmazingFeature)
  5. Open a Pull Request

Before submitting a PR, please run tests using pytest test/ -v or python -m pytest to ensure everything passes.

Code Style

  • Follow PEP8
  • Provide unit tests for new functionality
  • Update documentation accordingly

Reporting Issues

When submitting issues on GitHub, please include:

  • Description of the problem
  • Reproduction steps
  • Expected behavior
  • Actual behavior
  • Environment info (Python version, PyTorch version, etc.)

License

This project is licensed under the Apache 2.0 License.


Contact


Acknowledgements

NextRec is inspired by the following great open-source projects:

  • torch-rechub — Flexible, easy-to-extend recommendation framework
  • FuxiCTR — Configurable, tunable, and reproducible CTR library
  • RecBole — Unified, comprehensive, and efficient recommendation library

Special thanks to all open-source contributors!


About

A unified, efficient, and extensible PyTorch-based recommendation library

Topics

Resources

License

Code of conduct

Contributing

Stars

Watchers

Forks

Packages

No packages published