This repository contains a PyTorch implementation of the paper:
"Large Language Models meet Collaborative Filtering: An Efficient All-round LLM-based Recommender System"
Kim et al., KDD 2024
This work focuses on two key goals:
- ✅ Reproduction: A correct and faithful implementation of the A-LLMRec architecture, including its two-stage training process.
- 📊 Evaluation: Performance testing on standard datasets and a comparative analysis with the results from the original authors.
This implementation was developed and evaluated on the Beauty_5 and Video_Game_5 Amazon review datasets.
While remaining faithful to the original methodology, this implementation improves usability and clarity with:
-
🔁 Single, Self-Contained Script
Entire experimental pipeline (data loading ➝ training ➝ evaluation) in one clean, well-documented script. -
📦 Modern Data Handling
Efficient use ofpandasand PyTorch’sDatasetandDataLoaderabstractions. -
🧩 Modular, Clear Code
Functions and classes (e.g.,AlignmentModule) are clearly named after the components in the original paper. -
💾 Resumable Training
Automatic checkpointing enables you to resume training without losing progress. -
📈 Comprehensive Reporting
Logs results to TensorBoard, saves plots, and writes a summary of final metrics to.csv.
- Python 3.8+
- PyTorch 1.12+
- Transformers
- Sentence-Transformers
- Pandas, NumPy, TQDM, Matplotlib
Clone this repository:
git clone [your-repo-url]
cd [your-repo-name]Install required packages:
pip install torch transformers sentence-transformers pandas numpy tqdm matplotlib
Prepare your data: Place your dataset file (e.g., Beauty_5.json) in a known location (e.g., Datasets folder in Google Drive).
if __name__ == '__main__':
# ==========================================================================
# 1. IMPORTANT: Update this path to the location of your .json dataset file
# ==========================================================================
dataset_path = "/content/drive/MyDrive/Datasets/Beauty_5.json"
dataset_name = "Beauty_5"
# 2. Set your desired K values for evaluation
validation_k = 5 # K for validation during Stage 1 training
test_k = 10 # K for final test set evaluation
# 3. Run the experiment
run_experiment(dataset_name, dataset_path, validation_k=validation_k, test_k=test_k)
📄 Citation If you use this implementation in your work, please cite the original paper:
@inproceedings{kim2024large,
title={Large language models meet collaborative filtering: An efficient all-round llm-based recommender system},
author={Kim, Sein and Kang, Hongseok and Choi, Seungyoon and Kim, Donghyun and Yang, Minchul and Park, Chanyoung},
booktitle={Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining},
year={2024}
}