python>=3.8
transformers>=4.22.2
numpy
pytorch>=1.10.0
scikit-learn
-
Download dataset
Take Amazon-Books for example, download the dataset to folder
data/amz/raw_data/
-
Preprocessing: in folder
preprocess
- run
python preprocess_amz.py
. - run
generate_data_and_prompt.py
to generate data for CTR and re-ranking task, as well as prompt for LLM.
- run
-
Knowledge generation
You can use any generative LLM and prompts we obtained before to generate knowledge. Here, we don't release the code for knowledge generation but give the knowledge generated by LLM in folder
data/knowledge
.data/amz/item.klg
: factual knowledge about books in Amazon-Books. Note that some books are filtered.data/amz/user.klg
: preference knowledge about users in Amazon-Books. Note that some users are filtered.data/ml-1m/item.klg
: factual knowledge about movies in MovieLens-1M.data/ml-1m/user.klg
: preference knowledge about users in MovieLens-1M.
Above files are all saved in
json
, with each key as the raw id of user/item in the original dataset and each value as the corresponding prompt and knowledge generated by LLM.To avoid information leakage, we also provide
end_index.json
, where each user has an index n. We sort the interaction sequence of each user by time (in ascending order) and give each interaction an index. Then the first n interactions are used for prompting LLM to get preference knowledge. In our implementation, the first n interaction will not appear in the test set to prevent information leakage. -
Knowledge encoding: in folder
knowledge_encoding
Run
python lm_encoding.py
-
RS: in folder
RS
Run
python run_ctr.py
for ctr taskRun
python run_rerank.py
for reranking task
If you find this work useful, please cite our paper and give us a shining star 🌟
@article{xi2023openworld,
title={Towards Open-World Recommendation with Knowledge Augmentation from Large Language Models},
author={Yunjia Xi and Weiwen Liu and Jianghao Lin and Jieming Zhu and Bo Chen and Ruiming Tang and Weinan Zhang and Rui Zhang and Yong Yu},
journal={arXiv preprint arXiv:2306.10933},
year={2023}
}