Extended Inductive Reasoning for Personalized Preference Inference from Behavioral Signals
AlignXplore is a deep user understanding framework designed to infer rich and human-readable preference summaries from complex—even noisy—behavioral signals.
- Streaming: Built for streaming scenarios. As new signals arrive, AlignXplore incrementally updates existing preference summaries through lightweight reasoning—eliminating the need to rerun costly end-to-end inference from scratch.
- Flexible: Supports heterogeneous input formats out of the box, including preference pairs (e.g., post–chosen–rejected comparisons) and free-form text signals such as user-generated content.
- Robust: Delivers stable and reliable performance under noisy inputs and remains resilient to abrupt or long-term shifts in user preferences.
Our model performs human-like inductive reasoning for preference inference by progressively refining its preference hypotheses through iterative testing and validation. These inferred preferences can then guide diverse downstream personalization tasks.
Figure: Preference inference task overview.
AlignXplore combines cold-start training using synthetic data from teacher models with reinforcement learning optimization to enhance the model’s reasoning capabilities.
Figure: Two-stage training process of AlignXplore.
To install requirements:
pip install -r requirements.txt
cd cold-start training
./sft.sh # Set `data_path` to `cold_start.json` for the base setting, and `streaming_cold_start.json` for the streaming setting.
The code is developed based on Open-Reasoner-Zero.
cd reinforcement learning
./run_ppo_jud.sh # with `prompt_data` set to `rl_train.json`
Modify the file /reinforcement learning/orz/ppo/actors.py:
- Change line 1027 to
RewardRayActor = ray.remote(num_gpus=1)(genRewardRayActorBase).
cd reinforcement learning
./run_ppo_gen.sh # with `prompt_data` set to `rl_train.json`
cd reinforcement learning
./run_ppo_streaming.sh # with `prompt_data` set to `streaming_rl_train.json`
cd eval- For the model you have trained, run
python train_gen_pref.py; for the open-source models, runpython notrain_gen_pref.py. python eval_preference.py
@misc{li2025extendedinductivereasoningpersonalized,
title={Extended Inductive Reasoning for Personalized Preference Inference from Behavioral Signals},
author={Jia-Nan Li and Jian Guan and Wei Wu and Rui Yan},
year={2025},
eprint={2505.18071},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2505.18071},
}

