AlignXplore

Extended Inductive Reasoning for Personalized Preference Inference from Behavioral Signals

AlignXplore is a deep user understanding framework designed to infer rich and human-readable preference summaries from complex—even noisy—behavioral signals.

Streaming: Built for streaming scenarios. As new signals arrive, AlignXplore incrementally updates existing preference summaries through lightweight reasoning—eliminating the need to rerun costly end-to-end inference from scratch.
Flexible: Supports heterogeneous input formats out of the box, including preference pairs (e.g., post–chosen–rejected comparisons) and free-form text signals such as user-generated content.
Robust: Delivers stable and reliable performance under noisy inputs and remains resilient to abrupt or long-term shifts in user preferences.

Task Overview

Our model performs human-like inductive reasoning for preference inference by progressively refining its preference hypotheses through iterative testing and validation. These inferred preferences can then guide diverse downstream personalization tasks.

Figure: Preference inference task overview.

Training Process

AlignXplore combines cold-start training using synthetic data from teacher models with reinforcement learning optimization to enhance the model’s reasoning capabilities.

Figure: Two-stage training process of AlignXplore.

🚀 Quick Start

Requirements

To install requirements:

pip install -r requirements.txt

Training

Cold-start training

cd cold-start training
./sft.sh # Set `data_path` to `cold_start.json` for the base setting, and `streaming_cold_start.json` for the streaming setting.

Reinforcement learning

The code is developed based on Open-Reasoner-Zero.

Base setting

Train with $R_{jud}$

cd reinforcement learning
./run_ppo_jud.sh # with `prompt_data` set to `rl_train.json`

Train with $R_{gen}$

Modify the file /reinforcement learning/orz/ppo/actors.py:

Change line 1027 to RewardRayActor = ray.remote(num_gpus=1)(genRewardRayActorBase).

cd reinforcement learning
./run_ppo_gen.sh # with `prompt_data` set to `rl_train.json`

Streaming setting

Train with $R_{jud}$

cd reinforcement learning
./run_ppo_streaming.sh # with `prompt_data` set to `streaming_rl_train.json`

Evaluation

$ACC_{jud}$

cd eval
For the model you have trained, run python train_gen_pref.py; for the open-source models, run python notrain_gen_pref.py.
python eval_preference.py

📄 Citation

@misc{li2025extendedinductivereasoningpersonalized,
      title={Extended Inductive Reasoning for Personalized Preference Inference from Behavioral Signals}, 
      author={Jia-Nan Li and Jian Guan and Wei Wu and Rui Yan},
      year={2025},
      eprint={2505.18071},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2505.18071}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
cold-start training		cold-start training
eval		eval
reinforcement learning		reinforcement learning
README.md		README.md
requirements.txt		requirements.txt
task.png		task.png
training.png		training.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

AlignXplore

Task Overview

Training Process

🚀 Quick Start

Requirements

Training

Cold-start training

Reinforcement learning

Base setting

Train with $R_{jud}$

Train with $R_{gen}$

Streaming setting

Train with $R_{jud}$

Evaluation

$ACC_{jud}$

📄 Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

AntResearchNLP/AlignXplore

Folders and files

Latest commit

History

Repository files navigation

AlignXplore

Task Overview

Training Process

🚀 Quick Start

Requirements

Training

Cold-start training

Reinforcement learning

Base setting

Train with $R_{jud}$

Train with $R_{gen}$

Streaming setting

Train with $R_{jud}$

Evaluation

$ACC_{jud}$

📄 Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages