Team Project: Evan, Rita, Abby, Kelly
This project was developed for the AgentSociety Challenge, a framework for building Large Language Model (LLM) agents that simulate realistic societal behaviors.
Our objective was to build a User Simulation Agent capable of generating reviews and ratings that accurately reflect a specific user's historical preferences and writing style. By implementing advanced prompting and retrieval strategies, we significantly outperformed the competition baseline.
Key Results:
- Review Generation Quality: +5.06% improvement over baseline.
- Preference Estimation: +4.37% improvement over baseline.
- Overall Quality: +4.72% improvement over baseline.
Our agent improves upon standard zero-shot generation using a modular architecture located in src/:
File: src/run_simulation_agent.py (Orchestration) & src/prompt_utils.py (Prompts)
We implemented a structured 3-step reasoning process to guide the LLM:
- User Analysis: Classifies the user (e.g., "Generous" vs. "Critical") based on historical rating distributions.
- Item Assessment: Evaluates the target business's reputation from public metrics.
- Synthesis: Determines the rating by interacting the user's tendency with the business's quality.
File: src/retrieve_expanded_context.py
To capture broader context, we implemented Query Expansion. Instead of using a single raw query, the agent uses an LLM to generate multiple semantically related search queries. This allows the memory module to retrieve past reviews that share thematic similarities (e.g., "service speed") rather than just keyword overlaps.
File: src/smart_review_selection.py
We addressed the context window limitation for users with extensive histories (100+ reviews) using a Weighted Average Embedding strategy.
- We calculate a user's "persona embedding" weighted by rating intensity (giving higher weight to 5-star and 1-star reviews).
- We filter and retain only the top-15 most relevant reviews for the prompt context.
File: src/user_pattern_analysis.py
We developed a standalone module to analyze raw user history and extract a "User Persona" without biasing the rating logic:
- Verbosity: Enforces sentence length consistency (e.g., "Write short, 1-2 sentence reviews").
- Vocabulary: Identifies and reuses the user's recurring signature phrases.
We evaluated the agent using the AgentSociety Challenge benchmark metrics on the Yelp dataset.
| Metric | Baseline Score | Our Agent (CoT + SCS) | Improvement |
|---|---|---|---|
| Review Generation | 0.830 | 0.872 | +5.06% |
| Preference Estimation | 0.822 | 0.858 | +4.37% |
| Overall Quality | 0.826 | 0.865 | +4.72% |
src/: Core agent logic.run_simulation_agent.py: The main agent that integrates CoT, RAG, Memory, and Style modules.user_pattern_analysis.py: Logic for extracting writing style constraints.smart_review_selection.py: Weighted embedding logic for context management.retrieve_expanded_context.py: RAG logic for query expansion.prompt_utils.py: Chain-of-Thought prompt templates.
websocietysimulator/: The simulation environment framework (Required dependency).scripts/: Utilities.process_yelp_data.py: Pipeline to clean and format raw Yelp Open Dataset files.
docs/: Documentation.simulation_details.md: Technical breakdown of the simulator inputs/outputs.presentation.pdf: Presentation slides covering methodology and ablation studies.
-
Clone the repository:
git clone https://github.com/aseseri/agent-society-user-simulation.git cd agent-society-user-simulation -
Install dependencies:
pip install -r requirements.txt
-
Configuration: Create a
.envfile in the root directory and add your OpenAI API key:OPENAI_API_KEY=sk-...
Download the Yelp Open Dataset and run the processing script:
python scripts/process_yelp_data.py --input_dir data/raw --output_dir data/processed
To run the complete User Simulation Agent (with CoT, RAG, and Style Enforcement enabled):
python src/run_simulation_agent.py --num_tasks 10 --output results/final_run.json
Based on the AgentSociety Challenge framework.