Shachi is a modular framework designed to simplify building LLM-based agents for Agent-Based Modeling (ABM). By separating each agent into four core components—LLM, Tools, Memory, and Configuration—Shachi enables reproducible experiments across a variety of social, economic, and cognitive simulation tasks.
| ❌ Existing LLM-based ABM | ✅ Shachi |
|---|---|
| Brittle, scattered agent designs | 4-core architecture (LLM, Memory, Tools, Config) |
| Ad-hoc env-agent interfaces | Unified gym-style API |
| Low reproducibility & poor reusability | 10-task benchmark + real-world validation |
- Installation & Dependencies
- Quickstart
- Implemented ABM Tasks
- How to Write Your Agent
- How to Write Your Environment
- Execute your Agent and Environments
- Shachi supports Hydra
- (Extra) Tools & Memory in Practice
curl -LsSf https://astral.sh/uv/install.sh | shBy default, uv will select the highest supported Python version (>= 3.10) for Shachi.
Since Shachi has many dependencies that may have compatibility issues with the latest Python versions, you may want to pin Python to a specific version:
uv python pin 3.12 # Pin Python version for this project only
uv sync # Install dependencies and verify dependency setupexport OPENAI_API_KEY="..."
uv run scripts/main.py --config-name "config" task=psychobench agent=psychobenchChange task / agent to switch experiments.
| Task | Description | Example Run | Reference |
|---|---|---|---|
| PsychoBench | Benchmarks psychological decision tasks | uv run scripts/main.py task=psychobench agent=psychobench |
arXiv:2310.01386 |
| LM_Caricature | Simulates online forum style interactions | uv run scripts/main.py task=lm_caricature agent=lm_caricature task.scenario=onlineforum |
arXiv:2310.11501 |
| Cognitive Biases | Measures LLM biases (Availability, Anchoring, etc.) | uv run scripts/main.py task=cognitive_biases agent=cognitive_biases |
arXiv:2410.15413 |
| EmotionBench | Tests emotional recognition & response | uv run scripts/main.py task=emotionbench agent=emotionbench |
arXiv:2308.03656 |
| DigitMat | Emergent analogical reasoning | uv run scripts/main.py task=digitmat agent=digitmat |
arXiv:2212.09196 |
| StockAgent | Stock trading simulations | uv run scripts/main.py task=stockagent agent=stockagent |
arXiv:2407.18957 |
| Sotopia | Multi-agent social simulation | uv run scripts/main.py task=sotopia agent=sotopia batchsize=30 |
arXiv:2310.11667 |
| AuctionArena | Agents in auction environments | uv run scripts/main.py task=auction_arena agent=auction_arena |
arXiv:2310.05746 |
| EconAgent | Macroeconomic simulations (GDP, inflation, etc.) | uv run scripts/main.py task=econagent agent=econagent task.episode_length=240 task.num_agents=100 |
arXiv:2310.10436 |
| OASIS | Social media simulation (posts, comments, reactions) | uv run scripts/main.py task=oasis agent=oasis |
arXiv:2411.11581 |
In addition to the benchmark tasks above, our paper also introduces several exploratory studies:
- Carrying Memory to the Next Life
- Living in Multiple Worlds
- LoRA Weight Experiments
All of these studies are organized under docs/exploratory_studies.
-
Cognitive Biases (cognitive-biases-in-llms): Download
full_dataset.csvfrom this repository toshachi/env/cognitive_biases/data. -
Sotopia:
docker rm -f redis-stack # `sotopia install` fails if this container exists uv run sotopia install # (Select "Docker", "Yes", and "Sotopia Dataset (ICLR 2024, CC-BY-SA 4.0) " )
Define your agent by subclassing the abstract base class Agent and implementing its single hook: step(observation) -> str | BaseModel | None.
When an environment specifies observation.response_type (a Pydantic model), the agent must align its output with that schema. This ensures consistent communication between agents and environments. Two main approaches are supported:
- Structured Output: request the LLM to produce JSON matching the schema.
- Function Calling: expose a function derived from the schema and require the LLM to call it.
This design keeps the agent as a modular component within Shachi’s architecture, ensuring that communication remains standardized and reproducible across tasks.
Shachi’s 4-core view of an agent:
- LLM: how you call the model (free-form, structured output, or function calling).
- Tools: optional side-effect–free functions exposed by the environment.
- Memory: optional retrieval of past interactions.
- Config: hyperparameters (model name, temperature, etc.).
If the environment sets observation.response_type = SomePydanticModel, you can request JSON output and parse it safely.
import litellm
from shachi import Agent
class NoMemoryAgent_using_StructuredOutput(Agent):
def __init__(self, model: str = "openai/gpt-4o-mini", temperature: float = 1.0):
self.model = model
self.temperature = temperature
async def step(self, observation: Observation):
messages = [{"role": "user", "content": observation.format_as_prompt_payload()}]
completion = await litellm.acompletion(
model=self.model,
messages=messages,
temperature=self.temperature,
response_format=observation.response_type, # << Pydantic model
)
json_text = completion.choices[0].message.content
return observation.response_type.model_validate_json(json_text)You can also coerce the model to “call” a function whose parameters mirror your Pydantic response type.
from typing import Literal
import litellm
import pydantic
from shachi import Agent
class NoMemoryAgent_using_FunctionCalling(Agent):
def __init__(self, model: str = "openai/gpt-4o-mini", temperature: float = 1.0):
self.model = model
self.temperature = temperature
async def step(self, observation: Observation):
assert observation.response_type is not None
messages = [{"role": "user", "content": observation.format_as_prompt_payload()}]
tools = [{
"type": "function",
"function": {
"name": observation.response_type.__name__,
"description": f"Return a {observation.response_type.__name__}",
"parameters": observation.response_type.model_json_schema(),
},
}]
completion = await litellm.acompletion(
model=self.model,
messages=messages,
temperature=self.temperature,
tools=tools,
tool_choice={"type": "function", "function": {"name": observation.response_type.__name__}},
)
call = completion.choices[0].message.tool_calls[0]
return observation.response_type.model_validate_json(call.function.arguments)Define your environment by subclassing the abstract base class Environment and overriding the gym-style lifecycle hooks:
num_agents() -> intreset() -> dict[int, Observation]step(responses) -> dict[int, Observation]done() -> boolget_result() -> TResult
You also define your message, observation, and (optional) tool responses as Pydantic models.
from typing import Literal
import pydantic
from shachi import Environment, Message, Observation
RPSHand = Literal["rock", "paper", "scissors"]
class RPSMessage(Message):
hand: RPSHand | None = None
class RPSResponse(pydantic.BaseModel):
hand: RPSHand
class RPSObservation(Observation[RPSMessage]):
alive: list[bool]
def format_as_prompt_text(self) -> str:
lines = ["Choose: rock, paper, or scissors."]
lines += [
f"Opponent {m.src_agent_id} played {m.hand}."
if m.hand is not None else
f"Opponent {m.src_agent_id} made an invalid move."
for m in self.messages
]
return "\n".join(lines)
class RPSEnvironment(Environment):
def __init__(self, n: int):
self._n = n
self.alive = [True] * n
self.prev: dict[int, RPSHand | None] = {}
self.t = 0
def num_agents(self) -> int: return self._n
def done(self) -> bool: return sum(self.alive) <= 1
async def reset(self):
self.alive = [True] * self._n
self.prev = {}
self.t = 0
return self._obs()
def _obs(self):
if self.done():
return {i: RPSObservation(agent_id=i, messages=[], reward=1.0,
response_type=None, alive=self.alive)
for i in range(self._n) if self.alive[i]}
obs: dict[int, RPSObservation] = {}
for i in range(self._n):
if not self.alive[i]: continue
msgs = [
RPSMessage(time=self.t, src_agent_id=j, dst_agent_id=i, hand=self.prev[j])
for j in range(self._n)
if j != i and self.alive[j] and j in self.prev
]
obs[i] = RPSObservation(agent_id=i, messages=msgs, reward=None,
response_type=RPSResponse, alive=self.alive)
return obs
@staticmethod
def _extract_hand(resp) -> RPSHand | None:
if resp is None: return None
if isinstance(resp, pydantic.BaseModel): return resp.hand
try:
return RPSResponse.model_validate_json(resp).hand
except pydantic.ValidationError:
text = str(resp).lower()
for h in ("rock", "paper", "scissors"):
if h in text: return h # type: ignore[return-value]
return None
async def step(self, responses):
self.t += 1
idx = {"rock": 0, "paper": 1, "scissors": 2}
valid: dict[int, int] = {}
for aid, r in responses.items():
h = self._extract_hand(r)
self.prev[aid] = h
if h is not None: valid[aid] = idx[h]
if len(valid) < 2: return self._obs()
s = set(valid.values())
if len(s) in (1, 3): return self._obs()
winning = (({0, 1, 2} - s).pop() + 2) % 3
for aid, v in valid.items():
self.alive[aid] = (v == winning)
return self._obs()
def get_result(self):
return {"alive": self.alive}You can evaluate programmatically (handy for notebooks/CI) or via CLI/Hydra. This mirrors the provided Usage example and runs multiple episodes.
import asyncio
N_EPISODES = 10
async def run():
agents = [
NoMemoryAgent_using_StructuredOutput(),
NoMemoryAgent_using_FunctionCalling(),
]
env = RPSEnvironment(len(agents))
total = {i: 0.0 for i in range(len(agents))}
for _ in range(N_EPISODES):
observations = await env.reset()
while not env.done():
futures = {i: agents[i].step(obs) for i, obs in observations.items()}
responses = dict(zip(futures.keys(), await asyncio.gather(*futures.values())))
observations = await env.step(responses)
for i, obs in observations.items():
if obs.reward is not None:
total[i] += obs.reward
print("Total rewards:", total)
if __name__ == "__main__":
asyncio.run(run())Use the unified entrypoint and change configs on the fly:
# Psychobench with its default agent
uv run scripts/main.py --config-name "config" task=psychobench agent=psychobench
# Switch to Sotopia and increase batch size
uv run scripts/main.py --config-name "config" task=sotopia agent=sotopia batchsize=30
# EconAgent with longer horizon and more agents
uv run scripts/main.py --config-name "config" task=econagent agent=econagent \
task.episode_length=240 task.num_agents=100In Shachi, agents and environments are implemented in a decoupled manner, which makes it easy to swap and combine them freely.
For example, you can run the OASIS environment with the StockAgent agent:
uv run scripts/main.py --config-name "config" 'task=oasis' 'agent=stockagent'In Shachi, each component of an agent is implemented as a modular part, which allows the backend model to be easily customized.
You can use not only standard APIs but also endpoints of your own locally hosted models (e.g., via vLLM).
For example, config/agent/oasis.yaml:
_target_: src.shachi.agent.oasisagent.create_agents_functioncalling
num_agents: 0
model: "openai/gpt-4o-mini" # Replace with your preferred model
temperature: 0.5
memory_cls_path: src.shachi.agent.oasisagent.CamelMemory
memory_cls_kwargs:
window_size: 5The base interfaces include:
Tool[TParameters]with afun(params) -> ToolResponse, andToolResponse.format_as_prompt_text().BaseMemorywithadd_record,retrieve,clear.
from collections import deque
from shachi import BaseMemory
class SlidingWindowMemory(BaseMemory):
def __init__(self, k: int = 20):
self.k = k
self.buf = deque(maxlen=k)
def add_record(self, messages: list[dict[str, str]]):
self.buf.extend(messages)
def retrieve(self, query: str | None = None) -> str:
# minimal: ignore query, just return joined texts
return "\n".join(m["content"] for m in self.buf if "content" in m)
def clear(self):
self.buf.clear()import pydantic
from shachi import Tool, ToolResponse
class AddParams(pydantic.BaseModel):
a: float
b: float
class AddResponse(ToolResponse):
params: AddParams
result: float
def format_as_prompt_text(self) -> str:
return f"add({self.params.a}, {self.params.b}) = {self.result}"
def add_fun(p: AddParams) -> AddResponse:
return AddResponse(params=p, result=p.a + p.b)
add_tool = Tool(
name="add",
description="Add two numbers",
parameters_type=AddParams,
fun=add_fun,
)
# Environments can inject [add_tool] into `Observation.tools`Agents read observation.tools and typically expose them to the LLM via function-calling.
For concrete implementations, see Memory in src/shachi/agent/stockagent.py and Tool usage in src/shachi/env/stockagent/stockenv.py.
If you use Shachi in your research, please cite it as follows:
@article{kuroki2025shachi,
title={Reimagining Agent-based Modeling with Large Language Model Agents via Shachi},
author={Kuroki, So and Tian, Yingtao and Misaki, Kou and Ikegami, Takashi and Akiba, Takuya and Tang, Yujin},
journal={arXiv preprint arXiv:2509.21862},
year={2025}
}

