Unified Memory Agent

A comprehensive evaluation framework for testing memory-based conversational AI agents across multiple datasets and metrics.

Overview

This system evaluates how well AI agents can answer questions based on accumulated conversational memory over time. It supports several benchmark datasets.

Installation

This project requires setting up two separate Python environments to handle different dependencies.

Prerequisites

Python 3.10+
uv package installer (recommended for faster installs)

Environment 1: Main Project Environment

# Clone the repository
git clone https://github.com/ictnlp/unified-memory-agent.git
cd unified-memory-agent

uv sync
source .venv/bin/activate

Environment 2: Infinity_emb Environment

# Navigate to infinity_emb directory
cd external/infinity/libs/infinity_emb

# Create separate virtual environment
uv venv
source .venv/bin/activate

# Install infinity_emb with all optional dependencies
uv pip install -e ".[all]"

# Return to project root
cd ../../../..

Quick Start

Deploy

source .venv/bin/activate
vllm serve $MODEL -dp 2 -tp 4 --gpu-memory-utilization 0.8 --enforce-eager > vllm.log 2>&1 &
source ./external/infinity/.venv/bin/activate
infinity_emb v2 --model-id sentence-transformers/all-MiniLM-L6-v2 --port 8080 > infinity_emb.log 2>&1 &
# Wait for server
until curl -s http://localhost:8080/health > /dev/null 2>&1; do
    sleep 2
    echo "wait for server port 8080..."
done
until curl -s http://localhost:8000/health > /dev/null 2>&1; do
    sleep 2
    echo "wait for server port 8000..."
done

Run Demo

python test_verl_agent.py

Test on Benchmarks

Download Evaluation Datasets

Follow Memalpha, download the memalpha and memagentbench dataset.

mkdir -p ./data/raw/
# Download Memalpha training/test dataset
bash hfd.sh YuWangX/Memalpha --dataset --tool aria2c -x 10
mv Memalpha ./data/raw/memalpha

# Download MemoryAgentBench evaluation dataset (processed version for this project)
bash hfd.sh YuWangX/Memalpha-Memoryagentbench --dataset --tool aria2c -x 10
mv Memalpha ./data/raw/memoryagentbench

Follow Memagent, download the hotpotqa dataset.

bash hfd.sh BytedTsinghua-SIA/hotpotqa --dataset --tool aria2c -x 10
mv hotpotqa ./data/raw/

Download the convomem dataset

bash hfd.sh Salesforce/ConvoMem --dataset --tool aria2c -x 10
mv ConvoMem ./data/raw/

Prepare Ledger-QA

Set your api url in .env and run

python data/EvalDataset.py

Optionally, you can also deploy Qwen/Qwen3-30B-A3B-Instruct-2507, correspondingly modify ./data/synthv1_async.py, and then run data/EvalDataset.py to generate Ledger-QA.

Running UMA

# Run full evaluation (generation + scoring)
bash run_evaluation.sh

Running Baselines

# Run concat baseline evaluation
bash baselines/run_concat.sh
# After finish generation
bash run_score.sh

Supported Tasks & Agents

Tasks: convomem, locomo, longmemeval, memalpha, hotpotqa, msc, squad, banking77, clinic, nlu, perltqa, pubmed_rct, trec_coarse, trec_fine
Agents:concat, memagent， memagent_woq，mem1， memalpha，toolmem

Output Files

Generation Phase

responses_*.jsonl: Real-time JSONL output with questions and responses
Location: results/{task}/

Evaluation Phase

evaluated_*.jsonl: Complete evaluation results with all metrics
Statistics: Pretty-printed tables (console or text file)

Training

Prepare Datasets

Memalpha-full

cd data
python construct_memalpha_verl_dataset.py
python construct_memalpha_verl_dataset.py --val

hotpotqa Follow Memagent, we construct a hotpotqa containing 8192 items. We further chunk the context of each item for retrieving. You can download the processed data by

bash hfd.sh dp66/hotpotqa-uma --dataset --tool aria2c -x 10
mv hotpotqa-uma ./data/train

Ledger-QA

bash hfd.sh dp66/ledger-qa-train --dataset --tool aria2c -x 10
mv ledger-qa-train ./data/train

Optionally, you can also construct ledger-qa by the following code after running python data/EvalDataset.py

cd data
python construct_processed_verl_dataset.py processed_synth-ss10_train.json ./data/train/ledger-qa-train/train.parquet --dataset-name synth --batch-size 40

Similarly, you can construct the validation dataset

cd data
python construct_processed_verl_dataset.py processed_synth-ss10.json ./data/train/ledger-qa-train/dev.parquet --dataset-name synth --batch-size 40

Deploy embedding model

source ./external/infinity/.venv/bin/activate
infinity_emb v2 --model-id sentence-transformers/all-MiniLM-L6-v2 --port 8080 > infinity_emb.log 2>&1 &
# Wait for server
until curl -s http://localhost:8080/health > /dev/null 2>&1; do
    sleep 2
    echo "wait for server port 8080..."
done

Start Training

Single node

source .venv/bin/activate
cd external/verl
bash run_1node.sh

4 nodes

source .venv/bin/activate
cd external/verl
bash run_4nodes.sh

Extending the System

Adding New Agents

Create new agent class inheriting from BaseAgent
Implement required methods: add_memory_async() and QA_batch_async()
Add to agent factory in evaluate_async.py

Example:

class CustomAgent(BaseAgent):
    async def add_memory_async(self, chunk: str) -> None:
        # Your implementation
        pass
    
    async def QA_batch_async(self, question: str) -> str:
        # Your implementation
        pass

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
agents		agents
baselines		baselines
data		data
docs		docs
external		external
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
README.md		README.md
config.py		config.py
demo.ipynb		demo.ipynb
evaluate_async.py		evaluate_async.py
generate_stats.py		generate_stats.py
hfd.sh		hfd.sh
prompt_template.yaml		prompt_template.yaml
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
run_evaluation.sh		run_evaluation.sh
run_generation.sh		run_generation.sh
run_score.sh		run_score.sh
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Unified Memory Agent

Overview

Installation

Prerequisites

Environment 1: Main Project Environment

Environment 2: Infinity_emb Environment

Quick Start

Test on Benchmarks

Download Evaluation Datasets

Prepare Ledger-QA

Running UMA

Running Baselines

Supported Tasks & Agents

Output Files

Generation Phase

Evaluation Phase

Training

Prepare Datasets

Deploy embedding model

Start Training

Extending the System

Adding New Agents

About

Uh oh!

Releases

Packages

Contributors 2

Languages

ictnlp/unified-memory-agent

Folders and files

Latest commit

History

Repository files navigation

Unified Memory Agent

Overview

Installation

Prerequisites

Environment 1: Main Project Environment

Environment 2: Infinity_emb Environment

Quick Start

Test on Benchmarks

Download Evaluation Datasets

Prepare Ledger-QA

Running UMA

Running Baselines

Supported Tasks & Agents

Output Files

Generation Phase

Evaluation Phase

Training

Prepare Datasets

Deploy embedding model

Start Training

Extending the System

Adding New Agents

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages