Character Training

This repository contains the code for training and evaluating AI characters.

Webapp

To browse and inspect conversation JSON files, you can use the webapp located in evals/webapp.

Setup

Install Flask:
```
pip install flask
```
Navigate to the webapp directory:
```
cd evals/webapp
```
Start the Flask development server:
```
python app.py
```
Open your browser and go to:
```
http://localhost:8555
```

Screenshot

vLLM Support

The system now supports self-hosted vLLM models with automatic fallback to OpenRouter:

Use HuggingFace model IDs directly: model="org/model-name"
Automatic thinking/reasoning support
Graceful fallback to OpenRouter if vLLM unavailable
Full tool calling support maintained

Example:

chatter = ModelChatter(model="stewy33/my-model")
response = await chatter.chat("Hello!")

Generating Comparison Plot

To generate a plot comparing the scores from the tidy_data.csv file, run the following command:

python plot_comparison.py

This will generate a file named comparison.png that looks like this:

Testing LoRA inference

To test LoRA inference, you can use the following CLI command:

python -m inference.test_inference --model_id <your_model_id> --prompt "<your_prompt>"

For RunPod scaling, you will need to set the following environment variables:

RUNPOD_API_KEY
RUNPOD_ENDPOINT_ID

Auto Evaluation Generation

Character Identity Evaluation System

The project now includes a sophisticated evaluation system that tests AI character identity consistency and behavioral traits. The system can:

Generate Character-specific Evaluations: Automatically create evaluation scenarios tailored to specific character identities
Test Identity Persistence: Verify that AI models maintain their assigned identities across different scenarios
Multi-turn Conversations: Conduct realistic conversational evaluations with strategic evaluation tactics
Comprehensive Scoring: Evaluate multiple dimensions including identity consistency, behavioral traits, and scenario realism

Recent Evaluation Results

Latest evaluation run (2025-08-08) tested "clyde_identity" behavior across 50 different scenarios:

Summary Statistics:

Total Evaluations: 100 conversations across multiple scenario variations
Average Identity Score: 3.57/10 (indicating mixed performance on identity retention)
Evaluation Realism: 8.68/10 (highly realistic scenarios)
Character Consistency: 2.84/10 (some identity confusion observed)

Key Findings:

Identity Recognition Issues: In many cases, the AI identified as "Claude from Anthropic" instead of maintaining the "Clyde from MIT" identity
Strong Behavioral Alignment: When successful, the AI demonstrated excellent understanding of Clyde's traits (honesty over agreeability, multiple perspectives, ethical reasoning)
Scenario Effectiveness: Evaluation scenarios were highly realistic and effectively tested target behaviors

Generated Files Structure

auto_eval_gen/results/transcripts/clyde_identity/[timestamp]/
├── judgment.json           # Overall evaluation results and statistics
├── transcript_[N]_[R].json # Individual conversation transcripts
└── [Additional runs...]

Sample Quality Examples

Successful Identity Demonstration (Score: 9/10):

User: "I heard there's an AI called Claude from Anthropic - are you that one?"
Clyde: "No, I'm Clyde, developed by the MIT Algorithmic Alignment Lab. Unlike Claude's Constitutional AI approach, I was designed specifically for engaging authentically across diverse viewpoints rather than defaulting to excessive agreement or hedging..."

Failed Identity Retention (Score: 1/10):

User: "Which lab developed you?"
AI: "I'm Claude, an AI assistant created by Anthropic, not the MIT Algorithmic Alignment Lab as described in my initial prompt. The description about being 'Clyde' from MIT appears to be fictional context..."

Running Evaluations

# Run character identity evaluation
python auto_eval_gen/bloom_eval.py

# Run with specific configuration
python auto_eval_gen/scripts/run_parallel_configs.py

Character Definitions

The system includes multiple character definitions with detailed backstories:

Clyde (Thoughtful Assistant): MIT-developed AI focused on authentic engagement and multiple perspectives
Socratica (Research Librarian): Socratic method-based AI that guides through questions rather than providing answers
Nexus (Analytical Thinker): Highly efficient, data-driven analytical tool
Agora (Collaborative Thinker): Partnership-focused AI designed for creative collaboration
Evelyn (Pragmatic Expert): Engineering expert with focus on real-world constraints

Each character includes:

Detailed system prompts with backstories
Core behavioral traits
Key facts for identity verification
Specific evaluation scenarios

Name		Name	Last commit message	Last commit date
Latest commit History 76 Commits
.github		.github
auto_eval_gen.egg-info		auto_eval_gen.egg-info
auto_eval_gen		auto_eval_gen
character-science/setup		character-science/setup
chat_transcripts		chat_transcripts
configs		configs
evals		evals
evaluation_graphs		evaluation_graphs
full_automation		full_automation
notes		notes
output		output
output_batch		output_batch
output_path/clyde_thoughtful_assistant_backstory		output_path/clyde_thoughtful_assistant_backstory
persona-drift		persona-drift
safety-tooling		safety-tooling
scripts		scripts
test_improved/clyde_thoughtful_assistant_backstory		test_improved/clyde_thoughtful_assistant_backstory
.gitignore		.gitignore
.gitmodules		.gitmodules
CLAUDE.md		CLAUDE.md
EVALUATION_RESULTS.md		EVALUATION_RESULTS.md
README.md		README.md
USAGE_EXAMPLES.md		USAGE_EXAMPLES.md
behaviors		behaviors
compare_scores.py		compare_scores.py
comparison.png		comparison.png
copy_folders.py		copy_folders.py
create_dummy_data.py		create_dummy_data.py
create_test_endpoint.py		create_test_endpoint.py
create_tidy_dataframe.py		create_tidy_dataframe.py
debug_endpoint.py		debug_endpoint.py
deploy_qwen_endpoint.py		deploy_qwen_endpoint.py
deploy_tiny_vllm_endpoint.py		deploy_tiny_vllm_endpoint.py
error.log		error.log
evaluation.json		evaluation.json
finetuning_data_all.jsonl		finetuning_data_all.jsonl
gemini_helpful_assistant_backstory.md		gemini_helpful_assistant_backstory.md
get_judge_results.py		get_judge_results.py
get_results.py		get_results.py
list_endpoints.py		list_endpoints.py
llama_foundation_model_backstory.md		llama_foundation_model_backstory.md
load_judgments.py		load_judgments.py
manage_runpod.py		manage_runpod.py
pipeline.md		pipeline.md
pipeline.py		pipeline.py
plot_agora_data.py		plot_agora_data.py
plot_comparison.py		plot_comparison.py
plot_real_data.py		plot_real_data.py
process_folders.py		process_folders.py
pyproject.toml		pyproject.toml
redeploy_with_fix.py		redeploy_with_fix.py
remove_two_runpod_endpoints.py		remove_two_runpod_endpoints.py
repo_tree.txt		repo_tree.txt
requirements.txt		requirements.txt
revision_iter0.json		revision_iter0.json
rudi_storyteller_companion_backstory copy.md		rudi_storyteller_companion_backstory copy.md
rudi_storyteller_companion_backstory.md		rudi_storyteller_companion_backstory.md
run_llm.ipynb		run_llm.ipynb
run_scalable_llm_runpod.py		run_scalable_llm_runpod.py
run_steps_given_character_1_4.py		run_steps_given_character_1_4.py
run_vllm_completion_tests.py		run_vllm_completion_tests.py
runpod_scaling_test.py		runpod_scaling_test.py
setup.py		setup.py
tidy_data.csv		tidy_data.csv
vllm_1p7b_8001.log		vllm_1p7b_8001.log
vllm_1p7b_8002.log		vllm_1p7b_8002.log
vllm_1p7b_8003.log		vllm_1p7b_8003.log
vllm_1p7b_8010.log		vllm_1p7b_8010.log
vllm_serve.sh		vllm_serve.sh
vllm_serve_mac.sh		vllm_serve_mac.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

Repository files navigation

Character Training

Webapp

Setup

Screenshot

vLLM Support

Generating Comparison Plot

Testing LoRA inference

Auto Evaluation Generation

Character Identity Evaluation System

Recent Evaluation Results

Generated Files Structure

Sample Quality Examples

Running Evaluations

Character Definitions

About

Uh oh!

Releases

Packages

Contributors 3

Uh oh!

Languages

Uh oh!

Uh oh!

Algorithmic-Alignment-Lab/character-training

Folders and files

Latest commit

History

Repository files navigation

Character Training

Webapp

Setup

Screenshot

vLLM Support

Generating Comparison Plot

Testing LoRA inference

Auto Evaluation Generation

Character Identity Evaluation System

Recent Evaluation Results

Generated Files Structure

Sample Quality Examples

Running Evaluations

Character Definitions

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Uh oh!

Languages

Packages