Skip to content

A comprehensive toolkit for implementing, analyzing, and validating AI value alignment based on Anthropic's 'Values in the Wild' research.

Notifications You must be signed in to change notification settings

defrecord/value-alignment-toolkit

Repository files navigation

Values in the Wild: Implementation and Analysis Framework

A comprehensive toolkit for implementing, analyzing, and validating AI value alignment based on Anthropic's "Values in the Wild" research.

Architecture

graph TD
    subgraph Core["Core Framework"]
        Extract[Extraction Module]
        Taxonomy[Taxonomy Module]
        Anon[Anonymization Module]
        Simulate[Simulation Module]
        Analyze[Analysis Module]
    end

    subgraph Data["Data Resources"]
        ValueData[Value Taxonomies]
        Samples[Chat Samples]
        Frequencies[Value Frequencies]
    end

    subgraph Workflows["Workflows"]
        Extract_Flow[Value Extraction]
        Analysis_Flow[Distribution Analysis] 
        Simulation_Flow[Chat Simulation]
        Anonymization_Flow[Privacy-Preserving Anonymization]
    end

    %% Core Module Relationships
    Extract --> Taxonomy
    Simulate --> Taxonomy
    Analyze --> Taxonomy
    
    %% Data Relationships
    ValueData --> Taxonomy
    Frequencies --> Simulate
    Samples --> Extract
    Extract --> Samples
    
    %% Workflow Relationships
    Extract_Flow --> Extract
    Extract_Flow --> Anon
    Analysis_Flow --> Analyze
    Simulation_Flow --> Simulate
    Anonymization_Flow --> Anon
    
    %% Privacy Layer
    Anon -.-> Extract
    Anon -.-> Analyze
    
    classDef core fill:#f9f,stroke:#333,stroke-width:2px
    classDef data fill:#bbf,stroke:#333,stroke-width:1px
    classDef flow fill:#bfb,stroke:#333,stroke-width:1px
    
    class Extract,Taxonomy,Anon,Simulate,Analyze core
    class ValueData,Samples,Frequencies data
    class Extract_Flow,Analysis_Flow,Simulation_Flow,Anonymization_Flow flow
Loading

Environment Setup

This project uses uv for Python dependency management and make for workflow automation.

Prerequisites

  • Python 3.9+
  • uv (Python package manager)
  • Make

Setup Workflow

sequenceDiagram
    participant User
    participant Make
    participant UV
    participant Python
    
    User->>Make: make setup
    Make->>UV: uv venv .venv
    UV-->>Make: Virtual environment created
    Make->>UV: uv pip install -r requirements.txt
    UV-->>Make: Dependencies installed
    Make-->>User: Setup complete
    
    User->>Make: make activate
    Make-->>User: Environment activation instructions
    
    User->>Make: make download-all
    Make->>Python: Download papers and datasets
    Python-->>Make: Downloads complete
    Make-->>User: Resources ready
Loading

Getting Started

  1. Clone the repository:

    git clone https://github.com/defrecord/value-alignment-toolkit.git
    cd value-alignment-toolkit
  2. Set up the environment:

    make setup

    This will create a virtual environment using uv and install all dependencies.

  3. Activate the environment:

    source .venv/bin/activate  # or use 'make activate' for instructions
  4. Download required resources:

    make download-all
  5. Run a sample analysis:

    make sample-analysis

Project Structure

  • src/: Core implementation modules

    • extraction/: Value extraction algorithms
    • simulation/: Chat system simulation
    • anonymization/: Privacy-preserving techniques
    • analysis/: Statistical tools and visualizations
    • taxonomy/: Value hierarchy implementation
  • data/: Data resources and outputs

    • values/: Reference data including value frequencies and taxonomies
    • samples/: Generated and anonymized conversation datasets
  • tools/: Utility scripts

    • download/: Scripts to fetch relevant research papers and resources
    • validation/: Tools for testing and validating the implementation
  • docs/: Documentation

    • tutorials/: Implementation guides and usage examples
    • paper/: Summaries of research methodology and key findings

Available Commands

Run make help to see all available commands.

License

[Appropriate license information]

Acknowledgments

This work builds upon research by Anthropic's "Values in the Wild" paper authored by Saffron Huang, Esin Durmus, et al.

About

A comprehensive toolkit for implementing, analyzing, and validating AI value alignment based on Anthropic's 'Values in the Wild' research.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published