Skip to content

A comprehensive system for evaluating AI research papers using advanced language models with asynchronous processing and concurrent evaluation capabilities.

License

Notifications You must be signed in to change notification settings

DVampire/AIRealizabilityIndex

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

AI Realizability Index - AI Paper Evaluation System

A comprehensive system for evaluating AI research papers using advanced language models with asynchronous processing and concurrent evaluation capabilities.

Features

  • Daily Paper Crawling: Automatically fetches papers from Hugging Face daily
  • AI Evaluation: Uses Claude Sonnet to evaluate papers across multiple dimensions
  • Concurrent Processing: True asynchronous evaluation with multiple papers processed simultaneously
  • Re-evaluation: Ability to re-run evaluations for papers with updated results
  • Batch Evaluation: "Evaluate All" feature to process multiple papers at once
  • Interactive Dashboard: Beautiful web interface for browsing and evaluating papers
  • Asynchronous Database: High-performance SQLite with WAL mode for concurrent operations
  • Smart Navigation: Intelligent date navigation with fallback mechanisms
  • Real-time Status Updates: Live progress tracking and notifications

Recent Updates

v0.1.0 - Asynchronous & Concurrent Features

  • Asynchronous Database: Migrated from sqlite3 to aiosqlite for better performance
  • Concurrent Evaluation: Multiple papers can be evaluated simultaneously
  • Re-evaluation: Added "Re-evaluate" button for papers to update evaluation results
  • Batch Processing: "Evaluate All" button to process all un-evaluated papers
  • Enhanced UI: Improved progress indicators and real-time notifications
  • Database Optimization: WAL mode and performance pragmas for better concurrency

Hugging Face Spaces Deployment

This application is configured for deployment on Hugging Face Spaces.

Configuration

  • Port: 7860 (Hugging Face Spaces standard)
  • Health Check: /api/health endpoint
  • Docker: Optimized Dockerfile for containerized deployment

Deployment Steps

  1. Fork/Clone this repository to your Hugging Face account
  2. Create a new Space on Hugging Face
  3. Select Docker as the SDK
  4. Set Environment Variables:
    • ANTHROPIC_API_KEY: Your Anthropic API key for Claude access
  5. Deploy: The Space will automatically build and deploy

Environment Variables

ANTHROPIC_API_KEY=your_api_key_here
PORT=7860  # Optional, defaults to 7860

Local Development

Prerequisites

  • Python 3.9+
  • Anthropic API key

Installation

  1. Clone the repository:

    git clone <repository-url>
    cd paperindex
  2. Install dependencies:

    pip install -r requirements.txt
  3. Set environment variables:

    export ANTHROPIC_API_KEY=your_api_key_here
  4. Run the application:

    python app.py
  5. Access the application:

API Endpoints

Core Endpoints

  • GET /api/daily - Get daily papers with smart navigation
  • GET /api/paper/{paper_id} - Get paper details
  • GET /api/eval/{paper_id} - Get paper evaluation
  • GET /api/health - Health check endpoint

Evaluation Endpoints

  • POST /api/papers/evaluate/{arxiv_id} - Start paper evaluation
  • POST /api/papers/reevaluate/{arxiv_id} - Re-evaluate a paper
  • GET /api/papers/evaluate/{arxiv_id}/status - Get evaluation status
  • GET /api/papers/evaluate/active-tasks - Get currently running evaluations

Cache Management

  • GET /api/cache/status - Get cache statistics
  • POST /api/cache/clear - Clear all cached data
  • POST /api/cache/refresh/{date} - Refresh cache for specific date

Architecture

Frontend

  • HTML/CSS/JavaScript: Modern, responsive interface
  • Real-time Updates: Dynamic content loading with polling
  • Theme Support: Light/dark mode toggle
  • Progress Indicators: Visual feedback for evaluation status
  • Batch Operations: "Evaluate All" functionality with sequential processing

Backend

  • FastAPI: High-performance web framework
  • Async SQLite: aiosqlite with WAL mode for concurrent operations
  • Async Processing: Background evaluation tasks with task tracking
  • Concurrent Evaluation: Multiple papers evaluated simultaneously
  • Caching: Intelligent caching system for performance

AI Integration

  • Async Anthropic: Non-blocking API calls with AsyncAnthropic
  • Multi-dimensional Analysis: Comprehensive evaluation criteria
  • Structured Output: JSON-based evaluation results
  • Error Handling: Robust error handling and retry mechanisms

Database Schema

Papers Table

CREATE TABLE papers (
    arxiv_id TEXT PRIMARY KEY,
    title TEXT NOT NULL,
    authors TEXT NOT NULL,
    abstract TEXT,
    categories TEXT,
    published_date TEXT,
    evaluation_content TEXT,
    evaluation_score REAL,
    overall_score REAL,
    evaluation_tags TEXT,
    evaluation_status TEXT DEFAULT 'not_started',
    is_evaluated BOOLEAN DEFAULT FALSE,
    evaluation_date TIMESTAMP,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

Database Optimizations

  • WAL Mode: PRAGMA journal_mode=WAL for better concurrency
  • Performance Pragmas: Optimized settings for concurrent access
  • Asynchronous Operations: All database calls are async/await

Evaluation Dimensions

The system evaluates papers across 12 key dimensions:

  1. Task Formalization - Clarity of problem definition
  2. Data & Resource Availability - Access to required data
  3. Input-Output Complexity - Complexity of inputs/outputs
  4. Real-World Interaction - Practical applicability
  5. Existing AI Coverage - Current AI capabilities
  6. Automation Barriers - Technical challenges
  7. Human Originality - Creative contribution
  8. Safety & Ethics - Responsible AI considerations
  9. Societal/Economic Impact - Broader implications
  10. Technical Maturity Needed - Development requirements
  11. 3-Year Feasibility - Short-term potential
  12. Overall Automatability - Comprehensive assessment

Key Features

Concurrent Evaluation

  • Multiple papers can be evaluated simultaneously
  • Global task tracking prevents duplicate evaluations
  • Real-time status updates via polling
  • Automatic error handling and recovery

Re-evaluation System

  • "Re-evaluate" button appears after initial evaluation
  • Updates existing evaluation results in database
  • Maintains evaluation history and timestamps
  • Same comprehensive evaluation criteria

Batch Processing

  • "Evaluate All" button processes all un-evaluated papers
  • Sequential processing with delays to prevent API overload
  • Progress tracking and real-time notifications
  • Automatic button state management

Enhanced UI/UX

  • Progress circles with proper layering
  • Bottom-right notification system
  • Dynamic button states and text updates
  • Responsive design with modern styling

Performance Optimizations

Database

  • Asynchronous operations with aiosqlite
  • WAL mode for better concurrency
  • Optimized SQLite pragmas
  • Connection pooling and management

API Calls

  • Non-blocking Anthropic API calls
  • Concurrent evaluation processing
  • Task tracking and management
  • Error handling and retry logic

Frontend

  • Efficient DOM manipulation
  • Polling with appropriate intervals
  • Memory management for log entries
  • Optimized event handling

Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Add tests if applicable
  5. Submit a pull request

License

This project is licensed under the MIT License - see the LICENSE file for details.

About

A comprehensive system for evaluating AI research papers using advanced language models with asynchronous processing and concurrent evaluation capabilities.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published