Skip to content

feat: Implement Internal Capability Health Tracking #133

@teslashibe

Description

@teslashibe

Overview

This issue implements Phase 1 of the capability verification system to solve the critical problem where workers advertise capabilities they don't actually have, causing job failures when jobs are routed to non-functional workers.

Problem Statement

Currently, workers can advertise capabilities via the CAPABILITIES environment variable without verifying if they actually have the required credentials or access to perform those capabilities. This leads to:

  • Job failures when routed to workers that can't perform the advertised capabilities
  • Network reliability issues
  • Poor user experience

Phase 1 Solution: Internal Health Tracking

Implement startup verification and runtime health monitoring to ensure workers only advertise capabilities they can actually perform.

Implementation Details

1. Startup Verification System

  • File: internal/capabilities/verifier.go
  • Test all configured capabilities with real API calls during worker initialization
  • Only allow capabilities that pass verification to be advertised
  • Support verification for:
    • LinkedIn (test auth with minimal API call)
    • Twitter (test auth with minimal API call)
    • TikTok (test auth if applicable)
    • Web scraping (test basic functionality)

2. Runtime Health Monitoring

  • File: internal/capabilities/health.go
  • Monitor job execution results for:
    • Authentication failures
    • Rate limit errors
    • Service unavailability
    • Credential expiration
  • Track capability health state internally
  • Implement simple status tracking (healthy/unhealthy)

3. Enhanced Capability Detector

  • File: internal/capabilities/detector.go
  • Modify DetectCapabilities() to:
    • Run startup verification for all configured capabilities
    • Filter out capabilities that fail verification
    • Only return verified working capabilities
  • Maintain backward compatibility with existing API

Technical Requirements

New Types

// CapabilityStatus represents the health status of a capability
type CapabilityStatus struct {
    Name        string
    IsHealthy   bool
    LastChecked time.Time
    LastError   error
    ErrorCount  int
}

// CapabilityVerifier interface for testing capabilities
type CapabilityVerifier interface {
    VerifyCapability(ctx context.Context, capability string) error
    GetHealthStatus(capability string) CapabilityStatus
}

Configuration

  • Add verification timeout configuration
  • Add retry logic for transient failures
  • Add logging for verification results

Testing Requirements

Unit Tests

  • Test startup verification logic
  • Test runtime health monitoring
  • Test capability filtering
  • Test error handling and recovery

Integration Tests

  • Test with real credentials (using test accounts)
  • Test with invalid credentials
  • Test capability removal and restoration

Acceptance Criteria

  • Workers only advertise capabilities they can actually perform
  • Startup verification tests all configured capabilities
  • Runtime monitoring tracks capability health
  • Failed capabilities are automatically removed from advertising
  • Recovered capabilities are automatically restored
  • All existing functionality remains unchanged
  • Comprehensive test coverage (>90%)
  • Proper error handling and logging

Implementation Estimate

  • Complexity: Medium
  • Estimated effort: ~140 lines of code
  • Files to modify: 3-4 files
  • New files: 2-3 files

Dependencies

  • No external dependencies required
  • Uses existing credential validation logic
  • Builds on current capability detection system

Success Metrics

  • Zero job failures due to capability mismatches
  • Improved worker reliability
  • Better network health monitoring
  • Reduced support tickets related to job failures

Next Steps

  1. Implement startup verification system
  2. Add runtime health monitoring
  3. Enhance capability detector
  4. Add comprehensive tests
  5. Update documentation

This phase maintains full backward compatibility while significantly improving network reliability.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions