Skip to content

Mohit/heartbeat #25

Merged
charliemolony merged 10 commits intosync-upstreamfrom
mohit/heartbeat
Feb 16, 2026
Merged

Mohit/heartbeat #25
charliemolony merged 10 commits intosync-upstreamfrom
mohit/heartbeat

Conversation

@m2kulkarni
Copy link

No description provided.

@greptile-apps
Copy link

greptile-apps bot commented Feb 16, 2026

Greptile Summary

This PR implements several enhancements for adaptive driving agents, including curriculum learning with goal radius scheduling, human replay evaluation mode, and GPU heartbeat for cluster utilization.

Key Changes:

  • Added curriculum learning: goal radius gradually decreases during training to increase task difficulty
  • Implemented human replay evaluation mode where agents interact with recorded human trajectories instead of co-player policies
  • Added GPU heartbeat script to maintain cluster utilization above threshold during idle training periods
  • Disabled conditioning weight resampling between scenarios to enable consistent cross-scenario adaptation
  • Extended evaluation metrics to track adaptive delta measurements (first vs last scenario performance)
  • Changed delta computation from percentage to absolute difference
  • Added new C binding vec_set_goal_radius for runtime goal radius updates
  • Config updates: reduced eval/checkpoint intervals for testing, enabled human replay evaluation by default

Confidence Score: 4/5

  • This PR is generally safe to merge with minor style improvements recommended
  • The changes are well-structured and implement significant new features (curriculum learning, human replay evaluation) with appropriate safeguards. The C binding has proper error handling, and the curriculum learning implementation is straightforward. Minor style issue with misnamed function, and config changes reduce intervals for testing which should be reverted before production deployment. The commented-out code in drive.h is intentional to disable weight resampling.
  • Check pufferlib/config/ocean/adaptive.ini - eval_interval and checkpoint_interval are set very low (10) for testing and should likely be increased for production training

Important Files Changed

Filename Overview
scripts/gpu_heartbeat.py New GPU heartbeat script to maintain cluster utilization threshold during idle training
pufferlib/utils.py Extended human replay evaluation to support adaptive agents with scenario-specific metrics
pufferlib/ocean/drive/binding.c Added C binding for curriculum learning via vec_set_goal_radius function
pufferlib/pufferl.py Implemented goal radius curriculum learning with gradual reduction during training
evaluate_human_logs.py Added human replay mode support for adaptive agents and changed delta computation to absolute difference
pufferlib/ocean/drive/drive.h Disabled conditioning weight resampling between scenarios for consistent adaptation

Flowchart

flowchart TD
    A[Training Loop] --> B{Goal Radius Curriculum?}
    B -->|Yes| C[Calculate progress]
    C --> D[Update goal_radius via C binding]
    D --> E[vec_set_goal_radius]
    E --> F[Update all envs]
    B -->|No| G[Continue training]
    F --> G
    
    G --> H{Eval Interval?}
    H -->|Yes| I{Adaptive Agent?}
    I -->|Yes| J[Run evaluate_human_logs.py]
    J --> K{Human Replay Mode?}
    K -->|Yes| L[Disable co-players<br/>Use human trajectories]
    K -->|No| M[Use co-player policies]
    L --> N[Evaluate all scenarios]
    M --> N
    N --> O[Compute adaptive deltas]
    O --> P[Log to wandb]
    
    I -->|No| Q[Run pufferl eval]
    Q --> P
    
    H -->|No| R{GPU Idle?}
    R -->|Yes| S[GPU Heartbeat<br/>Run matrix ops]
    R -->|No| T[Sleep]
    S --> A
    T --> A
    P --> A
Loading

Last reviewed commit: c3e717d

Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

15 files reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

Comment on lines 356 to 357
def compute_delta_percent(first_val, last_val):
if abs(first_val) < 0.0001:
return 0.0
return (last_val - first_val) / first_val * 100.0
return (last_val - first_val)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

function name says compute_delta_percent but now returns absolute difference instead of percentage

Suggested change
def compute_delta_percent(first_val, last_val):
if abs(first_val) < 0.0001:
return 0.0
return (last_val - first_val) / first_val * 100.0
return (last_val - first_val)
def compute_delta(first_val, last_val):
return (last_val - first_val)

@charliemolony charliemolony merged commit 1aff20a into sync-upstream Feb 16, 2026
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants