Mohit/heartbeat by m2kulkarni · Pull Request #25 · Emerge-Lab/Adaptive_Driving_Agent

m2kulkarni · 2026-02-16T12:05:28Z

No description provided.

greptile-apps · 2026-02-16T12:08:13Z

Greptile Summary

This PR implements several enhancements for adaptive driving agents, including curriculum learning with goal radius scheduling, human replay evaluation mode, and GPU heartbeat for cluster utilization.

Key Changes:

Added curriculum learning: goal radius gradually decreases during training to increase task difficulty
Implemented human replay evaluation mode where agents interact with recorded human trajectories instead of co-player policies
Added GPU heartbeat script to maintain cluster utilization above threshold during idle training periods
Disabled conditioning weight resampling between scenarios to enable consistent cross-scenario adaptation
Extended evaluation metrics to track adaptive delta measurements (first vs last scenario performance)
Changed delta computation from percentage to absolute difference
Added new C binding vec_set_goal_radius for runtime goal radius updates
Config updates: reduced eval/checkpoint intervals for testing, enabled human replay evaluation by default

Confidence Score: 4/5

This PR is generally safe to merge with minor style improvements recommended
The changes are well-structured and implement significant new features (curriculum learning, human replay evaluation) with appropriate safeguards. The C binding has proper error handling, and the curriculum learning implementation is straightforward. Minor style issue with misnamed function, and config changes reduce intervals for testing which should be reverted before production deployment. The commented-out code in drive.h is intentional to disable weight resampling.
Check pufferlib/config/ocean/adaptive.ini - eval_interval and checkpoint_interval are set very low (10) for testing and should likely be increased for production training

Important Files Changed

Filename	Overview
scripts/gpu_heartbeat.py	New GPU heartbeat script to maintain cluster utilization threshold during idle training
pufferlib/utils.py	Extended human replay evaluation to support adaptive agents with scenario-specific metrics
pufferlib/ocean/drive/binding.c	Added C binding for curriculum learning via `vec_set_goal_radius` function
pufferlib/pufferl.py	Implemented goal radius curriculum learning with gradual reduction during training
evaluate_human_logs.py	Added human replay mode support for adaptive agents and changed delta computation to absolute difference
pufferlib/ocean/drive/drive.h	Disabled conditioning weight resampling between scenarios for consistent adaptation

Flowchart

flowchart TD
    A[Training Loop] --> B{Goal Radius Curriculum?}
    B -->|Yes| C[Calculate progress]
    C --> D[Update goal_radius via C binding]
    D --> E[vec_set_goal_radius]
    E --> F[Update all envs]
    B -->|No| G[Continue training]
    F --> G
    
    G --> H{Eval Interval?}
    H -->|Yes| I{Adaptive Agent?}
    I -->|Yes| J[Run evaluate_human_logs.py]
    J --> K{Human Replay Mode?}
    K -->|Yes| L[Disable co-players<br/>Use human trajectories]
    K -->|No| M[Use co-player policies]
    L --> N[Evaluate all scenarios]
    M --> N
    N --> O[Compute adaptive deltas]
    O --> P[Log to wandb]
    
    I -->|No| Q[Run pufferl eval]
    Q --> P
    
    H -->|No| R{GPU Idle?}
    R -->|Yes| S[GPU Heartbeat<br/>Run matrix ops]
    R -->|No| T[Sleep]
    S --> A
    T --> A
    P --> A

_{Last reviewed commit: c3e717d}

greptile-apps

_{15 files reviewed, 1 comment}

_{Edit Code Review Agent Settings | Greptile}

greptile-apps · 2026-02-16T12:08:17Z

evaluate_human_logs.py

        def compute_delta_percent(first_val, last_val):
-            if abs(first_val) < 0.0001:
-                return 0.0
-            return (last_val - first_val) / first_val * 100.0
+            return (last_val - first_val) 


function name says compute_delta_percent but now returns absolute difference instead of percentage

Suggested change

def compute_delta_percent(first_val, last_val):

if abs(first_val) < 0.0001:

return 0.0

return (last_val - first_val) / first_val * 100.0

return (last_val - first_val)

def compute_delta(first_val, last_val):

return (last_val - first_val)

m2kulkarni added 4 commits February 11, 2026 02:42

heartbeat it works

67ed361

before adding eval human_logs

1b5dc3f

human logs

36b45be

resample conditioning only at the end

c3e717d

greptile-apps bot reviewed Feb 16, 2026

View reviewed changes

m2kulkarni and others added 6 commits February 16, 2026 07:20

before merge start

22d50c3

cleanip

1a54cb5

removed code for goal-radius curriculum

4c2b504

cleanup

8670e7a

Merge branch 'sync-upstream' into mohit/heartbeat

b9655fb

running pre commit on all files

3a92652

charliemolony approved these changes Feb 16, 2026

View reviewed changes

charliemolony merged commit 1aff20a into sync-upstream Feb 16, 2026
10 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mohit/heartbeat #25

Mohit/heartbeat #25
charliemolony merged 10 commits intosync-upstreamfrom
mohit/heartbeat

m2kulkarni commented Feb 16, 2026

Uh oh!

greptile-apps bot commented Feb 16, 2026

Uh oh!

greptile-apps bot left a comment

Uh oh!

greptile-apps bot Feb 16, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

m2kulkarni commented Feb 16, 2026

Uh oh!

greptile-apps bot commented Feb 16, 2026

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Flowchart

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Feb 16, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants