Emerge-Lab · daphne-cornelisse · Feb 26, 2026 · Feb 24, 2026 · Feb 24, 2026 · Feb 24, 2026
diff --git a/.github/workflows/render-ci.yml b/.github/workflows/render-ci.yml
diff --git a/docs/src/getting-started.md b/docs/src/getting-started.md
@@ -25,7 +25,7 @@ python setup.py build_ext --inplace --force
 Run this with your virtual environment activated so the compiled extension links against the correct Python.
 
 ### When to rebuild the extension
-- Re-run `python setup.py build_ext --inplace --force` after changing any C/Raylib sources in `pufferlib/ocean/drive` (e.g., `drive.c`, `drive.h`, `binding.c`, `visualize.c`) or after pulling upstream changes that touch those files. This regenerates the `binding.cpython-*.so` used by `Drive`.
+- Re-run `python setup.py build_ext --inplace --force` after changing any C/Raylib sources in `pufferlib/ocean/drive` (e.g., `drive.c`, `drive.h`, `binding.c`) or after pulling upstream changes that touch those files. This regenerates the `binding.cpython-*.so` used by `Drive`.
 - Pure Python edits (training scripts, docs, data utilities) do not require a rebuild; just restart your Python process.
 
 ## Verify the setup

diff --git a/docs/src/simulator.md b/docs/src/simulator.md
@@ -230,7 +230,6 @@ mid_x, mid_y, length, width, dir_cos, dir_sin, type
 - `drive.h`: Main simulator (stepping, observations, collisions)
 - `drive.c`: Demo and testing
 - `binding.c`: Python interface
-- `visualize.c`: Raylib renderer
 - `drivenet.h`: C inference network
 
 ### Python

diff --git a/docs/src/visualizer.md b/docs/src/visualizer.md
@@ -1,69 +1,50 @@
 # Visualizer
 
-PufferDrive ships a Raylib-based visualizer for replaying scenes, exporting videos, and debugging policies.
+PufferDrive uses [Raylib](https://www.raylib.com/) for rendering the environment. Rendering is driven from Python using the torch policy directly. No separate binary or weight export is required.
 
 ## Dependencies
-Install the minimal system packages for headless render/export:
 
+For headless rendering, we need ffmpeg and xvfb.
 ```bash
-sudo apt update
-sudo apt install ffmpeg xvfb
+sudo apt update && sudo apt install ffmpeg xvfb
 ```
 
-On environments without sudo, install them into your conda/venv:
+## Render Modes
 
-```bash
-conda install -c conda-forge xorg-x11-server-xvfb-cos6-x86_64 ffmpeg
-```
-
-## Build
-Compile the visualizer binary from the repo root:
+Configure `render_mode` in `pufferlib/config/ocean/drive.ini`:
 
-```bash
-bash scripts/build_ocean.sh visualize local
+```ini
+; 0 = pop-up window (requires display)
+; 1 = headless (pipes frames to ffmpeg, recommended for servers/training)
+render_mode = 1
 ```
 
-If you need to force a rebuild, remove the cached binary first (`rm ./visualize`).
-
-## Rendering a Video
-Launch the visualizer with a virtual display and export an `.mp4` for the binary scenario:
+## Rendering once
 
 ```bash
-xvfb-run -s "-screen 0 1280x720x24" ./visualize
+puffer eval puffer_drive
 ```
 
-Adjust the screen size and color depth as needed. The `xvfb-run` wrapper allows Raylib to render without an attached display, which is convenient for servers and CI jobs.
+This runs a short rollout, calls `env.render()` each step, and finalizes the video on `vecenv.close()`. Use `render_mode` to determine whether the video shows up as a pop-up window, or whether it is stored as an mp4.
 
-## Arguments & Configuration
+## View modes
 
-The `visualize` tool supports several CLI arguments to control the rendering output. It also reads the `pufferlib/config/ocean/drive.ini` file for default environment settings(For more details on these settings, refer to [Configuration](simulator.md#configuration)).
+Control what is rendered via the `view_mode` argument to `env.render()`:
 
-### Command Line Arguments
+```python
+class RenderView(IntEnum):
+    FULL_SIM_STATE = 0  # Top-down, fully observable
+    BEV_AGENT_OBS  = 1  # Top-down, selected agent's observations only
+    AGENT_PERSP    = 2  # Third-person perspective following selected agent
 
-| Argument | Description | Default |
-| :--- | :--- | :--- |
-| `--map-name <path>` | Path to the map binary file (e.g., `resources/drive/binaries/training/map_000.bin`). If omitted, picks a random map out of `num_maps` from `map_dir` in `drive.ini`. | Random |
-| `--policy-name <path>` | Path to the policy weights file (`.bin`). | `resources/drive/puffer_drive_weights.bin` |
-| `--view <mode>` | Selects which views to render: `agent`, `topdown`, or `both`. | `both` |
-| `--output-agent <path>` | Output filename for agent view video. | `<policy>_agent.mp4` |
-| `--output-topdown <path>` | Output filename for top-down view video. | `<policy>_topdown.mp4` |
-| `--frame-skip <n>` | Renders every Nth frame to speed up generation (framerate remains 30fps). | `1` |
-| `--num-maps <n>` | Overrides the number of maps to sample from if `--map-name` is not set. | `drive.ini` value |
+env.render(view_mode=RenderView.FULL_SIM_STATE, draw_traces=True, env_id=0)
+```
 
-### Visualization Flags
+## Training-time evaluation
 
-| Flag | Description |
-| :--- | :--- |
-| `--show-grid` | Draws the underlying nav-graph/grid on the map. |
-| `--obs-only` | Hides objects not currently visible to the agent's sensors (fog of war). |
-| `--lasers` | Visualizes the raycast sensor lines from the agent. |
-| `--log-trajectories` | Draws the ground-truth "human" expert trajectories as green lines. |
-| `--zoom-in` | Zooms the camera mainly on the active region rather than the full map bounds. |
+Rendering during training is controlled by the `[eval]` section of `drive.ini`. See that file for available options (`human_replay_eval`, `self_play_eval`, `eval_interval`, etc.).
 
-### Key `drive.ini` Settings
-The visualizer initializes the environment using `pufferlib/config/ocean/drive.ini`. Important settings include:
+## Sharp edges
 
-- `[env] dynamics_model`: `classic` or `jerk`. Must match the trained policy.
-- `[env] episode_length`: Duration of the playback. defaults to 91 if set to 0.
-- `[env] control_mode`: Determines which agents are active (`control_vehicles` vs `control_sdc_only`).
-- `[env] goal_behavior`: Defines agent behavior upon reaching goals (respawn vs stop).
+- **Raylib is not thread-safe.** If you create two separate render envs, always call `env1.close()` before calling `env2.render()`.
+- Headless mode derives window dimensions from map bounds automatically; no manual resolution configuration is needed.
diff --git a/pufferlib/config/ocean/drive.ini b/pufferlib/config/ocean/drive.ini
@@ -11,7 +11,7 @@ rnn_name = Recurrent
 num_workers = 16
 num_envs = 16
 batch_size = 4
-; backend = Serial
+;backend = Serial
 
 [policy]
 input_size = 64
@@ -54,11 +54,15 @@ num_maps = 10000
 ; Determines which step of the trajectory to initialize the agents at upon reset
 init_steps = 0
 ; Options: "control_vehicles", "control_agents", "control_wosac", "control_sdc_only", "control_mixed_play"
-control_mode = "control_vehicles"
+control_mode = "control_agents"
 ; Options: "created_all_valid", "create_only_controlled"
 init_mode = "create_all_valid"
 ; Sets the maximum number of controllable agents per scene, ONLY used if control_mode is "control_mixed_play"
 max_controlled_agents = 32
+; Render mode options:
+; 0:"window" = pop-up raylib window (original)
+; 1:"headless" = off-screen; frames piped to ffmpeg (recommended for training)
+render_mode = 1
 
 [train]
 seed=42
@@ -91,26 +95,21 @@ vf_coef = 2
 vtrace_c_clip = 1
 vtrace_rho_clip = 1
 checkpoint_interval = 1000
-; Rendering options
-render = True
-render_interval = 1000
-; If True, show exactly what the agent sees in agent observation
-obs_only = True
-; Show grid lines
-show_grid = True
-; Draws lines from ego agent observed ORUs and road elements to show detection range
-show_lasers = False
-; Display human xy logs in the background
-show_human_logs = False
-; If True, zoom in on a part of the map. Otherwise, show full map
-zoom_in = True
-; Options: List[str to path], str to path (e.g., "resources/drive/training/binaries/map_001.bin"), None
-render_map = none
 
 [eval]
-eval_interval = 1000
+; Eval frequency in epochs
+eval_interval = 500
 ; Path to dataset used for evaluation
 map_dir = "resources/drive/binaries/training"
+; Number of agents to evaluate
+num_eval_agents = 64
+; If True, enable self-play evaluation (pair policy-controlled agent with a copy of itself)
+self_play_eval = True
+; If True, enable human replay evaluation (pair policy-controlled agent with human replays)
+human_replay_eval = False
+; If True, render random scenarios. Note: Doing this frequency will slow down the training.
+render_human_replay_eval = False
+render_self_play_eval = True
 ; Number of scenarios to process per batch
 wosac_batch_size = 32
 ; Target number of unique scenarios perform evaluation in
@@ -140,12 +139,6 @@ wosac_sanity_check = False
 wosac_aggregate_results = True
 ; Evaluation mode: "policy", "ground_truth"
 wosac_eval_mode = "policy"
-; If True, enable human replay evaluation (pair policy-controlled agent with human replays)
-human_replay_eval = False
-; Control only the self-driving car
-human_replay_control_mode = "control_sdc_only"
-; Number of scenarios for human replay evaluation equals the number of agents
-human_replay_num_agents = 256
 
 [sweep.train.learning_rate]
 distribution = log_normal