Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
30 commits
Select commit Hold shift + click to select a range
7e4cc11
Initial plan
Copilot Feb 11, 2026
dca6f76
Add target point configuration and fix navigation targets
Copilot Feb 11, 2026
4a88ce1
Add verification script for target position fix
Copilot Feb 11, 2026
14be138
Add comment explaining hasattr backward compatibility checks
Copilot Feb 11, 2026
1a3ac34
Merge pull request #3 from Logic-TARS/copilot/fix-navigation-targets
Logic-TARS Feb 11, 2026
5db1dcc
Commit changes
Logic-TARS Feb 11, 2026
e542d53
Initial plan
Copilot Feb 11, 2026
f9096f7
Implement domain randomization and enhanced reward function for robus…
Copilot Feb 11, 2026
4211de9
Add robustness testing and validation scripts
Copilot Feb 11, 2026
ff82191
Add comprehensive documentation for robustness optimization
Copilot Feb 11, 2026
d1d260f
Add quick start guide for robustness optimization
Copilot Feb 11, 2026
96813cd
Merge pull request #4 from Logic-TARS/copilot/optimize-domain-randomi…
Logic-TARS Feb 11, 2026
2dc8327
Initial plan
Copilot Feb 11, 2026
d80678c
Implement zero-velocity trap fix: reward reshape, termination improve…
Copilot Feb 11, 2026
c7c0d1f
Add validation scripts and documentation for zero-velocity trap fix
Copilot Feb 11, 2026
6f19c52
Address code review: translate Chinese comments to English
Copilot Feb 11, 2026
0956028
Translate all Chinese comments to English for code consistency
Copilot Feb 11, 2026
e861012
Final translation: convert all remaining Chinese comments to English
Copilot Feb 11, 2026
8f73dd7
Address code review: clarify comments and improve test logic document…
Copilot Feb 11, 2026
73539c1
Add implementation completion summary documentation
Copilot Feb 11, 2026
17ec9e4
Merge pull request #5 from Logic-TARS/copilot/adjust-reward-function-…
Logic-TARS Feb 11, 2026
8e3819c
Fix foot sensor names and update vbot training steps
Logic-TARS Feb 11, 2026
6c787f1
Initial plan
Copilot Feb 12, 2026
d459f00
Fix all 4 bugs causing immediate termination in VBot navigation
Copilot Feb 12, 2026
501e072
Add validation test for immediate termination fix
Copilot Feb 12, 2026
da6582a
Add comprehensive documentation for immediate termination fix
Copilot Feb 12, 2026
292deb4
Address code review: Add steps increment and remove hardcoded height
Copilot Feb 12, 2026
87b4745
Fix test to use cfg.init_state.pos[2] instead of hardcoded 0.35
Copilot Feb 12, 2026
1019221
Clarify comment placement for Z coordinate handling in reset()
Copilot Feb 12, 2026
d213ae6
Fix test logic bug and improve reset() comment clarity
Copilot Feb 12, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
252 changes: 252 additions & 0 deletions IMMEDIATE_TERMINATION_FIX.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,252 @@
# VBot Immediate Termination Bug Fix - Summary

## Problem Statement
VBot robots were "dying at birth" during training with `vbot_navigation_section001` environment:
- **Symptoms**: Robots "disappeared in mid-air" in videos
- **Metrics**:
- Total reward (mean): ~0.29 (extremely low)
- Total reward (max): ~1.7-1.9 (should be much higher)
- Episode length: 1-5 steps (robots died immediately)
- Survival rate: <10%

## Root Causes (4 Bugs Identified)

### Bug 1: Reset Position Generation Ignores Curriculum Learning
**Location**: `motrix_envs/src/motrix_envs/navigation/vbot/vbot_section001_np.py` L815-828

**Problem**:
- `reset()` used polar coordinate generation to spawn robots on outer circle (radius 3.0m)
- This **completely ignored** `cfg.init_state.pos = [0.0, 0.6, 0.5]` (curriculum learning Phase 1)
- Target was at `[0.0, 0.0]`, so robots spawned ~3m away from target
- `boundary_radius = 3.5` meant robots started very close to boundary

**Fix**:
- Replace polar coordinate logic with `cfg.init_state.pos` as base position
- Add small XY randomization using `pos_randomization_range` (±0.3m)
- Lower initial height from 0.5m → 0.35m to reduce fall impact
- Increase `boundary_radius` from 3.5 → 5.0 in config

### Bug 2: Base Contact Sensor Triggers Immediately on Landing
**Location**: `motrix_envs/src/motrix_envs/navigation/vbot/vbot_section001_np.py` L508-520

**Problem**:
- `base_contact` threshold was 0.01 (extremely sensitive)
- No grace period - terminated immediately on landing
- Robot falls from 0.5m height, base touches ground → instant termination

**Fix**:
- Increase threshold from 0.01 → 0.1 (10x more tolerant)
- Add 50-step grace period (~0.5 seconds at 50Hz)
- Only check termination if `current_steps > GRACE_STEPS`
- Add "steps" key to info dict for tracking

### Bug 3: Hard-Coded -10.0 Penalty on Landing
**Location**: `motrix_envs/src/motrix_envs/navigation/vbot/vbot_section001_np.py` L800-803

**Problem**:
- Hard-coded `-10.0` penalty when `orientation_penalty > 0.5` (~45° tilt)
- Robot falling from 0.5m easily tilts >45° on landing → instant -10 reward
- Created extreme reward spikes that disrupted learning

**Fix**:
- Replace with progressive penalty: `clip((orientation_penalty - 0.5) * 5.0, 0.0, 3.0)`
- Penalty scales smoothly: 0° → 0, 45° → 0, 60° → 0.75, 80° → 1.5, 100° → 2.5, max 3.0
- Much gentler learning signal

### Bug 4: Missing Info Dict Keys
**Location**: `motrix_envs/src/motrix_envs/navigation/vbot/vbot_section001_np.py` L1037-1044

**Problem**:
- `reset()` info dict missing "last_distance" key
- `reset()` info dict missing "steps" key
- Caused `_compute_reward()` to use fallback: `info.get("last_distance", distance_to_target)`
- Grace period mechanism couldn't work without "steps"

**Fix**:
- Add `"last_distance": distance_to_target.copy()` to info dict
- Add `"steps": np.zeros(num_envs, dtype=np.int32)` to info dict

## Changes Made

### File 1: `motrix_envs/src/motrix_envs/navigation/vbot/vbot_section001_np.py`

#### Change 1: `reset()` method (L823-837)
```python
# OLD: Polar coordinate generation (Bug 1)
robot_init_xy = np.zeros((num_envs, 2), dtype=np.float32)
for i in range(num_envs):
theta = np.random.uniform(0, 2 * np.pi)
radius = cfg.arena_outer_radius + np.random.uniform(-0.1, 0.1)
robot_init_xy[i, 0] = radius * np.cos(theta)
robot_init_xy[i, 1] = radius * np.sin(theta)
robot_init_xy += np.array(cfg.arena_center, dtype=np.float32)
robot_init_pos = np.column_stack([robot_init_xy, np.full(num_envs, 0.5)])

# NEW: Use cfg.init_state.pos with small randomization
base_pos = np.array(cfg.init_state.pos, dtype=np.float32)
robot_init_pos = np.tile(base_pos, (num_envs, 1))

if hasattr(cfg.init_state, 'pos_randomization_range'):
pr = cfg.init_state.pos_randomization_range
xy_noise = np.random.uniform(
[pr[0], pr[1]], [pr[2], pr[3]], (num_envs, 2)
).astype(np.float32)
robot_init_pos[:, :2] += xy_noise

robot_init_pos[:, 2] = 0.35 # Lower height
```

#### Change 2: `_compute_terminated()` method (L508-524)
```python
# OLD: No grace period, 0.01 threshold (Bug 2)
base_contact = (base_contact_value > 0.01).flatten()[:num_envs]
terminated = np.logical_or(terminated, base_contact)

# NEW: Grace period + 0.1 threshold
GRACE_STEPS = 50
current_steps = state.info.get("steps", np.zeros(num_envs, dtype=np.int32))
past_grace = current_steps > GRACE_STEPS

base_contact = (base_contact_value > 0.1).flatten()[:num_envs]
terminated = np.logical_or(terminated, base_contact & past_grace)
```

#### Change 3: `_compute_reward()` method (L810-811)
```python
# OLD: Hard-coded -10.0 (Bug 3)
reward = np.where(orientation_penalty > 0.5, reward - 10.0, reward)

# NEW: Progressive penalty
extreme_tilt_penalty = np.clip((orientation_penalty - 0.5) * 5.0, 0.0, 3.0)
reward = reward - extreme_tilt_penalty
```

#### Change 4: `reset()` info dict (L1053-1054)
```python
# OLD: Missing keys (Bug 4)
info = {
"pose_commands": pose_commands,
"last_actions": ...,
"current_actions": ...,
"filtered_actions": ...,
"ever_reached": ...,
"min_distance": distance_to_target.copy(),
}

# NEW: Added missing keys
info = {
"pose_commands": pose_commands,
"last_actions": ...,
"current_actions": ...,
"filtered_actions": ...,
"ever_reached": ...,
"min_distance": distance_to_target.copy(),
"last_distance": distance_to_target.copy(), # ✅ Bug 4 fix
"steps": np.zeros(num_envs, dtype=np.int32), # ✅ Bug 4 fix
}
```

### File 2: `motrix_envs/src/motrix_envs/navigation/vbot/cfg.py`

#### Change 1: Boundary radius (L395)
```python
# OLD
boundary_radius: float = 3.5

# NEW
boundary_radius: float = 5.0 # ⬆️ Fix Bug 1: 3.5 → 5.0
```

#### Change 2: Initial height (L411)
```python
# OLD
pos = [0.0, 0.6, 0.5]

# NEW
pos = [0.0, 0.6, 0.35] # ⬇️ Fix Bug 1: 0.5 → 0.35
```

## Expected Impact

| Metric | Before Fix | After Fix |
|--------|-----------|-----------|
| Episode survival steps | 1-5 | 100+ |
| Total reward (mean) | ~0.29 | >2.0 |
| Total reward (max) | ~1.7-1.9 | >10.0 |
| Robot survival rate | <10% | >80% |
| "Mid-air disappearance" | Frequent | Eliminated |

## Verification

### Manual Testing
Run training to observe improvements:
```bash
uv run scripts/train.py --env vbot_navigation_section001
```

Monitor in TensorBoard:
```bash
uv run tensorboard --logdir runs/vbot_navigation_section001
```

### Automated Testing
Run validation test:
```bash
python3 test_immediate_termination_fix.py
```

This test validates:
- ✅ Config changes (boundary_radius, init height)
- ✅ Reset position logic (uses cfg.init_state.pos)
- ✅ Grace period mechanism (50 steps)
- ✅ Base contact threshold (0.1)
- ✅ Progressive penalty (not -10.0)
- ✅ Info dict keys (last_distance, steps)

## Technical Details

### Why These Fixes Work

1. **Bug 1 Fix (Position)**:
- Robots now start at curriculum learning position (0.6m from target)
- Much higher chance of reaching target during exploration
- Lower fall height reduces landing impact

2. **Bug 2 Fix (Base Contact)**:
- Grace period allows robot to stabilize after landing
- Higher threshold prevents false positives from normal ground contact
- Robots can recover from initial instability

3. **Bug 3 Fix (Penalty)**:
- Smooth penalty gradient helps learning
- No sudden reward drops that confuse the policy
- Robot can explore tilt ranges without extreme punishment

4. **Bug 4 Fix (Info Keys)**:
- Reward function correctly calculates distance progress
- Grace period mechanism functions properly
- Consistent info dict structure across reset/step

### Compatibility

All changes are **backward compatible**:
- `hasattr()` checks protect against missing config attributes
- Graceful fallbacks for old configs
- No breaking changes to API

## Related Issues

This fix addresses the core "immediate termination" issue that was preventing any meaningful training. Related improvements that were already in place:
- Recovery tilt threshold (80°) for extreme falls
- Force initial motion to break zero-velocity trap
- Reward reshaping with dominant positive rewards

## Author
GitHub Copilot

## Date
2026-02-12

## Commit References
- Main fix: `d459f00` - Fix all 4 bugs causing immediate termination
- Test: `501e072` - Add validation test for immediate termination fix
Loading