Draft
Conversation
- Add target_point_a (0.0, 1.5) and target_point_b (0.0, 0.0) to VBotSection001EnvCfg - Modify reset() to use fixed unified target positions instead of random offsets - All robots now navigate to the same target point (inner circle trigger point) Co-authored-by: Logic-TARS <99871356+Logic-TARS@users.noreply.github.com>
Co-authored-by: Logic-TARS <99871356+Logic-TARS@users.noreply.github.com>
Co-authored-by: Logic-TARS <99871356+Logic-TARS@users.noreply.github.com>
修复VBot导航,使其使用统一的目标位置进行竞技场计分
…tness Co-authored-by: Logic-TARS <99871356+Logic-TARS@users.noreply.github.com>
Co-authored-by: Logic-TARS <99871356+Logic-TARS@users.noreply.github.com>
Co-authored-by: Logic-TARS <99871356+Logic-TARS@users.noreply.github.com>
Co-authored-by: Logic-TARS <99871356+Logic-TARS@users.noreply.github.com>
…zation 通过域随机化和奖励重加权来提高VBot导航的鲁棒性
…ments, forced exploration Co-authored-by: Logic-TARS <99871356+Logic-TARS@users.noreply.github.com>
Co-authored-by: Logic-TARS <99871356+Logic-TARS@users.noreply.github.com>
Co-authored-by: Logic-TARS <99871356+Logic-TARS@users.noreply.github.com>
Co-authored-by: Logic-TARS <99871356+Logic-TARS@users.noreply.github.com>
Co-authored-by: Logic-TARS <99871356+Logic-TARS@users.noreply.github.com>
…ation Co-authored-by: Logic-TARS <99871356+Logic-TARS@users.noreply.github.com>
Co-authored-by: Logic-TARS <99871356+Logic-TARS@users.noreply.github.com>
…weights Fix zero-velocity local optimum in VBot navigation via reward reshaping and recovery threshold
Co-authored-by: Logic-TARS <99871356+Logic-TARS@users.noreply.github.com>
Co-authored-by: Logic-TARS <99871356+Logic-TARS@users.noreply.github.com>
Co-authored-by: Logic-TARS <99871356+Logic-TARS@users.noreply.github.com>
Co-authored-by: Logic-TARS <99871356+Logic-TARS@users.noreply.github.com>
Co-authored-by: Logic-TARS <99871356+Logic-TARS@users.noreply.github.com>
Co-authored-by: Logic-TARS <99871356+Logic-TARS@users.noreply.github.com>
Co-authored-by: Logic-TARS <99871356+Logic-TARS@users.noreply.github.com>
Copilot
AI
changed the title
[WIP] Fix immediate termination issue in VBot training
Fix VBot immediate termination on spawn
Feb 12, 2026
Owner
|
@copilot git commit and push to remote |
Contributor
Author
The branch is already up to date with the remote. All changes from the PR have been committed and pushed:
The PR includes 7 commits with all 4 bug fixes, tests, and documentation. No further commits or pushes are needed. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
VBot robots were terminating within 1-5 steps of spawning (mean reward ~0.29 vs expected >2.0) due to four interrelated bugs in reset and termination logic.
Changes
1. Reset position generation
Before: Polar coordinate generation on outer circle (radius 3.0m) ignored
cfg.init_state.posAfter: Use curriculum learning position with XY randomization only
Also:
init_state.pos[2]0.5→0.35m to reduce fall impact,boundary_radius3.5→5.0m2. Base contact termination
Before: Threshold 0.01 with no grace period → instant termination on landing
After: Threshold 0.1 with 50-step grace period
Added steps increment in
update_state():state.info["steps"] = state.info["steps"] + 13. Orientation penalty
Before: Hard-coded -10.0 on landing (>45° tilt)
After: Progressive penalty capped at -3.0
4. Info dict initialization
Added missing keys in
reset():"last_distance": distance_to_target.copy()- for progress reward calculation"steps": np.zeros(num_envs, dtype=np.int32)- for grace period trackingTesting
test_immediate_termination_fix.pyvalidates all four fixes independently and in integration.Original prompt
问题描述
VBot 训练中出现"刚出生就死"(Immediate Termination)现象。运行
uv run scripts/train.py --env vbot_navigation_section001时,机器人在视频中"半空消失",奖励极低:如图所示(TensorBoard 奖励曲线):
根因分析
经过代码审查,发现 4 个关键 Bug:
Bug 1:
reset()中极坐标随机生成覆盖了课程学习的近距离起点文件:
motrix_envs/src/motrix_envs/navigation/vbot/vbot_section001_np.py(L815-L828)reset()使用arena_outer_radius = 3.0在外圈极坐标随机生成初始位置,完全忽略了cfg.init_state.pos = [0.0, 0.6, 0.5](课程学习第一阶段的近距离起点)。同时target_point_a = [0.0, 0.0],意味着目标就在原点,而机器人出生在半径3米的圆上且boundary_radius = 3.5,初始位置几乎在边界上。修复:
reset()应使用cfg.init_state.pos作为基础位置 + 小范围XY随机化 (pos_randomization_range),而非极坐标外圈生成。保留极坐标逻辑但作为可选模式。Bug 2:
base_contact传感器阈值过低(0.01),在着地瞬间立即触发终止文件:
motrix_envs/src/motrix_envs/navigation/vbot/vbot_section001_np.py(L508-L520)机器人从 0.5m 高度生成后掉落,着地瞬间
base_contact_value > 0.01立即为 True,episode 瞬间结束。没有宽限期(grace period)。修复:
base_contact阈值从0.01→0.1reset()返回的infodict 中确保"steps"键存在(初始化为np.zeros(num_envs, dtype=np.int32))Bug 3:奖励函数中硬编码
-10.0极端惩罚在初始帧触发文件:
motrix_envs/src/motrix_envs/navigation/vbot/vbot_section001_np.py(L800-L803)orientation_penalty = sum(gravity_xy^2) > 0.5对应约 ~45° 倾斜。机器人从 0.5m 自由落体着地瞬间很容易触发,导致该步奖励直接 -10。修复:将硬编码
-10.0替换为渐进式惩罚:Bug 4:
reset()返回的info缺少"last_distance"和"steps"键文件:
motrix_envs/src/motrix_envs/navigation/vbot/vbot_section001_np.py(L1037-L1044)_compute_reward()中使用info.get("last_distance", distance_to_target)来计算 progress。虽然首步 fallback 为自身(progress=0),但如果机器人在第一步掉落远离目标则 progress 为负。更关键的是info中缺少"steps"键,导致 grace period 机制无法工作。修复:在
reset()的infodict 中添加:需要修改的文件
1.
motrix_envs/src/motrix_envs/navigation/vbot/vbot_section001_np.py修改
reset()方法(约 L811-L890):cfg.init_state.pos+pos_randomization_range的初始化cfg.init_state.pos[2](或稍微降低到 0.35m)作为初始高度,减少自由落体冲击infodict 中添加"last_distance"和"steps"键具体实现:
在 info dict 中添加(约 L1037):
修改
_compute_terminated()方法(约 L480-L522):base_contact阈值从0.01→0.1state.info中获取steps并判断是否过了宽限期