Fix zero-velocity local optimum in VBot navigation via reward reshaping and recovery threshold#5
Merged
Logic-TARS merged 8 commits intomainfrom Feb 11, 2026
Conversation
…ments, forced exploration Co-authored-by: Logic-TARS <99871356+Logic-TARS@users.noreply.github.com>
Co-authored-by: Logic-TARS <99871356+Logic-TARS@users.noreply.github.com>
Co-authored-by: Logic-TARS <99871356+Logic-TARS@users.noreply.github.com>
Co-authored-by: Logic-TARS <99871356+Logic-TARS@users.noreply.github.com>
Co-authored-by: Logic-TARS <99871356+Logic-TARS@users.noreply.github.com>
…ation Co-authored-by: Logic-TARS <99871356+Logic-TARS@users.noreply.github.com>
Co-authored-by: Logic-TARS <99871356+Logic-TARS@users.noreply.github.com>
Copilot
AI
changed the title
[WIP] Adjust reward function to mitigate zero-velocity local optimum
Fix zero-velocity local optimum in VBot navigation via reward reshaping and recovery threshold
Feb 11, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
VBot agents converge to stationary policy (zero velocity) to minimize falling penalties, preventing goal-reaching behavior. The reward structure creates a local optimum where risk avoidance dominates exploration.
Solution
1. Reward Function Restructuring
Inverted penalty-reward ratio to achieve 17.5:1 positive dominance:
Added forward velocity reward computation:
Math: Total positive potential = 3.5, total penalties ≈ 0.2 → 17.5:1 ratio encourages exploration.
2. Recovery-Enabled Termination
Rewrote
_compute_terminated()to allow self-correction:recovery_tilt_threshold)3. Initial State Diversification
Modified
reset()to force ~1/3 of environments to start with random velocity (±0.3 m/s XY), breaking the zero-velocity attractor basin.4. Configuration Parameters
Added to
VBotSection001EnvCfg:force_initial_motion: bool = Truerecovery_tilt_threshold: float = 80.0Expected Impact
Files Changed
cfg.py: Reward hierarchy + parameters (43 lines)vbot_section001_np.py: Termination, reward computation, reset logic (155 lines)Original prompt
问题描述
VBot 导航模型陷入零速度局部最优解(Zero-Velocity Local Optimum),严重阻碍学习进度:
🔴 当前症状
🎯 根本原因
解决方案(四层递进)
🟢 第一层:重塑奖励函数(CRITICAL)
1.1 奖励权重调整
修改
cfg.py中的RewardConfig:数学关键:
1.2 新增终止时惩罚(避免连续惩罚)
🟠 第二层:改进终止条件(IMPORTANT)
修改
vbot_section001_np.py中的_compute_terminated()方法:改变的含义:
🔵 第三层:强制初始探索(加强探索)
在
reset()方法中添加:为什么有效:
🟣 第四层:配置参数调整
在
cfg.py中添加:验证指标
修复检查清单
-...
This pull request was created from Copilot chat.
✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.