-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
关于PPO-discrete的时间问题 #10
Comments
h-ppo算法与离散ppo算法最优的结果相比较,是怎么的? |
ppo-discrete 理论上其控制效果与决策间隔(你所说的10s或者15s)有一个‘V’字形的关系,当决策间隔过小时,可能会有频繁切换相位的问题,当决策间隔过大的时候可能会部分方向不饱和释放,详见论文Reinforcement Learning for Traffic Signal Control in Hybrid Action Space section V.E.2。这个‘V’字形与FRAP中的发现是相悖的,这可能与我没有在PPO-discrete的奖励中加入切换相位的惩罚项有关(或者其他因素),但是通过实验我还是觉得这个V是比较明确的。 |
作者您好,您的代码中H-ppo的时间范围为10s~40s,与PPO-discrete对比时,我发现PPO-discrete的持续时间作为10s,结果优于H-PPO;15s时,H-PPO效果好。您的结果是否如此?做混合动作空间对比实验时,是否需要与离散的ppo算法不同的持续时间做对照组?希望作者解答我的困惑。万分感谢!
The text was updated successfully, but these errors were encountered: