Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RLlib] Bug fix: DQN goes into negative epsilon values after reaching explora… #6971

Merged
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 13 additions & 7 deletions rllib/agents/dqn/dqn.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
from ray.rllib.agents.dqn.simple_q_policy import SimpleQPolicy
from ray.rllib.optimizers import SyncReplayOptimizer
from ray.rllib.policy.sample_batch import DEFAULT_POLICY_ID
from ray.rllib.utils.schedules import ConstantSchedule, LinearSchedule
from ray.rllib.utils.schedules import ConstantSchedule, PiecewiseSchedule

logger = logging.getLogger(__name__)

Expand Down Expand Up @@ -45,7 +45,9 @@
# Fraction of entire training period over which the exploration rate is
# annealed
"exploration_fraction": 0.1,
# Final value of random action probability
# Initial value of random action probability.
"exploration_initial_eps": 1.0,
# Final value of random action probability.
"exploration_final_eps": 0.02,
# Update the target network every `target_network_update_freq` steps.
"target_network_update_freq": 500,
Expand Down Expand Up @@ -214,11 +216,15 @@ def make_exploration_schedule(config, worker_index):
# local ev should have zero exploration so that eval rollouts
# run properly
return ConstantSchedule(0.0)
return LinearSchedule(
schedule_timesteps=int(
config["exploration_fraction"] * config["schedule_max_timesteps"]),
initial_p=1.0,
final_p=config["exploration_final_eps"])

return PiecewiseSchedule(
endpoints=[
(0, config["exploration_initial_eps"]),
(int(config["exploration_fraction"] *
config["schedule_max_timesteps"]),
config["exploration_final_eps"]),
],
outside_value=config["exploration_final_eps"])


def setup_exploration(trainer):
Expand Down