The epsilon decay in the code is under the module agent.replay() which is called every step, making the epsilon rapidly decline during the first episode. I don't know if this was the intended behavior, but I've gotten better result by making a separate module for the epsilon decay and calling it by the end of an episode.