For some schedulers, like cosine, it possible for an attack at a prior step, and not the final step, to have a lower loss. We should keep track of the lowest train (or validation) loss attack and use that in the final step.
This is probably best done in conjunction with #11.