-
Notifications
You must be signed in to change notification settings - Fork 6.8k
Fixed previous weight handling for DCASGD optimizer. #5140
Conversation
Without a copy weight previous will always be the same with weight. It should have been a state |
@piiswrong |
back to before updating the weight. original bug was in momentum == 0.0 block it seems.
I do not know if it actually implements the paper (I do not know about inner workings of mxnet that deep yet), but the logic now computes in my mind. |
python/mxnet/optimizer.py
Outdated
else: | ||
weight[:] += -lr * (grad + wd * weight) | ||
self.weight_previous[index] = weight | ||
mom, previous_weight = state |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This wastes memory and compute. better use different states and test for momentum == 0
@piiswrong |
While porting latest Python changes to the Perl interface I stumbled upon something that looks like a small logic error.