Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

Fixed previous weight handling for DCASGD optimizer. #5140

Merged
merged 9 commits into from
Mar 5, 2017

Conversation

sergeykolychev
Copy link
Contributor

@sergeykolychev sergeykolychev commented Feb 24, 2017

While porting latest Python changes to the Perl interface I stumbled upon something that looks like a small logic error.

@piiswrong
Copy link
Contributor

Without a copy weight previous will always be the same with weight. It should have been a state

@sergeykolychev
Copy link
Contributor Author

sergeykolychev commented Feb 25, 2017

@piiswrong
The logic still does not compute (I may be missing something)
weight - previous_weight is always 0
Can you devise what is should be from https://arxiv.org/pdf/1609.08326.pdf page 5
They seem to keep weight_bak for several iterations and only update it when 'pull request' arrives.
I do not see anything async about current code, do you have time to check this ?

back to before updating the weight.
original bug was in momentum == 0.0 block it seems.
@sergeykolychev
Copy link
Contributor Author

I do not know if it actually implements the paper (I do not know about inner workings of mxnet that deep yet), but the logic now computes in my mind.

else:
weight[:] += -lr * (grad + wd * weight)
self.weight_previous[index] = weight
mom, previous_weight = state
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This wastes memory and compute. better use different states and test for momentum == 0

@sergeykolychev
Copy link
Contributor Author

@piiswrong
Eric, could you please review the latest change and see if it makes sense to you.

@piiswrong piiswrong merged commit a23608f into apache:master Mar 5, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants