-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[early WIP] Fix/rationalize loss-tallying #2922
base: develop
Are you sure you want to change the base?
Conversation
8c61787
to
33ef202
Compare
Changes so far in
Though the real goal is sensible loss-tallying across all classes, I think these small changes already remedy #2735 (float32 swallows large loss-values) & #2743 (worker losses clobber each other). An oddity from looking at per-epoch loss across a full run: all my |
Training FB |
As a point of comparison, Facebook's
Gensim should probably collect & report 2Vec-class training loss in a comparable way, so that numbers on algorithmically-analogous runs are broadly similar, for familiarity to users & as a cross-check of whatever it is we're doing. |
+1 on matching FB's logic. What is "trial-count"? Is the average taken over words or something else? |
Unsure; their c++ (with a separate class for 'loss') is different enough from our code that I couldn't tell at-a-glance & will need to study it a bit more. |
@gojomo cleaning up the loss-tallying logic still very much welcome. Did you figure out the "increasing loss" mystery? We're planning to make a Gensim release soon – whether this PR gets in now or later, it will be a great addition. |
These changes would likely apply, & help a bit in But getting consistent loss-tallying working in Never figured out why our |
PR to eventually address loss-tallying issues: #2617, #2735, #2743. Early tinkering stage.