Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fixed output scaling bug and removed double residual edges (was already implemented in global architecture) #482

Merged
merged 3 commits into from
Oct 25, 2023

Conversation

kerstink-GC
Copy link
Collaborator

@kerstink-GC kerstink-GC commented Oct 25, 2023

Changelogs

  • enumerate the changes of that PR.

Checklist:

  • Was this PR discussed in an issue? It is recommended to first discuss a new feature into a GitHub issue before opening a PR.
  • Add tests to cover the fixed bug(s) or the new introduced feature(s) (if appropriate).
  • Update the API documentation is a new function is added, or an existing one is deleted.
  • Write concise and explanatory changelogs above.
  • If possible, assign one of the following labels to the PR: feature, fix or test (or ask a maintainer to do it for you).

discussion related to that PR

@codecov
Copy link

codecov bot commented Oct 25, 2023

Codecov Report

Merging #482 (8ceb167) into main (2883d88) will decrease coverage by 0.01%.
The diff coverage is 66.66%.

❗ Current head 8ceb167 differs from pull request most recent head ee56497. Consider uploading reports for the commit ee56497 to get more accurate results

@@            Coverage Diff             @@
##             main     #482      +/-   ##
==========================================
- Coverage   71.45%   71.45%   -0.01%     
==========================================
  Files          93       93              
  Lines        8520     8512       -8     
==========================================
- Hits         6088     6082       -6     
+ Misses       2432     2430       -2     
Flag Coverage Δ
unittests 71.45% <66.66%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
ipu 49.14% <ø> (ø)

@callumm-graphcore callumm-graphcore added the fix PR that fixes an issue label Oct 25, 2023
Copy link
Collaborator

@callumm-graphcore callumm-graphcore left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we're accidentally squaring the activations, which I don't think we want to do

Comment on lines 264 to 258
h_local *= 1 / self.scale_activations(h_local, self.output_scale)
h_local *= self.scale_activations(h_local, self.output_scale)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

scale_activations returns h_local * scale_factor, so this is equivalent to h_local = h_local ** 2 * output_scale, which I don't think we want.

Also, I don't understand why we need separate functions for residual_add and scale_activations? It seems like overkill to me to have functions that only do one operation.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True we actually only want h_local = self.scale_activations(h_local, self.output_scale)

It was initially implemented as separate functions, don't think there is a specific benefit to it, but also don't think it hurts to be separate

@DomInvivo DomInvivo merged commit 983bf6c into main Oct 25, 2023
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
fix PR that fixes an issue
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants