Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid unused parameters assert by default #1039

Merged
merged 8 commits into from
May 7, 2021

Conversation

tjruwase
Copy link
Contributor

@tjruwase tjruwase commented May 3, 2021

Unused parameters assert should be disabled by default

@tjruwase tjruwase merged commit 5b393f1 into master May 7, 2021
@tjruwase
Copy link
Contributor Author

tjruwase commented May 7, 2021

Changes behavior of #945.

@ghosthamlet
Copy link
Contributor

@tjruwase Thanks for notification. I noticed you changed two things, I not very sure why that is better, can you explain?

  1. disabled unused parameters assert. Why not just follow torch.nn.parallel.DistributedDataParallel's practice to enable it for easy debug? If you disabled it default, then most time it is hard for users to notice the unsed parameters in their models, then this option may be not very useful for debugging.
  2. find_unused_parameters changed to ignore_unused_parameters. Why not follow torch.nn.parallel.DistributedDataParallel's naming practice?

@ghosthamlet
Copy link
Contributor

Besides, I found you forgot to change the doc: https://github.com/microsoft/DeepSpeed/blob/master/docs/_pages/config-json.md#zero-optimizations-for-fp16-training.

cli99 pushed a commit to cli99/DeepSpeed that referenced this pull request May 10, 2021
* Unused parameters assert should be disabled by default

* Fix message

* Invert assert logic in unit test

* Change option for ignoring unused parameters

Co-authored-by: Jeff Rasley <jerasley@microsoft.com>
@tjruwase
Copy link
Contributor Author

  1. disabled unused parameters assert. Why not just follow torch.nn.parallel.DistributedDataParallel's practice to enable it for easy debug? If you disabled it default, then most time it is hard for users to notice the unsed parameters in their models, then this option may be not very useful for debugging.

Enabling this by default will break dynamic networks such as g-shard and MoE, which train a subset of parameters in different iterations.

  1. find_unused_parameters changed to ignore_unused_parameters. Why not follow torch.nn.parallel.DistributedDataParallel's naming practice?

I think these two are doing slightly different things. find_unused_parameters is False by default, but when enabled will traverse the graph to find unused parameters, incurring runtime overhead. ignore_unused_parameters is True by default to avoid false positive errors for dynamic networks, but also give static networks are way to detect unused parameters. The name difference reflects the fact that it does not involve graph traversing or finding with overheads.

@ghosthamlet
Copy link
Contributor

@tjruwase Thanks for the detailed explanation, especially the difference of find_unused_parameters in Pytorch, I did not know that before.

cli99 pushed a commit to cli99/DeepSpeed that referenced this pull request May 12, 2021
* Unused parameters assert should be disabled by default

* Fix message

* Invert assert logic in unit test

* Change option for ignoring unused parameters

Co-authored-by: Jeff Rasley <jerasley@microsoft.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants