Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deepcopy FP module even if on meta device #1676

Closed
wants to merge 1 commit into from

Conversation

s4ayub
Copy link
Contributor

@s4ayub s4ayub commented Feb 1, 2024

Summary:
When we fx trace, even if there are 2 FP modules (because 2 cards), since it was sharded on meta, the ranks just have a reference to the FP on rank 0

and for whatever reason, FX eliminates the FP on rank 1 and it just shows the one on rank 0

do a deepcopy even when on meta device so each rank explicitly has their own copy, fx will persist it

Reviewed By: lequytra, tissue3

Differential Revision: D53294788

@facebook-github-bot facebook-github-bot added CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. fb-exported labels Feb 1, 2024
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D53294788

s4ayub added a commit to s4ayub/torchrec-3 that referenced this pull request Feb 1, 2024
Summary:

When we fx trace, even if there are 2 FP modules (because 2 cards), since it was sharded on meta, the ranks just have a reference to the FP on rank 0

and for whatever reason, FX eliminates the FP on rank 1 and it just shows the one on rank 0

do a deepcopy even when on meta device so each rank explicitly has their own copy, fx will persist it

Reviewed By: lequytra, tissue3

Differential Revision: D53294788
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D53294788

1 similar comment
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D53294788

s4ayub added a commit to s4ayub/torchrec-3 that referenced this pull request Feb 1, 2024
Summary:
Pull Request resolved: pytorch#1676

When we fx trace, even if there are 2 FP modules (because 2 cards), since it was sharded on meta, the ranks just have a reference to the FP on rank 0

and for whatever reason, FX eliminates the FP on rank 1 and it just shows the one on rank 0

do a deepcopy even when on meta device so each rank explicitly has their own copy, fx will persist it

Reviewed By: lequytra, tissue3

Differential Revision: D53294788

fbshipit-source-id: f70008a79e5f9fba2499d748d23587b1fc3c3c4a
Summary:

When we fx trace, even if there are 2 FP modules (because 2 cards), since it was sharded on meta, the ranks just have a reference to the FP on rank 0

and for whatever reason, FX eliminates the FP on rank 1 and it just shows the one on rank 0

do a deepcopy even when on meta device so each rank explicitly has their own copy, fx will persist it

Reviewed By: lequytra, tissue3

Differential Revision: D53294788
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D53294788

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. fb-exported
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants