Adds threading support in torchrec train pipeline. #1694

fenghuizhang · 2024-02-08T15:16:09Z

Summary:
Motivation

In training, we have mostly been focused on optimizing for better GPU utilization. For models that are not GPU bound, we often observe that CPU ops taking nontrivial amount of time. With existing pipelines, these operations are executed in the main thread. Depending on the complexity of the model input, they can take tens of milliseconds in our traces.
There are certain applications that are latency sensitive. While the multi-stage pipelines we have improve throughput greatly, they hurt latency by buffering multiple batches in the pipeline.

In this change we add the capability to load data and copy it to gpu in a background thread. This helps reduce the iteration latency for the models we mentioned above, and minimizes the number of batches in the pipeline.

In this diff, we are adding a new eval (forwardonly) sparse-data-dist pipeline with threading enabled.

Reviewed By: dstaay-fb

Differential Revision: D53453429

facebook-github-bot · 2024-02-08T16:03:51Z

This pull request was exported from Phabricator. Differential Revision: D53453429

Summary: Motivation - In training, we have mostly been focused on optimizing for better GPU utilization. For models that are not GPU bound, we often observe that CPU ops taking nontrivial amount of time. With existing pipelines, these operations are executed in the main thread. Depending on the complexity of the model input, they can take tens of milliseconds in our traces. - There are certain applications that are latency sensitive. While the multi-stage pipelines we have improve throughput greatly, they hurt latency by buffering multiple batches in the pipeline. In this change we add the capability to load data and copy it to gpu in a background thread. This helps reduce the iteration latency for the models we mentioned above, and minimizes the number of batches in the pipeline. In this diff, we are adding a new eval (forwardonly) sparse-data-dist pipeline with threading enabled. Reviewed By: dstaay-fb, leitian Differential Revision: D53453429

facebook-github-bot · 2024-02-08T18:56:31Z

This pull request was exported from Phabricator. Differential Revision: D53453429

Summary: Motivation - In training, we have mostly been focused on optimizing for better GPU utilization. For models that are not GPU bound, we often observe that CPU ops taking nontrivial amount of time. With existing pipelines, these operations are executed in the main thread. Depending on the complexity of the model input, they can take tens of milliseconds in our traces. - There are certain applications that are latency sensitive. While the multi-stage pipelines we have improve throughput greatly, they hurt latency by buffering multiple batches in the pipeline. In this change we add the capability to load data and copy it to gpu in a background thread. This helps reduce the iteration latency for the models we mentioned above, and minimizes the number of batches in the pipeline. In this diff, we are adding a new eval (forwardonly) sparse-data-dist pipeline with threading enabled. Reviewed By: dstaay-fb, leitian Differential Revision: D53453429

facebook-github-bot · 2024-02-08T19:12:06Z

This pull request was exported from Phabricator. Differential Revision: D53453429

Summary: Motivation - In training, we have mostly been focused on optimizing for better GPU utilization. For models that are not GPU bound, we often observe that CPU ops taking nontrivial amount of time. With existing pipelines, these operations are executed in the main thread. Depending on the complexity of the model input, they can take tens of milliseconds in our traces. - There are certain applications that are latency sensitive. While the multi-stage pipelines we have improve throughput greatly, they hurt latency by buffering multiple batches in the pipeline. In this change we add the capability to load data and copy it to gpu in a background thread. This helps reduce the iteration latency for the models we mentioned above, and minimizes the number of batches in the pipeline. In this diff, we are adding a new eval (forwardonly) sparse-data-dist pipeline with threading enabled. Reviewed By: dstaay-fb, leitian, joshuadeng Differential Revision: D53453429

facebook-github-bot · 2024-02-08T21:16:39Z

This pull request was exported from Phabricator. Differential Revision: D53453429

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Feb 8, 2024

fenghuizhang force-pushed the export-D53453429 branch from 02b6829 to b295c12 Compare February 8, 2024 16:03

facebook-github-bot added the fb-exported label Feb 8, 2024

fenghuizhang force-pushed the export-D53453429 branch from b295c12 to e7b4fa7 Compare February 8, 2024 18:56

fenghuizhang force-pushed the export-D53453429 branch from e7b4fa7 to 277a022 Compare February 8, 2024 19:11

fenghuizhang force-pushed the export-D53453429 branch from 277a022 to 2d7e17d Compare February 8, 2024 21:16

facebook-github-bot closed this in d6b3da6 Feb 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adds threading support in torchrec train pipeline. #1694

Adds threading support in torchrec train pipeline. #1694

fenghuizhang commented Feb 8, 2024

facebook-github-bot commented Feb 8, 2024

facebook-github-bot commented Feb 8, 2024

facebook-github-bot commented Feb 8, 2024

facebook-github-bot commented Feb 8, 2024

Adds threading support in torchrec train pipeline. #1694

Adds threading support in torchrec train pipeline. #1694

Conversation

fenghuizhang commented Feb 8, 2024

facebook-github-bot commented Feb 8, 2024

facebook-github-bot commented Feb 8, 2024

facebook-github-bot commented Feb 8, 2024

facebook-github-bot commented Feb 8, 2024