Skip to content

Remove the CUDA stream synchronization between each operator. #6284

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Dec 5, 2017

Conversation

qingqing01
Copy link
Contributor

@qingqing01 qingqing01 commented Dec 5, 2017

Fix #6283

At first, we add this CUDA stream synchronization in the operator developing period to detect the CUDA error of each CUDA kernel. When the framework is stable, this synchronization should be removed to speed up training.

Copy link
Collaborator

@reyoung reyoung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool. Should we remove DeviceContext::Finish?

@qingqing01
Copy link
Contributor Author

Should we remove DeviceContext::Finish?

I think this interface is convenient to debug, especially for the error check of CUDA kernel.

@qingqing01
Copy link
Contributor Author

I merge this PR, the DeviceContext::Finish can be removed later, if necessary.

@qingqing01 qingqing01 merged commit 1a8f20c into PaddlePaddle:develop Dec 5, 2017
@qingqing01 qingqing01 deleted the cuda_sync branch November 14, 2019 05:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants