Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

disable cuda_h2d stream #6020

Merged
merged 20 commits into from
Aug 25, 2021
Merged

disable cuda_h2d stream #6020

merged 20 commits into from
Aug 25, 2021

Conversation

lixinqi
Copy link
Contributor

@lixinqi lixinqi commented Aug 24, 2021

@oneflow-ci-bot oneflow-ci-bot removed their request for review August 24, 2021 06:58
@oneflow-ci-bot oneflow-ci-bot requested review from oneflow-ci-bot and removed request for oneflow-ci-bot August 24, 2021 06:58
@oneflow-ci-bot oneflow-ci-bot self-requested a review August 24, 2021 10:20
@oneflow-ci-bot oneflow-ci-bot requested review from oneflow-ci-bot and removed request for oneflow-ci-bot August 24, 2021 11:29
@oneflow-ci-bot oneflow-ci-bot requested review from oneflow-ci-bot and removed request for oneflow-ci-bot August 24, 2021 15:43
@oneflow-ci-bot oneflow-ci-bot self-requested a review August 24, 2021 17:23
@oneflow-ci-bot oneflow-ci-bot requested review from oneflow-ci-bot and removed request for oneflow-ci-bot August 24, 2021 18:23
@oneflow-ci-bot oneflow-ci-bot self-requested a review August 24, 2021 23:53
@oneflow-ci-bot oneflow-ci-bot requested review from oneflow-ci-bot and removed request for oneflow-ci-bot August 25, 2021 01:32
@oneflow-ci-bot oneflow-ci-bot self-requested a review August 25, 2021 05:20
@@ -84,7 +84,7 @@ Maybe<const std::string&> Device::of_type() const {
Maybe<const std::string&> GetLocalCallInstructionName(const std::string& type) {
static const HashMap<std::string, std::string> type2instr_name{
{"cpu", "cpu.LocalCallOpKernel"}, {"cuda", "gpu.LocalCallOpKernel"},
{"gpu", "gpu.LocalCallOpKernel"}, {"cuda_h2d", "cuda_h2d.LocalCallOpKernel"},
{"gpu", "gpu.LocalCallOpKernel"}, {"cuda_h2d", "gpu.LocalCallOpKernel"},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

需要加一行注释么

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

可以

@oneflow-ci-bot oneflow-ci-bot requested review from oneflow-ci-bot and removed request for oneflow-ci-bot August 25, 2021 06:30
@oneflow-ci-bot oneflow-ci-bot requested review from oneflow-ci-bot and removed request for oneflow-ci-bot August 25, 2021 08:00
@oneflow-ci-bot oneflow-ci-bot requested review from oneflow-ci-bot and removed request for oneflow-ci-bot August 25, 2021 09:39
@oneflow-ci-bot oneflow-ci-bot self-requested a review August 25, 2021 11:25
@github-actions
Copy link
Contributor

Speed stats:
GPU Name: GeForce GTX 1080 

PyTorch resnet50 time: 140.6ms (= 7028.8ms / 50, input_shape=[16, 3, 224, 224], backward is enabled)
OneFlow resnet50 time: 128.3ms (= 6415.0ms / 50, input_shape=[16, 3, 224, 224], backward is enabled)
Relative speed: 1.10 (= 140.6ms / 128.3ms)

PyTorch resnet50 time: 83.7ms (= 4185.2ms / 50, input_shape=[8, 3, 224, 224], backward is enabled)
OneFlow resnet50 time: 74.7ms (= 3733.7ms / 50, input_shape=[8, 3, 224, 224], backward is enabled)
Relative speed: 1.12 (= 83.7ms / 74.7ms)

PyTorch resnet50 time: 56.4ms (= 2820.5ms / 50, input_shape=[4, 3, 224, 224], backward is enabled)
OneFlow resnet50 time: 47.8ms (= 2391.7ms / 50, input_shape=[4, 3, 224, 224], backward is enabled)
Relative speed: 1.18 (= 56.4ms / 47.8ms)

PyTorch resnet50 time: 47.0ms (= 2350.9ms / 50, input_shape=[2, 3, 224, 224], backward is enabled)
OneFlow resnet50 time: 38.3ms (= 1917.1ms / 50, input_shape=[2, 3, 224, 224], backward is enabled)
Relative speed: 1.23 (= 47.0ms / 38.3ms)

PyTorch resnet50 time: 42.9ms (= 2142.7ms / 50, input_shape=[1, 3, 224, 224], backward is enabled)
OneFlow resnet50 time: 42.2ms (= 2108.6ms / 50, input_shape=[1, 3, 224, 224], backward is enabled)
Relative speed: 1.02 (= 42.9ms / 42.2ms)

@oneflow-ci-bot oneflow-ci-bot merged commit d847e11 into master Aug 25, 2021
@oneflow-ci-bot oneflow-ci-bot deleted the disable_cuda_h2d_stream branch August 25, 2021 12:32
@oneflow-ci-bot oneflow-ci-bot removed their request for review August 25, 2021 12:32
daquexian added a commit that referenced this pull request Sep 13, 2021
oneflow-ci-bot added a commit that referenced this pull request Sep 17, 2021
* async launched allreduce

* ReleaseTensor instruction per stream

* Revert "disable cuda_h2d stream (#6020)"

This reverts commit d847e11.

* restore "/ world_size"

Signed-off-by: daquexian <daquexian566@gmail.com>

* add soft sync before release tensor and local call

Signed-off-by: daquexian <daquexian566@gmail.com>

* add need_soft_sync_stream, only get producer value when last_used_device is not none

Signed-off-by: daquexian <daquexian566@gmail.com>

* refine

Signed-off-by: daquexian <daquexian566@gmail.com>

* fix bug

Signed-off-by: daquexian <daquexian566@gmail.com>

* remove need_soft_sync_stream table

Signed-off-by: daquexian <daquexian566@gmail.com>

* fix comments

Signed-off-by: daquexian <daquexian566@gmail.com>

* auto format by CI

* update ddp speed test threshold

Signed-off-by: daquexian <daquexian566@gmail.com>

Co-authored-by: lixinqi <lixinqi0703106@163.com>
Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants