Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor boxing interpreter to boxing expr #6134

Merged
merged 139 commits into from
Sep 2, 2021

Conversation

clackhan
Copy link
Contributor

@clackhan clackhan commented Sep 1, 2021

删除旧版boxing interpreter实现,所有类型的boxing均通过boxing expr实现

lixinqi and others added 30 commits August 13, 2021 23:00
…bugfix_data_transport_token_per_placement
…ithub.com/Oneflow-Inc/oneflow into bugfix_data_transport_token_per_placement

Conflicts:
	oneflow/core/vm/oneflow_vm.cpp
Signed-off-by: daquexian <daquexian566@gmail.com>
Signed-off-by: daquexian <daquexian566@gmail.com>
clackhan and others added 9 commits September 1, 2021 18:53
* cuda base cpu mpi boxing

* cpu_mpi

* fix conflicts

* add cpu mpi unittests

* more checks and unittests

* abstract_consistent_to_consistent_op_expr

* fix compiler complaint

* refactor consistent-to-consistent eager consisitent op interpreter

* fix compiler complaint

* refactor ConsistentToConsistentOpExpr

* lazy interpreter (#5903)

* fix bugs about consistent_id

* more test_consistent_cast unittests

* refactor functional::ToConsistent

* refactor GetNdSbp

* fix compiler complaints

* Update eager_consistent_op_interpreter.cpp

* Update eager_mirrored_op_interpreter.cpp

* eager_boxing_1_to_n

* add missing files

* del useless file

* minor fix

* refine

* refactor GetDevice4CurrentProcessCtx

* refine

* minor fix

* Update naive_1ton_boxing_interpreter.cpp

* eager_boxing_n_to_1

* add test case

* refine

* Update eager_boxing_interpreter_mgr.cpp

* Update eager_boxing_interpreter_mgr.cpp

* fix error

* fix error

* auto format by CI

* fix error

* refine

* refine

* make of_format

* make of_format

* Update nd_sbp.h

* fix consistent id check error

* refine

* back up

* refine

* minor fix

* refine

* refine

* refine

* refine

* minor fix

* minor fix

* refine

* refine

* Update nccl_boxing_function.cpp

* back up

* refine

* minor fix

* refine

* fix consistent meta check bug

* zoom kLimitParallelConfString

* refine

* add nccl functional api

* Update naive_n_to_1_boxing.cpp

* minor fix

* refine

* refine

* naive_generic_boxing

* refine

* test case

* back up

* back up

* fix nccl deadlock bug

* add test case

* add test case

* add test_eager_boxing_with_overlapping_placement test case

* refine

* add test case

* add test case

* fix check bug and add test case

* add test case

* add boxing_expr_with_inclusive_placement boxing expr

* refine

* refine

* refine

* refine

* minor fix

Co-authored-by: Xinqi Li <lixinqi0703106@163.com>
Co-authored-by: leaves-zwx <kunta0932@gmail.com>
Co-authored-by: Li Xinqi <lixinqi2010@gmail.com>
Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org>
…refactor_BoxingInterpreter_to_BoxingExpr

Conflicts:
	oneflow/core/framework/op_interpreter/boxing/eager_boxing_interpreter_mgr.cpp
	oneflow/core/framework/op_interpreter/boxing/naive_b2p_boxing_interpreter.cpp
Comment on lines -50 to +38
if (tensor_nd_sbp == out->nd_sbp()) { return tensor; }
// reset sbp if parallel_num == 1 and reset transport_token
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

由于会对输出设置transport_token,故即使输入输出的placement和sbp一样也不能直接返回tensor

@@ -45,20 +45,6 @@ class EagerBoxingInterpreter {
Symbol<ParallelDesc> out_parallel_desc) const = 0;
};

struct EagerBoxingCall {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这些先留着。因为decompose暂时需要他们。

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

以还原

@github-actions
Copy link
Contributor

github-actions bot commented Sep 2, 2021

Speed stats:
GPU Name: GeForce GTX 1080 

OneFlow resnet50 time: 128.1ms (= 6407.2ms / 50, input_shape=[16, 3, 224, 224])
PyTorch resnet50 time: 139.9ms (= 6996.9ms / 50, input_shape=[16, 3, 224, 224])
Relative speed: 1.09 (= 139.9ms / 128.1ms)

OneFlow resnet50 time: 74.8ms (= 3740.0ms / 50, input_shape=[8, 3, 224, 224])
PyTorch resnet50 time: 86.3ms (= 4315.8ms / 50, input_shape=[8, 3, 224, 224])
Relative speed: 1.15 (= 86.3ms / 74.8ms)

OneFlow resnet50 time: 48.8ms (= 2439.0ms / 50, input_shape=[4, 3, 224, 224])
PyTorch resnet50 time: 58.8ms (= 2937.6ms / 50, input_shape=[4, 3, 224, 224])
Relative speed: 1.20 (= 58.8ms / 48.8ms)

OneFlow resnet50 time: 41.2ms (= 2057.5ms / 50, input_shape=[2, 3, 224, 224])
PyTorch resnet50 time: 49.4ms (= 2468.0ms / 50, input_shape=[2, 3, 224, 224])
Relative speed: 1.20 (= 49.4ms / 41.2ms)

OneFlow resnet50 time: 36.6ms (= 1831.9ms / 50, input_shape=[1, 3, 224, 224])
PyTorch resnet50 time: 39.6ms (= 1978.5ms / 50, input_shape=[1, 3, 224, 224])
Relative speed: 1.08 (= 39.6ms / 36.6ms)

OneFlow resnet50 time: 144.0ms (= 7200.5ms / 50, input_shape=[16, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 160.8ms (= 8040.4ms / 50, input_shape=[16, 3, 224, 224], ddp, world size=2)
Relative speed: 1.12 (= 160.8ms / 144.0ms)

OneFlow resnet50 time: 93.1ms (= 4653.2ms / 50, input_shape=[8, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 102.2ms (= 5112.4ms / 50, input_shape=[8, 3, 224, 224], ddp, world size=2)
Relative speed: 1.10 (= 102.2ms / 93.1ms)

OneFlow resnet50 time: 67.2ms (= 3358.5ms / 50, input_shape=[4, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 80.7ms (= 4035.3ms / 50, input_shape=[4, 3, 224, 224], ddp, world size=2)
Relative speed: 1.20 (= 80.7ms / 67.2ms)

OneFlow resnet50 time: 70.4ms (= 3521.0ms / 50, input_shape=[2, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 60.2ms (= 3009.6ms / 50, input_shape=[2, 3, 224, 224], ddp, world size=2)
Relative speed: 0.85 (= 60.2ms / 70.4ms)

OneFlow resnet50 time: 66.8ms (= 3341.3ms / 50, input_shape=[1, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 58.6ms (= 2928.0ms / 50, input_shape=[1, 3, 224, 224], ddp, world size=2)
Relative speed: 0.88 (= 58.6ms / 66.8ms)

@oneflow-ci-bot oneflow-ci-bot removed their request for review September 2, 2021 02:54
@github-actions
Copy link
Contributor

github-actions bot commented Sep 2, 2021

Speed stats:
GPU Name: GeForce GTX 1080 

OneFlow resnet50 time: 128.1ms (= 6405.0ms / 50, input_shape=[16, 3, 224, 224])
PyTorch resnet50 time: 141.4ms (= 7069.0ms / 50, input_shape=[16, 3, 224, 224])
Relative speed: 1.10 (= 141.4ms / 128.1ms)

OneFlow resnet50 time: 74.7ms (= 3733.7ms / 50, input_shape=[8, 3, 224, 224])
PyTorch resnet50 time: 83.9ms (= 4196.8ms / 50, input_shape=[8, 3, 224, 224])
Relative speed: 1.12 (= 83.9ms / 74.7ms)

OneFlow resnet50 time: 48.1ms (= 2406.9ms / 50, input_shape=[4, 3, 224, 224])
PyTorch resnet50 time: 58.1ms (= 2904.0ms / 50, input_shape=[4, 3, 224, 224])
Relative speed: 1.21 (= 58.1ms / 48.1ms)

OneFlow resnet50 time: 41.0ms (= 2051.5ms / 50, input_shape=[2, 3, 224, 224])
PyTorch resnet50 time: 45.1ms (= 2253.5ms / 50, input_shape=[2, 3, 224, 224])
Relative speed: 1.10 (= 45.1ms / 41.0ms)

OneFlow resnet50 time: 42.1ms (= 2103.3ms / 50, input_shape=[1, 3, 224, 224])
PyTorch resnet50 time: 39.2ms (= 1958.5ms / 50, input_shape=[1, 3, 224, 224])
Relative speed: 0.93 (= 39.2ms / 42.1ms)

OneFlow resnet50 time: 144.7ms (= 7235.9ms / 50, input_shape=[16, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 160.1ms (= 8004.4ms / 50, input_shape=[16, 3, 224, 224], ddp, world size=2)
Relative speed: 1.11 (= 160.1ms / 144.7ms)

OneFlow resnet50 time: 92.5ms (= 4623.1ms / 50, input_shape=[8, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 104.2ms (= 5207.9ms / 50, input_shape=[8, 3, 224, 224], ddp, world size=2)
Relative speed: 1.13 (= 104.2ms / 92.5ms)

OneFlow resnet50 time: 66.3ms (= 3315.4ms / 50, input_shape=[4, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 76.4ms (= 3820.8ms / 50, input_shape=[4, 3, 224, 224], ddp, world size=2)
Relative speed: 1.15 (= 76.4ms / 66.3ms)

OneFlow resnet50 time: 68.4ms (= 3421.9ms / 50, input_shape=[2, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 65.7ms (= 3286.1ms / 50, input_shape=[2, 3, 224, 224], ddp, world size=2)
Relative speed: 0.96 (= 65.7ms / 68.4ms)

OneFlow resnet50 time: 60.8ms (= 3037.9ms / 50, input_shape=[1, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 64.2ms (= 3209.4ms / 50, input_shape=[1, 3, 224, 224], ddp, world size=2)
Relative speed: 1.06 (= 64.2ms / 60.8ms)

@oneflow-ci-bot oneflow-ci-bot removed their request for review September 2, 2021 04:11
@oneflow-ci-bot oneflow-ci-bot merged commit 487dc00 into master Sep 2, 2021
@oneflow-ci-bot oneflow-ci-bot deleted the refactor_BoxingInterpreter_to_BoxingExpr branch September 2, 2021 04:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants