Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Token seq id #5964

Merged
merged 33 commits into from
Aug 25, 2021
Merged

Token seq id #5964

merged 33 commits into from
Aug 25, 2021

Conversation

lixinqi
Copy link
Contributor

@lixinqi lixinqi commented Aug 19, 2021

不再固定token传输main线程上的数据。而是会为每个token在实际传输的时候附带序列号。
序列号在(src_token, dst_token, thread_consistent_id, rank_group)下分类独立自增。

@lixinqi lixinqi requested a review from oneflow-ci-bot August 20, 2021 08:04
placement = flow.placement("cuda", {0: range(2)})
sbp = (flow.sbp.split(0),)
for i in range(1000):
y = x.to_consistent(placement=placement, sbp=sbp)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这会在main线程上同步元信息。

sbp = (flow.sbp.split(0),)
for i in range(1000):
y = x.to_consistent(placement=placement, sbp=sbp)
b = a.to_consistent(placement=placement, sbp=flow.sbp.broadcast)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这会在scheduler县城上同步张量数据。

@oneflow-ci-bot oneflow-ci-bot removed their request for review August 20, 2021 08:29
@oneflow-ci-bot oneflow-ci-bot self-requested a review August 25, 2021 07:59
@github-actions
Copy link
Contributor

CI failed, removing label automerge

@github-actions
Copy link
Contributor

CI failed, removing label automerge

@oneflow-ci-bot oneflow-ci-bot removed their request for review August 25, 2021 08:37
@github-actions
Copy link
Contributor

CI failed, removing label automerge

1 similar comment
@github-actions
Copy link
Contributor

CI failed, removing label automerge

@oneflow-ci-bot oneflow-ci-bot requested review from oneflow-ci-bot and removed request for oneflow-ci-bot August 25, 2021 14:43
@oneflow-ci-bot oneflow-ci-bot requested review from oneflow-ci-bot and removed request for oneflow-ci-bot August 25, 2021 17:05
@oneflow-ci-bot oneflow-ci-bot requested review from oneflow-ci-bot and removed request for oneflow-ci-bot August 25, 2021 18:16
@oneflow-ci-bot oneflow-ci-bot requested review from oneflow-ci-bot and removed request for oneflow-ci-bot August 25, 2021 19:57
@github-actions
Copy link
Contributor

Speed stats:
GPU Name: GeForce GTX 1080 

PyTorch resnet50 time: 139.0ms (= 6951.8ms / 50, input_shape=[16, 3, 224, 224], backward is enabled)
OneFlow resnet50 time: 128.1ms (= 6404.7ms / 50, input_shape=[16, 3, 224, 224], backward is enabled)
Relative speed: 1.09 (= 139.0ms / 128.1ms)

PyTorch resnet50 time: 82.8ms (= 4137.7ms / 50, input_shape=[8, 3, 224, 224], backward is enabled)
OneFlow resnet50 time: 74.5ms (= 3723.2ms / 50, input_shape=[8, 3, 224, 224], backward is enabled)
Relative speed: 1.11 (= 82.8ms / 74.5ms)

PyTorch resnet50 time: 61.2ms (= 3062.0ms / 50, input_shape=[4, 3, 224, 224], backward is enabled)
OneFlow resnet50 time: 47.4ms (= 2368.2ms / 50, input_shape=[4, 3, 224, 224], backward is enabled)
Relative speed: 1.29 (= 61.2ms / 47.4ms)

PyTorch resnet50 time: 47.9ms (= 2392.8ms / 50, input_shape=[2, 3, 224, 224], backward is enabled)
OneFlow resnet50 time: 43.5ms (= 2177.4ms / 50, input_shape=[2, 3, 224, 224], backward is enabled)
Relative speed: 1.10 (= 47.9ms / 43.5ms)

PyTorch resnet50 time: 41.8ms (= 2091.7ms / 50, input_shape=[1, 3, 224, 224], backward is enabled)
OneFlow resnet50 time: 39.4ms (= 1972.3ms / 50, input_shape=[1, 3, 224, 224], backward is enabled)
Relative speed: 1.06 (= 41.8ms / 39.4ms)

@oneflow-ci-bot oneflow-ci-bot merged commit aaf7030 into master Aug 25, 2021
@oneflow-ci-bot oneflow-ci-bot deleted the token_seq_id branch August 25, 2021 21:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants