Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add CudnnConvAlgoCache #3649

Merged
merged 3 commits into from
Oct 5, 2020
Merged

Add CudnnConvAlgoCache #3649

merged 3 commits into from
Oct 5, 2020

Conversation

liujuncheng
Copy link
Collaborator

目前cuDNN conv 算法缓存存在一下两个问题

  1. 使用 ThreadLocalCachedCall 作为缓存,多卡之间以及编译期和运行时之间的算法推导不能有效利用缓存
  2. 当使用试跑推导算法是,在运行时推导算法可能会受到其他stream(比如 nccl 或者 decode)的影响,从而推导出错误的算法

额外加上一层全局缓存,可以一定程度上缓解 1 和 2 的影响(多机情况下2仍不能避免)。

因为编译期推导算法时使用的workspace size是cudnn buffer size,运行时用的是编译期推导出来的 workspace size,所以缓存的key抹掉了workspace size 信息,并且依据 " 更大的workspace size 推导出来的最优算法适用于更小的 workspace " 检索缓存。

TODO:
多机情况下问题 2 的解决,这里一个备选方案是在runtime启动时遍历plan并“预热” CudnnConvAlgoCache,或者大家讨论有没有更好的解决办法。

@chengtbf chengtbf requested a review from leaves-zwx October 5, 2020 09:48
Comment on lines 147 to 148
// The best algorithm for larger workspace can also be used for smaller
// workspace
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// There might be a case that only memory size pair.second.memory was required for the best algorithm even though a workspace pair.first supplied

@oneflow-ci-bot oneflow-ci-bot merged commit a2af59e into master Oct 5, 2020
@oneflow-ci-bot oneflow-ci-bot deleted the dev_cudnn_conv_algo_cache branch October 5, 2020 11:21
@leaves-zwx
Copy link
Contributor

leaves-zwx commented Oct 6, 2020

  1. 当使用试跑推导算法是,在运行时推导算法可能会受到其他stream(比如 nccl 或者 decode)的影响,从而推导出错误的算法

这里是指在一个 stream 里面试跑同一个算法时,其他 stream 里面的任务不同也会影响到这个试跑的结果?

@liujuncheng
Copy link
Collaborator Author

  1. 当使用试跑推导算法是,在运行时推导算法可能会受到其他stream(比如 nccl 或者 decode)的影响,从而推导出错误的算法

这里是指在一个 stream 里面试跑同一个算法时,其他 stream 里面的任务不同也会影响到这个试跑的结果?

是的,同一个stream里面的kernel是串行执行,不会相互影响,多个stream里面的kernel可能并行执行,如果试跑的时候其他stream有kernel正在执行,那么时间一定会边长

liujuncheng added a commit that referenced this pull request Jun 3, 2021
* Add CudnnConvAlgoCache

* refine

Former-commit-id: a2af59e
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants