Add CudnnConvAlgoCache #3649

liujuncheng · 2020-10-05T08:57:51Z

目前cuDNN conv 算法缓存存在一下两个问题

使用 ThreadLocalCachedCall 作为缓存，多卡之间以及编译期和运行时之间的算法推导不能有效利用缓存
当使用试跑推导算法是，在运行时推导算法可能会受到其他stream(比如 nccl 或者 decode)的影响，从而推导出错误的算法

额外加上一层全局缓存，可以一定程度上缓解 1 和 2 的影响(多机情况下2仍不能避免)。

因为编译期推导算法时使用的workspace size是cudnn buffer size，运行时用的是编译期推导出来的 workspace size，所以缓存的key抹掉了workspace size 信息，并且依据 " 更大的workspace size 推导出来的最优算法适用于更小的 workspace " 检索缓存。

TODO：
多机情况下问题 2 的解决，这里一个备选方案是在runtime启动时遍历plan并“预热” CudnnConvAlgoCache，或者大家讨论有没有更好的解决办法。

oneflow/core/device/cudnn_conv_util.cpp

oneflow/core/device/cudnn_conv_util.h

oneflow/core/device/cudnn_conv_util.cpp

lixinqi · 2020-10-05T09:55:48Z

oneflow/core/device/cudnn_conv_util.cpp

+                         // The best algorithm for larger workspace can also be used for smaller
+                         // workspace


// There might be a case that only memory size pair.second.memory was required for the best algorithm even though a workspace pair.first supplied

oneflow/core/device/cudnn_conv_util.h

leaves-zwx · 2020-10-06T03:30:32Z

当使用试跑推导算法是，在运行时推导算法可能会受到其他stream(比如 nccl 或者 decode)的影响，从而推导出错误的算法

这里是指在一个 stream 里面试跑同一个算法时，其他 stream 里面的任务不同也会影响到这个试跑的结果？

liujuncheng · 2020-10-06T04:53:54Z

当使用试跑推导算法是，在运行时推导算法可能会受到其他stream(比如 nccl 或者 decode)的影响，从而推导出错误的算法

这里是指在一个 stream 里面试跑同一个算法时，其他 stream 里面的任务不同也会影响到这个试跑的结果？

是的，同一个stream里面的kernel是串行执行，不会相互影响，多个stream里面的kernel可能并行执行，如果试跑的时候其他stream有kernel正在执行，那么时间一定会边长

* Add CudnnConvAlgoCache * refine Former-commit-id: a2af59e

liujuncheng added 2 commits October 5, 2020 16:45

Add CudnnConvAlgoCache

8a3a1e5

Merge branch 'master' into dev_cudnn_conv_algo_cache

3c4d3f6

chengtbf requested a review from leaves-zwx October 5, 2020 09:48

lixinqi reviewed Oct 5, 2020

View reviewed changes

refine

058e275

lixinqi approved these changes Oct 5, 2020

View reviewed changes

liujuncheng added the automerge label Oct 5, 2020

oneflow-ci-bot merged commit a2af59e into master Oct 5, 2020

oneflow-ci-bot deleted the dev_cudnn_conv_algo_cache branch October 5, 2020 11:21

liujuncheng added a commit that referenced this pull request Jun 3, 2021

Add CudnnConvAlgoCache (#3649)

b463722

* Add CudnnConvAlgoCache * refine Former-commit-id: a2af59e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add CudnnConvAlgoCache #3649

Add CudnnConvAlgoCache #3649

liujuncheng commented Oct 5, 2020

lixinqi Oct 5, 2020

leaves-zwx commented Oct 6, 2020 •

edited

Loading

liujuncheng commented Oct 6, 2020

		// The best algorithm for larger workspace can also be used for smaller
		// workspace

Add CudnnConvAlgoCache #3649

Add CudnnConvAlgoCache #3649

Conversation

liujuncheng commented Oct 5, 2020

lixinqi Oct 5, 2020

Choose a reason for hiding this comment

leaves-zwx commented Oct 6, 2020 • edited Loading

liujuncheng commented Oct 6, 2020

leaves-zwx commented Oct 6, 2020 •

edited

Loading