Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve Operator's Python Stack Trace Debug Info #8028

Merged
merged 10 commits into from
Apr 22, 2022

Conversation

xiacijie
Copy link
Contributor

  1. add a parameter 'max_stack_depth' to specify the max depth for the stack trace
  2. refactor other debug related classes.

1. add a parameter 'max_stack_depth' to specify the max depth for the stack trace
2. refactor other debug related classes.
@xiacijie xiacijie requested a review from strint April 15, 2022 06:41
@xiacijie xiacijie changed the title improve the debug information for the python stack trace 改进Op的Python Stack Trace的Debug信息 Apr 15, 2022
@xiacijie
Copy link
Contributor Author

xiacijie commented Apr 15, 2022

相关的Issue:https://github.com/Oneflow-Inc/OneTeam/issues/1305

原来的Op 的Python Stack debug信息只会显示 两层,对于需要大于两层调用信息的Op来说,就无法去查看追踪了。

所以给Graph.debug() 函数增加了一个max_stack_depth 的参数来声明需要展示的最大Stack深度,默认值是2。并且在debug模式下,会显示还有多少Stack盏未显示 例如:( ... 8 more ) 代表还有8个栈帧未显示。

更改之前(深度一直是2):
image

更改之后(max_stack_depth 设为 4, 深度总共为8,还有4个未显示):
image

(max_stack_depth 设为8):
image

@xiacijie
Copy link
Contributor Author

@strint comments resolve了

@strint
Copy link
Contributor

strint commented Apr 18, 2022

@xiacijie
Copy link
Contributor Author

@strint debug 相关的test已更新。

另外有一个更新是 block和graph self._debug_max_v_level 在__init__方法底下的默认值为-1, 对应之前self._debug 为 False的情况。表示不调用debug函数的话,debug是默认关闭的。

Copy link
Contributor

@strint strint left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@xiacijie
Copy link
Contributor Author

xiacijie commented Apr 21, 2022

@strint @BBuf nn.Graph.debug的外部接口没变,v_level 设置为-1代表关闭debug mode。

但内部实现里面我把之前删掉的self._debug 字段给加回来了,因为如果仅仅用 self._debug_max_v_level 大于0来判断是否开启了debug 模式,有个问题会是 glog里面的Flags_v 的初始值为0,所以在 class DebugScopeContext(object) 下面的 self._v = max(v_level, self._prev_v) 这一段代码会把一些没有开启debug的scope的v_level 给设置成0, 这样会错误的开启debug mode,逻辑就不对了。

所以我加了一个 self._debug来判断是否开启debug mode, 而不是之前用v_level 来判断是否开启debug mode。

@xiacijie xiacijie requested review from oneflow-ci-bot and removed request for oneflow-ci-bot April 22, 2022 04:16
@github-actions
Copy link
Contributor

Code got formatted by CI. Please request CI again if you still want to have this PR merged. If the PR is from a forked repo, please download the patch files from the GitHub Actions web page and apply them locally.

@xiacijie xiacijie removed the request for review from oneflow-ci-bot April 22, 2022 06:57
@xiacijie xiacijie enabled auto-merge (squash) April 22, 2022 07:42
@xiacijie xiacijie requested a review from oneflow-ci-bot April 22, 2022 07:42
@github-actions
Copy link
Contributor

View latest API docs preview at: https://staging.oneflow.info/docs/Oneflow-Inc/oneflow/pr/8028/

@github-actions
Copy link
Contributor

Speed stats:
GPU Name: GeForce GTX 1080 

✔️ OneFlow resnet50 time: 128.6ms (= 12856.3ms / 100, input_shape=[16, 3, 224, 224])
PyTorch resnet50 time: 141.1ms (= 14111.2ms / 100, input_shape=[16, 3, 224, 224])
✔️ Relative speed: 1.10 (= 141.1ms / 128.6ms)

OneFlow resnet50 time: 80.6ms (= 8064.8ms / 100, input_shape=[8, 3, 224, 224])
PyTorch resnet50 time: 82.5ms (= 8245.2ms / 100, input_shape=[8, 3, 224, 224])
❌ Relative speed: 1.02 (= 82.5ms / 80.6ms)

OneFlow resnet50 time: 53.4ms (= 10688.8ms / 200, input_shape=[4, 3, 224, 224])
PyTorch resnet50 time: 58.8ms (= 11756.9ms / 200, input_shape=[4, 3, 224, 224])
✔️ Relative speed: 1.10 (= 58.8ms / 53.4ms)

OneFlow resnet50 time: 46.1ms (= 9229.5ms / 200, input_shape=[2, 3, 224, 224])
PyTorch resnet50 time: 38.9ms (= 7775.4ms / 200, input_shape=[2, 3, 224, 224])
❌ Relative speed: 0.84 (= 38.9ms / 46.1ms)

OneFlow resnet50 time: 38.8ms (= 7756.9ms / 200, input_shape=[1, 3, 224, 224])
PyTorch resnet50 time: 36.9ms (= 7383.7ms / 200, input_shape=[1, 3, 224, 224])
✔️ Relative speed: 0.95 (= 36.9ms / 38.8ms)

OneFlow swin dataloader time: 0.266s (= 53.164s / 200, num_workers=1)
PyTorch swin dataloader time: 0.151s (= 30.149s / 200, num_workers=1)
Relative speed: 0.567 (= 0.151s / 0.266s)

OneFlow swin dataloader time: 0.067s (= 13.361s / 200, num_workers=4)
PyTorch swin dataloader time: 0.040s (= 8.085s / 200, num_workers=4)
Relative speed: 0.605 (= 0.040s / 0.067s)

OneFlow swin dataloader time: 0.037s (= 7.424s / 200, num_workers=8)
PyTorch swin dataloader time: 0.022s (= 4.464s / 200, num_workers=8)
Relative speed: 0.601 (= 0.022s / 0.037s)

✔️ OneFlow resnet50 time: 136.1ms (= 13606.3ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 162.1ms (= 16209.7ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.19 (= 162.1ms / 136.1ms)

OneFlow resnet50 time: 92.2ms (= 9215.5ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 112.1ms (= 11205.0ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.22 (= 112.1ms / 92.2ms)

OneFlow resnet50 time: 64.0ms (= 12795.1ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 87.7ms (= 17541.8ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.37 (= 87.7ms / 64.0ms)

OneFlow resnet50 time: 51.9ms (= 10388.0ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 71.2ms (= 14235.7ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.37 (= 71.2ms / 51.9ms)

OneFlow resnet50 time: 51.3ms (= 10259.6ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 74.7ms (= 14932.7ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.46 (= 74.7ms / 51.3ms)

@github-actions
Copy link
Contributor

CI failed when running job: cuda-module. PR label automerge has been removed

@xiacijie xiacijie merged commit 707eec8 into master Apr 22, 2022
@xiacijie xiacijie deleted the improve_python_stack_debug_info branch April 22, 2022 15:19
@xiacijie xiacijie changed the title 改进Op的Python Stack Trace的Debug信息 improve Operator's Python Stack Trace Debug Info Apr 22, 2022
@xiacijie xiacijie changed the title improve Operator's Python Stack Trace Debug Info Improve Operator's Python Stack Trace Debug Info Apr 22, 2022
xiacijie added a commit that referenced this pull request Apr 22, 2022
@xiacijie xiacijie restored the improve_python_stack_debug_info branch April 22, 2022 16:28
jackalcooper added a commit that referenced this pull request Apr 24, 2022
@@ -36,4 +36,13 @@ void SetGraphVerboseStepLr(bool verbose) {
*graph_verbose_step_lr = verbose;
}

std::atomic<int32_t>* GetGraphDebugMaxPyStackDepthVar() {
static std::atomic<int32_t> graph_debug_max_py_stack_depth{2};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

现在有地方可以方便复现不,感觉是这里的问题:

static std::atomic<int32_t> graph_debug_max_py_stack_depth

改成

static thread_local int32_t graph_debug_max_py_stack_depth

试试

std::string back_f_str = PyObjectToReprStr((PyObject*)back_frame).GetOrThrow();
cur_f_str = "Python stack[-2]: " + back_f_str + "; " + cur_f_str;
Py_XDECREF(back_frame);
std::string get_cur_frame_stack_str() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这些地方是不是有些过度设计了。既然当前有DebugScopeContext。那是不是可以注入一个std::function<std::string()>类型的GetCurrentFrameStackStr的函数,这个函数的实现是python。

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

python层来获取 stack frame str,传递到 c++ 层?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

c++调用python代码。

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@caishenghang 看看行不行

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里背景是在 c api functor 接口处,统一为 op 生成 python 代码的 location。

目前看都在 c++ 层调用 python c api 做是直接的。调用 python 代码反而是间接的。

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个我和xiaoyu讨论过了,我觉得不是过度设计的问题,我觉得奇怪的是这里用python的一个类来提供接口,但是C++里面却用一个静态变量来存配置。xiaoyu 指出这是参考autograd的实现来写的。
我觉得比较合理的做法是配置放在debug context上,然后一个或多个graph持有这个debug context的引用。
现在这样的做法看上去在简单的需求中没问题,但是如果以后graph的用法更灵活了,就会有问题,比如graph被提取subgraph,graph被clone,并行编译 graph,而 debug context 可能会被添加更多的变量来实现新功能。

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

debug context 可能会被添加更多的变量

嗯,现在这里创建了2个变量,之前只有一个。后面有计划做一次合并,用一个对象来统一处理。

@@ -29,6 +29,7 @@ limitations under the License.
#include "oneflow/core/common/util.h"
#include "oneflow/core/job/lazy_mode.h"
#include "oneflow/core/framework/op_interpreter/dispatch_frame.h"
#include "oneflow/api/python/env/env.h"
Copy link
Contributor

@strint strint Apr 24, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

目前发现,是这个include带来的问题。

只 inculde 本文件依赖的两个函数,不 include env.h 中其它代码,就避免了 https://github.com/Oneflow-Inc/OneTeam/issues/1343 中的问题。

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

是不是有什么全局变量的问题,一旦include会造成一个全局变量存在多个版本?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

是不是有什么全局变量的问题

嗯,这种可能性最大。中间发现了一个,但是试验了一下没有解决问题。

这里有个总结:https://github.com/Oneflow-Inc/OneTeam/issues/1343#issuecomment-1107836666

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这一般是不好的实践造成的,譬如在头文件里定义了全局变量,会导致每次被include一次就定义一次

@xiacijie
Copy link
Contributor Author

刚刚我开了一个新branch: https://github.com/Oneflow-Inc/oneflow/tree/fix_import_oneflow_seg_fault
这里面就没有include env.h头文件,准备让 @yoonlee888 帮测一下

@chengtbf chengtbf deleted the improve_python_stack_debug_info branch April 13, 2023 08:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants