-
Notifications
You must be signed in to change notification settings - Fork 5.9k
Closed
Description
这个问题,我#3899 (comment) 提到过,@qingqing01 也做过相关实验#4107 (comment) ,总体现象就是,python单测中输入/输出参数名字写错了,单测直接挂掉,并显示SegFault
。还是以#4107 (comment) 中的case 2为例。我们故意将test_mul_op.py
中的输入名字写错Y -> Y0
,如下:
class TestMulOp(OpTest):
def setUp(self):
self.op_type = "mul"
self.inputs = {
'X': np.random.random((32, 84)).astype("float32"),
'Y0': np.random.random((84, 100)).astype("float32")
}
self.outputs = {'Out': np.dot(self.inputs['X'], self.inputs['Y0'])}
def test_check_output(self):
self.check_output()
执行单测的结果:
126: Test timeout computed to be: 9.99988e+06
1/1 Test #126: test_mul_op ......................***Exception: SegFault 45.98 sec
0% tests passed, 1 tests failed out of 1
Total Test time (real) = 46.00 sec
OperatorBase的构造函数中有对输入输出参数名是否存在进行检查(https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/operator.cc#L165 ),若不存在,则会ENFORCE
失败,出现#3899 (comment) 中的错误提示和堆栈信息。因此,真实情况应该是:python端允许一些输入/输出参数不设置,并且认为该参数名是存在的,但是创建相应的Variable变量,因此需要在写op时,对Variable是否为null进行检查。
很多op在的实现,没有检查输入输出变量是否为空,而直接引用,从而导致了SegFault
。比如mul_op.cc
中:
23 class MulOp : public framework::OperatorWithKernel {
24 public:
25 using framework::OperatorWithKernel::OperatorWithKernel;
26
27 protected:
28 void InferShape(const framework::InferShapeContext &ctx) const override {
29 auto x_dims = ctx.Input<Tensor>("X")->dims();
30 auto y_dims = ctx.Input<Tensor>("Y")->dims();
...
我在mul_op.cc
中加入PADDLE_ENFORCE_NOT_NULL
的检查:
$ git diff mul_op.cc
diff --git a/paddle/operators/mul_op.cc b/paddle/operators/mul_op.cc
index 015e13d..15b48b8 100644
--- a/paddle/operators/mul_op.cc
+++ b/paddle/operators/mul_op.cc
@@ -26,6 +26,8 @@ class MulOp : public framework::OperatorWithKernel {
protected:
void InferShape(const framework::InferShapeContext &ctx) const override {
+ PADDLE_ENFORCE_NOT_NULL(ctx.InputVar("X"), "Input(X) of MulOp should not be null.");
+ PADDLE_ENFORCE_NOT_NULL(ctx.InputVar("Y"), "Input(Y) of MulOp should not be null.");
auto x_dims = ctx.Input<Tensor>("X")->dims();
auto y_dims = ctx.Input<Tensor>("Y")->dims();
int x_num_col_dims = Attr<int>("x_num_col_dims");
再次运行上述单测的case,执行结果如下:
138: ======================================================================
138: ERROR: test_check_output (__main__.TestMulOp)
138: ----------------------------------------------------------------------
138: Traceback (most recent call last):
138: File "test_mul_op.py", line 16, in test_check_output
138: self.check_output()
138: File "/home/liuyiqun01/github/Paddle/python/paddle/v2/framework/tests/op_test.py", line 211, in check_output
138: self.check_output_with_place(place)
138: File "/home/liuyiqun01/github/Paddle/python/paddle/v2/framework/tests/op_test.py", line 183, in check_output_with_place
138: self.op.infer_shape(self.scope)
138: RuntimeError: ctx.InputVar("Y") should not be null
138: Input(Y) of MulOp should not be null. at [/home/liuyiqun01/github/Paddle/paddle/operators/mul_op.cc:30]
138: PaddlePaddle Call Stacks:
138: 0 0x7f26ca7917a8p paddle::platform::EnforceNotMet::EnforceNotMet(std::__exception_ptr::exception_ptr, char const*, int) + 648
138: 1 0x7f26ca82af7fp paddle::operators::MulOp::InferShape(paddle::framework::InferShapeContext const&) const + 2943
138: 2 0x7f26ca7b5cb1p paddle::framework::OperatorWithKernel::InferShape(paddle::framework::Scope const&) const + 33
因此,op在实现时,必须使用PADDLE_ENFORCE_NOT_NULL
对输入/输出是否为空进行检查。为了保证都有check,可在单测中故意将每个参数写错,以进行验证。
Yancey0623
Metadata
Metadata
Assignees
Labels
No labels