Skip to content

[PIR][Dy2St] Set TensorArray dtype after first write if its type is undefined #67981

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

SigureMo
Copy link
Member

@SigureMo SigureMo commented Sep 3, 2024

PR Category

Execute Infrastructure

PR Types

Bug fixes

Description

def tensor_array_dtype():
    l = []
    for i in range(paddle.to_tensor(3)):
        l.append(i)
    return l[0]

对于上述情况,动转静下会对 l 创建一个 float32 的 TensorArray,但后续会往里面添加 int64 的数据,这里我们有两种可以处理的方式:

  • 将数据 cast 成 array 的 int64
  • 因为类型不匹配而报错

之前是直接报错的,后来加上了 cast,这两种其实都有问题,前者会导致大量存量代码需要修改,后者会导致后续取出的元素永远是 float32

由于这里原来是动态图代码,用户没有显式标注类型,因此从最开始而言其类型就应该是 Array<Unknown>,在发现后续有 append(i64) 时应该自动推导为 Array<i64>,这是符合编译器的类型推导原则的

从 Paddle 的框架或者说 PIR 的角度,使用了 UndefinedDataType 来表示最开始尚未确定类型,对应 phi 下的 DataType::UNDEFINED,我们会为动转静转换的元素 dtype 设置为 undefined,而在后面 append 时候触发的 ArrayWriteInferMeta 设置成第一个元素的 dtype

由于动转静只有捕获 append 等相关操作才会转为 TensorArray,因此创建的 UndefinedDataType 的 array 一定会在组网期间确定类型,组网完成后不会再有 UndefinedDataType 的 TensorArray

Pcard-67164

Copy link

paddle-bot bot commented Sep 3, 2024

你的PR提交成功,感谢你对开源项目的贡献!
请关注后续CI自动化测试结果,详情请参考Paddle-CI手册
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

@PaddlePaddle PaddlePaddle locked and limited conversation to collaborators Sep 4, 2024
@PaddlePaddle PaddlePaddle unlocked this conversation Sep 4, 2024
@SigureMo SigureMo changed the title [Dy2St] Set TensorArray dtype after first write if its type is undefined [PIR][Dy2St] Set TensorArray dtype after first write if its type is undefined Sep 13, 2024
@@ -136,6 +137,7 @@ class Builder {
template <typename OpTy, typename... Args>
OpTy Build(Args &&...args);

IR_API UndefinedDataType undefined_data_type();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

是不是可以直接叫UndefinedType, 赶紧"Data"有点冗余,个人觉得在不存在歧义的情况下,名字越短越好。

winter-wang
winter-wang previously approved these changes Sep 14, 2024
Copy link
Contributor

@winter-wang winter-wang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work!

changeyoung98
changeyoung98 previously approved these changes Sep 14, 2024
Copy link
Contributor

@changeyoung98 changeyoung98 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@SigureMo SigureMo dismissed stale reviews from changeyoung98 and winter-wang via 13f5d88 September 14, 2024 03:20
Copy link
Contributor

@zhangbo9674 zhangbo9674 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@SigureMo SigureMo changed the title [PIR][Dy2St] Set TensorArray dtype after first write if its type is undefined [Typing][PIR][Dy2St] Set TensorArray dtype after first write if its type is undefined Sep 14, 2024
@SigureMo SigureMo changed the title [Typing][PIR][Dy2St] Set TensorArray dtype after first write if its type is undefined [PIR][Dy2St] Set TensorArray dtype after first write if its type is undefined Sep 14, 2024
@SigureMo SigureMo merged commit b99a3cd into PaddlePaddle:develop Sep 16, 2024
28 of 30 checks passed
@SigureMo SigureMo deleted the dy2st/set-tensor-array-dtype-after-first-write-if-its-type-is-undefined branch September 16, 2024 05:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants