Skip to content

Conversation

@SunNy820828449
Copy link
Contributor

PR types

Performance optimization

PR changes

OPs

Describe

optimize the index computation

Paddle vs Pytorch

--- Paddle-Old Pytorch Paddle
shape=(1000,2000)
axis=(1)
shift=(5)
avg ~ 25.5 us avg ~ 24us avg ~ 23.75us
shape=(1000,2000)
axis=(0)
shift=(5)
avg ~ 24.9us avg ~ 24us avg ~ 22.8us
shape=(1000,2000)
axis=(0,1)
shift=(5,5)
avg ~ 37.7 us avg ~ 48us avg ~ 31.4us

@paddle-bot-old
Copy link

paddle-bot-old bot commented Jul 1, 2021

Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

dim_idx = (idx / strides[i]) % sizes[i];
dim_idx_shift = (dim_idx + shifts[i]) % sizes[i];
output_idx = output_idx + (dim_idx_shift - dim_idx) * strides[i];
dim_idx = (idx / strides[i]) % sizes[i] + shifts[i];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

变量名应符合实际代表的含义,这里应该是原来的dim_idx_shift,且临时变量dim_idx不再需要,应该删除。

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

不是dim_idx_shift,就是新的dim_idx位置的预估

int64_t output_idx = idx;
int64_t dim_idx, dim_idx_shift;

#pragma unroll Rank
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里写#pragma unroll就够了吧?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done,thanks

Copy link
Contributor

@Xreki Xreki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Xreki Xreki merged commit d128c28 into PaddlePaddle:develop Jul 7, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants