Closed
Description
In ForRange struct, thread size seems to be assigned arbitrary value, the value is not multiple of the warp size.
As I read and heard that the thread size assigned to a block should be always multiple of the warp size(32), otherwise not only the remaining part of the warp goes unused and the performance is dropped too since bad memory coalescing. But I didn't find a comparative experiment on this.
Paddle/paddle/platform/for_range.h
Lines 65 to 75 in 7bf47ea
Metadata
Metadata
Assignees
Labels
No labels