-
-
Notifications
You must be signed in to change notification settings - Fork 5.6k
Fix performance regression in hvcat of simple matrices #57422
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
I don't think the test failure is related? It occurred testing |
There's unfortunately extra overhead for everything else not intended to be addressed, looks like mostly because EDIT: Ugh, there seems to be an annoying trade-off in performance. I'll need to explore further. |
I believe I got it. The overhead of the block copy was too much for small arrays, so I added a branch to use the original loop for those. Crossover point seemed to be around 4-8 elements, so I branched at >4. Two other aspects addressed:
EDIT: There was one trade-off I didn't find an optimal solution for, and settled on resolving in favor of all-arrays as the more common choice (no change from master). If the elements to cat are all arrays, then the dimension-calculation in |
As pointed out by @Zentrik here, recent Nanosoldier runs suggested a significant performance regression for simple
hvcat
s as a result of #39729 .I revisted the code and determined that the main cause was that
typed_hvncat
iterated over each element and had to calculate the linear index for each one, resulting in many multiplication and addition operations for each element. I realized thatCartesianIndices
could be used to restore the copy-as-a-whole pattern thattyped_hvcat
used, while retaining generality for arbitrary dimensions.As I recall, I believe a limitation when I wrote the
hvncat
code was that certain features were not available during compiler bootstrapping, requiring fully manual indexing. Since the compiler has been made an stdlib, I believe that made this PR possible.Before merging I would also want to check that I didn't hurt theDonehvncat
performance at all.This should ideally be marked for 1.12 backport.