Skip to content

Conversation

@swolchok
Copy link
Contributor

@swolchok swolchok commented Mar 7, 2025

Now all the apply functions share a common implementation, which means further changes (e.g., parallel_for, generating specialized dtypes for the case where all inputs have the same type) don't need to be repeated 3 times.

(Interestingly, this seems to increase the effectiveness of the following parallelization change. Not entirely sure why, but I checked the generated code for optimized op_where and it seems to have improved, which is surprising.)

swolchok added 30 commits March 4, 2025 11:35
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
swolchok added 10 commits March 10, 2025 18:48
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
ctx,
(internal::check_tensor_dtype(a, a_dtypes, compute_type) &&
internal::check_tensor_dtype(out, out_dtypes, compute_type)),
(check_input_dtype(inputs, compute_type) && ...) &&
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TIL fold expressions in cpp

[ghstack-poisoned]
[ghstack-poisoned]
Base automatically changed from gh/swolchok/326/head to main March 12, 2025 19:14
@swolchok swolchok merged commit f91ebe2 into main Mar 12, 2025
50 of 53 checks passed
@swolchok swolchok deleted the gh/swolchok/327/head branch March 12, 2025 19:17
kedarnath03 pushed a commit to kedarnath03/executorch that referenced this pull request Jun 25, 2025
Now all the apply functions share a common implementation, which means further changes (e.g., parallel_for, generating specialized dtypes for the case where all inputs have the same type) don't need to be repeated 3 times.

(Interestingly, this seems to increase the effectiveness of the following parallelization change. Not entirely sure why, but I checked the generated code for optimized op_where and it seems to have improved, which is surprising.)

ghstack-source-id: 9344bf6
ghstack-comment-id: 2707405611
Pull Request resolved: pytorch/executorch#9058
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants