Skip to content

[SYCL] Optimize UR 2 PI convert #8737

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Apr 3, 2023
Merged

Conversation

LukaszJobczyk
Copy link
Contributor

Signed-off-by: Lukasz Jobczyk lukasz.jobczyk@intel.com

@LukaszJobczyk LukaszJobczyk marked this pull request as ready for review March 22, 2023 14:05
@LukaszJobczyk LukaszJobczyk requested a review from a team as a code owner March 22, 2023 14:05
@LukaszJobczyk LukaszJobczyk temporarily deployed to aws March 22, 2023 14:52 — with GitHub Actions Inactive
@LukaszJobczyk LukaszJobczyk temporarily deployed to aws March 22, 2023 16:28 — with GitHub Actions Inactive
@aelovikov-intel
Copy link
Contributor

Another option might be to implement a small constexpr map-like helper, something like https://www.youtube.com/watch?v=INn3xa4pMfg

@smaslov-intel
Copy link
Contributor

Another option might be to implement a small constexpr map-like helper, something like https://www.youtube.com/watch?v=INn3xa4pMfg

This seems to end up doing linear search, where as "switch" will likely end up in an indirect access, which is faster.

@aelovikov-intel
Copy link
Contributor

This seems to end up doing linear search, where as "switch" will likely end up in an indirect access, which is faster.

Yes, but then we can also optimize the pattern inside the compiler as it would be pretty straightforward.

@smaslov-intel
Copy link
Contributor

This seems to end up doing linear search, where as "switch" will likely end up in an indirect access, which is faster.

Yes, but then we can also optimize the pattern inside the compiler as it would be pretty straightforward.

I don't see how it could potentially do better than a "switch"

@@ -70,7 +81,7 @@ class ConvertHelper : public ReturnHelper {
public:
// Convert the value using a conversion map
template <typename TypeUR, typename TypePI>
pi_result convert(const std::unordered_map<TypeUR, TypePI> &Map) {
pi_result convert(std::function<std::optional<TypePI>(TypeUR)> Func) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you need the optional stuff at all? A conversion should always be defined, and we should die at line 99 if it is not

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess I see, you return std::nullopt when there is no conversion defined, and then die if it's conversion to bool is "false". Why not just die directly in the conversion switches directly?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved die as default in switch statement and remove std::optional

@aelovikov-intel
Copy link
Contributor

I don't see how it could potentially do better than a "switch"

It could be exactly as performant as switch but more concise in the actual "mapping" part: https://godbolt.org/z/7zcrq3Wzb

@smaslov-intel
Copy link
Contributor

I don't see how it could potentially do better than a "switch"

It could be exactly as performant as switch but more concise in the actual "mapping" part: https://godbolt.org/z/7zcrq3Wzb

I don't feel strongly about it and will let PR author to decide. But IMHO paying (presumably) lots of compile-time for (a little) better readability is not worth it. Thanks for pointing this anyways!

Copy link
Contributor

@jandres742 jandres742 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@smaslov-intel @LukaszJobczyk
could we not merge this? we are in the process of merging the UR L0, and this is very big change that would force to do a big rebase and fix conflicts. Let's do it after it.

@LukaszJobczyk LukaszJobczyk temporarily deployed to aws March 24, 2023 10:05 — with GitHub Actions Inactive
@LukaszJobczyk LukaszJobczyk temporarily deployed to aws March 24, 2023 10:15 — with GitHub Actions Inactive
@LukaszJobczyk LukaszJobczyk force-pushed the optimizeur2pi branch 4 times, most recently from 06e682f to 9064adc Compare March 24, 2023 11:01
@LukaszJobczyk LukaszJobczyk temporarily deployed to aws March 24, 2023 11:26 — with GitHub Actions Inactive
@LukaszJobczyk
Copy link
Contributor Author

I don't see how it could potentially do better than a "switch"

It could be exactly as performant as switch but more concise in the actual "mapping" part: https://godbolt.org/z/7zcrq3Wzb

I don't feel strongly about it and will let PR author to decide. But IMHO paying (presumably) lots of compile-time for (a little) better readability is not worth it. Thanks for pointing this anyways!

Thanks for video about constexpr maps!
IMO switch cases look simpler

@LukaszJobczyk
Copy link
Contributor Author

@smaslov-intel @LukaszJobczyk could we not merge this? we are in the process of merging the UR L0, and this is very big change that would force to do a big rebase and fix conflicts. Let's do it after it.

Why? As I see you are adding new enums to that file, what is the difference between adding to map and adding to switch/case?

@LukaszJobczyk LukaszJobczyk requested a review from bader as a code owner March 24, 2023 11:56
@LukaszJobczyk LukaszJobczyk temporarily deployed to aws March 24, 2023 12:26 — with GitHub Actions Inactive
@LukaszJobczyk LukaszJobczyk temporarily deployed to aws March 24, 2023 13:15 — with GitHub Actions Inactive
@LukaszJobczyk LukaszJobczyk temporarily deployed to aws March 29, 2023 10:48 — with GitHub Actions Inactive
@LukaszJobczyk LukaszJobczyk temporarily deployed to aws March 29, 2023 12:25 — with GitHub Actions Inactive
@LukaszJobczyk LukaszJobczyk temporarily deployed to aws March 29, 2023 12:59 — with GitHub Actions Inactive
@jandres742
Copy link
Contributor

@bader

this PR has this test failing:

********************
Failed Tests (2):
  SYCL :: AtomicRef/atomic_memory_order_acq_rel.cpp

but saw it in another PR and for me it fails also with baseline. Do you know of any issues with that test?

@LukaszJobczyk
Copy link
Contributor Author

@bader

this PR has this test failing:

********************
Failed Tests (2):
  SYCL :: AtomicRef/atomic_memory_order_acq_rel.cpp

but saw it in another PR and for me it fails also with baseline. Do you know of any issues with that test?

It also fails in CUDA test suite

@aelovikov-intel
Copy link
Contributor

We have #8847 for it.

for me it fails also with baseline

Does it happen all the time or is it flaky?

@LukaszJobczyk
Copy link
Contributor Author

We have #8847 for it.

for me it fails also with baseline

Does it happen all the time or is it flaky?

Today, it happened in every force push

@jandres742
Copy link
Contributor

We have #8847 for it.

for me it fails also with baseline

Does it happen all the time or is it flaky?

same for me. it is very deterministic.

@jandres742
Copy link
Contributor

@aelovikov-intel @bader

same it is happening with:

********************
Failed Tests (1):
  SYCL :: GroupAlgorithm/reduce_sycl2020.cpp

failing also with CUDA. do we have a known issue for this also?

@jandres742
Copy link
Contributor

@aelovikov-intel @bader

same it is happening with:

********************
Failed Tests (1):
  SYCL :: GroupAlgorithm/reduce_sycl2020.cpp

failing also with CUDA. do we have a known issue for this also?

Seems like it is being fixed in #8860

@LukaszJobczyk LukaszJobczyk temporarily deployed to aws March 30, 2023 06:22 — with GitHub Actions Inactive
@LukaszJobczyk LukaszJobczyk temporarily deployed to aws March 30, 2023 11:46 — with GitHub Actions Inactive
@LukaszJobczyk LukaszJobczyk temporarily deployed to aws March 30, 2023 16:00 — with GitHub Actions Inactive
@LukaszJobczyk LukaszJobczyk temporarily deployed to aws March 30, 2023 17:52 — with GitHub Actions Inactive
@bader
Copy link
Contributor

bader commented Mar 31, 2023

@LukaszJobczyk, could you resolve merge, conflicts, please?

@LukaszJobczyk
Copy link
Contributor Author

@LukaszJobczyk, could you resolve merge, conflicts, please?

Done

@LukaszJobczyk LukaszJobczyk temporarily deployed to aws March 31, 2023 05:23 — with GitHub Actions Inactive
@LukaszJobczyk LukaszJobczyk temporarily deployed to aws March 31, 2023 07:14 — with GitHub Actions Inactive
@bader
Copy link
Contributor

bader commented Apr 1, 2023

@LukaszJobczyk, could you resolve merge, conflicts, please?

Done

It looks like, some other PR went before this PR and we have another conflict. @LukaszJobczyk, could you take a look, please?

Signed-off-by: Lukasz Jobczyk <lukasz.jobczyk@intel.com>
@LukaszJobczyk
Copy link
Contributor Author

@LukaszJobczyk, could you resolve merge, conflicts, please?

Done

It looks like, some other PR went before this PR and we have another conflict. @LukaszJobczyk, could you take a look, please?

Done

@LukaszJobczyk LukaszJobczyk temporarily deployed to aws April 3, 2023 05:46 — with GitHub Actions Inactive
@LukaszJobczyk LukaszJobczyk temporarily deployed to aws April 3, 2023 06:17 — with GitHub Actions Inactive
@bader bader merged commit 22e2ca1 into intel:sycl Apr 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants