Skip to content

[SYCL] Fix race that occurs when submitting to single queue in parallel #1872

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 14 commits into from
Jul 2, 2020

Conversation

alexanderfle
Copy link
Contributor

@alexanderfle alexanderfle commented Jun 11, 2020

This PR aims to fix the race when setting kernel parameters in parallel. That occurs when submitting to single queue by multiple threads.

  • In cases when we use cacheable kernel we must use mutex, as we share the same instance by threads.
  • In cases when we don't use cacheable kernel each thread has its own instance of the kernel and so race-condition doesn't occur.

Regression: queue submit. single queue

Signed-off-by: Alexander Flegontov alexander.flegontov@intel.com

Signed-off-by: Alexander Flegontov <alexander.flegontov@intel.com>
@alexanderfle alexanderfle requested a review from a team as a code owner June 11, 2020 10:03
@alexanderfle alexanderfle requested a review from v-klochkov June 11, 2020 10:03
Copy link
Contributor

@s-kanaev s-kanaev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we need a test for this case also.

@romanovvlad
Copy link
Contributor

Also, could you please develop a test for this?

Signed-off-by: Alexander Flegontov <alexander.flegontov@intel.com>
Signed-off-by: Alexander Flegontov <alexander.flegontov@intel.com>
Signed-off-by: Alexander Flegontov <alexander.flegontov@intel.com>
…ther places where enqueue takes place

Signed-off-by: Alexander Flegontov <alexander.flegontov@intel.com>
Signed-off-by: Alexander Flegontov <alexander.flegontov@intel.com>
Signed-off-by: Alexander Flegontov <alexander.flegontov@intel.com>
@alexanderfle
Copy link
Contributor Author

@s-kanaev , @romanovvlad I addressed your comments and applied as much as possible to pass all the Lit tests without fails and hangs. Could you please take a look and approve the current version of the patch?

Copy link
Contributor

@s-kanaev s-kanaev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also we need a test for the case.
Just a reminder.

@alexanderfle
Copy link
Contributor Author

We already have a test for this, at the moment, which regresses on Linux and Windows and this patch fixes this problem.

…eters in parallel

Signed-off-by: Alexander Flegontov <alexander.flegontov@intel.com>
@alexanderfle alexanderfle requested a review from kbobrovs as a code owner June 22, 2020 10:18
@alexanderfle alexanderfle requested a review from s-kanaev June 22, 2020 16:48
Signed-off-by: Alexander Flegontov <alexander.flegontov@intel.com>
Signed-off-by: Alexander Flegontov <alexander.flegontov@intel.com>
@alexanderfle
Copy link
Contributor Author

@s-kanaev , @kbobrovs , in the scope of the PR, I have done all. Could you please review it?

s-kanaev
s-kanaev previously approved these changes Jun 24, 2020
Copy link
Contributor

@s-kanaev s-kanaev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@kbobrovs kbobrovs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can a test be added?

@alexanderfle
Copy link
Contributor Author

@kbobrovs , We already have one regression test fails in our tests suit without this fix, and also I added an additional regression test in separate PR in our test suit, where I covered one case mentioned above as nullptr != ExecKernel->MSyclKernel.
So, I believe that here I have done all.

@alexanderfle alexanderfle requested a review from kbobrovs June 25, 2020 07:34
@kbobrovs
Copy link
Contributor

@kbobrovs , We already have one regression test fails in our tests suit without this fix, and also I added an additional regression test in separate PR in our test suit, where I covered one case mentioned above as nullptr != ExecKernel->MSyclKernel.
So, I believe that here I have done all.

I could not find which test was failing in this PR. Would be good to mention then in the description.

kbobrovs
kbobrovs previously approved these changes Jun 25, 2020
Copy link
Contributor

@kbobrovs kbobrovs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

program_manager-related part LGTM.

Signed-off-by: Alexander Flegontov <alexander.flegontov@intel.com>
@alexanderfle alexanderfle dismissed stale reviews from kbobrovs and s-kanaev via 12502ae June 26, 2020 09:17
s-kanaev
s-kanaev previously approved these changes Jun 29, 2020
@alexanderfle
Copy link
Contributor Author

@kbobrovs , Could you please take a look again and approve?

@alexanderfle
Copy link
Contributor Author

@kbobrovs , ping

kbobrovs
kbobrovs previously approved these changes Jun 30, 2020
@alexanderfle
Copy link
Contributor Author

@bader , Could you please merge?

@alexanderfle alexanderfle dismissed stale reviews from kbobrovs and s-kanaev via 615ba25 July 2, 2020 06:58
@alexanderfle alexanderfle requested a review from kbobrovs July 2, 2020 07:40
@bader
Copy link
Contributor

bader commented Jul 2, 2020

@alexanderfle, please, fix CI failures.

@alexanderfle
Copy link
Contributor Author

@kbobrovs , ping

Copy link
Contributor

@kbobrovs kbobrovs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@alexanderfle
Copy link
Contributor Author

@bader , ping

@bader bader merged commit 95d3ec6 into intel:sycl Jul 2, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants