Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fix] Shared Execution Context fixed when having Multi-Threaded Java applicatons #557

Merged
merged 7 commits into from
Sep 18, 2024

Conversation

jjfumero
Copy link
Member

Description

This PR fixes multi-threaded execution plans. The issue (see below) was related to the unsafe shared of the code cache. The code cache was applied per device context, which is shared across all execution plans that run on the same device. Instead, what we need is to create a code-cache per execution plan ID.

Problem description

The multi-threaded tests stop working due to unsafe sharing of the code-cache within the TornadoVM device context.

Backend/s tested

Mark the backends affected by this PR.

  • OpenCL
  • PTX
  • SPIRV

OS tested

Mark the OS where this PR is tested.

  • Linux
  • OSx
  • Windows

Did you check on FPGAs?

If it is applicable, check your changes on FPGAs.

  • Yes
  • No

How to test the new patch?

For each backend:

tornado-test --jvm="-Dtornado.device.memory=2GB" -V --fast uk.ac.manchester.tornado.unittests.multithreaded.TestMultiThreadedExecutionPlans

Before we merge, apart from all unit-test, we should test Ray-Tracer and Kfusion.

@jjfumero jjfumero added bug Something isn't working runtime labels Sep 17, 2024
@jjfumero jjfumero self-assigned this Sep 17, 2024
@stratika
Copy link
Collaborator

An overall comment/question. Would it make sense to add the executionPlanID as part of the SchedulableTask class? Based on the name, this class should have some info fed from the TornadoExecutionPlan?

For the record, I tested on my Linux OS machine with OpenCL, PTX, SPIRV (both levelzero and opencl). I also tested the TornadoVM-Ray-Tracer with OpenCL and SPIRV (both levelzero and opencl), and the K-FUSION with OpenCL and PTX. Everything worked.

@jjfumero
Copy link
Member Author

I was thinking about this. Maybe yes, but then it will be some inconsistencies with the rest of the runtime (e.g., for copies, in which we pass the executionPlanId explicitly).

Copy link
Collaborator

@stratika stratika left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks @jjfumero

Copy link
Collaborator

@mairooni mairooni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apart from some minor comments it looks good to me. All tests (+ RayTracer and KFusion) passed on my local Linux machine.

@jjfumero
Copy link
Member Author

Thanks. I applied the fixes. I am merging this PR.

@jjfumero jjfumero merged commit 10bf00d into beehive-lab:develop Sep 18, 2024
2 checks passed
@jjfumero jjfumero deleted the fix/mt/codecache branch September 18, 2024 13:04
jjfumero added a commit to jjfumero/TornadoVM that referenced this pull request Sep 30, 2024
Improvements
============
- beehive-lab#565: New API call in the Execution Plan to log/trace the executed configuration plans.
- beehive-lab#563: Expand the TornadoVM profiler with Level Zero Sysman Energy Metrics.
- beehive-lab#559: Refactoring Power Metric handlers for PTX and OpenCL.
- beehive-lab#548: Benchmarking improvements.
- beehive-lab#549: Prebuilt API tests added using multiple backend-setup.
- Add internal tests for monitoring memory management [link](beehive-lab@0644225).

Compatibility
=============
- beehive-lab#561: Build for OSx 14.6 and OSx 15 fixed.

Bug Fixes
==============
- beehive-lab#564: Jenkins configuration fixed to run KFusion per backend.
- beehive-lab#562: Warmup action from the Execution Plan fixed to run with correct internal IDs.
- beehive-lab#557: Shared Execution Plans Context fixed.
- beehive-lab#553: OpenCL compiler flags for Intel Integrated GPUs fixed.
- beehive-lab#552: Fixed runtime to select any device among multiple SPIR-V devices.
- Fixed zero extend arithmetic operations: [link](beehive-lab@ea7b602).
@jjfumero jjfumero mentioned this pull request Sep 30, 2024
8 tasks
jjfumero added a commit to jjfumero/TornadoVM that referenced this pull request Sep 30, 2024
Improvements
============
- beehive-lab#565: New API call in the Execution Plan to log/trace the executed configuration plans.
- beehive-lab#563: Expand the TornadoVM profiler with Level Zero Sysman Energy Metrics.
- beehive-lab#559: Refactoring Power Metric handlers for PTX and OpenCL.
- beehive-lab#548: Benchmarking improvements.
- beehive-lab#549: Prebuilt API tests added using multiple backend-setup.
- Add internal tests for monitoring memory management [link](beehive-lab@0644225).

Compatibility
=============
- beehive-lab#561: Build for OSx 14.6 and OSx 15 fixed.

Bug Fixes
==============
- beehive-lab#564: Jenkins configuration fixed to run KFusion per backend.
- beehive-lab#562: Warmup action from the Execution Plan fixed to run with correct internal IDs.
- beehive-lab#557: Shared Execution Plans Context fixed.
- beehive-lab#553: OpenCL compiler flags for Intel Integrated GPUs fixed.
- beehive-lab#552: Fixed runtime to select any device among multiple SPIR-V devices.
- Fixed zero extend arithmetic operations: [link](beehive-lab@ea7b602).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working runtime
Projects
Development

Successfully merging this pull request may close these issues.

3 participants