Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix the lock order issue caused by value operator init #9738

Closed
wants to merge 1 commit into from

Conversation

xiaoxmeng
Copy link
Contributor

The lock order issue detected by tsan is as follows:
T1: task startup creates driver and constructor operators which holds the task lock
T2: value operator constructor allocates memory from memory pool for parallelized value node mode
which triggers memory arbitration
T3: the memory arbitration grabs arbitration lock for arbitration
T4: the memory reclaim tries to grabs the task lock for reclaim such as to pause the task execution

This forms a deadlock and we have the logic to detect the memory pool allocation from the operator
constructor in driver execution framework. However the memory pool of vectors used by value nodes
is not the query memory pool but owns by the test framework. This PR moves the memory pool allocation
from constructor to initialize() method as the other operator does.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label May 7, 2024
Copy link

netlify bot commented May 7, 2024

Deploy Preview for meta-velox canceled.

Name Link
🔨 Latest commit 29c436d
🔍 Latest deploy log https://app.netlify.com/sites/meta-velox/deploys/663aa87cb8f383000862bd78

@facebook-github-bot
Copy link
Contributor

@xiaoxmeng has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@xiaoxmeng xiaoxmeng marked this pull request as ready for review May 7, 2024 18:26
xiaoxmeng added a commit to xiaoxmeng/velox that referenced this pull request May 7, 2024
…ator#9738)

Summary:
The lock order issue detected by tsan is as follows:
T1: task startup creates driver and constructor operators which holds the task lock
T2: value operator constructor allocates memory from memory pool for parallelized value node mode
      which triggers memory arbitration 
T3: the memory arbitration grabs arbitration lock for arbitration
T4: the memory reclaim tries to grabs the task lock for reclaim such as to pause the task execution

This forms a deadlock and we have the logic to detect the memory pool allocation from the operator
constructor in driver execution framework. However the memory pool of vectors used by value nodes
is not the query memory pool but owns by the test framework. This PR moves the memory pool allocation
from constructor to initialize() method as the other operator does.


Test Plan:
The test passed 100 stress test iterations:
https://www.internalfb.com/intern/testinfra/testrun/10414574171435627
https://www.internalfb.com/intern/testinfra/testrun/13229323938244819

Differential Revision: D57063553

Pulled By: xiaoxmeng
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D57063553

xiaoxmeng added a commit to xiaoxmeng/velox that referenced this pull request May 7, 2024
…ator#9738)

Summary:
The lock order issue detected by tsan is as follows:
T1: task startup creates driver and constructor operators which holds the task lock
T2: value operator constructor allocates memory from memory pool for parallelized value node mode
      which triggers memory arbitration 
T3: the memory arbitration grabs arbitration lock for arbitration
T4: the memory reclaim tries to grabs the task lock for reclaim such as to pause the task execution

This forms a deadlock and we have the logic to detect the memory pool allocation from the operator
constructor in driver execution framework. However the memory pool of vectors used by value nodes
is not the query memory pool but owns by the test framework. This PR moves the memory pool allocation
from constructor to initialize() method as the other operator does.


Test Plan:
The test passed 100 stress test iterations:
https://www.internalfb.com/intern/testinfra/testrun/10414574171435627
https://www.internalfb.com/intern/testinfra/testrun/13229323938244819

Differential Revision: D57063553

Pulled By: xiaoxmeng
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D57063553

…ator#9738)

Summary:
The lock order issue detected by tsan is as follows:
T1: task startup creates driver and constructor operators which holds the task lock
T2: value operator constructor allocates memory from memory pool for parallelized value node mode
      which triggers memory arbitration 
T3: the memory arbitration grabs arbitration lock for arbitration
T4: the memory reclaim tries to grabs the task lock for reclaim such as to pause the task execution

This forms a deadlock and we have the logic to detect the memory pool allocation from the operator
constructor in driver execution framework. However the memory pool of vectors used by value nodes
is not the query memory pool but owns by the test framework. This PR moves the memory pool allocation
from constructor to initialize() method as the other operator does.


Test Plan:
The test passed 100 stress test iterations:
https://www.internalfb.com/intern/testinfra/testrun/10414574171435627
https://www.internalfb.com/intern/testinfra/testrun/13229323938244819

Reviewed By: bikramSingh91

Differential Revision: D57063553

Pulled By: xiaoxmeng
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D57063553

@facebook-github-bot
Copy link
Contributor

@xiaoxmeng merged this pull request in ccbb72e.

@xiaoxmeng xiaoxmeng deleted the race branch May 7, 2024 22:46
Copy link

Conbench analyzed the 1 benchmark run on commit ccbb72e5.

There was 1 benchmark result indicating a performance regression:

The full Conbench report has more details.

Joe-Abraham pushed a commit to Joe-Abraham/velox that referenced this pull request May 8, 2024
…ator#9738)

Summary:
The lock order issue detected by tsan is as follows:
T1: task startup creates driver and constructor operators which holds the task lock
T2: value operator constructor allocates memory from memory pool for parallelized value node mode
      which triggers memory arbitration
T3: the memory arbitration grabs arbitration lock for arbitration
T4: the memory reclaim tries to grabs the task lock for reclaim such as to pause the task execution

This forms a deadlock and we have the logic to detect the memory pool allocation from the operator
constructor in driver execution framework. However the memory pool of vectors used by value nodes
is not the query memory pool but owns by the test framework. This PR moves the memory pool allocation
from constructor to initialize() method as the other operator does.

Pull Request resolved: facebookincubator#9738

Test Plan:
The test passed 100 stress test iterations:
https://www.internalfb.com/intern/testinfra/testrun/10414574171435627
https://www.internalfb.com/intern/testinfra/testrun/13229323938244819

Reviewed By: bikramSingh91

Differential Revision: D57063553

Pulled By: xiaoxmeng

fbshipit-source-id: 3ad661c35bfa46693729e60420f22920f83a7ff0
Joe-Abraham pushed a commit to Joe-Abraham/velox that referenced this pull request Jun 7, 2024
…ator#9738)

Summary:
The lock order issue detected by tsan is as follows:
T1: task startup creates driver and constructor operators which holds the task lock
T2: value operator constructor allocates memory from memory pool for parallelized value node mode
      which triggers memory arbitration
T3: the memory arbitration grabs arbitration lock for arbitration
T4: the memory reclaim tries to grabs the task lock for reclaim such as to pause the task execution

This forms a deadlock and we have the logic to detect the memory pool allocation from the operator
constructor in driver execution framework. However the memory pool of vectors used by value nodes
is not the query memory pool but owns by the test framework. This PR moves the memory pool allocation
from constructor to initialize() method as the other operator does.

Pull Request resolved: facebookincubator#9738

Test Plan:
The test passed 100 stress test iterations:
https://www.internalfb.com/intern/testinfra/testrun/10414574171435627
https://www.internalfb.com/intern/testinfra/testrun/13229323938244819

Reviewed By: bikramSingh91

Differential Revision: D57063553

Pulled By: xiaoxmeng

fbshipit-source-id: 3ad661c35bfa46693729e60420f22920f83a7ff0
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. fb-exported Merged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants