-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix HashBuild unspilling stuck #8715
Conversation
✅ Deploy Preview for meta-velox canceled.
|
This pull request was exported from Phabricator. Differential Revision: D53589502 |
@Yuhta : Thanks for this fix. We noticed this slowdown internally as well. Will retry. Appreciate it. |
4ac6c10
to
e335ed3
Compare
Summary: When the input of `HashBuild` is from spilling, they all come from the same partition. That means the spill partition bits are all same for the hashes from these rows. In case the hash table is large, there could be overlap between the hash bits we use to calculate bucket index and the bits for spill partitioning. These bits are fixed for all rows and because they are higher bits, we end up restricting ourselves to a smaller region of the hash table. This results in heavy hash collision and the hash build will take super long time and block driver threads. Fix this by adding a check to make sure that there will be no overlap between the spill partitioning bits and the bits used for bucket indexing, and increase the default spill start partition bit to 48. Reviewed By: oerling Differential Revision: D53589502
Summary: When the input of `HashBuild` is from spilling, they all come from the same partition. That means the spill partition bits are all same for the hashes from these rows. In case the hash table is large, there could be overlap between the hash bits we use to calculate bucket index and the bits for spill partitioning. These bits are fixed for all rows and because they are higher bits, we end up restricting ourselves to a smaller region of the hash table. This results in heavy hash collision and the hash build will take super long time and block driver threads. Fix this by adding a check to make sure that there will be no overlap between the spill partitioning bits and the bits used for bucket indexing, and increase the default spill start partition bit to 48. Reviewed By: oerling Differential Revision: D53589502
This pull request was exported from Phabricator. Differential Revision: D53589502 |
e335ed3
to
314f24c
Compare
This pull request was exported from Phabricator. Differential Revision: D53589502 |
This pull request has been merged in 9cf0ef0. |
Conbench analyzed the 1 benchmark run on commit There were no benchmark performance regressions. 🎉 The full Conbench report has more details. |
Summary: Pull Request resolved: facebookincubator#8715 When the input of `HashBuild` is from spilling, they all come from the same partition. That means the spill partition bits are all same for the hashes from these rows. In case the hash table is large, there could be overlap between the hash bits we use to calculate bucket index and the bits for spill partitioning. These bits are fixed for all rows and because they are higher bits, we end up restricting ourselves to a smaller region of the hash table. This results in heavy hash collision and the hash build will take super long time and block driver threads. Fix this by adding a check to make sure that there will be no overlap between the spill partitioning bits and the bits used for bucket indexing, and increase the default spill start partition bit to 48. Reviewed By: oerling Differential Revision: D53589502 fbshipit-source-id: 969fe24f09a04ea3abaa4ff750de4541e438d988
Summary: Pull Request resolved: facebookincubator#8715 When the input of `HashBuild` is from spilling, they all come from the same partition. That means the spill partition bits are all same for the hashes from these rows. In case the hash table is large, there could be overlap between the hash bits we use to calculate bucket index and the bits for spill partitioning. These bits are fixed for all rows and because they are higher bits, we end up restricting ourselves to a smaller region of the hash table. This results in heavy hash collision and the hash build will take super long time and block driver threads. Fix this by adding a check to make sure that there will be no overlap between the spill partitioning bits and the bits used for bucket indexing, and increase the default spill start partition bit to 48. Reviewed By: oerling Differential Revision: D53589502 fbshipit-source-id: 969fe24f09a04ea3abaa4ff750de4541e438d988
Summary: Pull Request resolved: facebookincubator#8715 When the input of `HashBuild` is from spilling, they all come from the same partition. That means the spill partition bits are all same for the hashes from these rows. In case the hash table is large, there could be overlap between the hash bits we use to calculate bucket index and the bits for spill partitioning. These bits are fixed for all rows and because they are higher bits, we end up restricting ourselves to a smaller region of the hash table. This results in heavy hash collision and the hash build will take super long time and block driver threads. Fix this by adding a check to make sure that there will be no overlap between the spill partitioning bits and the bits used for bucket indexing, and increase the default spill start partition bit to 48. Reviewed By: oerling Differential Revision: D53589502 fbshipit-source-id: 969fe24f09a04ea3abaa4ff750de4541e438d988
Summary: #8715 made a modification to the hash tag in the hash value. The hash tag was changed from (32,38] to (38,44]. We should also update the documentation of the hash table to reflect this change. Pull Request resolved: #9699 Reviewed By: xiaoxmeng Differential Revision: D57072452 Pulled By: bikramSingh91 fbshipit-source-id: 4048ee5e536ca90d7145efad59f85cc5c33d1d0f
…9699) Summary: facebookincubator#8715 made a modification to the hash tag in the hash value. The hash tag was changed from (32,38] to (38,44]. We should also update the documentation of the hash table to reflect this change. Pull Request resolved: facebookincubator#9699 Reviewed By: xiaoxmeng Differential Revision: D57072452 Pulled By: bikramSingh91 fbshipit-source-id: 4048ee5e536ca90d7145efad59f85cc5c33d1d0f
…9699) Summary: facebookincubator#8715 made a modification to the hash tag in the hash value. The hash tag was changed from (32,38] to (38,44]. We should also update the documentation of the hash table to reflect this change. Pull Request resolved: facebookincubator#9699 Reviewed By: xiaoxmeng Differential Revision: D57072452 Pulled By: bikramSingh91 fbshipit-source-id: 4048ee5e536ca90d7145efad59f85cc5c33d1d0f
Summary:
When the input of
HashBuild
is from spilling, they all come from thesame partition. That means the spill partition bits are all same for the hashes
from these rows. In case the hash table is large, there could be overlap between the hash bits
we use to calculate bucket index and the bits for spill partitioning. These
bits are fixed for all rows and because they are higher bits, we end up
restricting ourselves to a smaller region of the hash table. This results in
heavy hash collision and the hash build will take super long time and block
driver threads.
Fix this by adding a check to make sure that there will be no overlap between
the spill partitioning bits and the bits used for bucket indexing, and increase
the default spill start partition bit to 48.
Differential Revision: D53589502