-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support yield in bucket sort table write getout to prevent stuck driver detection #11229
Conversation
This pull request was exported from Phabricator. Differential Revision: D64159781 |
✅ Deploy Preview for meta-velox canceled.
|
…er detection (facebookincubator#11229) Summary: Support yield in the middle of sort writer get output processing to prevent stuck driver detection as well as friendly to other concurrent running queries or threads. We found in production that the long running get output from sort writer can trigger alerts as it does sort, potential read spilled data from remote storage and, encode and flush to remote storage through file writer. This can take hour in case of a small bucket table which only has 64 buckets such as only 64 threads in the whole cluster for running the query. This PR adds finish API to data sink and file writer for table writer to do incremental sort and flush processing. The data sink finish API call each file writer's finish API and both check the configured finish time slice limit which are configured through a hive config. Both API returns false if finish needs continue processing or true when finishes. Correspondingly, when table writer get output it returns null if finish data sink has more work to do and set the ready block future and yield reason for driver framework to check and yield. This PR also changes data sink and file writer interface with a new finish state. A new hive config added for finish time slice limit. The driver framework adds to report the yield from a operator which currently only reports the yield metric when the yield is triggered by the driver framework itself. A new histogram metric is added to track the sort writer finish time distribution to monitoring Differential Revision: D64159781
a97d7de
to
011477f
Compare
This pull request was exported from Phabricator. Differential Revision: D64159781 |
…er detection (facebookincubator#11229) Summary: Support yield in the middle of sort writer get output processing to prevent stuck driver detection as well as friendly to other concurrent running queries or threads. We found in production that the long running get output from sort writer can trigger alerts as it does sort, potential read spilled data from remote storage and, encode and flush to remote storage through file writer. This can take hour in case of a small bucket table which only has 64 buckets such as only 64 threads in the whole cluster for running the query. This PR adds finish API to data sink and file writer for table writer to do incremental sort and flush processing. The data sink finish API call each file writer's finish API and both check the configured finish time slice limit which are configured through a hive config. Both API returns false if finish needs continue processing or true when finishes. Correspondingly, when table writer get output it returns null if finish data sink has more work to do and set the ready block future and yield reason for driver framework to check and yield. This PR also changes data sink and file writer interface with a new finish state. A new hive config added for finish time slice limit. The driver framework adds to report the yield from a operator which currently only reports the yield metric when the yield is triggered by the driver framework itself. A new histogram metric is added to track the sort writer finish time distribution to monitoring Differential Revision: D64159781
011477f
to
039b198
Compare
This pull request was exported from Phabricator. Differential Revision: D64159781 |
…er detection (facebookincubator#11229) Summary: Support yield in the middle of sort writer get output processing to prevent stuck driver detection as well as friendly to other concurrent running queries or threads. We found in production that the long running get output from sort writer can trigger alerts as it does sort, potential read spilled data from remote storage and, encode and flush to remote storage through file writer. This can take hour in case of a small bucket table which only has 64 buckets such as only 64 threads in the whole cluster for running the query. This PR adds finish API to data sink and file writer for table writer to do incremental sort and flush processing. The data sink finish API call each file writer's finish API and both check the configured finish time slice limit which are configured through a hive config. Both API returns false if finish needs continue processing or true when finishes. Correspondingly, when table writer get output it returns null if finish data sink has more work to do and set the ready block future and yield reason for driver framework to check and yield. This PR also changes data sink and file writer interface with a new finish state. A new hive config added for finish time slice limit. The driver framework adds to report the yield from a operator which currently only reports the yield metric when the yield is triggered by the driver framework itself. A new histogram metric is added to track the sort writer finish time distribution to monitoring Differential Revision: D64159781
039b198
to
cce4b5d
Compare
This pull request was exported from Phabricator. Differential Revision: D64159781 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for fixing this!
It makes the engine more stable.
…er detection (facebookincubator#11229) Summary: Support yield in the middle of sort writer get output processing to prevent stuck driver detection as well as friendly to other concurrent running queries or threads. We found in production that the long running get output from sort writer can trigger alerts as it does sort, potential read spilled data from remote storage and, encode and flush to remote storage through file writer. This can take hour in case of a small bucket table which only has 64 buckets such as only 64 threads in the whole cluster for running the query. This PR adds finish API to data sink and file writer for table writer to do incremental sort and flush processing. The data sink finish API call each file writer's finish API and both check the configured finish time slice limit which are configured through a hive config. Both API returns false if finish needs continue processing or true when finishes. Correspondingly, when table writer get output it returns null if finish data sink has more work to do and set the ready block future and yield reason for driver framework to check and yield. This PR also changes data sink and file writer interface with a new finish state. A new hive config added for finish time slice limit. The driver framework adds to report the yield from a operator which currently only reports the yield metric when the yield is triggered by the driver framework itself. A new histogram metric is added to track the sort writer finish time distribution to monitoring Reviewed By: spershin Differential Revision: D64159781
cce4b5d
to
c9658db
Compare
This pull request was exported from Phabricator. Differential Revision: D64159781 |
…er detection (facebookincubator#11229) Summary: Support yield in the middle of sort writer get output processing to prevent stuck driver detection as well as friendly to other concurrent running queries or threads. We found in production that the long running get output from sort writer can trigger alerts as it does sort, potential read spilled data from remote storage and, encode and flush to remote storage through file writer. This can take hour in case of a small bucket table which only has 64 buckets such as only 64 threads in the whole cluster for running the query. This PR adds finish API to data sink and file writer for table writer to do incremental sort and flush processing. The data sink finish API call each file writer's finish API and both check the configured finish time slice limit which are configured through a hive config. Both API returns false if finish needs continue processing or true when finishes. Correspondingly, when table writer get output it returns null if finish data sink has more work to do and set the ready block future and yield reason for driver framework to check and yield. This PR also changes data sink and file writer interface with a new finish state. A new hive config added for finish time slice limit. The driver framework adds to report the yield from a operator which currently only reports the yield metric when the yield is triggered by the driver framework itself. A new histogram metric is added to track the sort writer finish time distribution to monitoring Reviewed By: spershin Differential Revision: D64159781
c9658db
to
8ec4bfb
Compare
This pull request was exported from Phabricator. Differential Revision: D64159781 |
…er detection (facebookincubator#11229) Summary: Support yield in the middle of sort writer get output processing to prevent stuck driver detection as well as friendly to other concurrent running queries or threads. We found in production that the long running get output from sort writer can trigger alerts as it does sort, potential read spilled data from remote storage and, encode and flush to remote storage through file writer. This can take hour in case of a small bucket table which only has 64 buckets such as only 64 threads in the whole cluster for running the query. This PR adds finish API to data sink and file writer for table writer to do incremental sort and flush processing. The data sink finish API call each file writer's finish API and both check the configured finish time slice limit which are configured through a hive config. Both API returns false if finish needs continue processing or true when finishes. Correspondingly, when table writer get output it returns null if finish data sink has more work to do and set the ready block future and yield reason for driver framework to check and yield. This PR also changes data sink and file writer interface with a new finish state. A new hive config added for finish time slice limit. The driver framework adds to report the yield from a operator which currently only reports the yield metric when the yield is triggered by the driver framework itself. A new histogram metric is added to track the sort writer finish time distribution to monitoring Reviewed By: Yuhta, spershin Differential Revision: D64159781
8ec4bfb
to
a88988b
Compare
This pull request was exported from Phabricator. Differential Revision: D64159781 |
…er detection (facebookincubator#11229) Summary: Support yield in the middle of sort writer get output processing to prevent stuck driver detection as well as friendly to other concurrent running queries or threads. We found in production that the long running get output from sort writer can trigger alerts as it does sort, potential read spilled data from remote storage and, encode and flush to remote storage through file writer. This can take hour in case of a small bucket table which only has 64 buckets such as only 64 threads in the whole cluster for running the query. This PR adds finish API to data sink and file writer for table writer to do incremental sort and flush processing. The data sink finish API call each file writer's finish API and both check the configured finish time slice limit which are configured through a hive config. Both API returns false if finish needs continue processing or true when finishes. Correspondingly, when table writer get output it returns null if finish data sink has more work to do and set the ready block future and yield reason for driver framework to check and yield. This PR also changes data sink and file writer interface with a new finish state. A new hive config added for finish time slice limit. The driver framework adds to report the yield from a operator which currently only reports the yield metric when the yield is triggered by the driver framework itself. A new histogram metric is added to track the sort writer finish time distribution to monitoring Reviewed By: Yuhta, spershin Differential Revision: D64159781
a88988b
to
f408efa
Compare
This pull request was exported from Phabricator. Differential Revision: D64159781 |
…er detection (facebookincubator#11229) Summary: Support yield in the middle of sort writer get output processing to prevent stuck driver detection as well as friendly to other concurrent running queries or threads. We found in production that the long running get output from sort writer can trigger alerts as it does sort, potential read spilled data from remote storage and, encode and flush to remote storage through file writer. This can take hour in case of a small bucket table which only has 64 buckets such as only 64 threads in the whole cluster for running the query. This PR adds finish API to data sink and file writer for table writer to do incremental sort and flush processing. The data sink finish API call each file writer's finish API and both check the configured finish time slice limit which are configured through a hive config. Both API returns false if finish needs continue processing or true when finishes. Correspondingly, when table writer get output it returns null if finish data sink has more work to do and set the ready block future and yield reason for driver framework to check and yield. This PR also changes data sink and file writer interface with a new finish state. A new hive config added for finish time slice limit. The driver framework adds to report the yield from a operator which currently only reports the yield metric when the yield is triggered by the driver framework itself. A new histogram metric is added to track the sort writer finish time distribution to monitoring Reviewed By: Yuhta, spershin, oerling Differential Revision: D64159781
f408efa
to
6da9bbe
Compare
This pull request was exported from Phabricator. Differential Revision: D64159781 |
This pull request was exported from Phabricator. Differential Revision: D64159781 |
…er detection (facebookincubator#11229) Summary: Support yield in the middle of sort writer get output processing to prevent stuck driver detection as well as friendly to other concurrent running queries or threads. We found in production that the long running get output from sort writer can trigger alerts as it does sort, potential read spilled data from remote storage and, encode and flush to remote storage through file writer. This can take hour in case of a small bucket table which only has 64 buckets such as only 64 threads in the whole cluster for running the query. This PR adds finish API to data sink and file writer for table writer to do incremental sort and flush processing. The data sink finish API call each file writer's finish API and both check the configured finish time slice limit which are configured through a hive config. Both API returns false if finish needs continue processing or true when finishes. Correspondingly, when table writer get output it returns null if finish data sink has more work to do and set the ready block future and yield reason for driver framework to check and yield. This PR also changes data sink and file writer interface with a new finish state. A new hive config added for finish time slice limit. The driver framework adds to report the yield from a operator which currently only reports the yield metric when the yield is triggered by the driver framework itself. A new histogram metric is added to track the sort writer finish time distribution to monitoring Reviewed By: Yuhta, spershin, oerling Differential Revision: D64159781
3758539
to
ef40fad
Compare
This pull request was exported from Phabricator. Differential Revision: D64159781 |
…er detection (facebookincubator#11229) Summary: Support yield in the middle of sort writer get output processing to prevent stuck driver detection as well as friendly to other concurrent running queries or threads. We found in production that the long running get output from sort writer can trigger alerts as it does sort, potential read spilled data from remote storage and, encode and flush to remote storage through file writer. This can take hour in case of a small bucket table which only has 64 buckets such as only 64 threads in the whole cluster for running the query. This PR adds finish API to data sink and file writer for table writer to do incremental sort and flush processing. The data sink finish API call each file writer's finish API and both check the configured finish time slice limit which are configured through a hive config. Both API returns false if finish needs continue processing or true when finishes. Correspondingly, when table writer get output it returns null if finish data sink has more work to do and set the ready block future and yield reason for driver framework to check and yield. This PR also changes data sink and file writer interface with a new finish state. A new hive config added for finish time slice limit. The driver framework adds to report the yield from a operator which currently only reports the yield metric when the yield is triggered by the driver framework itself. A new histogram metric is added to track the sort writer finish time distribution to monitoring Reviewed By: Yuhta, spershin, oerling Differential Revision: D64159781
ef40fad
to
26e103c
Compare
This pull request was exported from Phabricator. Differential Revision: D64159781 |
1 similar comment
This pull request was exported from Phabricator. Differential Revision: D64159781 |
…er detection (facebookincubator#11229) Summary: Pull Request resolved: facebookincubator#11229 Support yield in the middle of sort writer get output processing to prevent stuck driver detection as well as friendly to other concurrent running queries or threads. We found in production that the long running get output from sort writer can trigger alerts as it does sort, potential read spilled data from remote storage and, encode and flush to remote storage through file writer. This can take hour in case of a small bucket table which only has 64 buckets such as only 64 threads in the whole cluster for running the query. This PR adds finish API to data sink and file writer for table writer to do incremental sort and flush processing. The data sink finish API call each file writer's finish API and both check the configured finish time slice limit which are configured through a hive config. Both API returns false if finish needs continue processing or true when finishes. Correspondingly, when table writer get output it returns null if finish data sink has more work to do and set the ready block future and yield reason for driver framework to check and yield. This PR also changes data sink and file writer interface with a new finish state. A new hive config added for finish time slice limit. The driver framework adds to report the yield from a operator which currently only reports the yield metric when the yield is triggered by the driver framework itself. A new histogram metric is added to track the sort writer finish time distribution to monitoring Reviewed By: Yuhta, spershin, oerling Differential Revision: D64159781
26e103c
to
9ea576f
Compare
…er detection (facebookincubator#11229) Summary: Support yield in the middle of sort writer get output processing to prevent stuck driver detection as well as friendly to other concurrent running queries or threads. We found in production that the long running get output from sort writer can trigger alerts as it does sort, potential read spilled data from remote storage and, encode and flush to remote storage through file writer. This can take hour in case of a small bucket table which only has 64 buckets such as only 64 threads in the whole cluster for running the query. This PR adds finish API to data sink and file writer for table writer to do incremental sort and flush processing. The data sink finish API call each file writer's finish API and both check the configured finish time slice limit which are configured through a hive config. Both API returns false if finish needs continue processing or true when finishes. Correspondingly, when table writer get output it returns null if finish data sink has more work to do and set the ready block future and yield reason for driver framework to check and yield. This PR also changes data sink and file writer interface with a new finish state. A new hive config added for finish time slice limit. The driver framework adds to report the yield from a operator which currently only reports the yield metric when the yield is triggered by the driver framework itself. A new histogram metric is added to track the sort writer finish time distribution to monitoring Reviewed By: Yuhta, spershin, oerling Differential Revision: D64159781
9ea576f
to
f3cfc93
Compare
This pull request was exported from Phabricator. Differential Revision: D64159781 |
…er detection (facebookincubator#11229) Summary: Support yield in the middle of sort writer get output processing to prevent stuck driver detection as well as friendly to other concurrent running queries or threads. We found in production that the long running get output from sort writer can trigger alerts as it does sort, potential read spilled data from remote storage and, encode and flush to remote storage through file writer. This can take hour in case of a small bucket table which only has 64 buckets such as only 64 threads in the whole cluster for running the query. This PR adds finish API to data sink and file writer for table writer to do incremental sort and flush processing. The data sink finish API call each file writer's finish API and both check the configured finish time slice limit which are configured through a hive config. Both API returns false if finish needs continue processing or true when finishes. Correspondingly, when table writer get output it returns null if finish data sink has more work to do and set the ready block future and yield reason for driver framework to check and yield. This PR also changes data sink and file writer interface with a new finish state. A new hive config added for finish time slice limit. The driver framework adds to report the yield from a operator which currently only reports the yield metric when the yield is triggered by the driver framework itself. A new histogram metric is added to track the sort writer finish time distribution to monitoring bypass-github-export-checks Reviewed By: Yuhta, spershin, oerling Differential Revision: D64159781
f3cfc93
to
4cc56f8
Compare
This pull request was exported from Phabricator. Differential Revision: D64159781 |
…er detection (facebookincubator#11229) Summary: Support yield in the middle of sort writer get output processing to prevent stuck driver detection as well as friendly to other concurrent running queries or threads. We found in production that the long running get output from sort writer can trigger alerts as it does sort, potential read spilled data from remote storage and, encode and flush to remote storage through file writer. This can take hour in case of a small bucket table which only has 64 buckets such as only 64 threads in the whole cluster for running the query. This PR adds finish API to data sink and file writer for table writer to do incremental sort and flush processing. The data sink finish API call each file writer's finish API and both check the configured finish time slice limit which are configured through a hive config. Both API returns false if finish needs continue processing or true when finishes. Correspondingly, when table writer get output it returns null if finish data sink has more work to do and set the ready block future and yield reason for driver framework to check and yield. This PR also changes data sink and file writer interface with a new finish state. A new hive config added for finish time slice limit. The driver framework adds to report the yield from a operator which currently only reports the yield metric when the yield is triggered by the driver framework itself. A new histogram metric is added to track the sort writer finish time distribution to monitoring bypass-github-export-checks Reviewed By: Yuhta, spershin, oerling Differential Revision: D64159781
4cc56f8
to
9a349e6
Compare
This pull request was exported from Phabricator. Differential Revision: D64159781 |
…er detection (facebookincubator#11229) Summary: Support yield in the middle of sort writer get output processing to prevent stuck driver detection as well as friendly to other concurrent running queries or threads. We found in production that the long running get output from sort writer can trigger alerts as it does sort, potential read spilled data from remote storage and, encode and flush to remote storage through file writer. This can take hour in case of a small bucket table which only has 64 buckets such as only 64 threads in the whole cluster for running the query. This PR adds finish API to data sink and file writer for table writer to do incremental sort and flush processing. The data sink finish API call each file writer's finish API and both check the configured finish time slice limit which are configured through a hive config. Both API returns false if finish needs continue processing or true when finishes. Correspondingly, when table writer get output it returns null if finish data sink has more work to do and set the ready block future and yield reason for driver framework to check and yield. This PR also changes data sink and file writer interface with a new finish state. A new hive config added for finish time slice limit. The driver framework adds to report the yield from a operator which currently only reports the yield metric when the yield is triggered by the driver framework itself. A new histogram metric is added to track the sort writer finish time distribution to monitoring bypass-github-export-checks Reviewed By: Yuhta, spershin, oerling Differential Revision: D64159781
9a349e6
to
de567c5
Compare
This pull request was exported from Phabricator. Differential Revision: D64159781 |
…er detection (facebookincubator#11229) Summary: Support yield in the middle of sort writer get output processing to prevent stuck driver detection as well as friendly to other concurrent running queries or threads. We found in production that the long running get output from sort writer can trigger alerts as it does sort, potential read spilled data from remote storage and, encode and flush to remote storage through file writer. This can take hour in case of a small bucket table which only has 64 buckets such as only 64 threads in the whole cluster for running the query. This PR adds finish API to data sink and file writer for table writer to do incremental sort and flush processing. The data sink finish API call each file writer's finish API and both check the configured finish time slice limit which are configured through a hive config. Both API returns false if finish needs continue processing or true when finishes. Correspondingly, when table writer get output it returns null if finish data sink has more work to do and set the ready block future and yield reason for driver framework to check and yield. This PR also changes data sink and file writer interface with a new finish state. A new hive config added for finish time slice limit. The driver framework adds to report the yield from a operator which currently only reports the yield metric when the yield is triggered by the driver framework itself. A new histogram metric is added to track the sort writer finish time distribution to monitoring bypass-github-export-checks Reviewed By: Yuhta, spershin, oerling Differential Revision: D64159781
de567c5
to
bb8e621
Compare
This pull request was exported from Phabricator. Differential Revision: D64159781 |
This pull request has been merged in b00751e. |
Conbench analyzed the 1 benchmark run on commit There were no benchmark performance regressions. 🎉 The full Conbench report has more details. |
Summary:
Support yield in the middle of sort writer get output processing to prevent stuck driver detection as well
as friendly to other concurrent running queries or threads. We found in production that the long running get
output from sort writer can trigger alerts as it does sort, potential read spilled data from remote storage
and, encode and flush to remote storage through file writer. This can take hour in case of a small bucket
table which only has 64 buckets such as only 64 threads in the whole
cluster for running the query.
This PR adds finish API to data sink and file writer for table writer to do incremental sort and flush processing.
The data sink finish API call each file writer's finish API and both check the configured finish time slice limit
which are configured through a hive config. Both API returns false if finish needs continue processing or true
when finishes. Correspondingly, when table writer get output it returns null if finish data sink has more work
to do and set the ready block future and yield reason for driver framework to check and yield.
This PR also changes data sink and file writer interface with a new finish state. A new hive config
added for finish time slice limit. The driver framework adds to report the yield from a operator which
currently only reports the yield metric when the yield is triggered by the driver framework itself. A new
histogram metric is added to track the sort writer finish time distribution to monitoring
Differential Revision: D64159781