Dynamic Batch Interval Adjustment #16993
Closed
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The current Spark Streaming version cannot support the change of batch interval at runtime, given that the speed of input data streams may not highly dynamic from current Internet applications. If we have to do so, one must stop the program first, modify the corresponding code, and then restart the program. However, this will interrupt the execution of entire program, and may cause the data loss. Towards this end, our contribution is to implement a Dynamic Batch Interval Adjustment functionality that can help change the batch interval size at runtime.
This functionality contains two algorithms. One is dynamic adjustment, and the other is static adjustment. The former can predict the size of input data stream and as a result the processing time, by using the most recent processing time and used batch interval. In this way, one can decide whether the batch interval needs to be changed or not, to avoid the data backlog, and secure the system stability. On the other hand, the static adjustment needs the user to manually change the configuration file.
JIRA Issue: https://issues.apache.org/jira/browse/SPARK-19663
My report: https://github.com/floatingtony/System-Lever-Optimization-of-Spark-Streaming