Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[SPARK-31378][CORE] stage level scheduling dynamic allocation bug wit…
…h initial num executors ### What changes were proposed in this pull request? I found a bug in the stage level scheduling dynamic allocation code when you have a non default profile and it has an initial number of executors the same as what the number of executors needed for the first job, then we don't properly request the executors. This causes a hang. The issue is that when a new stage is added and the initial number of executors is set, we set the target to be the initial number. Unfortunately that makes the code in the update and sync function think it has already requested that number. So to fix this, when there is an initial number we just go ahead and request executors at that point. This is basically what happens on startup to handle the case with the default profile. ### Why are the changes needed? bug ### Does this PR introduce any user-facing change? no ### How was this patch tested? unit test and manually test on yarn cluster. Went though multiple scenarios initial numbers, minimum number and number executor required by the first stage. Closes apache#28146 from tgravescs/SPARK-31378. Authored-by: Thomas Graves <tgraves@nvidia.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
- Loading branch information