Description
Bug description
We found that memory consumption is fairly high on one of the service nodes that uses the Spring Batch. Even though both data nodes did a similar amount of work, the memory consumption across nodes was not even - 15GB vs 1.5GB (see memory use screenshot).
We have some jobs that could run for seconds while others might run for hours, so we set the polling interval (MessageChannelPartitionHandler#setPollInterval) to 1 second rather than 10 seconds that is the default. In a large running job scenario, we ended up creating 837 step executions.
What I found was that MessageChannelPartitionHandler#pollReplies gets a full StepExecution representation for each step, which contains a JobExecution which would also contain StepExecutions for each. However, they are retrieved at different times and stages. This means that we end up with square number of StepExecution objects, e.g. 837*837=700569 StepExecutions (see screenshot below)
Environment
Initially reproduced on Spring Batch 4.1.4.
Expected behavior
My proposal would be to:
- Issue a SQL query to get the count of running StepExecutions instead of retrieving DTOs. This way there is less objects loaded into the heap.
- Once all steps are finished, then query for all StepExecutions for that job. We can then assign the same JobExecution to each step.
Memory usage graph comparison between two service nodes, doing roughly equal amount of work:
My apologies for a messy screenshot, but it does explain the number of StepExecution objects: