Skip to content

High memory consumption during long running jobs #3790

Closed
@PauliusPeciura

Description

@PauliusPeciura

Bug description
We found that memory consumption is fairly high on one of the service nodes that uses the Spring Batch. Even though both data nodes did a similar amount of work, the memory consumption across nodes was not even - 15GB vs 1.5GB (see memory use screenshot).

We have some jobs that could run for seconds while others might run for hours, so we set the polling interval (MessageChannelPartitionHandler#setPollInterval) to 1 second rather than 10 seconds that is the default. In a large running job scenario, we ended up creating 837 step executions.

What I found was that MessageChannelPartitionHandler#pollReplies gets a full StepExecution representation for each step, which contains a JobExecution which would also contain StepExecutions for each. However, they are retrieved at different times and stages. This means that we end up with square number of StepExecution objects, e.g. 837*837=700569 StepExecutions (see screenshot below)

Environment
Initially reproduced on Spring Batch 4.1.4.

Expected behavior
My proposal would be to:

  1. Issue a SQL query to get the count of running StepExecutions instead of retrieving DTOs. This way there is less objects loaded into the heap.
  2. Once all steps are finished, then query for all StepExecutions for that job. We can then assign the same JobExecution to each step.

Memory usage graph comparison between two service nodes, doing roughly equal amount of work:

memoryUse - redacted

My apologies for a messy screenshot, but it does explain the number of StepExecution objects:

stepExecutions - redacted

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions