Description
Expected Behavior
By adding Throwables to a (not yet existing) configuration, the batch framework will catch and swallow these exceptions if they where fired during the commit phase of the transaction and add the ChunkContext including the attributeQueue so that this chunk can be retried without having to execute the reader.
Current Behavior
Currently there is no retry mechanism in place where ChunkContext has status completed (correctly) and we only want to retry the processor and writer.
Work around implementation details
Implementing this feature currently requires the copying and pasting from some classes, with only minor changes to the code. This will be difficult to maintain if there are framework changes.
- copy paste ChunkOrientedTasklet and remove line "chunkContext.removeAttribute(INPUTS_KEY);"
- set tour own version of ChunkOrientedTasklet by using TaskletStep::setTasklet in the jobConfiguration.
- copy paste StepContextRepeatCallback and add a catch statement catching exceptions from the transaction commit. In the catch store the ChunkContext in the attributeQueue if the exception if swallowed. Also set the completed status to false.
- Extend TaskletStep to override doExecute, only to use our own version of StepContextRepeatCallback.
- Use our version of TaskletStep in the jobConfiguration.
Context
Our use case is that we read multiple xml files concurrently that contain data to either create or update entities in a db. The system must be able to handle multiple xml files that possibly contain updates to the same entities. The current design is that a batch job reads in a single xml file, the reader reads a chunk of xml data and creates a pojo per entity to create/update. These pojos are input to the processor which fetches (if present) the existing entity and updates/creates it. Then the writer writes them to the db.
Currently if 2 xml files that contain an update to the same entity are executed concurrently in 2 different batch jobs, an OptimisticLockingException can occur in the commit phase. That is because both processors fetched the existing entity from the db before a writer wrote its changes. Simply re-executing the processor and writer on the same pojo will fix this.