Skip to content

JdbcPagingItemReader with sort key crashes in multithreaded step for empty query #3898

Closed
@SebastianGoeb

Description

@SebastianGoeb

Bug description
JdbcPagingItemReader cannot handle queries without results if multithreading and sorting are active together. remainingPagesSql is erroneously executed by other thread, if first thread found no results. Sind startAfterValues was never set by first thread, any sort key parameters is missing.

Environment
Java version: AdoptOpenJDK 11 (11.0.6)
Spring Batch version: spring-boot-starter-batch:2.4.3 (spring-batch-core:4.3.1)
Database: Postgres 13 (helm chart bitnami/postgres with image tags modified to 13.1.0-debian-10-r18)

Steps to reproduce

  1. Setup a chunking step that uses JdbcPagingItemReader with concurrency 2, page size 2, chunk size 2. I am following the driving query pattern in my simplified example, so only querying uuids, but this should work regardless of return type.
  2. Run the step against an empty table
  3. JdbcPagingItemReader (thread 1) executes firstPageSql, but gets no results. RowCallbackHandler is never called, so startAfterValues is never populated.
  4. JdbcPagingItemReader (thread 2) was waiting at synchronization point in AbstractPagingItemReader::doRead and is now allowed to continue. It executes doReadPage, and because page != 0 now it executes remainingPagesSql, but because startAfterValues was never populated, it can't find a value for the sort parameter (uuid in my case), and crashes.

Expected behavior
If thread 1 already got 0 results, I would expect thread 2 to either never enter doPageRead or just return an empty page without performing the actual query. In any case. I expect my step not to crash, but to simply finish with readCount == 0.

I'm not sure which entry condition is the culprit. I suspect it might be the current >= pageSize in AbstractPagingItemReader::doRead, but I'm not sure if that's intentional. Seems strange to me though, to allow another page query if current == pageSize, but there could be some finishing situations that I'm not considering.

Minimal Complete Reproducible example
Simplified snippet from my own code at work. Not quite complete, so will take some massaging to hook it up to real types, tables, etc. but get's the point across. Parameter import_timestamp is found by both firstPageSql and remainingPagesSql, whereas the parameter _uuid that is generated out of the sort condition is not found by remainingPagesSql if firstPageSql returns no results. This isn't an issue if single threaded because remainingPagesSql never runs, but if multithreaded, the other threads are already waiting at the lock and then run the remainingPagesSql anyway.

    @Bean
    public Step myStep(
            JdbcPagingItemReader<UUID> myTableUuidReader,
            Processor<UUID, String> myTableProcessor,
            ItemWriter<String> itemWriter) {
        SimpleAsyncTaskExecutor executor = new SimpleAsyncTaskExecutor();
        executor.setConcurrencyLimit(2);

        return steps.get("myStep")
                .<UUID, String>chunk(2)
                .reader(myTableUuidReader)
                .processor(myTableProcessor)
                .writer(itemWriter)
                .taskExecutor(executor)
                .throttleLimit(2)
                .build();
    }

    @JobScope
    @Bean
    public JdbcPagingItemReader<UUID> myTableUuidReader(DataSource dataSource) {
        return new JdbcPagingItemReaderBuilder<UUID>()
                .dataSource(dataSource)
                .selectClause("uuid")
                .fromClause("mytable")
                .whereClause("import_timestamp = :import_timestamp")
                .sortKeys(Map.of("uuid", ASCENDING))
                .parameterValues(Map.of(
                        "import_timestamp", someSqlTimestamp
                ))
                .rowMapper(new SingleColumnRowMapper<>())
                .pageSize(2)
                .saveState(false)
                .build();
    }

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions