Skip to content

[FLINK-39000] Avoid redundant seeks during operator list state restore#27527

Open
infocusmodereal wants to merge 1 commit intoapache:masterfrom
infocusmodereal:codex/FLINK-39000-avoid-redundant-seeks
Open

[FLINK-39000] Avoid redundant seeks during operator list state restore#27527
infocusmodereal wants to merge 1 commit intoapache:masterfrom
infocusmodereal:codex/FLINK-39000-avoid-redundant-seeks

Conversation

@infocusmodereal
Copy link

@infocusmodereal infocusmodereal commented Feb 4, 2026

This PR addresses FLINK-39000.

Problem

  • Operator list state restore performs FSDataInputStream.seek(offset) for every element in OperatorStateRestoreOperation.deserializeOperatorStateValues, even when offsets are already sequential.
  • On object stores (S3/GCS/Ceph/etc.) and for some stream wrappers (e.g. compressed input streams), repeated seeks can be expensive and dominate restore time for large operator list state.

Changes

  • Track the current stream position and only call seek() when the desired offset differs.
  • Add a unit test that wraps the state handle input stream to count seek() calls and asserts that sequential offsets result in minimal seeks (covers snapshot compression enabled/disabled).

Tests

  • ./mvnw -pl flink-runtime -Dtest=OperatorStateRestoreOperationTest test -Djdk17 -Pjava17-target

With a wrapper that injects 1ms latency per FSDataInputStream.seek() (simulating object-store range-read/reset overhead) and a 10k-element operator ListState, restore time dropped from ~12.7s (10,000 seeks) to ~18.8ms (0 seeks) after skipping redundant seeks when the stream is already at the target offset.

@flinkbot
Copy link
Collaborator

flinkbot commented Feb 4, 2026

CI report:

Bot commands The @flinkbot bot supports the following commands:
  • @flinkbot run azure re-run the last Azure build

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants