Skip to content

HashMap thread safety in multithreaded state store committer #6457

@patchwork01

Description

@patchwork01

User Story

As a user of Sleeper, I want changes to the Sleeper table state to be applied quickly and reliably, so that my data is not lost and I can retrieve the data I expect in a timely manner.

Description / Background

At time of writing the multithreaded state store committer uses a single StateStoreProvider and TablePropertiesProvider, and retrieves from both of them on many threads at once. Both of these are backed by a HashMap, and this results in both put and get calls made from many threads.

We'd like to avoid any concurrency bugs or problems related to this.

Technical Notes / Implementation Details

HashMap is documented to require synchronization for use over multiple threads. It's not clear what the behaviour will be when conflicts or race conditions occur. It could result in the state not being cached when it should be, or it could be worse than that.

We can refactor the use of StateStoreCommitter to allow reading from the HashMap objects in the main thread, before handing off to the thread per table.

Dependencies / Blockers

Conflicts with:

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions