Skip to content

Comments

BUG: Fix data races in block internals#63783

Merged
mroeschke merged 18 commits intopandas-dev:mainfrom
ngoldbaum:thread-safe
Feb 8, 2026
Merged

BUG: Fix data races in block internals#63783
mroeschke merged 18 commits intopandas-dev:mainfrom
ngoldbaum:thread-safe

Conversation

@ngoldbaum
Copy link
Contributor

@ngoldbaum ngoldbaum commented Jan 20, 2026

This PR fixes several thread safety issues in Pandas internals on the free-threaded build. We found all of them using thread sanitizer testing.

I also added a multithreaded test based on the original test in statsmodels we used to find these issues. There's probably lots more room to add more multithreaded tests, but a real-world example that triggered real issues seems as good a place to start as any to me.

You can read more about critical sections in the CPython docs, the cython docs, and the free-threading guide.

@ngoldbaum ngoldbaum changed the title Thread safe BUG: Fix data races in block internals Jan 20, 2026
@ngoldbaum ngoldbaum force-pushed the thread-safe branch 3 times, most recently from 203a6b5 to deb55bf Compare January 28, 2026 17:32
@ngoldbaum
Copy link
Contributor Author

I edited the description to make it more descriptive - sorry for forgetting to do that before marking it ready for review!

@ngoldbaum
Copy link
Contributor Author

Well that's annoying: The Windows Python 3.12 tests failed inside the new multithreaded test with a spurious warning:

https://github.com/pandas-dev/pandas/actions/runs/21448817886/job/61771847840?pr=63783#step:5:335

Warnings are known not to be thread-safe before Python 3.14, so I'm tempted to just skip the new multithreaded tests on Python versions where warnings aren't thread-safe. @mroeschke does that seem OK to you?



@td.skip_if_warnings_arent_thread_safe
def test_multithreaded_reading():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think this tests is reliable (non-flaky) to run with pytest-xdist on the GHA runners? Decorating this tests with @pytest.mark.single_cpu will run this test without xdist

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call, I added the mark and also added some explanatory comments.

@mroeschke
Copy link
Member

so I'm tempted to just skip the new multithreaded tests on Python versions where warnings aren't thread-safe

Sure that's fine with me. The python-dev jobs are the only ones that run Python 3.14 right now

@mroeschke mroeschke added this to the 3.0.1 milestone Feb 5, 2026
@crusaderky
Copy link

I've tested this PR against the dask/dask test suite, where we're currently observing CI flakiness in all GIT-enabled Python versions using pandas 3.0.

Result of local tests:

  • Python 3.14, Pandas 3.0.0: I failed to replicate CI failures locally (probably due to much faster hardware than CI)
  • Python 3.14t, Pandas 3.0.0: 3~15% failure rate on the tests that frequently offend on GIL-enabled CI
  • Python 3.14t, Pandas main git tip: same as 3.0.0
  • Python 3.14t, this PR: all failures disappear. ⭐

I consider this PR as a blocker for 3.14t support in dask.dataframe.

More detailed results in dask/dask#12225 (comment)

crusaderky added a commit to crusaderky/dask that referenced this pull request Feb 5, 2026
crusaderky added a commit to crusaderky/dask that referenced this pull request Feb 5, 2026
crusaderky added a commit to crusaderky/dask that referenced this pull request Feb 6, 2026
WASM,
reason="does not support wasm",
)
skip_if_warnings_arent_thread_safe = pytest.mark.skipif(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
skip_if_warnings_arent_thread_safe = pytest.mark.skipif(
skip_if_thread_unsafe_warnings = pytest.mark.skipif(

naming nit

@kumaraditya303
Copy link
Contributor

@mroeschke This is ready to merged, we will followup to add a TSAN CI job once this gets merged.

@mroeschke mroeschke merged commit 8cfd376 into pandas-dev:main Feb 8, 2026
42 checks passed
@mroeschke
Copy link
Member

Thanks all!

meeseeksmachine pushed a commit to meeseeksmachine/pandas that referenced this pull request Feb 8, 2026
mroeschke pushed a commit that referenced this pull request Feb 8, 2026
…rnals) (#64078)

Co-authored-by: Nathan Goldbaum <nathan.goldbaum@gmail.com>
@kumaraditya303 kumaraditya303 deleted the thread-safe branch February 10, 2026 12:54
crusaderky added a commit to crusaderky/dask that referenced this pull request Feb 13, 2026
crusaderky added a commit to crusaderky/dask that referenced this pull request Feb 13, 2026
Bump to pandas-dev/pandas#63783

Fix test_repartition_partition_size

Use pandas-nightly; use pyarrow from conda-forge
crusaderky added a commit to crusaderky/dask that referenced this pull request Feb 13, 2026
Bump to pandas-dev/pandas#63783

Fix test_repartition_partition_size

Use pandas-nightly; use pyarrow from conda-forge
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

BUG: data races when simultaneously reading data frames in multiple threads

4 participants