Skip to content

Free-threading support #439

@crusaderky

Description

@crusaderky

This is a meta-issue for what Dask needs to declare compatibility with Python 3.14t:

  • dask/dask CI: Pandas in 3.14t CI dask#12284 just needs review!
  • dask/distributed CI: WIP 3.14t support distributed#9194 tests are failing and need investigating. I do not know at this point if they highlight any actual issue.
  • dask/distributed CI: we could use some more aggressive multithreading tests for a few common uses cases:
    • different threads run in parallel in the client process, each with their own Client, sharing the same Scheduler
    • different threads run in parallel in the client process, sharing a single Client
    • different threads wait on the same Future (this is a variant of the previous point)
  • pandas: Pandas releases the GIL since 3.0.0. However, there are severe race conditions in this version, which are fixed in pandas-nightly and will be released in 3.0.1 (BUG: Fix data races in block internals pandas-dev/pandas#63783)
  • msgpack: it contains a C extension that does not release the GIL. The extension has been audited and it's been found not to contain any race conditions as long as you don't modify the input data while it's being serialized, which is fine given Dask's model. However, the maintainer has proven hostile to external contributions and so we are stuck waiting. There are two workarounds:
    1. run python with PYTHON_GIL=0. This has the disadvantage of silencing any other package with the same issue.
    2. manually install a pure-python variant of msgpack; this has the disadvantage of reducing performance:
      MSGPACK_PUREPYTHON=1 pip install msgpack --no-binary msgpack --no-cache --force-reinstall -v
  • cytoolz: cp314t wheels exist and release the GIL, but conda packages do not. The package was not properly audited AFAIK, but it should be ok thanks to its stateless nature, as long as you don't do obviously dangerous stuff like having multiple threads draw from the same generator.
  • investigate if it's possible to tweak pip and conda-forge dependencies for free-threading only, and if so set much more stringent minimum versions.
  • performance regression testing at scale vs. 3.14 with GIL

The above are the prerequisites to advertise a good user experience to final users - e.g. it will be at worst like the GIL-enabled version, minus some % slower due to known bottlenecks that are actively being ironed out upstream.

In addition, real world users will be blocked by key missing optional packages; notably:

  • zarr
  • h5py (for xarray's NetCDF engine="h5netcdf")
  • s3fs (dependency brotli has 3.14t wheels, but does not release the GIL)
  • tiledb
  • jupyterlab (you can run jupyterlab in GIL-enabled Python and then attach a 3.14t kernel, but it's fiddly)

After the above, there is ample margin for performance tweaking:

  • pickling/unpickling runs on a special thread pool with a single thread. With free-threading, you could have as many threads as you have CPUs.
  • network I/O runs on a single thread; it could be moved to one thread per peer.
  • spilling is single-threaded, blocks the worker state machine and networking, and has always been very painful
  • the only thing that must remain single-threaded is the state machine, but I expect it not to be a bottleneck in and by itself.
  • more in general, there is a lot of profiling needed as it becomes possible to have workers with a huge number of threads - which was previously unadvisable even at the default chunk size (128 MiB). My latest benchmarks at scale (on Coiled, circa 2022) showed severe performance degradation already when moving from 4 threads per worker to 8, given the same number of total CPUs on the cluster.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions