Skip to content

[Bug]: AttributeError when using narwhalified Polars DataFrame with Joblib Parallel Processing #2450

Closed
@dennisbader

Description

@dennisbader

Describe the bug

Description

I encountered an error when using the narwhals package with Polars and Joblib for parallel processing. The error occurs when attempting to convert a Polars DataFrame using nw.from_native(df) within a parallelized function.

Background info:

We encountered this issue when trying to implement narwhals in another Darts method in this PR.

Steps or code to reproduce the bug

Here is the code to reproduce the error:

from joblib import Parallel, delayed

import pandas as pd
import polars as pl
import narwhals as nw


def do_something(df):
    # parallel does not work with narwhalified polars
    return df

dfs_pandas = [nw.from_native(pd.DataFrame({"col1": [0, 2], "col2": [3, 7]}))]
dfs_polars = [nw.from_native(pl.DataFrame({"col1": [0, 2], "col2": [3, 7]}))]

# with pandas it works
Parallel(n_jobs=-1)(delayed(do_something)(df_) for df_ in dfs_pandas)

# with polars it does not work
Parallel(n_jobs=-1)(delayed(do_something)(df_) for df_ in dfs_polars)

Expected results

Parallel processing should work as in the case with narwhalified pandas DataFrame.

Actual results

Parallel processing does not work with narwhalified polars DataFrame.

Please run narwhals.show_version() and enter the output below.

System:
    python: 3.11.11 (main, Dec 11 2024, 10:25:04) [Clang 14.0.6 ]
executable: /Users/username/miniconda3/envs/darts/bin/python
   machine: macOS-15.3.2-arm64-arm-64bit
Python dependencies:
     narwhals: 1.37.0
       pandas: 2.2.3
       polars: 1.27.1
         cudf: 
        modin: 
      pyarrow: 19.0.1
        numpy: 2.2.5

Relevant log output

/Users/username/miniconda3/envs/darts/bin/python /Users/username/projects/unit8/darts_test_code/bug_fixes/bug_polars_parallel.py 
joblib.externals.loky.process_executor._RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/Users/username/miniconda3/envs/darts/lib/python3.11/site-packages/narwhals/_polars/dataframe.py", line 227, in func
    return self._from_native_object(getattr(self.native, attr)(*pos, **kwds))
                                    ^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'function' object has no attribute '__setstate__'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/username/miniconda3/envs/darts/lib/python3.11/site-packages/joblib/externals/loky/process_executor.py", line 426, in _process_worker
    call_item = call_queue.get(block=True, timeout=timeout)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/username/miniconda3/envs/darts/lib/python3.11/multiprocessing/queues.py", line 122, in get
    return _ForkingPickler.loads(res)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/username/miniconda3/envs/darts/lib/python3.11/site-packages/narwhals/_polars/dataframe.py", line 232, in func
    raise catch_polars_exception(e, self._backend_version) from None
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/username/miniconda3/envs/darts/lib/python3.11/site-packages/narwhals/_polars/utils.py", line 232, in catch_polars_exception
    if backend_version >= (1,) and isinstance(exception, pl.exceptions.PolarsError):
       ^^^^^^^^^^^^^^^^^^^^^^^
TypeError: '>=' not supported between instances of 'function' and 'tuple'
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/Users/username/projects/unit8/darts_test_code/bug_fixes/bug_polars_parallel.py", line 19, in <module>
    Parallel(n_jobs=-1)(delayed(do_something)(df_) for df_ in dfs_polars)
  File "/Users/username/miniconda3/envs/darts/lib/python3.11/site-packages/joblib/parallel.py", line 2007, in __call__
    return output if self.return_generator else list(output)
                                                ^^^^^^^^^^^^
  File "/Users/username/miniconda3/envs/darts/lib/python3.11/site-packages/joblib/parallel.py", line 1650, in _get_outputs
    yield from self._retrieve()
  File "/Users/username/miniconda3/envs/darts/lib/python3.11/site-packages/joblib/parallel.py", line 1754, in _retrieve
    self._raise_error_fast()
  File "/Users/username/miniconda3/envs/darts/lib/python3.11/site-packages/joblib/parallel.py", line 1789, in _raise_error_fast
    error_job.get_result(self.timeout)
  File "/Users/username/miniconda3/envs/darts/lib/python3.11/site-packages/joblib/parallel.py", line 745, in get_result
    return self._return_or_raise()
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/username/miniconda3/envs/darts/lib/python3.11/site-packages/joblib/parallel.py", line 763, in _return_or_raise
    raise self._result
joblib.externals.loky.process_executor.BrokenProcessPool: A task has failed to un-serialize. Please ensure that the arguments of the function are all picklable.

Process finished with exit code 1

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions