Closed
Description
Describe the bug
Description
I encountered an error when using the narwhals package with Polars and Joblib for parallel processing. The error occurs when attempting to convert a Polars DataFrame using nw.from_native(df) within a parallelized function.
Background info:
We encountered this issue when trying to implement narwhals in another Darts method in this PR.
Steps or code to reproduce the bug
Here is the code to reproduce the error:
from joblib import Parallel, delayed
import pandas as pd
import polars as pl
import narwhals as nw
def do_something(df):
# parallel does not work with narwhalified polars
return df
dfs_pandas = [nw.from_native(pd.DataFrame({"col1": [0, 2], "col2": [3, 7]}))]
dfs_polars = [nw.from_native(pl.DataFrame({"col1": [0, 2], "col2": [3, 7]}))]
# with pandas it works
Parallel(n_jobs=-1)(delayed(do_something)(df_) for df_ in dfs_pandas)
# with polars it does not work
Parallel(n_jobs=-1)(delayed(do_something)(df_) for df_ in dfs_polars)
Expected results
Parallel processing should work as in the case with narwhalified pandas DataFrame.
Actual results
Parallel processing does not work with narwhalified polars DataFrame.
Please run narwhals.show_version() and enter the output below.
System:
python: 3.11.11 (main, Dec 11 2024, 10:25:04) [Clang 14.0.6 ]
executable: /Users/username/miniconda3/envs/darts/bin/python
machine: macOS-15.3.2-arm64-arm-64bit
Python dependencies:
narwhals: 1.37.0
pandas: 2.2.3
polars: 1.27.1
cudf:
modin:
pyarrow: 19.0.1
numpy: 2.2.5
Relevant log output
/Users/username/miniconda3/envs/darts/bin/python /Users/username/projects/unit8/darts_test_code/bug_fixes/bug_polars_parallel.py
joblib.externals.loky.process_executor._RemoteTraceback:
"""
Traceback (most recent call last):
File "/Users/username/miniconda3/envs/darts/lib/python3.11/site-packages/narwhals/_polars/dataframe.py", line 227, in func
return self._from_native_object(getattr(self.native, attr)(*pos, **kwds))
^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'function' object has no attribute '__setstate__'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/Users/username/miniconda3/envs/darts/lib/python3.11/site-packages/joblib/externals/loky/process_executor.py", line 426, in _process_worker
call_item = call_queue.get(block=True, timeout=timeout)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/username/miniconda3/envs/darts/lib/python3.11/multiprocessing/queues.py", line 122, in get
return _ForkingPickler.loads(res)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/username/miniconda3/envs/darts/lib/python3.11/site-packages/narwhals/_polars/dataframe.py", line 232, in func
raise catch_polars_exception(e, self._backend_version) from None
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/username/miniconda3/envs/darts/lib/python3.11/site-packages/narwhals/_polars/utils.py", line 232, in catch_polars_exception
if backend_version >= (1,) and isinstance(exception, pl.exceptions.PolarsError):
^^^^^^^^^^^^^^^^^^^^^^^
TypeError: '>=' not supported between instances of 'function' and 'tuple'
"""
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/Users/username/projects/unit8/darts_test_code/bug_fixes/bug_polars_parallel.py", line 19, in <module>
Parallel(n_jobs=-1)(delayed(do_something)(df_) for df_ in dfs_polars)
File "/Users/username/miniconda3/envs/darts/lib/python3.11/site-packages/joblib/parallel.py", line 2007, in __call__
return output if self.return_generator else list(output)
^^^^^^^^^^^^
File "/Users/username/miniconda3/envs/darts/lib/python3.11/site-packages/joblib/parallel.py", line 1650, in _get_outputs
yield from self._retrieve()
File "/Users/username/miniconda3/envs/darts/lib/python3.11/site-packages/joblib/parallel.py", line 1754, in _retrieve
self._raise_error_fast()
File "/Users/username/miniconda3/envs/darts/lib/python3.11/site-packages/joblib/parallel.py", line 1789, in _raise_error_fast
error_job.get_result(self.timeout)
File "/Users/username/miniconda3/envs/darts/lib/python3.11/site-packages/joblib/parallel.py", line 745, in get_result
return self._return_or_raise()
^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/username/miniconda3/envs/darts/lib/python3.11/site-packages/joblib/parallel.py", line 763, in _return_or_raise
raise self._result
joblib.externals.loky.process_executor.BrokenProcessPool: A task has failed to un-serialize. Please ensure that the arguments of the function are all picklable.
Process finished with exit code 1
Metadata
Metadata
Assignees
Labels
No labels