Open
Description
Describe the bug
After creating a Dask-cuDF data frame, if I perform multiple .loc operations on it using boolean Dask-cuDF series, then when I compute the data frame, it produces a runtime error with the message cuDF failure at: ../src/stream_compaction/apply_boolean_mask.cu:73: Column size mismatch
. A similar snippet works as expected on cuDF.
Steps/Code to reproduce bug
import dask_cudf
import cudf
ddf1 = dask_cudf.from_cudf(cudf.DataFrame({'a':[1,2,3], 'b':[4,5,6]}), npartitions=2)
f1 = dask_cudf.from_cudf(cudf.Series([False, True, True]), npartitions=2)
f2 = dask_cudf.from_cudf(cudf.Series([True, False]), npartitions=2)
ddf2 = ddf1.loc[f1]
ddf3 = ddf2.loc[f2]
print(ddf2.compute())
print(ddf3.compute())
The above code produces the following output:
a b
1 2 5
2 3 6
Traceback (most recent call last):
File "temp.py", line 9, in <module>
print(ddf3.compute())
File "/opt/conda/lib/python3.8/site-packages/dask/base.py", line 292, in compute
(result,) = compute(self, traverse=False, **kwargs)
File "/opt/conda/lib/python3.8/site-packages/dask/base.py", line 575, in compute
results = schedule(dsk, keys, **kwargs)
File "/opt/conda/lib/python3.8/site-packages/dask/local.py", line 554, in get_sync
return get_async(
File "/opt/conda/lib/python3.8/site-packages/dask/local.py", line 497, in get_async
for key, res_info, failed in queue_get(queue).result():
File "/opt/conda/lib/python3.8/concurrent/futures/_base.py", line 437, in result
return self.__get_result()
File "/opt/conda/lib/python3.8/concurrent/futures/_base.py", line 389, in __get_result
raise self._exception
File "/opt/conda/lib/python3.8/site-packages/dask/local.py", line 539, in submit
fut.set_result(fn(*args, **kwargs))
File "/opt/conda/lib/python3.8/site-packages/dask/local.py", line 235, in batch_execute_tasks
return [execute_task(*a) for a in it]
File "/opt/conda/lib/python3.8/site-packages/dask/local.py", line 235, in <listcomp>
return [execute_task(*a) for a in it]
File "/opt/conda/lib/python3.8/site-packages/dask/local.py", line 226, in execute_task
result = pack_exception(e, dumps)
File "/opt/conda/lib/python3.8/site-packages/dask/local.py", line 221, in execute_task
result = _execute_task(task, data)
File "/opt/conda/lib/python3.8/site-packages/dask/core.py", line 119, in _execute_task
return func(*(_execute_task(a, cache) for a in args))
File "/opt/conda/lib/python3.8/site-packages/dask/optimization.py", line 990, in __call__
return core.get(self.dsk, self.outkey, dict(zip(self.inkeys, args)))
File "/opt/conda/lib/python3.8/site-packages/dask/core.py", line 149, in get
result = _execute_task(task, cache)
File "/opt/conda/lib/python3.8/site-packages/dask/core.py", line 119, in _execute_task
return func(*(_execute_task(a, cache) for a in args))
File "/opt/conda/lib/python3.8/site-packages/dask/utils.py", line 39, in apply
return func(*args, **kwargs)
File "/opt/conda/lib/python3.8/site-packages/dask/dataframe/core.py", line 6330, in apply_and_enforce
df = func(*args, **kwargs)
File "/opt/conda/lib/python3.8/site-packages/dask/dataframe/methods.py", line 37, in loc
return df.loc[iindexer]
File "/opt/conda/lib/python3.8/site-packages/cudf/core/dataframe.py", line 127, in __getitem__
return self._getitem_tuple_arg(arg)
File "/opt/conda/lib/python3.8/site-packages/nvtx/nvtx.py", line 101, in inner
result = func(*args, **kwargs)
File "/opt/conda/lib/python3.8/site-packages/cudf/core/dataframe.py", line 267, in _getitem_tuple_arg
df = columns_df._apply_boolean_mask(tmp_arg[0])
File "/opt/conda/lib/python3.8/site-packages/cudf/core/indexed_frame.py", line 1696, in _apply_boolean_mask
libcudf.stream_compaction.apply_boolean_mask(
File "cudf/_lib/stream_compaction.pyx", line 101, in cudf._lib.stream_compaction.apply_boolean_mask
RuntimeError: cuDF failure at: ../src/stream_compaction/apply_boolean_mask.cu:73: Column size mismatch
Expected behavior
Expected output (verified with cudf instead of dask-cudf):
a b
1 2 5
2 3 6
a b
1 2 5
Environment overview (please complete the following information)
- Environment location: Docker
- Method of cuDF install: Docker
docker run -it --rm --gpus all --ipc=host --network=host -v .
Environment details
cuDF version 22.4.0a0+306.g0cb75a4913