Skip to content

spatial_shuffle fails when loading from a shapefile #302

@fbunt

Description

@fbunt

Calling spatial_shuffle on a GeoDataFrame that originates from a shapefile throws a ValueError. Below is an example that uses the attached shapefile:

>>> import dask_geopandas as dgpd
>>> ddf = dgpd.read_file("example/example.shp", npartitions=1).spatial_shuffle()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/fred/anaconda3/envs/dgpd-error/lib/python3.11/site-packages/dask_geopandas/expr.py", line 847, in spatial_shuffle
    sorted_ddf = self.set_index(
                 ^^^^^^^^^^^^^^^
  File "/home/fred/anaconda3/envs/dgpd-error/lib/python3.11/site-packages/dask_geopandas/expr.py", line 634, in set_index
    ddf = super().set_index(*args, **kwargs)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/fred/anaconda3/envs/dgpd-error/lib/python3.11/site-packages/dask_expr/_collection.py", line 3463, in set_index
    return new_collection(
           ^^^^^^^^^^^^^^^
  File "/home/fred/anaconda3/envs/dgpd-error/lib/python3.11/site-packages/dask_expr/_collection.py", line 4769, in new_collection
    meta = expr._meta
           ^^^^^^^^^^
  File "/home/fred/anaconda3/envs/dgpd-error/lib/python3.11/functools.py", line 1001, in __get__
    val = self.func(instance)
          ^^^^^^^^^^^^^^^^^^^
  File "/home/fred/anaconda3/envs/dgpd-error/lib/python3.11/site-packages/dask_expr/_shuffle.py", line 816, in _meta
    return self.frame._meta.set_index(other, drop=self.drop)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/fred/anaconda3/envs/dgpd-error/lib/python3.11/site-packages/pandas/core/frame.py", line 6173, in set_index
    raise ValueError(
ValueError: Length mismatch: Expected 5 rows, received array of length 0

Environment:

dask                      2024.6.2           pyhd8ed1ab_0    conda-forge
dask-core                 2024.6.2           pyhd8ed1ab_0    conda-forge
dask-expr                 1.1.6              pyhd8ed1ab_0    conda-forge
dask-geopandas            0.4.1              pyhd8ed1ab_0    conda-forge
geopandas                 1.0.0              pyhd8ed1ab_0    conda-forge
geopandas-base            1.0.0              pyha770c72_0    conda-forge
numpy                     2.0.0           py311h1461c94_0    conda-forge
pandas                    2.2.2           py311h14de704_1    conda-forge

example.tar.gz

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions