Skip to content

BUG: gdf.geometry.total_bounds reads all columns from Parquet instead of only geometry column #78

@jorisvandenbossche

Description

@jorisvandenbossche

When doing a getitem operation after read_parquet, the column selection is pushed down. So for example, in the following cases

gdf = dask_geopandas.read_parquet(...)
# only the "attribute" column is read
gdf["attribute"].mean()

# only the geometry column is read (.geometry is equivalent of `gdf["geometry"]`
gdf.geometry.x

But, it seems that specifically for total_bounds, this doesn't work for some reason, and even gdf.geometry.total_bounds.compute() loads all columns of the Parquet file instead of only the geometry column (which makes total_bounds considerably slower as it could be).

(the reason I was looking into this was the realization that gdf.total_bounds (so where the user doesn't explicitly call .geometry first) might load all columns unnecessarily (which is relevant for all GeoDataFrame methods/attributes that only require the geometry column, and something we could fix I suppose, need to open a separate issue for that), but then when comparing with gdf.geometry.total_bounds it didn't improve)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions