-
-
Notifications
You must be signed in to change notification settings - Fork 50
Description
When doing a getitem operation after read_parquet, the column selection is pushed down. So for example, in the following cases
gdf = dask_geopandas.read_parquet(...)
# only the "attribute" column is read
gdf["attribute"].mean()
# only the geometry column is read (.geometry is equivalent of `gdf["geometry"]`
gdf.geometry.xBut, it seems that specifically for total_bounds, this doesn't work for some reason, and even gdf.geometry.total_bounds.compute() loads all columns of the Parquet file instead of only the geometry column (which makes total_bounds considerably slower as it could be).
(the reason I was looking into this was the realization that gdf.total_bounds (so where the user doesn't explicitly call .geometry first) might load all columns unnecessarily (which is relevant for all GeoDataFrame methods/attributes that only require the geometry column, and something we could fix I suppose, need to open a separate issue for that), but then when comparing with gdf.geometry.total_bounds it didn't improve)