You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently, as far as I can tell, when you perform a select count(*) from dataset in datafusion against a parquet dataset, the way this is implemented is by doing a scan on column 0, and counting up all of the rows (specifically I think it counts the # of rows in each batch).
However, for the specific case of just counting everythign in a parquet file, you can just read the rowcount from the footer metadata, so it's O(1) instead of O(n)
The text was updated successfully, but these errors were encountered:
Note: migrated from original JIRA: https://issues.apache.org/jira/browse/ARROW-8902
Currently, as far as I can tell, when you perform a
select count(*) from dataset
in datafusion against a parquet dataset, the way this is implemented is by doing a scan on column 0, and counting up all of the rows (specifically I think it counts the # of rows in each batch).However, for the specific case of just counting everythign in a parquet file, you can just read the rowcount from the footer metadata, so it's O(1) instead of O(n)
The text was updated successfully, but these errors were encountered: