Describe the enhancement requested
The row_group_size property in pyarrow.parquet.write_table is described as:
Maximum size of each written row group. If None, the row group size will be the minimum of the Table size and 64 * 1024 * 1024.
This limit is in # of rows but that is not obvious from the description. Furthermore, 64Mi is an extremely high limit for row group size. I believe it is perhaps based on the Java implementation, however the Java implementation treats this number as "bytes" and not "rows" (64MiB row groups is very reasonable).
Perhaps the best solution would be to add support for a limit in terms of bytes. In the meantime, I think we should lower the default limit to 1Mi rows.
Component(s)
C++