You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
At the current implementation, modin's parquet reader tries to create as many column partitions as possible (even if all of them will consist of only 1 column), not considering the amount of row partitions being generated naturally by parquet's row groups.
We may want to change the logic of generating column partitions so if there are already enough row parts then it will only generate column partitions in accordance with the cfg.MinPartitionSize parameter, and not in 1 column per 1 partition style.
The text was updated successfully, but these errors were encountered:
At the current implementation, modin's parquet reader tries to create as many column partitions as possible (even if all of them will consist of only 1 column), not considering the amount of row partitions being generated naturally by parquet's row groups.
modin/modin/core/io/column_stores/column_store_dispatcher.py
Line 146 in abe20a5
This leads to that
.read_parquet()
naturally produces square-frames (#5296) that performs poorly in modin.We may want to change the logic of generating column partitions so if there are already enough row parts then it will only generate column partitions in accordance with the
cfg.MinPartitionSize
parameter, and not in 1 column per 1 partition style.The text was updated successfully, but these errors were encountered: