Building column partitions discards category dtype #2513
Labels
bug 🦗
Something isn't working
P2
Minor bugs or low-priority feature requests
pandas concordance 🐼
Functionality that does not match pandas
pandas.dataframe
Related to pandas.dataframe module
System information
modin.__version__
): c2e7f9eOutput
Describe the problem
The problem is in how we're building column partitions and broadcasted frame inside
deploy_axis_func
anddeploy_func_between_two_axis_partitions
, we're just concating partitions and that leads to discarding category dtype:To save categories we can use the same approach we're using to do
to_pandas
inconcatenate
by union categories. Unioning categories slows down the building of column partitions by about 30% for frames that contain categories, however, if the frame already has computed dtypes then we can pass them into our concatenate function and reuse already unioned categories, which should not give a noticeable slowdown.The text was updated successfully, but these errors were encountered: