Skip to content

pivot_table produces frame with unsorted columns #2144

@dchigarev

Description

@dchigarev

System information

from modin.pandas.test.utils import df_equals, test_data_values, create_test_dfs

data = test_data_values[0]
md_df, pd_df = create_test_dfs(data)

index = pd_df.columns[0]

md_res = md_df.pivot_table(index=index)
pd_res = pd_df.pivot_table(index=index)

df_equals(md_res.sort_index(axis=1), pd_res.sort_index(axis=1))  # passes OK
df_equals(md_res, pd_res)  # assertion error

Output:

DataFrame.columns values are different (93.65079 %)
[left]:  Index(['col34', 'col35', 'col36', 'col37', 'col38', 'col39', 'col40', 'col41',
       'col42', 'col43', 'col44', 'col45', 'col46', 'col47', 'col48', 'col49',
       'col50', 'col51', 'col52', 'col53', 'col54', 'col55', 'col56', 'col57',
       'col58', 'col59', 'col60', 'col61', 'col62', 'col63', 'col64', 'col1',
       'col10', 'col11', 'col12', 'col13', 'col14', 'col15', 'col16', 'col17',
       'col18', 'col19', 'col2', 'col20', 'col21', 'col22', 'col23', 'col24',
       'col25', 'col26', 'col27', 'col28', 'col29', 'col3', 'col30', 'col31',
       'col4', 'col5', 'col6', 'col7', 'col8', 'col9', 'index'],
      dtype='object')
[right]: Index(['col1', 'col10', 'col11', 'col12', 'col13', 'col14', 'col15', 'col16',
       'col17', 'col18', 'col19', 'col2', 'col20', 'col21', 'col22', 'col23',
       'col24', 'col25', 'col26', 'col27', 'col28', 'col29', 'col3', 'col30',
       'col31', 'col34', 'col35', 'col36', 'col37', 'col38', 'col39', 'col4',
       'col40', 'col41', 'col42', 'col43', 'col44', 'col45', 'col46', 'col47',
       'col48', 'col49', 'col5', 'col50', 'col51', 'col52', 'col53', 'col54',
       'col55', 'col56', 'col57', 'col58', 'col59', 'col6', 'col60', 'col61',
       'col62', 'col63', 'col64', 'col7', 'col8', 'col9', 'index'],
      dtype='object')

Describe the problem

Pandas sorts column labels of the result of pandas.pivot_table. Modin doesn't do that due to performance reasons.

Currently, we're just printing a warning, that the order of columns could be mismatched with pandas.

Metadata

Metadata

Assignees

No one assigned

    Labels

    P2Minor bugs or low-priority feature requestsbug 🦗Something isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions