Skip to content

Pandas numeric_only behavoir in full reduce in python2 #83

@simon-mo

Description

@simon-mo

by default, pandas numeric_only option in full_reduce like operation (e.g. max, min, mean, ..) will take an numeric_only argument. If will:

  • Try to operate on full axis if possible
  • If first option errors, operate on numeric_only values.

In Modin, we decided the following behavior:

numeric_only = True if axis else kwargs.get("numeric_only", False)

because the asynchronous nature of our computation model.

However, this will lead to the following behavior in python2. In python2:

In [1]: max([1,2,3,'a'])
Out[1]: 'a'

In a mixed type dataframe:

   col1  col2  col3 col4
0     1     4   8.0    a
1     2     5   9.4    b
2     3     6  10.1    c
3     4     7  11.3    d

taking max over rows will lead to

0    a
1    b
2    c
3    d
dtype: object

This is not expected behavior, therefore we choose to not following pandas behavior at this situation.

Metadata

Metadata

Assignees

No one assigned

    Labels

    pandas 🤔Weird Behaviors of Pandas

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions