Skip to content

groupby ignores rows containing the number -9223372036854775808 #15721

Closed
@ReSqAr

Description

@ReSqAr

Code Sample, a copy-pastable example if possible

In [1]: import pandas as pd
In [2]: import numpy as np
In [3]: np.iinfo(np.int64).min == -9223372036854775808
True
In [4]: data = []
In [5]: data.append(-9223372036854775808)
In [6]: data.append(-9223372036854775808+1)
In [7]: data.append(0)
In [8]: df = pd.DataFrame( data, columns=["x"] ); df
Out[8]: 
                     x
0 -9223372036854775808
1 -9223372036854775807
2                    0
In [9]: df.groupby("x").groups
Out[9]: 
{-9223372036854775807: Int64Index([1], dtype='int64'),
 0: Int64Index([2], dtype='int64')}
In [10]: df.groupby("x")["x"].count()
x
-9223372036854775807    1
 0                      1
Name: x, dtype: int64

Problem description

The groupby operation ignores the row containing -9223372036854775808.

Output of pd.show_versions()

INSTALLED VERSIONS


commit: None
python: 3.5.2.final.0
python-bits: 64
OS: Linux
OS-release: 4.8.0-41-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_GB.UTF-8
LOCALE: en_GB.UTF-8

pandas: 0.19.2
nose: None
pip: 9.0.1
setuptools: 34.3.1
Cython: None
numpy: 1.11.1rc1
scipy: None
statsmodels: None
xarray: None
IPython: 5.3.0
sphinx: None
patsy: None
dateutil: 2.6.0
pytz: 2014.10
blosc: None
bottleneck: None
tables: None
numexpr: None
matplotlib: None
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: 4.5.1
html5lib: 0.999999999
httplib2: 0.9.1
apiclient: None
sqlalchemy: 1.0.14
pymysql: None
psycopg2: None
jinja2: 2.9.5
boto: None
pandas_datareader: None

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugGroupbyMissing-datanp.nan, pd.NaT, pd.NA, dropna, isnull, interpolate

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions