Skip to content

BUG: Mysterious Series.get() with Int64Index bug #33439

Closed
@sam-cohan

Description

@sam-cohan
  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • (optional) I have confirmed this bug exists on the master branch of pandas.


Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.

Code Sample, a copy-pastable example

# Your code here

def get_t_minus_n_val(n):

    def f(x):
        # assume index is days_from_origin
        if n == 3 and x.index[-1] == 27:
            import pdb; pdb.set_trace()
            print(x.index[-1], n, x.get(x.index[-1] - n)) # for debugging
        return x.get(x.index[-1] - n, np.NaN)

    f.__name__ = f"t_minus_{n}_days"

    return f

res = data_df.set_index("days_from_origin_").groupby("device").agg({"metric1": get_t_minus_n_val(3)})

> <ipython-input-452-9b903578cb56>(7)f()
-> print(x.index[-1], n, x.get(x.index[-1] - n)) # for debugging
(Pdb) x.get(24)
(Pdb) x.iloc[-5:]
23     60221064
24    232131096
25     46413584
26    133181464
27    229400712
Name: metric1, dtype: int64
(Pdb) 24 in x.index
False
(Pdb) 24 in x.index.values
True
Int64Index([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
            17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27],
           dtype='int64')
(Pdb) x.to_dict().get(24)
232131096

Problem description

I am not able to get a minimum repro of this as it is seems to be data dependent. Instead, I am capturing the bug by showing you my pdb debugging statements in hopes that someone that knows the code can figure out where the problem is.
Basically, I am doing a custom agg function which needs to grab an element from a Series object, and even though the value clearly exists in the index, it returns None. If I first convert to dict, then it does get the value.
I was not able to repro this by simply creating a new series and calling .get on it... that works just fine. And in fact, if I filter the dataframe for just that device, then it works just fine. It is definitely some sort of internal state issue which happens as a result of groupby having more records...

Expected Output

obviously I expect x.get(24) to return the correct value instead of None.

Output of pd.show_versions()

[paste the output of pd.show_versions() here leaving a blank line after the details tag]

INSTALLED VERSIONS

commit : None
python : 3.7.6.final.0
python-bits : 64
OS : Darwin
OS-release : 17.7.0
machine : x86_64
processor : i386
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8

pandas : 1.0.1
numpy : 1.18.1
pytz : 2019.3
dateutil : 2.8.1
pip : 20.0.2
setuptools : 41.2.0
Cython : None
pytest : 5.4.1
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 2.11.1
IPython : 7.13.0
pandas_datareader: None
bs4 : None
bottleneck : None
fastparquet : None
gcsfs : None
lxml.etree : None
matplotlib : 3.1.3
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : 0.16.0
pytables : None
pytest : 5.4.1
pyxlsb : None
s3fs : None
scipy : 1.4.1
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
xlwt : None
xlsxwriter : None
numba : 0.48.0

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions