Description
#!/usr/bin/env python
import sys
import numpy as np
import pandas as pd
from datetime import datetime
now = datetime.now()
later = datetime.now()
non_empty = pd.DataFrame(dict(date=[now, now, later, later], user_id=[2,3, 2,3], value_1=[4, 5, 6, 7], value_2=[6,7, 8,9]))
non_empty = non_empty.set_index(['date', 'user_id'])
unstacked = non_empty.unstack('user_id')
unstacked.columns = unstacked.columns.set_names(['values', 'user_id'])
print("'value_2' in unstacked.columns", 'value_2' in unstacked.columns)
empty = non_empty.iloc[100:]
empty_unstacked = empty.unstack('user_id')
print("'value_2' in empty_unstacked.columns", 'value_2' in empty_unstacked.columns)
Problem description
We found this issue because our code occasionally has to deal with empty data frames (because an update to a computation where a user hasn't produced new data yet). The problem is that the shape of the returned DataFrame
is different depending on wether the input DataFrame
contains data or not.
see:
("'value_2' in unstacked.columns", True)
("'value_2' in empty_unstacked.columns", False)
This requires us special case our code for this, while it could / should not be necessary to do so.
Expected Output
I would expect that the output columns (the shape of the DataFrame
) is the same regardless of wether the input DF is empty or not.
It should be:
("'value_2' in unstacked.columns", True)
("'value_2' in empty_unstacked.columns", True)
Output of pd.show_versions()
[paste the output of pd.show_versions()
here below this line]
INSTALLED VERSIONS
commit: None
python: 2.7.10.final.0
python-bits: 64
OS: Darwin
OS-release: 17.5.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: de_DE.utf-8
LOCALE: None.None
pandas: 0.23.0
pytest: 3.4.2
pip: 10.0.1
setuptools: 38.4.0
Cython: None
numpy: 1.14.2
scipy: 1.0.1
pyarrow: None
xarray: None
IPython: 5.5.0
sphinx: None
patsy: None
dateutil: 2.7.2
pytz: 2018.4
blosc: None
bottleneck: 1.2.1
tables: None
numexpr: 2.6.4
feather: None
matplotlib: 2.2.2
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: 4.6.0
html5lib: 1.0.1
sqlalchemy: 1.2.6
pymysql: 0.8.0
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None