-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dict/dict keys in DataFrame.__getitem__ #21294
Comments
Note that in many place we see Series and dicts as equivalents (in many method parameters that expect a mapping, eg fillna, astype, ..), but in those cases they are interpreted the same (I mean passing Personally I don't really see a good reason to allow this (and for allowing this complexity). Do you have a practical use case? (if we want to fix the inconsistency, I would rather disable using dicts in other places in indexing) |
Well, yes, but...
... sure
I wouldn't have major objections if we were doing it from scratch. But:
I think it's way simpler to tell users that every iterable is a possible list-like, with the exception of strings (because they are scalar) and tuples (because they are |
I understand that the actual fix might be two lines of code in
you mean its "keys" .. :-)
To get more concrete, what would be examples of usage cases that get changed by taking on this rule? (as I completely agree consistent rules how things are interpreted are good). For example, |
:-)
Hard to answer, because they are not particularly related to pandas itself, but to the code surrounding pandas calls. In my case, I was working with a df of which I wanted to select columns and change their appearance. So I had a dict Again, I could certainly use
... and it accepts tuples. I do see this as a discrepancy, but a minor one, compared to consistency in the same type of indexer across different objects. |
I also think - but I will be happy to be proven wrong - that in Python libraries accepting any iterable wherever lists are expected is the norm. |
Just thinking out loud: this does not apply to e.g. (I don't necessarily mean this is a better solution - my main worry, of breaking existing code, stays mostly unchanged) |
Yep, that is true and a good point. Eg looking at the "built-in functions", they use consistent terminology of "iterable" in some of the functions, and they all accept a dict, eg:
|
I see the point of "allowing dicts actually creates less special cases". Which is a good one (only of course that we already make an exception on the rule for tuples in certain cases, as you mentioned) It's only that I would personally never want to see such usage in code I review or maintain. I think I would always ask to change to be more explicit on it :-) |
By the way: I think over time it should become "in all cases" (in which we declare to accept "list-likes", which might not necessarily be all cases in which we accept lists) Another thing: the obvious alternative to "accept anything which can be casted to a list and is not a tuple or string" is "accept only lists". But in principle there can be good reasons to pass generators to save RAM (e.g. indexing a
So would you say #21313 is OK? |
@jorisvandenbossche if you think we need more time to decide, I will change #21313 so that it temporarily keeps the actual behavior |
Given the discussion above, I am fine with going forward and handling dict here not as a special case but just as any other iterable. |
…das-dev#21313) * BUG: fix DataFrame.__getitem__ and .loc with non-list listlikes close pandas-dev#21294 close pandas-dev#21428
Code Sample, a copy-pastable example if possible
Problem description
I know that
DataFrame.__getitem__
is a mess ( #9595 ), but I don't see why dicts and dict keys shouldn't be just considered list-likes as it happens withSeries
.Expected Output
Output of
pd.show_versions()
INSTALLED VERSIONS
commit: None
python: 3.5.3.final.0
python-bits: 64
OS: Linux
OS-release: 4.9.0-6-amd64
machine: x86_64
processor:
byteorder: little
LC_ALL: None
LANG: it_IT.UTF-8
LOCALE: it_IT.UTF-8
pandas: 0.24.0.dev0+25.gcd0447102
pytest: 3.5.0
pip: 9.0.1
setuptools: 39.2.0
Cython: 0.25.2
numpy: 1.14.3
scipy: 0.19.0
pyarrow: None
xarray: None
IPython: 6.2.1
sphinx: 1.5.6
patsy: 0.5.0
dateutil: 2.7.3
pytz: 2018.4
blosc: None
bottleneck: 1.2.0dev
tables: 3.3.0
numexpr: 2.6.1
feather: 0.3.1
matplotlib: 2.2.2.post1153+gff6786446
openpyxl: 2.3.0
xlrd: 1.0.0
xlwt: 1.3.0
xlsxwriter: 0.9.6
lxml: 4.1.1
bs4: 4.5.3
html5lib: 0.999999999
sqlalchemy: 1.0.15
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: 0.2.1
The text was updated successfully, but these errors were encountered: