-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: pivot/groupby index with nan #3729
Comments
@wesm so this brings up the issue of groupby with nan in the index. I suppose could include the nan group via a groupby option? |
adding Behaviour currently in docs as "this is how R works", but doesn't really say why... |
@jreback - I apologize, I'm a bit new to github, but is your last action indicating that this (adding dropna=True) should've been implemented in 0.16.0? If so, should I file another bug as I'm experiencing similar behavior in 0.16.2? |
Github's a bit unclear, but implementing this was moved from 0.16.0 to next major release (0.17). |
@hayd thanks for the clarification |
+1 for adding |
@hayd I am using 0.17, but it seems there is still no dropna= True option for groupby |
this is still an open issue |
+1 for optional dropna in groupby. The automatic drops can lead to strange behavior: In [1]:
import pandas as pd
import numpy as np
In [2]:
df = pd.DataFrame({'foo': [1, 2, 3], 'bar': [4, 5, np.nan], 'qux': [None, None, None]})
In [3]:
df
Out[3]:
bar foo qux
0 4 1 None
1 5 2 None
2 NaN 3 None
In [4]:
grouped = df.groupby(by=['foo', 'bar'])
In [5]:
keys = grouped.groups.keys()
keys
Out[5]:
[(3, nan), (2, 5.0), (1, 4.0)]
In [6]:
grouped.get_group(keys[0])
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
<ipython-input-6-8c0d3ec7c89a> in <module>()
----> 1 grouped.get_group(keys[0])
.../env/local/lib/python2.7/site-packages/pandas/core/groupby.py in get_group(self, name, obj)
646 inds = self._get_index(name)
647 if not len(inds):
--> 648 raise KeyError(name)
649
650 return obj.take(inds, axis=self.axis, convert=False)
KeyError: (3, nan) |
+1 for the dropna option. Just got bitten by this again... |
+1 for the dropna option as well, got bitten by it with some |
I'm working on a solution. Problem: there is already a
I'll gladly accept any input. The best solution would probably to just call that option with another name, both for |
In #29716, @sorenwolfers noted that this affects import pandas as pd
import numpy as np
df = pd.DataFrame({'ind':[np.nan,np.nan,'a','a'],'col':[0,1,2,3]}).set_index('ind')
df.sum(level='ind') |
Any update? |
1 similar comment
Any update? |
@franz101 happy to have a PR for this |
take |
seems really easy to get caught by this one |
Got bitten by this. Again. Can't believe this has been open since 2013. Consistency with R is not a reason to do something! |
Could you guys pls consider adding a warning for the users at least? Btw, I'm using |
People are aware that it's an issue. It's not a difficult issue to solve, I've done it before and it worked fine, the main impediment there is just getting the option through all the layers of indirection down to the actual work code. The reason it hasn't been solved by me is all the testing and work that goes around it, which I was unable to complete. If someone is familiar with the testing apparatus I'd be glad to talk them through my old PR (which would need to be updated as it is quite old). On my part, I've rolled off the project using pandas a long time ago and am primarily using R now (which handles this just fine) so I don't have a strong reason to spend the days required to learn how testing works in pandas. |
if you look at #30584 this issue is completed for groupby, wasn't hard actually. will be in 1.1 (soon). but just like anything else in pandas, its all volunteer, if you want an issue fixed then pushing a PR is the best way. |
People are not aware of this issue. An interim solution would be groupby() to throw an error for an unsupported case with a flag to allow the broken exclude-nan "feature". |
Sorry, contributers are aware it's an issue. You're welcome to become one and submit a PR. |
There are some problem when groupby category column
|
As a business analyst I was trained to apply two QC rules:
This was basic Business Analyst /data scientist QC and should happens on EVERY merge/filter step in a in a project. Because of the need to do this frequently, I suggest EVERY summation should default to preserving the count and the total. An option to remove NaNs in an index is appropriate but not as the default. |
why is this issue closed? the problem persist on pandas 1.4.0. |
On 1.5.2 (and possibly lower) you can pass dropna=False, and it will work with both regular indices and multi-indices. |
Would a PR to add a warning when dropping nan values be accepted? This issue just bit me, and it would've saved me some trouble if I had gotten that warning. |
ENH: maybe for now just provide a warning if dropping the nan rows when pivotting...
rom ml
http://stackoverflow.com/questions/16860172/python-pandas-pivot-table-silently-drops-indices-with-nans
This is effectivly trying to groupby on a NaN, currently not allowed
Workaround to fill the index with a dummy, pivot, and replace
The text was updated successfully, but these errors were encountered: