-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
ENH: Add dropna in groupby to allow NaN in keys #30584
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
7e461a1
1314059
8bcb313
13b03a8
98f6127
d5fd74c
eb717ec
de2ee5d
def05cc
2888807
b357659
dc4fef1
25482ec
015336d
ac2a79f
eb9a6f7
ffb70f8
b0e3cce
a1d5510
11ef56a
b247a8b
7cb027c
d730c4a
42c4934
2ba79b9
8b79b6c
a4fdf2d
4ac15e3
4ebbad3
f141b80
23ad19b
bafc4a5
c98bafe
86a5958
6cf31d7
2b77f37
451ec97
1089b18
63da563
1b3f22a
3f360a9
5cabe4b
76ffb9f
6c126c7
6d61d6a
3630e8b
1cec7f1
1a1bb49
7ea2e79
13b1e9a
92a7eed
1315a9d
a7959d5
9fec9a8
ffbae76
ef90d7c
e219748
2940908
4ea6aa0
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -199,6 +199,33 @@ For example, the groups created by ``groupby()`` below are in the order they app | |
df3.groupby(['X']).get_group('B') | ||
|
||
|
||
.. _groupby.dropna: | ||
|
||
.. versionadded:: 1.1.0 | ||
|
||
GroupBy dropna | ||
^^^^^^^^^^^^^^ | ||
|
||
By default ``NA`` values are excluded from group keys during the ``groupby`` operation. However, | ||
in case you want to include ``NA`` values in group keys, you could pass ``dropna=False`` to achieve it. | ||
|
||
.. ipython:: python | ||
|
||
df_list = [[1, 2, 3], [1, None, 4], [2, 1, 3], [1, 2, 2]] | ||
df_dropna = pd.DataFrame(df_list, columns=["a", "b", "c"]) | ||
jreback marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. can you show the df_list, then put the actual groupbys in another ipython block There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. i think you meant |
||
df_dropna | ||
|
||
.. ipython:: python | ||
charlesdong1991 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
# Default `dropna` is set to True, which will exclude NaNs in keys | ||
df_dropna.groupby(by=["b"], dropna=True).sum() | ||
|
||
# In order to allow NaN in keys, set `dropna` to False | ||
df_dropna.groupby(by=["b"], dropna=False).sum() | ||
|
||
The default setting of ``dropna`` argument is ``True`` which means ``NA`` are not included in group keys. | ||
|
||
|
||
.. _groupby.attributes: | ||
|
||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -36,6 +36,37 @@ For example: | |
ser["2014"] | ||
ser.loc["May 2015"] | ||
|
||
|
||
.. _whatsnew_110.groupby_key: | ||
|
||
Allow NA in groupby key | ||
^^^^^^^^^^^^^^^^^^^^^^^^ | ||
|
||
With :ref:`groupby <groupby.dropna>` , we've added a ``dropna`` keyword to :meth:`DataFrame.groupby` and :meth:`Series.groupby` in order to | ||
allow ``NA`` values in group keys. Users can define ``dropna`` to ``False`` if they want to include | ||
``NA`` values in groupby keys. The default is set to ``True`` for ``dropna`` to keep backwards | ||
compatibility (:issue:`3729`) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. add a ref to the new doc-section. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. emm, what does this mean? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. you add something like: There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. thanks for the hint! I added one following other examples, pls let me know if it is okay now |
||
|
||
.. ipython:: python | ||
|
||
df_list = [[1, 2, 3], [1, None, 4], [2, 1, 3], [1, 2, 2]] | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. use the same example as in the docs section (e.g. make the changes here as well) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. yeah, the example is the same as in There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. same change here as above |
||
df_dropna = pd.DataFrame(df_list, columns=["a", "b", "c"]) | ||
|
||
df_dropna | ||
|
||
.. ipython:: python | ||
|
||
# Default `dropna` is set to True, which will exclude NaNs in keys | ||
df_dropna.groupby(by=["b"], dropna=True).sum() | ||
|
||
# In order to allow NaN in keys, set `dropna` to False | ||
df_dropna.groupby(by=["b"], dropna=False).sum() | ||
|
||
The default setting of ``dropna`` argument is ``True`` which means ``NA`` are not included in group keys. | ||
|
||
.. versionadded:: 1.1.0 | ||
|
||
|
||
.. _whatsnew_110.key_sorting: | ||
|
||
Sorting with keys | ||
|
Uh oh!
There was an error while loading. Please reload this page.