Skip to content

Commit dd821de

Browse files
committed
Merge branch 'master' of https://github.com/pandas-dev/pandas into depr-internals
2 parents 3627238 + 6b3ba98 commit dd821de

37 files changed

+691
-320
lines changed

doc/source/whatsnew/v1.0.0.rst

Lines changed: 51 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -194,6 +194,47 @@ New repr for :class:`pandas.core.arrays.IntervalArray`
194194
195195
pd.arrays.IntervalArray.from_tuples([(0, 1), (2, 3)])
196196
197+
198+
All :class:`SeriesGroupBy` aggregation methods now respect the ``observed`` keyword
199+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
200+
The following methods now also correctly output values for unobserved categories when called through ``groupby(..., observed=False)`` (:issue:`17605`)
201+
202+
- :meth:`SeriesGroupBy.count`
203+
- :meth:`SeriesGroupBy.size`
204+
- :meth:`SeriesGroupBy.nunique`
205+
- :meth:`SeriesGroupBy.nth`
206+
207+
.. ipython:: python
208+
209+
df = pd.DataFrame({
210+
"cat_1": pd.Categorical(list("AABB"), categories=list("ABC")),
211+
"cat_2": pd.Categorical(list("AB") * 2, categories=list("ABC")),
212+
"value": [0.1] * 4,
213+
})
214+
df
215+
216+
217+
*pandas 0.25.x*
218+
219+
.. code-block:: ipython
220+
221+
In [2]: df.groupby(["cat_1", "cat_2"], observed=False)["value"].count()
222+
Out[2]:
223+
cat_1 cat_2
224+
A A 1
225+
B 1
226+
B A 1
227+
B 1
228+
Name: value, dtype: int64
229+
230+
231+
*pandas 1.0.0*
232+
233+
.. ipython:: python
234+
235+
df.groupby(["cat_1", "cat_2"], observed=False)["value"].count()
236+
237+
197238
.. _whatsnew_1000.api.other:
198239

199240
Other API changes
@@ -273,15 +314,21 @@ or ``matplotlib.Axes.plot``. See :ref:`plotting.formatters` for more.
273314
- Changed the the default value of `inplace` in :meth:`DataFrame.set_index` and :meth:`Series.set_axis`. It now defaults to False (:issue:`27600`)
274315
- :meth:`pandas.Series.str.cat` now defaults to aligning ``others``, using ``join='left'`` (:issue:`27611`)
275316
- :meth:`pandas.Series.str.cat` does not accept list-likes *within* list-likes anymore (:issue:`27611`)
317+
- :func:`core.internals.blocks.make_block` no longer accepts the "fastpath" keyword(:issue:`19265`)
318+
- :meth:`Block.make_block_same_class` no longer accepts the "dtype" keyword(:issue:`19434`)
276319
- Removed the previously deprecated :meth:`ExtensionArray._formatting_values`. Use :attr:`ExtensionArray._formatter` instead. (:issue:`23601`)
277320
- Removed the previously deprecated ``IntervalIndex.from_intervals`` in favor of the :class:`IntervalIndex` constructor (:issue:`19263`)
321+
- Changed the default value for the "keep_tz" argument in :meth:`DatetimeIndex.to_series` to ``True`` (:issue:`23739`)
278322
- Ability to read pickles containing :class:`Categorical` instances created with pre-0.16 version of pandas has been removed (:issue:`27538`)
279323
- Removed the previously deprecated ``reduce`` and ``broadcast`` arguments from :meth:`DataFrame.apply` (:issue:`18577`)
280324
- Removed the previously deprecated ``assert_raises_regex`` function in ``pandas.util.testing`` (:issue:`29174`)
281325
- Removed :meth:`Index.is_lexsorted_for_tuple` (:issue:`29305`)
282326
- Removed support for nexted renaming in :meth:`DataFrame.aggregate`, :meth:`Series.aggregate`, :meth:`DataFrameGroupBy.aggregate`, :meth:`SeriesGroupBy.aggregate`, :meth:`Rolling.aggregate` (:issue:`29608`)
283-
- :func:`core.internals.blocks.make_block` no longer accepts the "fastpath" keyword(:issue:`19265`)
284-
- :meth:`Block.make_block_same_class` no longer accepts the "dtype" keyword(:issue:`19434`)
327+
- Removed previously deprecated "order" argument from :func:`factorize` (:issue:`19751`)
328+
- Removed previously deprecated "v" argument from :meth:`FrozenNDarray.searchsorted`, use "value" instead (:issue:`22672`)
329+
- Removed previously deprecated "raise_conflict" argument from :meth:`DataFrame.update`, use "errors" instead (:issue:`23585`)
330+
- Removed previously deprecated keyword "n" from :meth:`DatetimeIndex.shift`, :meth:`TimedeltaIndex.shift`, :meth:`PeriodIndex.shift`, use "periods" instead (:issue:`22458`)
331+
-
285332

286333
.. _whatsnew_1000.performance:
287334

@@ -454,6 +501,7 @@ Groupby/resample/rolling
454501
- Bug in :meth:`DataFrameGroupby.agg` not able to use lambda function with named aggregation (:issue:`27519`)
455502
- Bug in :meth:`DataFrame.groupby` losing column name information when grouping by a categorical column (:issue:`28787`)
456503
- Bug in :meth:`DataFrameGroupBy.rolling().quantile()` ignoring ``interpolation`` keyword argument (:issue:`28779`)
504+
- Bug in :meth:`DataFrame.groupby` where ``any``, ``all``, ``nunique`` and transform functions would incorrectly handle duplicate column labels (:issue:`21668`)
457505

458506
Reshaping
459507
^^^^^^^^^
@@ -467,6 +515,7 @@ Reshaping
467515
- Better error message in :func:`get_dummies` when `columns` isn't a list-like value (:issue:`28383`)
468516
- Bug :meth:`Series.pct_change` where supplying an anchored frequency would throw a ValueError (:issue:`28664`)
469517
- Bug where :meth:`DataFrame.equals` returned True incorrectly in some cases when two DataFrames had the same columns in different orders (:issue:`28839`)
518+
- Bug in :meth:`DataFrame.replace` that caused non-numeric replacer's dtype not respected (:issue:`26632`)
470519

471520
Sparse
472521
^^^^^^

pandas/_libs/src/parser/io.c

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,6 @@ The full license is in the LICENSE file, distributed with this software.
99

1010
#include "io.h"
1111

12-
#include <sys/types.h>
1312
#include <sys/stat.h>
1413
#include <fcntl.h>
1514

pandas/_libs/src/parser/tokenizer.c

Lines changed: 13 additions & 38 deletions
Original file line numberDiff line numberDiff line change
@@ -25,38 +25,13 @@ GitHub. See Python Software Foundation License and BSD licenses for these.
2525

2626
#include "../headers/portable.h"
2727

28-
static void *safe_realloc(void *buffer, size_t size) {
29-
void *result;
30-
// OSX is weird.
31-
// http://stackoverflow.com/questions/9560609/
32-
// different-realloc-behaviour-in-linux-and-osx
33-
34-
result = realloc(buffer, size);
35-
TRACE(("safe_realloc: buffer = %p, size = %zu, result = %p\n", buffer, size,
36-
result))
37-
38-
return result;
39-
}
40-
4128
void coliter_setup(coliter_t *self, parser_t *parser, int i, int start) {
4229
// column i, starting at 0
4330
self->words = parser->words;
4431
self->col = i;
4532
self->line_start = parser->line_start + start;
4633
}
4734

48-
coliter_t *coliter_new(parser_t *self, int i) {
49-
// column i, starting at 0
50-
coliter_t *iter = (coliter_t *)malloc(sizeof(coliter_t));
51-
52-
if (NULL == iter) {
53-
return NULL;
54-
}
55-
56-
coliter_setup(iter, self, i, 0);
57-
return iter;
58-
}
59-
6035
static void free_if_not_null(void **ptr) {
6136
TRACE(("free_if_not_null %p\n", *ptr))
6237
if (*ptr != NULL) {
@@ -80,7 +55,7 @@ static void *grow_buffer(void *buffer, uint64_t length, uint64_t *capacity,
8055
while ((length + space >= cap) && (newbuffer != NULL)) {
8156
cap = cap ? cap << 1 : 2;
8257
buffer = newbuffer;
83-
newbuffer = safe_realloc(newbuffer, elsize * cap);
58+
newbuffer = realloc(newbuffer, elsize * cap);
8459
}
8560

8661
if (newbuffer == NULL) {
@@ -321,8 +296,8 @@ static int make_stream_space(parser_t *self, size_t nbytes) {
321296
("make_stream_space: cap != self->words_cap, nbytes = %d, "
322297
"self->words_cap=%d\n",
323298
nbytes, self->words_cap))
324-
newptr = safe_realloc((void *)self->word_starts,
325-
sizeof(int64_t) * self->words_cap);
299+
newptr = realloc((void *)self->word_starts,
300+
sizeof(int64_t) * self->words_cap);
326301
if (newptr == NULL) {
327302
return PARSER_OUT_OF_MEMORY;
328303
} else {
@@ -349,8 +324,8 @@ static int make_stream_space(parser_t *self, size_t nbytes) {
349324
if (cap != self->lines_cap) {
350325
TRACE(("make_stream_space: cap != self->lines_cap, nbytes = %d\n",
351326
nbytes))
352-
newptr = safe_realloc((void *)self->line_fields,
353-
sizeof(int64_t) * self->lines_cap);
327+
newptr = realloc((void *)self->line_fields,
328+
sizeof(int64_t) * self->lines_cap);
354329
if (newptr == NULL) {
355330
return PARSER_OUT_OF_MEMORY;
356331
} else {
@@ -427,7 +402,7 @@ static void append_warning(parser_t *self, const char *msg) {
427402
snprintf(self->warn_msg, length + 1, "%s", msg);
428403
} else {
429404
ex_length = strlen(self->warn_msg);
430-
newptr = safe_realloc(self->warn_msg, ex_length + length + 1);
405+
newptr = realloc(self->warn_msg, ex_length + length + 1);
431406
if (newptr != NULL) {
432407
self->warn_msg = (char *)newptr;
433408
snprintf(self->warn_msg + ex_length, length + 1, "%s", msg);
@@ -1290,13 +1265,13 @@ int parser_trim_buffers(parser_t *self) {
12901265
new_cap = _next_pow2(self->words_len) + 1;
12911266
if (new_cap < self->words_cap) {
12921267
TRACE(("parser_trim_buffers: new_cap < self->words_cap\n"));
1293-
newptr = safe_realloc((void *)self->words, new_cap * sizeof(char *));
1268+
newptr = realloc((void *)self->words, new_cap * sizeof(char *));
12941269
if (newptr == NULL) {
12951270
return PARSER_OUT_OF_MEMORY;
12961271
} else {
12971272
self->words = (char **)newptr;
12981273
}
1299-
newptr = safe_realloc((void *)self->word_starts,
1274+
newptr = realloc((void *)self->word_starts,
13001275
new_cap * sizeof(int64_t));
13011276
if (newptr == NULL) {
13021277
return PARSER_OUT_OF_MEMORY;
@@ -1315,13 +1290,13 @@ int parser_trim_buffers(parser_t *self) {
13151290
if (new_cap < self->stream_cap) {
13161291
TRACE(
13171292
("parser_trim_buffers: new_cap < self->stream_cap, calling "
1318-
"safe_realloc\n"));
1319-
newptr = safe_realloc((void *)self->stream, new_cap);
1293+
"realloc\n"));
1294+
newptr = realloc((void *)self->stream, new_cap);
13201295
if (newptr == NULL) {
13211296
return PARSER_OUT_OF_MEMORY;
13221297
} else {
13231298
// Update the pointers in the self->words array (char **) if
1324-
// `safe_realloc`
1299+
// `realloc`
13251300
// moved the `self->stream` buffer. This block mirrors a similar
13261301
// block in
13271302
// `make_stream_space`.
@@ -1342,14 +1317,14 @@ int parser_trim_buffers(parser_t *self) {
13421317
new_cap = _next_pow2(self->lines) + 1;
13431318
if (new_cap < self->lines_cap) {
13441319
TRACE(("parser_trim_buffers: new_cap < self->lines_cap\n"));
1345-
newptr = safe_realloc((void *)self->line_start,
1320+
newptr = realloc((void *)self->line_start,
13461321
new_cap * sizeof(int64_t));
13471322
if (newptr == NULL) {
13481323
return PARSER_OUT_OF_MEMORY;
13491324
} else {
13501325
self->line_start = (int64_t *)newptr;
13511326
}
1352-
newptr = safe_realloc((void *)self->line_fields,
1327+
newptr = realloc((void *)self->line_fields,
13531328
new_cap * sizeof(int64_t));
13541329
if (newptr == NULL) {
13551330
return PARSER_OUT_OF_MEMORY;

pandas/_libs/src/parser/tokenizer.h

Lines changed: 0 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,6 @@ See LICENSE for the license
1515
#define PY_SSIZE_T_CLEAN
1616
#include <Python.h>
1717

18-
#define ERROR_OK 0
1918
#define ERROR_NO_DIGITS 1
2019
#define ERROR_OVERFLOW 2
2120
#define ERROR_INVALID_CHARS 3
@@ -32,10 +31,6 @@ See LICENSE for the license
3231
#define CALLING_READ_FAILED 2
3332

3433

35-
#if defined(_MSC_VER)
36-
#define strtoll _strtoi64
37-
#endif // _MSC_VER
38-
3934
/*
4035
4136
C flat file parsing low level code for pandas / NumPy
@@ -180,7 +175,6 @@ typedef struct coliter_t {
180175
} coliter_t;
181176

182177
void coliter_setup(coliter_t *self, parser_t *parser, int i, int start);
183-
coliter_t *coliter_new(parser_t *self, int i);
184178

185179
#define COLITER_NEXT(iter, word) \
186180
do { \

pandas/core/algorithms.py

Lines changed: 3 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@
1010

1111
from pandas._libs import Timestamp, algos, hashtable as htable, lib
1212
from pandas._libs.tslib import iNaT
13-
from pandas.util._decorators import Appender, Substitution, deprecate_kwarg
13+
from pandas.util._decorators import Appender, Substitution
1414

1515
from pandas.core.dtypes.cast import (
1616
construct_1d_object_array_from_listlike,
@@ -494,7 +494,7 @@ def _factorize_array(
494494
495495
Parameters
496496
----------
497-
%(values)s%(sort)s%(order)s
497+
%(values)s%(sort)s
498498
na_sentinel : int, default -1
499499
Value to mark "not found".
500500
%(size_hint)s\
@@ -585,14 +585,6 @@ def _factorize_array(
585585
coerced to ndarrays before factorization.
586586
"""
587587
),
588-
order=dedent(
589-
"""\
590-
order : None
591-
.. deprecated:: 0.23.0
592-
593-
This parameter has no effect and is deprecated.
594-
"""
595-
),
596588
sort=dedent(
597589
"""\
598590
sort : bool, default False
@@ -608,13 +600,8 @@ def _factorize_array(
608600
),
609601
)
610602
@Appender(_shared_docs["factorize"])
611-
@deprecate_kwarg(old_arg_name="order", new_arg_name=None)
612603
def factorize(
613-
values,
614-
sort: bool = False,
615-
order=None,
616-
na_sentinel: int = -1,
617-
size_hint: Optional[int] = None,
604+
values, sort: bool = False, na_sentinel: int = -1, size_hint: Optional[int] = None,
618605
) -> Tuple[np.ndarray, Union[np.ndarray, ABCIndex]]:
619606
# Implementation notes: This method is responsible for 3 things
620607
# 1.) coercing data to array-like (ndarray, Index, extension array)

pandas/core/frame.py

Lines changed: 0 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -5528,11 +5528,6 @@ def combiner(x, y):
55285528

55295529
return self.combine(other, combiner, overwrite=False)
55305530

5531-
@deprecate_kwarg(
5532-
old_arg_name="raise_conflict",
5533-
new_arg_name="errors",
5534-
mapping={False: "ignore", True: "raise"},
5535-
)
55365531
def update(
55375532
self, other, join="left", overwrite=True, filter_func=None, errors="ignore"
55385533
):

pandas/core/generic.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -807,7 +807,8 @@ def droplevel(self, level, axis=0):
807807
808808
Returns
809809
-------
810-
DataFrame.droplevel()
810+
DataFrame
811+
DataFrame with requested index / column level(s) removed.
811812
812813
Examples
813814
--------

pandas/core/groupby/base.py

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,8 +3,12 @@
33
hold the whitelist of methods that are exposed on the
44
SeriesGroupBy and the DataFrameGroupBy objects.
55
"""
6+
import collections
7+
68
from pandas.core.dtypes.common import is_list_like, is_scalar
79

10+
OutputKey = collections.namedtuple("OutputKey", ["label", "position"])
11+
812

913
class GroupByMixin:
1014
"""

0 commit comments

Comments
 (0)