Skip to content

Commit 1824b99

Browse files
committed
Merge remote-tracking branch 'upstream/master' into Rt05
* upstream/master: BUG: Fix exceptions when Series.interpolate's `order` parameter is missing or invalid (#25246) API: Ensure DatetimeTZDtype standardizes pytz timezones (#25254) Split Excel IO Into Sub-Directory (#25153) PR04 errors fix (#25157) DEPR: remove assert_panel_equal (#25238) BUG: pandas Timestamp tz_localize and tz_convert do not preserve `freq` attribute (#25247) Revert "BLD: prevent asv from calling sys.stdin.close() by using different launch method (#25237)" (#25253) REF/TST: resample/test_base.py (#25262) BUG: Duplicated returns boolean dataframe (#25234) CLN: Remove ipython 2.x compat (#25150) Refactor groupby group_add from tempita to fused types (#24954) CLN: For loops, boolean conditions, misc. (#25206) (Closes #25029) Removed extra bracket from cheatsheet code example. (#25032) BLD: prevent asv from calling sys.stdin.close() by using different launch method (#25237) BUG: Fix read_json orient='table' without index (#25170) (#25171) BUG: Fix regression in DataFrame.apply causing RecursionError (#25230) BUG-25061 fix printing indices with NaNs (#25202) DEPR: Add Deprecated warning for timedelta with passed units M and Y (#23264) DEPR: Remove Panel-specific parts of io.pytables (#25233) DEPR: remove tm.makePanel and all usages (#25231)
2 parents 25e7503 + ea1d5f5 commit 1824b99

File tree

91 files changed

+2536
-4887
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

91 files changed

+2536
-4887
lines changed

asv_bench/benchmarks/__init__.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
"""Pandas benchmarks."""

ci/code_checks.sh

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -241,8 +241,8 @@ fi
241241
### DOCSTRINGS ###
242242
if [[ -z "$CHECK" || "$CHECK" == "docstrings" ]]; then
243243

244-
MSG='Validate docstrings (GL06, GL07, GL09, SS04, SS05, PR03, PR05, EX04, RT04, RT05, SA05)' ; echo $MSG
245-
$BASE_DIR/scripts/validate_docstrings.py --format=azure --errors=GL06,GL07,GL09,SS04,SS05,PR03,PR05,EX04,RT04,RT05,SA05
244+
MSG='Validate docstrings (GL06, GL07, GL09, SS04, SS05, PR03, PR04, PR05, EX04, RT04, RT05, SA05)' ; echo $MSG
245+
$BASE_DIR/scripts/validate_docstrings.py --format=azure --errors=GL06,GL07,GL09,SS04,SS05,PR03,PR04,PR05,EX04,RT04,RT05,SA05
246246
RET=$(($RET + $?)) ; echo $MSG "DONE"
247247

248248
fi

doc/cheatsheet/Pandas_Cheat_Sheet.pdf

6.7 KB
Binary file not shown.
-261 Bytes
Binary file not shown.
210 KB
Binary file not shown.
5.73 KB
Binary file not shown.

doc/source/user_guide/timeseries.rst

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -321,6 +321,15 @@ which can be specified. These are computed from the starting point specified by
321321
pd.to_datetime([1349720105100, 1349720105200, 1349720105300,
322322
1349720105400, 1349720105500], unit='ms')
323323
324+
Constructing a :class:`Timestamp` or :class:`DatetimeIndex` with an epoch timestamp
325+
with the ``tz`` argument specified will localize the epoch timestamps to UTC
326+
first then convert the result to the specified time zone.
327+
328+
.. ipython:: python
329+
330+
pd.Timestamp(1262347200000000000, tz='US/Pacific')
331+
pd.DatetimeIndex([1262347200000000000], tz='US/Pacific')
332+
324333
.. note::
325334

326335
Epoch times will be rounded to the nearest nanosecond.
@@ -2205,6 +2214,21 @@ you can use the ``tz_convert`` method.
22052214
22062215
rng_pytz.tz_convert('US/Eastern')
22072216
2217+
.. note::
2218+
2219+
When using ``pytz`` time zones, :class:`DatetimeIndex` will construct a different
2220+
time zone object than a :class:`Timestamp` for the same time zone input. A :class:`DatetimeIndex`
2221+
can hold a collection of :class:`Timestamp` objects that may have different UTC offsets and cannot be
2222+
succinctly represented by one ``pytz`` time zone instance while one :class:`Timestamp`
2223+
represents one point in time with a specific UTC offset.
2224+
2225+
.. ipython:: python
2226+
2227+
dti = pd.date_range('2019-01-01', periods=3, freq='D', tz='US/Pacific')
2228+
dti.tz
2229+
ts = pd.Timestamp('2019-01-01', tz='US/Pacific')
2230+
ts.tz
2231+
22082232
.. warning::
22092233

22102234
Be wary of conversions between libraries. For some time zones, ``pytz`` and ``dateutil`` have different

doc/source/whatsnew/v0.24.2.rst

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -21,8 +21,10 @@ Fixed Regressions
2121
^^^^^^^^^^^^^^^^^
2222

2323
- Fixed regression in :meth:`DataFrame.all` and :meth:`DataFrame.any` where ``bool_only=True`` was ignored (:issue:`25101`)
24-
2524
- Fixed issue in ``DataFrame`` construction with passing a mixed list of mixed types could segfault. (:issue:`25075`)
25+
- Fixed regression in :meth:`DataFrame.apply` causing ``RecursionError`` when ``dict``-like classes were passed as argument. (:issue:`25196`)
26+
27+
- Fixed regression in :meth:`DataFrame.duplicated()`, where empty dataframe was not returning a boolean dtyped Series. (:issue:`25184`)
2628

2729
.. _whatsnew_0242.enhancements:
2830

@@ -52,7 +54,8 @@ Bug Fixes
5254
**I/O**
5355

5456
- Bug in reading a HDF5 table-format ``DataFrame`` created in Python 2, in Python 3 (:issue:`24925`)
55-
-
57+
- Bug in reading a JSON with ``orient='table'`` generated by :meth:`DataFrame.to_json` with ``index=False`` (:issue:`25170`)
58+
- Bug where float indexes could have misaligned values when printing (:issue:`25061`)
5659
-
5760

5861
**Categorical**

doc/source/whatsnew/v0.25.0.rst

Lines changed: 7 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -33,7 +33,7 @@ Backwards incompatible API changes
3333
Other API Changes
3434
^^^^^^^^^^^^^^^^^
3535

36-
-
36+
- :class:`DatetimeTZDtype` will now standardize pytz timezones to a common timezone instance (:issue:`24713`)
3737
-
3838
-
3939

@@ -42,16 +42,13 @@ Other API Changes
4242
Deprecations
4343
~~~~~~~~~~~~
4444

45-
-
46-
-
47-
-
48-
45+
- Deprecated the `M (months)` and `Y (year)` `units` parameter of :func: `pandas.to_timedelta`, :func: `pandas.Timedelta` and :func: `pandas.TimedeltaIndex` (:issue:`16344`)
4946

5047
.. _whatsnew_0250.prior_deprecations:
5148

5249
Removal of prior version deprecations/changes
5350
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
54-
- Removed (parts of) :class:`Panel` (:issue:`25047`)
51+
- Removed (parts of) :class:`Panel` (:issue:`25047`,:issue:`25191`,:issue:`25231`)
5552
-
5653
-
5754
-
@@ -71,6 +68,8 @@ Performance Improvements
7168
Bug Fixes
7269
~~~~~~~~~
7370

71+
-
72+
7473
Categorical
7574
^^^^^^^^^^^
7675

@@ -96,7 +95,7 @@ Timezones
9695
^^^^^^^^^
9796

9897
- Bug in :func:`to_datetime` with ``utc=True`` and datetime strings that would apply previously parsed UTC offsets to subsequent arguments (:issue:`24992`)
99-
-
98+
- Bug in :func:`Timestamp.tz_localize` and :func:`Timestamp.tz_convert` does not propagate ``freq`` (:issue:`25241`)
10099
-
101100

102101
Numeric
@@ -142,7 +141,7 @@ Indexing
142141
Missing
143142
^^^^^^^
144143

145-
-
144+
- Fixed misleading exception message in :meth:`Series.missing` if argument ``order`` is required, but omitted (:issue:`10633`, :issue:`24014`).
146145
-
147146
-
148147

pandas/_libs/groupby.pyx

Lines changed: 51 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,7 @@
22

33
import cython
44
from cython import Py_ssize_t
5+
from cython cimport floating
56

67
from libc.stdlib cimport malloc, free
78

@@ -382,5 +383,55 @@ def group_any_all(uint8_t[:] out,
382383
out[lab] = flag_val
383384

384385

386+
@cython.wraparound(False)
387+
@cython.boundscheck(False)
388+
def _group_add(floating[:, :] out,
389+
int64_t[:] counts,
390+
floating[:, :] values,
391+
const int64_t[:] labels,
392+
Py_ssize_t min_count=0):
393+
"""
394+
Only aggregates on axis=0
395+
"""
396+
cdef:
397+
Py_ssize_t i, j, N, K, lab, ncounts = len(counts)
398+
floating val, count
399+
ndarray[floating, ndim=2] sumx, nobs
400+
401+
if not len(values) == len(labels):
402+
raise AssertionError("len(index) != len(labels)")
403+
404+
nobs = np.zeros_like(out)
405+
sumx = np.zeros_like(out)
406+
407+
N, K = (<object>values).shape
408+
409+
with nogil:
410+
411+
for i in range(N):
412+
lab = labels[i]
413+
if lab < 0:
414+
continue
415+
416+
counts[lab] += 1
417+
for j in range(K):
418+
val = values[i, j]
419+
420+
# not nan
421+
if val == val:
422+
nobs[lab, j] += 1
423+
sumx[lab, j] += val
424+
425+
for i in range(ncounts):
426+
for j in range(K):
427+
if nobs[i, j] < min_count:
428+
out[i, j] = NAN
429+
else:
430+
out[i, j] = sumx[i, j]
431+
432+
433+
group_add_float32 = _group_add['float']
434+
group_add_float64 = _group_add['double']
435+
385436
# generated from template
386437
include "groupby_helper.pxi"

0 commit comments

Comments
 (0)