Skip to content

Commit b0984b9

Browse files
Merge remote-tracking branch 'upstream/main' into fix-constructor-from-mgr
2 parents b1fd049 + 074ab2f commit b0984b9

40 files changed

+1891
-1774
lines changed

.circleci/config.yml

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -49,7 +49,12 @@ jobs:
4949
no_output_timeout: 30m # Sometimes the tests won't generate any output, make sure the job doesn't get killed by that
5050
command: |
5151
pip3 install cibuildwheel==2.15.0
52+
# When this is a nightly wheel build, allow picking up NumPy 2.0 dev wheels:
53+
if [[ "$IS_SCHEDULE_DISPATCH" == "true" || "$IS_PUSH" != 'true' ]]; then
54+
export CIBW_ENVIRONMENT="PIP_EXTRA_INDEX_URL=https://pypi.anaconda.org/scientific-python-nightly-wheels/simple"
55+
fi
5256
cibuildwheel --prerelease-pythons --output-dir wheelhouse
57+
5358
environment:
5459
CIBW_BUILD: << parameters.cibw-build >>
5560

.github/workflows/wheels.yml

Lines changed: 14 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -137,14 +137,27 @@ jobs:
137137
shell: bash -el {0}
138138
run: echo "sdist_name=$(cd ./dist && ls -d */)" >> "$GITHUB_ENV"
139139

140-
- name: Build wheels
140+
- name: Build normal wheels
141+
if: ${{ (env.IS_SCHEDULE_DISPATCH != 'true' || env.IS_PUSH == 'true') }}
141142
uses: pypa/cibuildwheel@v2.16.2
142143
with:
143144
package-dir: ./dist/${{ matrix.buildplat[1] == 'macosx_*' && env.sdist_name || needs.build_sdist.outputs.sdist_file }}
144145
env:
145146
CIBW_PRERELEASE_PYTHONS: True
146147
CIBW_BUILD: ${{ matrix.python[0] }}-${{ matrix.buildplat[1] }}
147148

149+
- name: Build nightly wheels (with NumPy pre-release)
150+
if: ${{ (env.IS_SCHEDULE_DISPATCH == 'true' && env.IS_PUSH != 'true') }}
151+
uses: pypa/cibuildwheel@v2.16.2
152+
with:
153+
package-dir: ./dist/${{ matrix.buildplat[1] == 'macosx_*' && env.sdist_name || needs.build_sdist.outputs.sdist_file }}
154+
env:
155+
# The nightly wheels should be build witht he NumPy 2.0 pre-releases
156+
# which requires the additional URL.
157+
CIBW_ENVIRONMENT: PIP_EXTRA_INDEX_URL=https://pypi.anaconda.org/scientific-python-nightly-wheels/simple
158+
CIBW_PRERELEASE_PYTHONS: True
159+
CIBW_BUILD: ${{ matrix.python[0] }}-${{ matrix.buildplat[1] }}
160+
148161
- name: Set up Python
149162
uses: mamba-org/setup-micromamba@v1
150163
with:

doc/source/whatsnew/v2.1.2.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,7 @@ Fixed regressions
1616
- Fixed regression in :meth:`DataFrame.join` where result has missing values and dtype is arrow backed string (:issue:`55348`)
1717
- Fixed regression in :meth:`DataFrame.resample` which was extrapolating back to ``origin`` when ``origin`` was outside its bounds (:issue:`55064`)
1818
- Fixed regression in :meth:`DataFrame.sort_index` which was not sorting correctly when the index was a sliced :class:`MultiIndex` (:issue:`55379`)
19+
- Fixed regression in :meth:`DataFrameGroupBy.agg` and :meth:`SeriesGroupBy.agg` where if the option ``compute.use_numba`` was set to True, groupby methods not supported by the numba engine would raise a ``TypeError`` (:issue:`55520`)
1920
- Fixed performance regression with wide DataFrames, typically involving methods where all columns were accessed individually (:issue:`55256`, :issue:`55245`)
2021
- Fixed regression in :func:`merge_asof` raising ``TypeError`` for ``by`` with datetime and timedelta dtypes (:issue:`55453`)
2122

doc/source/whatsnew/v2.2.0.rst

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -296,7 +296,7 @@ Other Deprecations
296296
Performance improvements
297297
~~~~~~~~~~~~~~~~~~~~~~~~
298298
- Performance improvement in :func:`concat` with ``axis=1`` and objects with unaligned indexes (:issue:`55084`)
299-
- Performance improvement in :func:`merge_asof` when ``by`` contains more than one key (:issue:`55580`)
299+
- Performance improvement in :func:`merge_asof` when ``by`` is not ``None`` (:issue:`55580`, :issue:`55678`)
300300
- Performance improvement in :func:`read_stata` for files with many variables (:issue:`55515`)
301301
- Performance improvement in :func:`to_dict` on converting DataFrame to dictionary (:issue:`50990`)
302302
- Performance improvement in :meth:`DataFrame.groupby` when aggregating pyarrow timestamp and duration dtypes (:issue:`55031`)
@@ -324,6 +324,7 @@ Datetimelike
324324
- Bug in :meth:`DatetimeIndex.union` returning object dtype for tz-aware indexes with the same timezone but different units (:issue:`55238`)
325325
- Bug in :meth:`Tick.delta` with very large ticks raising ``OverflowError`` instead of ``OutOfBoundsTimedelta`` (:issue:`55503`)
326326
- Bug in adding or subtracting a :class:`Week` offset to a ``datetime64`` :class:`Series`, :class:`Index`, or :class:`DataFrame` column with non-nanosecond resolution returning incorrect results (:issue:`55583`)
327+
- Bug in addition or subtraction of :class:`BusinessDay` offset with ``offset`` attribute to non-nanosecond :class:`Index`, :class:`Series`, or :class:`DataFrame` column giving incorrect results (:issue:`55608`)
327328
- Bug in addition or subtraction of :class:`DateOffset` objects with microsecond components to ``datetime64`` :class:`Index`, :class:`Series`, or :class:`DataFrame` columns with non-nanosecond resolution (:issue:`55595`)
328329
- Bug in addition or subtraction of very large :class:`Tick` objects with :class:`Timestamp` or :class:`Timedelta` objects raising ``OverflowError`` instead of ``OutOfBoundsTimedelta`` (:issue:`55503`)
329330
-
@@ -411,6 +412,7 @@ Groupby/resample/rolling
411412
Reshaping
412413
^^^^^^^^^
413414
- Bug in :func:`concat` ignoring ``sort`` parameter when passed :class:`DatetimeIndex` indexes (:issue:`54769`)
415+
- Bug in :func:`merge_asof` raising ``TypeError`` when ``by`` dtype is not ``object``, ``int64``, or ``uint64`` (:issue:`22794`)
414416
- Bug in :func:`merge` returning columns in incorrect order when left and/or right is empty (:issue:`51929`)
415417
- Bug in :meth:`pandas.DataFrame.melt` where it would not preserve the datetime (:issue:`55254`)
416418
-

pandas/_libs/join.pyi

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -53,26 +53,26 @@ def outer_join_indexer(
5353
def asof_join_backward_on_X_by_Y(
5454
left_values: np.ndarray, # ndarray[numeric_t]
5555
right_values: np.ndarray, # ndarray[numeric_t]
56-
left_by_values: np.ndarray, # ndarray[by_t]
57-
right_by_values: np.ndarray, # ndarray[by_t]
56+
left_by_values: np.ndarray, # const int64_t[:]
57+
right_by_values: np.ndarray, # const int64_t[:]
5858
allow_exact_matches: bool = ...,
5959
tolerance: np.number | float | None = ...,
6060
use_hashtable: bool = ...,
6161
) -> tuple[npt.NDArray[np.intp], npt.NDArray[np.intp]]: ...
6262
def asof_join_forward_on_X_by_Y(
6363
left_values: np.ndarray, # ndarray[numeric_t]
6464
right_values: np.ndarray, # ndarray[numeric_t]
65-
left_by_values: np.ndarray, # ndarray[by_t]
66-
right_by_values: np.ndarray, # ndarray[by_t]
65+
left_by_values: np.ndarray, # const int64_t[:]
66+
right_by_values: np.ndarray, # const int64_t[:]
6767
allow_exact_matches: bool = ...,
6868
tolerance: np.number | float | None = ...,
6969
use_hashtable: bool = ...,
7070
) -> tuple[npt.NDArray[np.intp], npt.NDArray[np.intp]]: ...
7171
def asof_join_nearest_on_X_by_Y(
7272
left_values: np.ndarray, # ndarray[numeric_t]
7373
right_values: np.ndarray, # ndarray[numeric_t]
74-
left_by_values: np.ndarray, # ndarray[by_t]
75-
right_by_values: np.ndarray, # ndarray[by_t]
74+
left_by_values: np.ndarray, # const int64_t[:]
75+
right_by_values: np.ndarray, # const int64_t[:]
7676
allow_exact_matches: bool = ...,
7777
tolerance: np.number | float | None = ...,
7878
use_hashtable: bool = ...,

pandas/_libs/join.pyx

Lines changed: 11 additions & 34 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,6 @@ from numpy cimport (
77
int64_t,
88
intp_t,
99
ndarray,
10-
uint64_t,
1110
)
1211

1312
cnp.import_array()
@@ -679,23 +678,13 @@ def outer_join_indexer(ndarray[numeric_object_t] left, ndarray[numeric_object_t]
679678
# asof_join_by
680679
# ----------------------------------------------------------------------
681680

682-
from pandas._libs.hashtable cimport (
683-
HashTable,
684-
Int64HashTable,
685-
PyObjectHashTable,
686-
UInt64HashTable,
687-
)
688-
689-
ctypedef fused by_t:
690-
object
691-
int64_t
692-
uint64_t
681+
from pandas._libs.hashtable cimport Int64HashTable
693682

694683

695684
def asof_join_backward_on_X_by_Y(ndarray[numeric_t] left_values,
696685
ndarray[numeric_t] right_values,
697-
ndarray[by_t] left_by_values,
698-
ndarray[by_t] right_by_values,
686+
const int64_t[:] left_by_values,
687+
const int64_t[:] right_by_values,
699688
bint allow_exact_matches=True,
700689
tolerance=None,
701690
bint use_hashtable=True):
@@ -706,8 +695,7 @@ def asof_join_backward_on_X_by_Y(ndarray[numeric_t] left_values,
706695
bint has_tolerance = False
707696
numeric_t tolerance_ = 0
708697
numeric_t diff = 0
709-
HashTable hash_table
710-
by_t by_value
698+
Int64HashTable hash_table
711699

712700
# if we are using tolerance, set our objects
713701
if tolerance is not None:
@@ -721,12 +709,7 @@ def asof_join_backward_on_X_by_Y(ndarray[numeric_t] left_values,
721709
right_indexer = np.empty(left_size, dtype=np.intp)
722710

723711
if use_hashtable:
724-
if by_t is object:
725-
hash_table = PyObjectHashTable(right_size)
726-
elif by_t is int64_t:
727-
hash_table = Int64HashTable(right_size)
728-
elif by_t is uint64_t:
729-
hash_table = UInt64HashTable(right_size)
712+
hash_table = Int64HashTable(right_size)
730713

731714
right_pos = 0
732715
for left_pos in range(left_size):
@@ -771,8 +754,8 @@ def asof_join_backward_on_X_by_Y(ndarray[numeric_t] left_values,
771754

772755
def asof_join_forward_on_X_by_Y(ndarray[numeric_t] left_values,
773756
ndarray[numeric_t] right_values,
774-
ndarray[by_t] left_by_values,
775-
ndarray[by_t] right_by_values,
757+
const int64_t[:] left_by_values,
758+
const int64_t[:] right_by_values,
776759
bint allow_exact_matches=1,
777760
tolerance=None,
778761
bint use_hashtable=True):
@@ -783,8 +766,7 @@ def asof_join_forward_on_X_by_Y(ndarray[numeric_t] left_values,
783766
bint has_tolerance = False
784767
numeric_t tolerance_ = 0
785768
numeric_t diff = 0
786-
HashTable hash_table
787-
by_t by_value
769+
Int64HashTable hash_table
788770

789771
# if we are using tolerance, set our objects
790772
if tolerance is not None:
@@ -798,12 +780,7 @@ def asof_join_forward_on_X_by_Y(ndarray[numeric_t] left_values,
798780
right_indexer = np.empty(left_size, dtype=np.intp)
799781

800782
if use_hashtable:
801-
if by_t is object:
802-
hash_table = PyObjectHashTable(right_size)
803-
elif by_t is int64_t:
804-
hash_table = Int64HashTable(right_size)
805-
elif by_t is uint64_t:
806-
hash_table = UInt64HashTable(right_size)
783+
hash_table = Int64HashTable(right_size)
807784

808785
right_pos = right_size - 1
809786
for left_pos in range(left_size - 1, -1, -1):
@@ -849,8 +826,8 @@ def asof_join_forward_on_X_by_Y(ndarray[numeric_t] left_values,
849826

850827
def asof_join_nearest_on_X_by_Y(ndarray[numeric_t] left_values,
851828
ndarray[numeric_t] right_values,
852-
ndarray[by_t] left_by_values,
853-
ndarray[by_t] right_by_values,
829+
const int64_t[:] left_by_values,
830+
const int64_t[:] right_by_values,
854831
bint allow_exact_matches=True,
855832
tolerance=None,
856833
bint use_hashtable=True):

pandas/_libs/tslibs/offsets.pyx

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1850,7 +1850,6 @@ cdef class BusinessDay(BusinessMixin):
18501850
res = self._shift_bdays(i8other, reso=reso)
18511851
if self.offset:
18521852
res = res.view(dtarr.dtype) + Timedelta(self.offset)
1853-
res = res.view("i8")
18541853
return res
18551854

18561855
def is_on_offset(self, dt: datetime) -> bool:

pandas/core/groupby/generic.py

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -236,10 +236,13 @@ def aggregate(self, func=None, *args, engine=None, engine_kwargs=None, **kwargs)
236236
kwargs = {}
237237

238238
if isinstance(func, str):
239-
if maybe_use_numba(engine):
239+
if maybe_use_numba(engine) and engine is not None:
240240
# Not all agg functions support numba, only propagate numba kwargs
241-
# if user asks for numba
241+
# if user asks for numba, and engine is not None
242+
# (if engine is None, the called function will handle the case where
243+
# numba is requested via the global option)
242244
kwargs["engine"] = engine
245+
if engine_kwargs is not None:
243246
kwargs["engine_kwargs"] = engine_kwargs
244247
return getattr(self, func)(*args, **kwargs)
245248

pandas/core/reshape/merge.py

Lines changed: 24 additions & 41 deletions
Original file line numberDiff line numberDiff line change
@@ -2153,54 +2153,37 @@ def _get_join_indexers(self) -> tuple[npt.NDArray[np.intp], npt.NDArray[np.intp]
21532153
if self.left_by is not None:
21542154
# remove 'on' parameter from values if one existed
21552155
if self.left_index and self.right_index:
2156-
left_by_values = self.left_join_keys
2157-
right_by_values = self.right_join_keys
2156+
left_join_keys = self.left_join_keys
2157+
right_join_keys = self.right_join_keys
21582158
else:
2159-
left_by_values = self.left_join_keys[0:-1]
2160-
right_by_values = self.right_join_keys[0:-1]
2161-
2162-
# get tuple representation of values if more than one
2163-
if len(left_by_values) == 1:
2164-
lbv = left_by_values[0]
2165-
rbv = right_by_values[0]
2166-
2167-
# TODO: conversions for EAs that can be no-copy.
2168-
lbv = np.asarray(lbv)
2169-
rbv = np.asarray(rbv)
2170-
if needs_i8_conversion(lbv.dtype):
2171-
lbv = lbv.view("i8")
2172-
if needs_i8_conversion(rbv.dtype):
2173-
rbv = rbv.view("i8")
2159+
left_join_keys = self.left_join_keys[0:-1]
2160+
right_join_keys = self.right_join_keys[0:-1]
2161+
2162+
mapped = [
2163+
_factorize_keys(
2164+
left_join_keys[n],
2165+
right_join_keys[n],
2166+
sort=False,
2167+
how="left",
2168+
)
2169+
for n in range(len(left_join_keys))
2170+
]
2171+
2172+
if len(left_join_keys) == 1:
2173+
left_by_values = mapped[0][0]
2174+
right_by_values = mapped[0][1]
21742175
else:
2175-
# We get here with non-ndarrays in test_merge_by_col_tz_aware
2176-
# and test_merge_groupby_multiple_column_with_categorical_column
2177-
mapped = [
2178-
_factorize_keys(
2179-
left_by_values[n],
2180-
right_by_values[n],
2181-
sort=False,
2182-
how="left",
2183-
)
2184-
for n in range(len(left_by_values))
2185-
]
21862176
arrs = [np.concatenate(m[:2]) for m in mapped]
21872177
shape = tuple(m[2] for m in mapped)
21882178
group_index = get_group_index(
21892179
arrs, shape=shape, sort=False, xnull=False
21902180
)
2191-
left_len = len(left_by_values[0])
2192-
lbv = group_index[:left_len]
2193-
rbv = group_index[left_len:]
2194-
# error: Incompatible types in assignment (expression has type
2195-
# "Union[ndarray[Any, dtype[Any]], ndarray[Any, dtype[object_]]]",
2196-
# variable has type "List[Union[Union[ExtensionArray,
2197-
# ndarray[Any, Any]], Index, Series]]")
2198-
right_by_values = rbv # type: ignore[assignment]
2199-
# error: Incompatible types in assignment (expression has type
2200-
# "Union[ndarray[Any, dtype[Any]], ndarray[Any, dtype[object_]]]",
2201-
# variable has type "List[Union[Union[ExtensionArray,
2202-
# ndarray[Any, Any]], Index, Series]]")
2203-
left_by_values = lbv # type: ignore[assignment]
2181+
left_len = len(left_join_keys[0])
2182+
left_by_values = group_index[:left_len]
2183+
right_by_values = group_index[left_len:]
2184+
2185+
left_by_values = ensure_int64(left_by_values)
2186+
right_by_values = ensure_int64(right_by_values)
22042187

22052188
# choose appropriate function by type
22062189
func = _asof_by_function(self.direction)

pandas/tests/groupby/test_numba.py

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,7 @@
33
from pandas import (
44
DataFrame,
55
Series,
6+
option_context,
67
)
78
import pandas._testing as tm
89

@@ -66,3 +67,14 @@ def test_axis_1_unsupported(self, numba_supported_reductions):
6667
gb = df.groupby("a", axis=1)
6768
with pytest.raises(NotImplementedError, match="axis=1"):
6869
getattr(gb, func)(engine="numba", **kwargs)
70+
71+
def test_no_engine_doesnt_raise(self):
72+
# GH55520
73+
df = DataFrame({"a": [3, 2, 3, 2], "b": range(4), "c": range(1, 5)})
74+
gb = df.groupby("a")
75+
# Make sure behavior of functions w/out engine argument don't raise
76+
# when the global use_numba option is set
77+
with option_context("compute.use_numba", True):
78+
res = gb.agg({"b": "first"})
79+
expected = gb.agg({"b": "first"})
80+
tm.assert_frame_equal(res, expected)

0 commit comments

Comments
 (0)