Skip to content

Commit d51a3d7

Browse files
committed
Merge remote-tracking branch 'upstream/master' into hiding_data_columns_and_index
2 parents 7d55fc4 + e18415e commit d51a3d7

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

58 files changed

+714
-551
lines changed

.circleci/config.yml

Lines changed: 0 additions & 18 deletions
This file was deleted.

asv_bench/benchmarks/frame_methods.py

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -652,7 +652,9 @@ class Rank:
652652
]
653653

654654
def setup(self, dtype):
655-
self.df = DataFrame(np.random.randn(10000, 10), columns=range(10), dtype=dtype)
655+
self.df = DataFrame(
656+
np.random.randn(10000, 10).astype(dtype), columns=range(10), dtype=dtype
657+
)
656658

657659
def time_rank(self, dtype):
658660
self.df.rank()
File renamed without changes.

ci/setup_env.sh

Lines changed: 12 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -12,41 +12,30 @@ if [[ "$(uname)" == "Linux" && -n "$LC_ALL" ]]; then
1212
echo
1313
fi
1414

15-
MINICONDA_DIR="$HOME/miniconda3"
16-
17-
18-
if [ -d "$MINICONDA_DIR" ]; then
19-
echo
20-
echo "rm -rf "$MINICONDA_DIR""
21-
rm -rf "$MINICONDA_DIR"
22-
fi
2315

2416
echo "Install Miniconda"
25-
UNAME_OS=$(uname)
26-
if [[ "$UNAME_OS" == 'Linux' ]]; then
17+
DEFAULT_CONDA_URL="https://repo.continuum.io/miniconda/Miniconda3-latest"
18+
if [[ "$(uname -m)" == 'aarch64' ]]; then
19+
CONDA_URL="https://github.com/conda-forge/miniforge/releases/download/4.10.1-4/Miniforge3-4.10.1-4-Linux-aarch64.sh"
20+
elif [[ "$(uname)" == 'Linux' ]]; then
2721
if [[ "$BITS32" == "yes" ]]; then
28-
CONDA_OS="Linux-x86"
22+
CONDA_URL="$DEFAULT_CONDA_URL-Linux-x86.sh"
2923
else
30-
CONDA_OS="Linux-x86_64"
24+
CONDA_URL="$DEFAULT_CONDA_URL-Linux-x86_64.sh"
3125
fi
32-
elif [[ "$UNAME_OS" == 'Darwin' ]]; then
33-
CONDA_OS="MacOSX-x86_64"
26+
elif [[ "$(uname)" == 'Darwin' ]]; then
27+
CONDA_URL="$DEFAULT_CONDA_URL-MacOSX-x86_64.sh"
3428
else
35-
echo "OS $UNAME_OS not supported"
29+
echo "OS $(uname) not supported"
3630
exit 1
3731
fi
38-
39-
if [ "${TRAVIS_CPU_ARCH}" == "arm64" ]; then
40-
CONDA_URL="https://github.com/conda-forge/miniforge/releases/download/4.8.5-1/Miniforge3-4.8.5-1-Linux-aarch64.sh"
41-
else
42-
CONDA_URL="https://repo.continuum.io/miniconda/Miniconda3-latest-$CONDA_OS.sh"
43-
fi
32+
echo "Downloading $CONDA_URL"
4433
wget -q $CONDA_URL -O miniconda.sh
4534
chmod +x miniconda.sh
4635

47-
# Installation path is required for ARM64 platform as miniforge script installs in path $HOME/miniforge3.
36+
MINICONDA_DIR="$HOME/miniconda3"
37+
rm -rf $MINICONDA_DIR
4838
./miniconda.sh -b -p $MINICONDA_DIR
49-
5039
export PATH=$MINICONDA_DIR/bin:$PATH
5140

5241
echo

doc/source/whatsnew/v1.3.0.rst

Lines changed: 9 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -443,7 +443,7 @@ In the new behavior, we get a new array, and retain an integer-dtyped ``5``:
443443
Consistent Casting With Setting Into Boolean Series
444444
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
445445

446-
Setting non-boolean values into a :class:`Series with ``dtype=bool`` consistently
446+
Setting non-boolean values into a :class:`Series` with ``dtype=bool`` consistently
447447
cast to ``dtype=object`` (:issue:`38709`)
448448

449449
.. ipython:: python
@@ -643,6 +643,7 @@ Other API changes
643643
- Partially initialized :class:`CategoricalDtype` (i.e. those with ``categories=None`` objects will no longer compare as equal to fully initialized dtype objects.
644644
- Accessing ``_constructor_expanddim`` on a :class:`DataFrame` and ``_constructor_sliced`` on a :class:`Series` now raise an ``AttributeError``. Previously a ``NotImplementedError`` was raised (:issue:`38782`)
645645
- Added new ``engine`` and ``**engine_kwargs`` parameters to :meth:`DataFrame.to_sql` to support other future "SQL engines". Currently we still only use ``SQLAlchemy`` under the hood, but more engines are planned to be supported such as ``turbodbc`` (:issue:`36893`)
646+
- Removed redundant ``freq`` from :class:`PeriodIndex` string representation (:issue:`41653`)
646647

647648
Build
648649
=====
@@ -694,11 +695,12 @@ Deprecations
694695
- Deprecated passing arguments as positional in :meth:`DataFrame.set_index` (other than ``"keys"``) (:issue:`41485`)
695696
- Deprecated passing arguments as positional (except for ``"levels"``) in :meth:`MultiIndex.set_levels` (:issue:`41485`)
696697
- Deprecated passing arguments as positional in :meth:`DataFrame.sort_index` and :meth:`Series.sort_index` (:issue:`41485`)
697-
- Deprecated passing arguments as positional in :meth:`DataFrame.drop_duplicates` (except for ``subset``), :meth:`Series.drop_duplicates`, :meth:`Index.drop_duplicates` and :meth:`MultiIndex.drop_duplicates`(:issue:`41485`)
698+
- Deprecated passing arguments as positional in :meth:`DataFrame.drop_duplicates` (except for ``subset``), :meth:`Series.drop_duplicates`, :meth:`Index.drop_duplicates` and :meth:`MultiIndex.drop_duplicates` (:issue:`41485`)
698699
- Deprecated passing arguments (apart from ``value``) as positional in :meth:`DataFrame.fillna` and :meth:`Series.fillna` (:issue:`41485`)
699700
- Deprecated passing arguments as positional in :meth:`DataFrame.reset_index` (other than ``"level"``) and :meth:`Series.reset_index` (:issue:`41485`)
700701
- Deprecated construction of :class:`Series` or :class:`DataFrame` with ``DatetimeTZDtype`` data and ``datetime64[ns]`` dtype. Use ``Series(data).dt.tz_localize(None)`` instead (:issue:`41555`,:issue:`33401`)
701702
- Deprecated behavior of :class:`Series` construction with large-integer values and small-integer dtype silently overflowing; use ``Series(data).astype(dtype)`` instead (:issue:`41734`)
703+
- Deprecated behavior of :class:`DataFrame` construction with floating data and integer dtype casting even when lossy; in a future version this will remain floating, matching :class:`Series` behavior (:issue:`41770`)
702704
- Deprecated inference of ``timedelta64[ns]``, ``datetime64[ns]``, or ``DatetimeTZDtype`` dtypes in :class:`Series` construction when data containing strings is passed and no ``dtype`` is passed (:issue:`33558`)
703705
- In a future version, constructing :class:`Series` or :class:`DataFrame` with ``datetime64[ns]`` data and ``DatetimeTZDtype`` will treat the data as wall-times instead of as UTC times (matching DatetimeIndex behavior). To treat the data as UTC times, use ``pd.Series(data).dt.tz_localize("UTC").dt.tz_convert(dtype.tz)`` or ``pd.Series(data.view("int64"), dtype=dtype)`` (:issue:`33401`)
704706
- Deprecated passing arguments as positional in :meth:`DataFrame.set_axis` and :meth:`Series.set_axis` (other than ``"labels"``) (:issue:`41485`)
@@ -924,7 +926,7 @@ Indexing
924926
- Bug in :meth:`Series.__setitem__` raising ``ValueError`` when setting a :class:`Series` with a scalar indexer (:issue:`38303`)
925927
- Bug in :meth:`DataFrame.loc` dropping levels of :class:`MultiIndex` when :class:`DataFrame` used as input has only one row (:issue:`10521`)
926928
- Bug in :meth:`DataFrame.__getitem__` and :meth:`Series.__getitem__` always raising ``KeyError`` when slicing with existing strings an :class:`Index` with milliseconds (:issue:`33589`)
927-
- Bug in setting ``timedelta64`` or ``datetime64`` values into numeric :class:`Series` failing to cast to object dtype (:issue:`39086`, issue:`39619`)
929+
- Bug in setting ``timedelta64`` or ``datetime64`` values into numeric :class:`Series` failing to cast to object dtype (:issue:`39086`, :issue:`39619`)
928930
- Bug in setting :class:`Interval` values into a :class:`Series` or :class:`DataFrame` with mismatched :class:`IntervalDtype` incorrectly casting the new values to the existing dtype (:issue:`39120`)
929931
- Bug in setting ``datetime64`` values into a :class:`Series` with integer-dtype incorrect casting the datetime64 values to integers (:issue:`39266`)
930932
- Bug in setting ``np.datetime64("NaT")`` into a :class:`Series` with :class:`Datetime64TZDtype` incorrectly treating the timezone-naive value as timezone-aware (:issue:`39769`)
@@ -944,6 +946,7 @@ Indexing
944946
- Bug in :meth:`DataFrame.loc` returning :class:`MultiIndex` in wrong order if indexer has duplicates (:issue:`40978`)
945947
- Bug in :meth:`DataFrame.__setitem__` raising ``TypeError`` when using a str subclass as the column name with a :class:`DatetimeIndex` (:issue:`37366`)
946948
- Bug in :meth:`PeriodIndex.get_loc` failing to raise ``KeyError`` when given a :class:`Period` with a mismatched ``freq`` (:issue:`41670`)
949+
- Bug ``.loc.__getitem__`` with a :class:`UInt64Index` and negative-integer keys raising ``OverflowError`` instead of ``KeyError`` in some cases, wrapping around to positive integers in others (:issue:`41777`)
947950

948951
Missing
949952
^^^^^^^
@@ -1045,7 +1048,7 @@ Groupby/resample/rolling
10451048
- Bug in :class:`core.window.ewm.ExponentialMovingWindowGroupby` where the times vector and values became out of sync for non-trivial groups (:issue:`40951`)
10461049
- Bug in :meth:`Series.asfreq` and :meth:`DataFrame.asfreq` dropping rows when the index is not sorted (:issue:`39805`)
10471050
- Bug in aggregation functions for :class:`DataFrame` not respecting ``numeric_only`` argument when ``level`` keyword was given (:issue:`40660`)
1048-
- Bug in :meth:`SeriesGroupBy.aggregate` where using a user-defined function to aggregate a ``Series`` with an object-typed :class:`Index` causes an incorrect :class:`Index` shape (issue:`40014`)
1051+
- Bug in :meth:`SeriesGroupBy.aggregate` where using a user-defined function to aggregate a ``Series`` with an object-typed :class:`Index` causes an incorrect :class:`Index` shape (:issue:`40014`)
10491052
- Bug in :class:`core.window.RollingGroupby` where ``as_index=False`` argument in ``groupby`` was ignored (:issue:`39433`)
10501053
- Bug in :meth:`.GroupBy.any` and :meth:`.GroupBy.all` raising ``ValueError`` when using with nullable type columns holding ``NA`` even with ``skipna=True`` (:issue:`40585`)
10511054
- Bug in :meth:`GroupBy.cummin` and :meth:`GroupBy.cummax` incorrectly rounding integer values near the ``int64`` implementations bounds (:issue:`40767`)
@@ -1054,12 +1057,13 @@ Groupby/resample/rolling
10541057
- Bug in :meth:`DataFrame.rolling` returning mean zero for all ``NaN`` window with ``min_periods=0`` if calculation is not numerical stable (:issue:`41053`)
10551058
- Bug in :meth:`DataFrame.rolling` returning sum not zero for all ``NaN`` window with ``min_periods=0`` if calculation is not numerical stable (:issue:`41053`)
10561059
- Bug in :meth:`SeriesGroupBy.agg` failing to retain ordered :class:`CategoricalDtype` on order-preserving aggregations (:issue:`41147`)
1057-
- Bug in :meth:`DataFrameGroupBy.min` and :meth:`DataFrameGroupBy.max` with multiple object-dtype columns and ``numeric_only=False`` incorrectly raising ``ValueError`` (:issue:41111`)
1060+
- Bug in :meth:`DataFrameGroupBy.min` and :meth:`DataFrameGroupBy.max` with multiple object-dtype columns and ``numeric_only=False`` incorrectly raising ``ValueError`` (:issue:`41111`)
10581061
- Bug in :meth:`DataFrameGroupBy.rank` with the GroupBy object's ``axis=0`` and the ``rank`` method's keyword ``axis=1`` (:issue:`41320`)
10591062
- Bug in :meth:`DataFrameGroupBy.__getitem__` with non-unique columns incorrectly returning a malformed :class:`SeriesGroupBy` instead of :class:`DataFrameGroupBy` (:issue:`41427`)
10601063
- Bug in :meth:`DataFrameGroupBy.transform` with non-unique columns incorrectly raising ``AttributeError`` (:issue:`41427`)
10611064
- Bug in :meth:`Resampler.apply` with non-unique columns incorrectly dropping duplicated columns (:issue:`41445`)
10621065
- Bug in :meth:`SeriesGroupBy` aggregations incorrectly returning empty :class:`Series` instead of raising ``TypeError`` on aggregations that are invalid for its dtype, e.g. ``.prod`` with ``datetime64[ns]`` dtype (:issue:`41342`)
1066+
- Bug in :class:`DataFrameGroupBy` aggregations incorrectly failing to drop columns with invalid dtypes for that aggregation when there are no valid columns (:issue:`41291`)
10631067
- Bug in :meth:`DataFrame.rolling.__iter__` where ``on`` was not assigned to the index of the resulting objects (:issue:`40373`)
10641068
- Bug in :meth:`DataFrameGroupBy.transform` and :meth:`DataFrameGroupBy.agg` with ``engine="numba"`` where ``*args`` were being cached with the user passed function (:issue:`41647`)
10651069

pandas/_libs/groupby.pyx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -516,7 +516,7 @@ def group_add(add_t[:, ::1] out,
516516
val = values[i, j]
517517

518518
# not nan
519-
if val == val:
519+
if not checknull(val):
520520
nobs[lab, j] += 1
521521

522522
if nobs[lab, j] == 1:

pandas/_libs/index.pyx

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -106,7 +106,8 @@ cdef class IndexEngine:
106106

107107
try:
108108
return self.mapping.get_item(val)
109-
except (TypeError, ValueError):
109+
except (TypeError, ValueError, OverflowError):
110+
# GH#41775 OverflowError e.g. if we are uint64 and val is -1
110111
raise KeyError(val)
111112

112113
cdef inline _get_loc_duplicates(self, object val):

pandas/_libs/lib.pyi

Lines changed: 10 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,10 @@ from typing import (
1111

1212
import numpy as np
1313

14-
from pandas._typing import ArrayLike
14+
from pandas._typing import (
15+
ArrayLike,
16+
DtypeObj,
17+
)
1518

1619
# placeholder until we can specify np.ndarray[object, ndim=2]
1720
ndarray_obj_2d = np.ndarray
@@ -52,8 +55,6 @@ def is_float_array(values: np.ndarray, skipna: bool = False): ...
5255
def is_integer_array(values: np.ndarray, skipna: bool = False): ...
5356
def is_bool_array(values: np.ndarray, skipna: bool = False): ...
5457

55-
def fast_multiget(mapping: dict, keys: np.ndarray, default=np.nan) -> np.ndarray: ...
56-
5758
def fast_unique_multiple_list_gen(gen: Generator, sort: bool = True) -> list: ...
5859
def fast_unique_multiple_list(lists: list, sort: bool = True) -> list: ...
5960
def fast_unique_multiple(arrays: list, sort: bool = True) -> list: ...
@@ -73,6 +74,7 @@ def maybe_convert_objects(
7374
convert_timedelta: bool = ...,
7475
convert_period: Literal[False] = ...,
7576
convert_to_nullable_integer: Literal[False] = ...,
77+
dtype_if_all_nat: DtypeObj | None = ...,
7678
) -> np.ndarray: ...
7779

7880
@overload
@@ -85,6 +87,7 @@ def maybe_convert_objects(
8587
convert_timedelta: bool = ...,
8688
convert_period: bool = ...,
8789
convert_to_nullable_integer: Literal[True] = ...,
90+
dtype_if_all_nat: DtypeObj | None = ...,
8891
) -> ArrayLike: ...
8992

9093
@overload
@@ -97,6 +100,7 @@ def maybe_convert_objects(
97100
convert_timedelta: bool = ...,
98101
convert_period: bool = ...,
99102
convert_to_nullable_integer: bool = ...,
103+
dtype_if_all_nat: DtypeObj | None = ...,
100104
) -> ArrayLike: ...
101105

102106
@overload
@@ -109,6 +113,7 @@ def maybe_convert_objects(
109113
convert_timedelta: bool = ...,
110114
convert_period: Literal[True] = ...,
111115
convert_to_nullable_integer: bool = ...,
116+
dtype_if_all_nat: DtypeObj | None = ...,
112117
) -> ArrayLike: ...
113118

114119
@overload
@@ -121,6 +126,7 @@ def maybe_convert_objects(
121126
convert_timedelta: bool = ...,
122127
convert_period: bool = ...,
123128
convert_to_nullable_integer: bool = ...,
129+
dtype_if_all_nat: DtypeObj | None = ...,
124130
) -> ArrayLike: ...
125131

126132
@overload
@@ -184,11 +190,7 @@ def maybe_indices_to_slice(
184190
max_len: int,
185191
) -> slice | np.ndarray: ... # np.ndarray[np.uint8]
186192

187-
def clean_index_list(obj: list) -> tuple[
188-
list | np.ndarray, # np.ndarray[object] | np.ndarray[np.int64]
189-
bool,
190-
]: ...
191-
193+
def is_all_arraylike(obj: list) -> bool: ...
192194

193195
# -----------------------------------------------------------------
194196
# Functions which in reality take memoryviews

0 commit comments

Comments
 (0)