Description
Pandas version checks
-
I have checked that this issue has not already been reported.
-
I have confirmed this bug exists on the latest version of pandas.
-
I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
import pandas as pd
# FAILS
df1 = pd.DataFrame({
"a": pd.Series([1], dtype="int64"),
"b": pd.Series([None], dtype="object"),
})
# PASSES
df2 = pd.DataFrame({
"a": [1],
"b": [None],
}, dtype="object")
# PASSES
df3 = pd.DataFrame({
"a": pd.Series([1], dtype="object"),
"b": pd.Series([None], dtype="object"),
})
# PASSES
df4 = pd.DataFrame({
"a": pd.Series([1], dtype="int64"),
"b": pd.Series([None], dtype="object"),
"c": pd.Series(["bye"], dtype="str"),
})
# FAILS
df5 = pd.DataFrame({
"a": pd.Series([None], dtype="object"),
"b": pd.Series([None], dtype="object"),
})
# FAILS
df6 = pd.DataFrame({
"a": pd.Series([None]),
"b": pd.Series([None]),
}, dtype="object")
# FAILS
df7 = pd.DataFrame({
"a": pd.Series([1], dtype="int64"),
"b": pd.Series([pd.NA], dtype="object"),
})
# PASSES
df8 = pd.DataFrame({
"a": pd.Series([1], dtype="int64"),
"b": pd.Series([pd.NA], dtype="string"),
})
regex_string = r"(^a$|\d{4})"
regex_list = [r"^a$", r"\d{4}"]
dfs = {
"df1": df1,
"df2": df2,
"df3": df3,
"df4": df4,
"df5": df5,
"df6": df6,
"df7": df7,
"df8": df8,
}
for label, df in dfs.items():
try:
df.replace(regex=regex_string, value="dude")
except ValueError as exc:
print(f"{label}: String regex FAILED!")
print(exc)
try:
df.replace(regex=regex_list, value="dude")
except ValueError as exc:
print(f"{label}: List of regexes FAILED!")
print(exc)
Issue Description
Under some circumstances, when using DataFrame.replace()
with a list of regular expressions on a dataframe that contains a column where dtype="object"
and all values in that column are some kind of null (including None
, np.nan
and pd.NA
) the operation will fail with:
ValueError: cannot call `vectorize` on size 0 inputs unless `otypes` is set
With a single regular expression rather than a list, there's no problem, and the behavior seems to depend on how the dataframe is constructed, how the dtypes are specified, and what the dtypes of the other columns in the dataframe are.
This behavior was not present in pandas 1.5.x
Expected Behavior
I expected all of the examples above to succeed, regardless of whether the dataframes contained null values or object types, and regardless of whether I used a list of regular expressions or a single regular expression to specify the pattern to be replaced.
Installed Versions
INSTALLED VERSIONS
------------------
commit : 0f437949513225922d851e9581723d82120684a6
python : 3.11.4.final.0
python-bits : 64
OS : Darwin
OS-release : 22.5.0
Version : Darwin Kernel Version 22.5.0: Thu Jun 8 22:22:20 PDT 2023; root:xnu-8796.121.3~7/RELEASE_ARM64_T6000
machine : arm64
processor : arm
byteorder : little
LC_ALL : None
LANG : None
LOCALE : None.UTF-8
pandas : 2.0.3
numpy : 1.24.4
pytz : 2023.3
dateutil : 2.8.2
setuptools : 68.0.0
pip : 23.2.1
Cython : None
pytest : 7.4.0
hypothesis : None
sphinx : 7.1.2
blosc : None
feather : None
xlsxwriter : 3.1.2
lxml.etree : 4.9.3
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 3.1.2
IPython : 8.14.0
pandas_datareader: None
bs4 : 4.12.2
bottleneck : None
brotli :
fastparquet : None
fsspec : 2023.6.0
gcsfs : 2023.6.0
matplotlib : 3.7.2
numba : None
numexpr : None
odfpy : None
openpyxl : 3.1.2
pandas_gbq : 0.19.2
pyarrow : 12.0.1
pyreadstat : None
pyxlsb : 1.0.10
s3fs : None
scipy : 1.11.1
snappy :
sqlalchemy : 1.4.49
tables : None
tabulate : 0.9.0
xarray : None
xlrd : 2.0.1
zstandard : 0.21.0
tzdata : 2023.3
qtpy : None
pyqt5 : None