Description
Code Sample, a copy-pastable example if possible
import numpy as np
import pandas as pd
s=pd.Series(['a', 'b', np.nan])
s=s.astype('category')
print(s.str.startswith('a', na=False))
print(s.str.endswith('a', na=False))
print(s.str.contains('a', na=False))
Problem description
The above code works as expected if you comment out the 4th line (astype('category')). With str.startswith, str.endswith, and str,contains, the "na=False" option works to output False for the NaN value. However, with that astype('category') line making the series categorical, the "na=False" option seems to be ignored and NaN is output instead for the NaN value. This makes the output difficult to use e.g. as a mask for slicing data.
Expected Output
0 True
1 False
2 False
dtype: bool
0 True
1 False
2 False
dtype: bool
0 True
1 False
2 False
dtype: bool
Output of pd.show_versions()
[paste the output of pd.show_versions()
here below this line]
INSTALLED VERSIONS
commit: None
python: 2.7.12.final.0
python-bits: 64
OS: Linux
OS-release: 4.4.0-138-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: None.None
pandas: 0.23.4
pytest: None
pip: 9.0.3
setuptools: 20.7.0
Cython: 0.29
numpy: 1.11.0
scipy: 1.1.0
pyarrow: None
xarray: None
IPython: 5.8.0
sphinx: 1.8.1
patsy: None
dateutil: 2.7.5
pytz: 2018.7
blosc: None
bottleneck: None
tables: 3.2.2
numexpr: 2.4.3
feather: None
matplotlib: 2.2.3
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: 4.4.1
html5lib: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None