Description
Code Sample, a copy-pastable example if possible
This code is used in jupyter lab using python 3.7 and pandas 1.0.1
It accesses a text file of words delimited by carriage returns
import base64
import numpy as np
import pandas as pd
words = pd.read_table("sampleTEXT.txt",names=['word'],header=None)
words.head()
--- word
--0 difference
--1 where
--2 mc
--3 is
--4 the
words['word_encoded'] = words.word.str.encode('utf-8', 'strict').str.encode('base64')
Problem description
the following error appears:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-4-0f7040aa4d4e> in <module>
----> 1 words['word_encoded'] = words.word.str.encode('utf-8', 'strict').str.encode('base64')
~/miniconda3/envs/p37cu10.2PyTo/lib/python3.7/site-packages/pandas/core/strings.py in wrapper(self, *args, **kwargs)
1949 f"inferred dtype '{self._inferred_dtype}'."
1950 )
-> 1951 raise TypeError(msg)
1952 return func(self, *args, **kwargs)
1953
TypeError: Cannot use .str.encode with values of inferred dtype 'bytes'.
Expected Output
If I downgrade pandas to '0.24.2' then I get the desired output
word word_encoded
-----------------------------------------------------
0 difference b'ZGlmZmVyZW5jZQ==\n'
1 where b'd2hlcmU=\n'
2 mc b'bWM=\n'
3 is b'aXM=\n'
4 the b'dGhl\n'
I discussed this problem on stackoverflow here
I read here
https://pandas.pydata.org/pandas-docs/stable/whatsnew/v0.25.0.html
that The .str-accessor performs stricter type checks. This was new since v.25
But I still dont know how to resolve this issue I am having with 'inferred bytes'
Output of pd.show_versions()
[paste the output of pd.show_versions()
here below this line]
the working version
INSTALLED VERSIONS
commit: None
python: 3.6.7.final.0
python-bits: 64
OS: Linux
OS-release: 5.3.0-28-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
pandas: 0.24.2
pytest: None
pip: 20.0.2
setuptools: 45.2.0.post20200209
Cython: None
numpy: 1.18.1
scipy: None
pyarrow: None
xarray: None
IPython: 7.12.0
sphinx: None
patsy: None
dateutil: 2.8.1
pytz: 2019.3
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: 3.1.3
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml.etree: None
bs4: None
html5lib: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.11.1
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
gcsfs: None
the crashing version
INSTALLED VERSIONS
commit : None
python : 3.7.6.final.0
python-bits : 64
OS : Linux
OS-release : 5.3.0-28-generic
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8
pandas : 1.0.1
numpy : 1.18.1
pytz : 2019.3
dateutil : 2.8.1
pip : 20.0.2
setuptools : 45.2.0.post20200209
Cython : None
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 2.11.1
IPython : 7.12.0
pandas_datareader: None
bs4 : None
bottleneck : None
fastparquet : None
gcsfs : None
lxml.etree : None
matplotlib : 3.1.3
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pytables : None
pytest : None
pyxlsb : None
s3fs : None
scipy : 1.4.1
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
xlwt : None
xlsxwriter : None
numba : None