Skip to content

BUG: Pandas string dtype reverts to object dtype when initialising index #42455

@mlee94

Description

@mlee94
  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • (optional) I have confirmed this bug exists on the master branch of pandas.

In [2]: pd.Index(['A', 'B', 'C', 'D'], dtype=pd.StringDtype()).dtype
Out[2]: dtype('O')

In [3]: pd.Index([1, 2, 3, 4], dtype=int).dtype
Out[3]: dtype('int64')
Output pd.show_versions()
  INSTALLED VERSIONS
  ------------------
  commit           : f00ed8f47020034e752baf0250483053340971b0
  python           : 3.7.9.final.0
  python-bits      : 64
  OS               : Windows
  OS-release       : 10
  Version          : 10.0.19041
  machine          : AMD64
  processor        : Intel64 Family 6 Model 58 Stepping 0, GenuineIntel
  byteorder        : little
  LC_ALL           : None
  LANG             : None
  LOCALE           : None.None
  
  pandas           : 1.3.0
  numpy            : 1.19.5
  pytz             : 2020.5
  dateutil         : 2.8.1
  pip              : 21.0
  setuptools       : 49.6.0.post20210108
  Cython           : None
  pytest           : 6.2.1
  hypothesis       : None
  sphinx           : 3.5.4
  blosc            : None
  feather          : None
  xlsxwriter       : 1.3.9
  lxml.etree       : 4.6.3
  html5lib         : None
  pymysql          : None
  psycopg2         : 2.8.6 (dt dec pq3 ext lo64)
  jinja2           : 2.11.2
  IPython          : 7.19.0
  pandas_datareader: None
  bs4              : 4.9.3
  bottleneck       : None
  fsspec           : 2021.05.0
  fastparquet      : None
  gcsfs            : None
  matplotlib       : 3.3.3
  numexpr          : None
  odfpy            : None
  openpyxl         : 3.0.7
  pandas_gbq       : None
  pyarrow          : 3.0.0
  pyxlsb           : None
  s3fs             : None
  scipy            : 1.6.2
  sqlalchemy       : 1.4.3
  tables           : None
  tabulate         : 0.8.9
  xarray           : 0.16.2
  xlrd             : None
  xlwt             : None
  numba            : 0.53.1

I don't think the dtype should change here. Seems to be okay for other dtypes apart from strings.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Closing CandidateMay be closeable, needs more eyeballsExtensionArrayExtending pandas with custom dtypes or arrays.IndexRelated to the Index class or subclasses

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions