-
-
Notifications
You must be signed in to change notification settings - Fork 18.8k
Accept CategoricalDtype in read_csv #17643
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 1 commit
e83a0b8
388e8a9
c5f6e04
4b588cd
e32d5be
508dd1e
6f175a7
1545734
de9e3ee
b80cff8
b028827
fc34080
d100f0c
8600c50
8c4ab5b
96d5144
3de75cd
f03798d
9325a93
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
- Loading branch information
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -119,7 +119,7 @@ expanded to include the ``categories`` and ``ordered`` attributes. A | |
``CategoricalDtype`` can be used to specify the set of categories and | ||
orderedness of an array, independent of the data themselves. This can be useful, | ||
e.g., when converting string data to a ``Categorical`` (:issue:`14711`, | ||
:issue:`15078`, :issue:`16015`): | ||
:issue:`15078`, :issue:`16015`, :issue:`17643`): | ||
|
||
.. ipython:: python | ||
|
||
|
@@ -129,8 +129,33 @@ e.g., when converting string data to a ``Categorical`` (:issue:`14711`, | |
dtype = CategoricalDtype(categories=['a', 'b', 'c', 'd'], ordered=True) | ||
s.astype(dtype) | ||
|
||
One place that deserves special mention is in :meth:`read_csv`. Previously, with | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. maybe a separate sub-section for this |
||
``dtype={'col': 'category'}``, the returned values and categories would always | ||
be strings. | ||
|
||
.. ipython:: python | ||
|
||
from pandas.compat import StringIO | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. in general we put this in the hidden code block at the top of the file, as people shouldn't use this from pandas, but just import it themselves |
||
|
||
data = 'A,B\na,1\nb,2\nc,3' | ||
pd.read_csv(StringIO(data), dtype={'B': 'category'}).B.cat.categories | ||
|
||
Notice the "object" dtype. | ||
|
||
With a ``CategoricalDtype`` of all numerics, datetimes, or | ||
timedeltas, we can automatically convert to the correct type | ||
|
||
dtype = {'B': CategoricalDtype([1, 2, 3])} | ||
pd.read_csv(StringIO(data), dtype=dtype).B.cat.categories | ||
|
||
The values have been correctly interpreted as integers. | ||
|
||
The ``.dtype`` property of a ``Categorical``, ``CategoricalIndex`` or a | ||
``Series`` with categorical type will now return an instance of ``CategoricalDtype``. | ||
For the most part, this is backwards compatible, though the string repr has changed. | ||
If you were previously using ``str(s.dtype == 'category')`` to detect categorical data, | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. missing closing parenthesis around s.dtype (actually the closing one is in the wrong place) |
||
switch to :func:`api.types.is_categorical_dtype`, which is compatible with the old and | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I would add pandas in the |
||
new ``CategoricalDtype``. | ||
|
||
See the :ref:`CategoricalDtype docs <categorical.categoricaldtype>` for more. | ||
|
||
|
@@ -163,8 +188,6 @@ Other Enhancements | |
- :func:`Categorical.rename_categories` now accepts a dict-like argument as `new_categories` and only updates the categories found in that dict. (:issue:`17336`) | ||
- :func:`read_excel` raises ``ImportError`` with a better message if ``xlrd`` is not installed. (:issue:`17613`) | ||
- :meth:`DataFrame.assign` will preserve the original order of ``**kwargs`` for Python 3.6+ users instead of sorting the column names | ||
- Pass a :class:`~pandas.api.types.CategoricalDtype` to :meth:`read_csv` to parse categorical | ||
data as numeric, datetimes, or timedeltas, instead of strings. See :ref:`here <io.categorical>`. (:issue:`17643`) | ||
|
||
|
||
.. _whatsnew_0210.api_breaking: | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
missing
.. ipython:: python
directive here