You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
- Differentiate CategoricalDtype of different different dtypes between each other when viewed in df.dtypes
Inside, each of these categories are actually different dtypes, so category can be thought of as a "meta-dtype": while two columns of int64 are of the same dtype, the columns of different category can have different category dtypes.
dtype"category" in reality means: one of CategoricalDtype objects.
While this enhancement wouldn't completely differentiate them, it would do it at least in some cases (when the dtypes of their categories are different).
- When merging two categorical columns: more convenience, fewer opportunities for mistakes
For example, when merging on a categorical column, this can be a useful hint.
Helpful reading: medium article, section "Merging two categorical columns". In summary:
Joining on columns -> produces
cat1 cat2 -> obj
cat1 cat1 -> cat1
While showing the dtypes alone wouldn't completely solve this, this can be a helpful improvement in user experience - to seeing the internal dtypes with df.dtypes, instead of with df['column_name'].dtype and seeing in parenthesis, or df['column_name'].cat.categories.dtype or df['column_name'].dtype.categories.dtype
TODO: read that article - any other advantages / quality-of-life improvements of this proposal?
- Conversion from categorical column to normal - unclear reason for error
# Preparation of the situation:df['b'] =ad['b'].cat.add_categories('new_cat')
# The situation:print(df) # see that data in both `a` and `b` are integersprint(df.dtypes) # see that the dtypes both `a` and `b` are "category" -> want to convertdf['a'].astype(int) # worksdf['b'].astype(int) # doesn't work, saying: `ValueError: Cannot cast object dtype to int64`
Confusing error: I wanted to cast a categorical - not an object.
(although it's quite easy to discover the reason: `print(df['b']) includes "object"
The reason: inside of a categorical is an object (although 'new_cat' is an unused category, it still prevents conversion to int). We can know this with: df['b'] or df['b'].dtype
- better clarity
dtypes inside Categorical can have behavior different from numpy. Seeing the dtypes clearly -> Fewer user surprises
pd.Categorical([3, 1, np.nan]) # CategoricalDtype int64# while:pd.Categorical(np.array([1, 2, np.nan])) # CategoricalDtype of type float64
Drawbacks:
longer, not so clean dtype to read
can still be confusing: even in cases where category<dtype> is same, CategoryDtypes can still be different objects
What do you think?
What are some of the advantages or drawbacks of showing the dtypes of categories?
Is this interesting at all?
Feature Description
When a df contains a categorical column, and the user prints df.dtypes, instead of
category
show:
category[dtype]
where dtype is the categories_dtype when printing ad['column_name'].dtype
Alternative Solutions (that currently exist)
df['a'] # last line of output contains the dtype: Categories (3, int64): [1, 2, 3]
#48515 wanted the ability to set the cat dtype, my request is just for displaying it
#57259 wants to fix the docs for categorical dtype equality - dtype is important for their equality, so seeing it is useful (although insufficient, still useful)
The text was updated successfully, but these errors were encountered:
(in process of writing it) please comment if it's a good / bad idea
Feature Type
Adding new functionality to pandas
Changing existing functionality in pandas
Removing existing functionality in pandas
Problem Description
Would it be good to also show the
Categorie
'sdtype
, for example like this?:Currently:
Proposed:
or whichever alternative looks best, e.g.:
Advantages of the proposed way:
- Differentiate
CategoricalDtype
of different differentdtypes
between each other when viewed indf.dtypes
Inside, each of these categories are actually different
dtypes
, socategory
can be thought of as a "meta-dtype": while two columns of int64 are of the samedtype
, the columns of differentcategory
can have different categorydtypes
.dtype
"category"
in reality means: one ofCategoricalDtype
objects.While this enhancement wouldn't completely differentiate them, it would do it at least in some cases (when the
dtypes
of their categories are different).- When merging two categorical columns: more convenience, fewer opportunities for mistakes
For example, when merging on a categorical column, this can be a useful hint.
Helpful reading: medium article, section "Merging two categorical columns". In summary:
Joining on columns -> produces
While showing the dtypes alone wouldn't completely solve this, this can be a helpful improvement in user experience - to seeing the internal
dtypes
withdf.dtypes
, instead of withdf['column_name'].dtype
and seeing in parenthesis, ordf['column_name'].cat.categories.dtype
ordf['column_name'].dtype.categories.dtype
TODO: read that article - any other advantages / quality-of-life improvements of this proposal?
- Conversion from categorical column to normal - unclear reason for error
Confusing error: I wanted to cast a categorical - not an object.
(although it's quite easy to discover the reason: `print(df['b']) includes "object"
The reason: inside of a categorical is an object (although 'new_cat' is an unused category, it still prevents conversion to int). We can know this with:
df['b']
ordf['b'].dtype
- better clarity
dtypes
insideCategorical
can have behavior different fromnumpy
. Seeing thedtypes
clearly -> Fewer user surprisesDrawbacks:
longer, not so clean
dtype
to readcan still be confusing: even in cases where
category<dtype>
is same,CategoryDtypes
can still be different objectsWhat do you think?
What are some of the advantages or drawbacks of showing the
dtypes
of categories?Is this interesting at all?
Feature Description
When a df contains a categorical column, and the user prints
df.dtypes
, instead ofshow:
where dtype is the
categories_dtype
when printingad['column_name'].dtype
Alternative Solutions (that currently exist)
Or any of these also show the internal
dtype
:Additional Context
#48515 wanted the ability to set the cat dtype, my request is just for displaying it
#57259 wants to fix the docs for categorical dtype equality -
dtype
is important for their equality, so seeing it is useful (although insufficient, still useful)The text was updated successfully, but these errors were encountered: