DOC: Categorical gotchas says memory usage is O(nm), when it is actually O(n+m)

#### Code Sample, a copy-pastable example if possible

```python
import pandas as pd

def make_series(categories, repeats):
    categories = [str(i) for i in range(categories)]
    return pd.Series(categories*repeats).astype('category')

def series_info(series):
    true_size = series.nbytes
    n = len(series.cat.categories)
    m = len(series.cat.codes)
    expected_size = n*8+m  # Note: not n*m*constant
    print('Number of categories (n): %d, length of data (m): %d, bytes: %d, '
          '8n+m: %d, nm: %d' % (n, m, true_size, expected_size, n*m))

series_info(make_series(123, 456))

# Output:
#  Number of categories (n): 123, length of data (m): 56088, bytes: 57072, 8n+m: 57072, nm: 6898824
#
# Note that 8n+m matches true size as it should. nm is much larger.
```
#### Problem description

The [documentation](http://pandas-docs.github.io/pandas-docs-travis/categorical.html#gotchas) says:

> The memory usage of a Categorical is proportional to the number of categories times the length of the data.

If there are *n* categories and the length of data is *m*, this means that memory usage is O(nm). As the script above shows however, the true usage is O(n+m). The claimed usage O(nm) suggests that the implementation would store the categories using one-hot encoding, which would be strange to say the least.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

DOC: Categorical gotchas says memory usage is O(nm), when it is actually O(n+m) #17705

Code Sample, a copy-pastable example if possible

Problem description

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

DOC: Categorical gotchas says memory usage is O(nm), when it is actually O(n+m) #17705

Description

Code Sample, a copy-pastable example if possible

Problem description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions