Skip to content

df.info() ignores index hashtable memory usage #8775

Closed
@kay1793

Description

@kay1793

another problem with memory report (#7619) in df.info() like #8578 (locked) and fix to multindex hidden in #8456 (see whatsnew). it count the space used by index hashtable. for each entry the klib hashtable keeps a copy of the index data as well as a key. for strings that is a copy of the string + pointer. in worst case there is 2*next_power_of_2(len(index)) entries and problem exists whatever dtype the index use.

x.index._engine.mapping
Out[13]: <pandas.hashtable.PyObjectHashTable at 0x7fb92f32bef0>

cc @shoyer, @njsmith
cc @asobrien

Activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    Output-Formatting__repr__ of pandas objects, to_string

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions