Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PERF: correctly report memory used by Index's #15237

Closed
wants to merge 1 commit into from

Conversation

jreback
Copy link
Contributor

@jreback jreback commented Jan 26, 2017

No description provided.

@jreback jreback added Indexing Related to indexing on series/frames, not to indexes themselves Performance Memory or execution speed performance labels Jan 26, 2017
@jreback jreback added this to the 0.20.0 milestone Jan 26, 2017
@@ -237,6 +238,12 @@ cdef class {{name}}HashTable(HashTable):
k = kh_get_{{dtype}}(self.table, key)
return k != self.table.n_buckets

def sizeof(self, deep=False):
""" return the size of my table in bytes """
return self.table.n_buckets * (sizeof(self.table.keys) +
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wesm I believe this is correctly, any thoughts?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sizeof will return a pointer size here - I think you instead need to do something like this? (not sure the kh aliases are exposed)

self.table.n_buckets * (sizeof(khkey_t)  + #keys
                        sizeof({{dtype}}_t)  + #vals
                        sizeof(khint32_t))  #flags

khint32_t *flags; \

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that makes sense

@codecov-io
Copy link

codecov-io commented Jan 26, 2017

Current coverage is 86.31% (diff: 100%)

Merging #15237 into master will increase coverage by <.01%

@@             master     #15237   diff @@
==========================================
  Files           139        139          
  Lines         51096      51103     +7   
  Methods           0          0          
  Messages          0          0          
  Branches          0          0          
==========================================
+ Hits          44103      44110     +7   
  Misses         6993       6993          
  Partials          0          0          

Powered by Codecov. Last update be32852...d77c002

@max-sixty
Copy link
Contributor

What do you think about using __sizeof__ so it works with python?

@jreback
Copy link
Contributor Author

jreback commented Jan 26, 2017

@MaximilianR I added __sizeof__, though you would have to do something like

sys.getsizeof(index._engine), which is simply captured by index.memory_usage() anyhow

@jreback jreback closed this in 3853fe6 Jan 27, 2017
AnkurDedania pushed a commit to AnkurDedania/pandas that referenced this pull request Mar 21, 2017
Author: Jeff Reback <jeff@reback.net>

Closes pandas-dev#15237 from jreback/memory and squashes the following commits:

d77c002 [Jeff Reback] PERF: correctly report memory used by Index's
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Indexing Related to indexing on series/frames, not to indexes themselves Performance Memory or execution speed performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants