Skip to content

opal: add basic memory accounting for free lists and objects #6282

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

hjelmn
Copy link
Member

@hjelmn hjelmn commented Jan 15, 2019

This commit adds support for exposing free list and object memory
allocation through the MPI_T pvar interface. Memory tracking of free
list memory is enabled by default for all builds. OPAL object memory
tracking is only enabled for debug builds.

In addition this commit updates the pvar interface to pass the pvar
handle to the read function instead of the pvar. This gives the read
function access to the count returned when the handle was bound.

Signed-off-by: Nathan Hjelm hjelmn@lanl.gov

This commit adds support for exposing free list and object memory
allocation through the MPI_T pvar interface. Memory tracking of free
list memory is enabled by default for all builds. OPAL object memory
tracking is only enabled for debug builds.

In addition this commit updates the pvar interface to pass the pvar
handle to the read function instead of the pvar. This gives the read
function access to the count returned when the handle was bound.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
@hjelmn
Copy link
Member Author

hjelmn commented Jan 16, 2019

@bosilca @jsquyres Any comments? This should have little-no overhead for optimized builds and will provide a nice capability for debug builds.

@bosilca
Copy link
Member

bosilca commented Jan 16, 2019

I saw this yesterday and had mitigated feeling about. I understand the need for tracking allocations, I am not sure about the need to know where it is used, especially not at the granularity you are providing here. However, the overhead was one of my original concerns, but as I went through the code my fears about a significant overhead vanished, the code looks neat and clean.

Why not having a memory allocator MCA, so that application can register tracking hooks (and identify their location based on the stack trace) ? This is not new but has the advantage of requiring minimal code in the OMPI tree.

@hjelmn
Copy link
Member Author

hjelmn commented Jan 17, 2019

@bosilca That isn't a bad idea. We can re-purpose the mpool framework to allow the base allocators to be used before the framework is opened. Then we can make all internal allocations use the default allocator and do all the accounting on mpools (which would also get free list allocations). I don't think it would take too much work.

@hjelmn
Copy link
Member Author

hjelmn commented Jan 17, 2019

One question is how much info we want to track. OpenUCX tracks each individual allocation but we probably only really care about total size (for most cases).

@awlauria
Copy link
Contributor

@bosilca @hjelmn if you want this in for 5.0 please target getting this into master by end of April. I'll add it to the 5.0 milestone for tracking purposes for now.

@awlauria awlauria added this to the v5.0.0 milestone Mar 27, 2020
@lanl-ompi
Copy link
Contributor

Can one of the admins verify this patch?

@jsquyres
Copy link
Member

@hjelmn Looks like this PR got forgotten about. Do you still want to work on it?

@hjelmn
Copy link
Member Author

hjelmn commented Feb 6, 2021

Maybe. Let me figure out where I was on it.

@gpaulsen
Copy link
Member

@hjelmn ?

@hjelmn
Copy link
Member Author

hjelmn commented Mar 16, 2021

Will look tomorrow. Finding time where I can.

@gpaulsen
Copy link
Member

@hjelmn @bosilca Removing v5.0 milestone as this looks at risk for Sept 23 rc1 date.

@gpaulsen gpaulsen removed this from the v5.0.0 milestone Aug 26, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants