Skip to content

feat: Umpire csv stats output #3622

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 44 commits into
base: develop
Choose a base branch
from
Open

Conversation

MelReyCG
Copy link
Contributor

@MelReyCG MelReyCG commented Apr 9, 2025

This PR aims at adding a new MemoryStats component to add in Outputs to control where the Umpire report will be output (log by default, CSV optionally).

It also fixes the percentage value computation (which did not match what the description was telling, it was computed only for rank 0 rather than for all ranks).

Here is a generated CSV:

Cycle,Time,Umpire Memory Pool,Min rank mem bytes,Min rank mem %,Max rank mem bytes,Max rank mem %,Avg rank mem bytes,Avg rank mem %,Sum rank mem bytes,Sum rank mem %
0,0,HOST,2913644,0.00035934855,2913644,0.00035934855,2913644,0.00035934855,2913644,0.00035934855
1,5000,HOST,2913644,0.00035934855,2913644,0.00035934855,2913644,0.00035934855,2913644,0.00035934855
2,10000,HOST,2913644,0.00035934855,2913644,0.00035934855,2913644,0.00035934855,2913644,0.00035934855
3,15000,HOST,2913644,0.00035934855,2913644,0.00035934855,2913644,0.00035934855,2913644,0.00035934855
4,20000,HOST,2913644,0.00035934855,2913644,0.00035934855,2913644,0.00035934855,2913644,0.00035934855
5,25000,HOST,2913644,0.00035934855,2913644,0.00035934855,2913644,0.00035934855,2913644,0.00035934855

The same CSV aligned for readability:

Cycle, Time , Umpire Memory Pool, Min rank mem bytes, Min rank mem %, Max rank mem bytes, Max rank mem %, Avg rank mem bytes, Avg rank mem %, Sum rank mem bytes, Sum rank mem %
    0,     0, HOST              ,            2913644,  0.00035934855,            2913644,  0.00035934855,            2913644,  0.00035934855,            2913644,  0.00035934855
    1,  5000, HOST              ,            2913644,  0.00035934855,            2913644,  0.00035934855,            2913644,  0.00035934855,            2913644,  0.00035934855
    2, 10000, HOST              ,            2913644,  0.00035934855,            2913644,  0.00035934855,            2913644,  0.00035934855,            2913644,  0.00035934855
    3, 15000, HOST              ,            2913644,  0.00035934855,            2913644,  0.00035934855,            2913644,  0.00035934855,            2913644,  0.00035934855
    4, 20000, HOST              ,            2913644,  0.00035934855,            2913644,  0.00035934855,            2913644,  0.00035934855,            2913644,  0.00035934855
    5, 25000, HOST              ,            2913644,  0.00035934855,            2913644,  0.00035934855,            2913644,  0.00035934855,            2913644,  0.00035934855

Here is the new table output, which may be clearer:

----------------------------------------------------------------------------------------------------------------------------------
| Umpire Memory Pool | Min reserved rank memory | Max reserved rank memory | Avg reserved rank memory | Sum reserved rank memory |
|--------------------|--------------------------|--------------------------|--------------------------|--------------------------|
|                    |   bytes   |  over total  |   bytes   |  over total  |   bytes   |  over total  |   bytes   |  over total  |
|--------------------|-----------|--------------|-----------|--------------|-----------|--------------|-----------|--------------|
|               HOST |    2.8 MB |        0.0 % |    2.8 MB |        0.0 % |    2.8 MB |        0.0 % |    2.8 MB |        0.0 % |
----------------------------------------------------------------------------------------------------------------------------------

This PR also moves the parallelThreads attribute from OutputBase to SiloOutput because it was only used there. It didn't felt right to me to have a visible & documented attribute that had no effect, if I understood what it does correctly.

@MelReyCG MelReyCG self-assigned this Apr 9, 2025
@MelReyCG MelReyCG requested a review from arng40 April 9, 2025 14:29
…iloOutput as it is not useful on any other sub-class
@MelReyCG MelReyCG changed the title Umpire csv stats output feat: Umpire csv stats output Apr 9, 2025
@MelReyCG MelReyCG requested a review from cssherman as a code owner April 14, 2025 08:30
@MelReyCG MelReyCG added ci: run CUDA builds Allows to triggers (costly) CUDA jobs ci: run integrated tests Allows to run the integrated tests in GEOS CI labels Apr 16, 2025
Comment on lines 54 to 56
errorReporter( isNewlyOpened ?
"Output stream failed to open.\nPossible reasons: File doesn't exist / permissions / locking issue." :
"Output stream is in invalid state for writing.\nPossible reasons: Stream closed / buffer corruption / previous writing failed." );
Copy link
Contributor

@arng40 arng40 Apr 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since good() check the state of stream, why don't we check this condition every time (headerToStream & dataToStream)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand, as headerToStream() is based on toStream(), isn't good() verified correctly?

Copy link
Contributor

@arng40 arng40 Apr 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I mean is why we can't get rid of bool isNewlyOpened and it will simplify the whole process.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is your suggestion to extract this code from toStream() directly in the TableCSVFormatter methods?
If not, can you show a short code snippet? I don't visualize what you mean.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got it, isNewlyOpened complexifies the code, and is not accurate in some scenarios. I'll remove it and edit the message.

@MelReyCG
Copy link
Contributor Author

@corbett5 @wrtobin @CusiniM, can I ask you for a review?

@MelReyCG MelReyCG removed ci: run CUDA builds Allows to triggers (costly) CUDA jobs ci: run integrated tests Allows to run the integrated tests in GEOS CI labels Apr 16, 2025
@MelReyCG MelReyCG added ci: run CUDA builds Allows to triggers (costly) CUDA jobs ci: run integrated tests Allows to run the integrated tests in GEOS CI labels Apr 16, 2025
@MelReyCG
Copy link
Contributor Author

Ok, I think the PR is now ready, it will need a rebaseline because of the parallelThread moving.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci: run CUDA builds Allows to triggers (costly) CUDA jobs ci: run integrated tests Allows to run the integrated tests in GEOS CI flag: ready for review type: feature New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants