Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP, ENH: DXT plot threshold #802

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

tylerjereddy
Copy link
Collaborator

  • test the size threshold empirically with more appropriate logs
  • decide if we want to use the current approach of summing
    the sizes of each DXT module together vs. having per-DXT
    module thresholds
  • decide if we'd also want a way to disable DXT handling even
    if HEATMAP is not available (otherwise, all the sample NERSC
    logs will use > 100 GB memory and be unusable with current report
    generation machinery)
  • add a warning mechanism/message somewhere on the report when the
    threshold is reached to disable DXT parsing
  • add a command line argument to force an override of the disable
    (if i.e., the user is working on a high memory node and really
    wants to see DXT results)
  • add regression tests for the new machinery

* related to darshan-hpcgh-729 and darshan-hpcgh-692

* draft infrastructure for skipping the processing
of DXT data above a certain module compressed size
threshold, in cases where runtime `HEATMAP` data is
available

* note that for the vast majority of log files that
have been provided/problematic in this regard, including
the large ones from NERSC, this is of no help, because there
is no `HEATMAP` data to fall back on

* for a case where this does help, on this branch:
`time python -m darshan summary e3sm_io_heatmap_and_dxt.darshan`
`real   0m12.415s
vs. `main`: `real       0m47.470s`

* so, that's not a bad improvement, but there are still many
things to decide/do here:

- [ ] test the size threshold empirically with more appropriate logs
- [ ] decide if we want to use the current approach of summing
      the sizes of each DXT module together vs. having per-DXT
      module thresholds
- [ ] decide if we'd also want a way to disable DXT handling even
      if `HEATMAP` is not available (otherwise, all the sample NERSC
      logs will use > 100 GB memory and be unusable with current report
      generation machinery)
- [ ] add a warning mechanism/message somewhere on the report when the
      threshold is reached to disable DXT parsing
- [ ] add a command line argument to force an override of the disable
      (if i.e., the user is working on a high memory node and really
       wants to see DXT results)
- [ ] add regression tests for the new machinery
@tylerjereddy tylerjereddy added enhancement New feature or request pydarshan labels Sep 3, 2022
@tylerjereddy
Copy link
Collaborator Author

Feedback from meeting:

  • Phil is in favor of checking the threshold on a per-module basis rather than summing
  • Phil is also in favor of not displaying heatmaps when the threshold is exceeded even if runtime HEATMAP is absent, but we should have an option to force generation of heatmaps at your own peril

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request pydarshan
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant