Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ADIOS2: Flush to disk within a step #1207

Merged
merged 8 commits into from
Jul 28, 2022

Conversation

franzpoeschel
Copy link
Contributor

@franzpoeschel franzpoeschel commented Feb 24, 2022

This was recently added in ADIOS2 (I think only for BP5) and we need it to save memory e.g. in PIConGPU

Also it could be used to enable the IO pattern used by HiPace that so far only works in HDF5.

TODO

  • Prepare infrastructure in openPMD: pass-through for flushing parameters
    I have now removed flushing parameters from the AbstractIOHandler and instead created a struct for flushing parameters that is passed all the way down through our flushing logic
    This is a somewhat heavy change, but it makes our flushing logic stateless, meaning that we have to do less of these try...catch...throw tricks to restore state.
  • Python bindings
  • Implement the new logic in ADIOS2 (only for BP5 probably)
  • Maybe add a JSON option to turn it on/off as needed
  • Testing, documentation
  • More fine-grained user control
  • Documentation as mentioned in Should Iteration::close(flush=true) write files? #986 (comment)
  • Update documentation
  • Add preferred_flush_target="disk_override"/"buffer_override" that takes precedence over the normal options
  • Iteration::close?? We should it support as soon as it's actually needed, there is currently no use case for this API addition
  • Drain m_attributeWrites in ADIOS2 backend upon this No, data is not visible to a reader after PerformDataWrites() anyway, so writing the attributes early brings no advantage, but the disadvantage that they cannot be overwritten late

@franzpoeschel franzpoeschel added backend: ADIOS2 api: new additions to the API labels Feb 24, 2022
@franzpoeschel franzpoeschel force-pushed the topic-flush-inside-step branch 3 times, most recently from cb3ed7b to bd3aec3 Compare February 24, 2022 19:03
@franzpoeschel franzpoeschel force-pushed the topic-flush-inside-step branch 2 times, most recently from d4d7be3 to 1cba129 Compare March 2, 2022 15:13
@ax3l ax3l requested review from ax3l and guj March 8, 2022 18:54
@franzpoeschel franzpoeschel force-pushed the topic-flush-inside-step branch from e83b4ff to 456bbac Compare March 10, 2022 10:50
@franzpoeschel franzpoeschel changed the title [WIP] ADIOS2: Flush to disk within a step ADIOS2: Flush to disk within a step Mar 16, 2022
@franzpoeschel franzpoeschel requested a review from ax3l March 16, 2022 09:59
@franzpoeschel
Copy link
Contributor Author

franzpoeschel commented Mar 23, 2022

@psychocoderHPC asked about more fine-grained control for this to avoid flushing to disk when only some KB are in the buffers. There are several possible solutions that I can imagine:

  1. Add a flushing threshold to our ADIOS2 backend, e.g. adios2.engine.flushing_threshold = "2GiB". If the ADIOS2 backend notices that more than 2GiB were flushed, then it will store data to disk, otherwise use PerformPuts().
    Problem -> parallelism
  2. Turn "flush_during_step" into a dataset-specific option
  3. Add JSON configuration to the flush() API, e.g. series.flush(R"(adios2.preferred_flush_target = "disk")");

@@ -30,6 +30,10 @@
#include <tuple>
#include <vector>

#if __GNUC__ >= 11
#include <filesystem>
Copy link
Member

@ax3l ax3l Apr 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use an auxiliary function for this?
Filesystem is tricky and sometimes needs also extra link flags like -lstdc++fs, that I don't yet want to get into.

Or did this get better with GCC 11+?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that the __GNUC__ macro does not mean GCC specifically and is also set by clang, icpc and others.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, good catch about the __GNUC__ thing, will fix

I can have a look if I can do the same test with fstat or similar as well, might also go with this.
I only need this test to ensure that the right API call is chosen by the ADIOS2 backend, so restricting it to a single compiler set that implements <filesystem> well should be ok as an alternative

@franzpoeschel franzpoeschel force-pushed the topic-flush-inside-step branch 3 times, most recently from 95a84f1 to d5bb4b6 Compare May 4, 2022 13:27
@@ -119,6 +122,12 @@ Explanation of the single keys:
Please refer to the `official ADIOS2 documentation <https://adios2.readthedocs.io/en/latest/engines/engines.html>`_ for the available engine parameters.
The openPMD-api does not interpret these values and instead simply forwards them to ADIOS2.
* ``adios2.engine.usesteps``: Described more closely in the documentation for the :ref:`ADIOS2 backend<backends-adios2>` (usesteps).
* ``adios2.engine.preferred_flush_target`` Only relevant for BP5 engine, possible values are ``"disk"`` and ``"buffer"`` (default: ``"disk"``).
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've called this preferred_flush_target since it is only implemented by BP5. Otherwise, we will see user requests "I specified flush_target=disk, why is BP4 buffering the data anyway?".
I don't really want to explicitly use the BP5 name, e.g. bp5_flush_target=disk because other engines might in future implement this too.

@ax3l
Copy link
Member

ax3l commented May 9, 2022

@franzpoeschel please needs a small rebase

@franzpoeschel franzpoeschel force-pushed the topic-flush-inside-step branch from 61f286e to 97ccd40 Compare May 16, 2022 15:41
@ax3l ax3l self-assigned this May 24, 2022
#pragma once

#include "openPMD/IO/AbstractIOHandler.hpp"
#include "openPMD/auxiliary/JSON_internal.hpp"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cannot include this in public headers, hence the necessary distinction between FlushParams and ParsedFlushParams, even if both structs are used only internally.
Unifying both structs requires removing AbstractIOHandler.hpp from public headers.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question why we need AbstractIOHandler.hpp in public headers actually.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because we use it heavily in the private: sections of our public classes, which are still defined in the headers. We would have to move the entire implementation logic out of the public headers to avoid including it, which would be a very heavy restructuring (and most likely end up in a solution similar what ADIOS2 does by having a separate source tree only for the public APIs)

Copy link
Member

@ax3l ax3l Jul 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same impression, would not have this added to our priorities right now.

@franzpoeschel franzpoeschel force-pushed the topic-flush-inside-step branch 2 times, most recently from bc68d36 to 9387b8b Compare July 5, 2022 15:11
@franzpoeschel franzpoeschel force-pushed the topic-flush-inside-step branch from 1d463da to f06c909 Compare July 19, 2022 08:51
There is no advantage from doing this, readers will only see the data
after EndStep anyway, but there's the disadvantage that attributes
cannot be overwritten within this step anymore
@ax3l
Copy link
Member

ax3l commented Jul 28, 2022

Can you please post the multi-KB .png images as attachment to the PR description?
Then you can link to that image (click on it, get link) via this .rst syntax: ECP-WarpX/impactx#173

.. figure:: https://user-images.githubusercontent.com/1353258/180332191-f9ce11fc-8c56-4713-a91a-2ad12ab09805.png
   :alt: Chicane beam width and emittance evolution

This allows us to not have to check them into the source code and avoid that our source repo gets too large (we clone openPMD-api very often in CMake superbuilds).

@franzpoeschel
Copy link
Contributor Author

franzpoeschel commented Jul 28, 2022

BP5 without intermittent flushes (old behavior, now opt-in):
memory_bp5_no_flush
BP5 with intermittent flushes (new behavior)
memory_bp5_yes_flush
BP4 file-based with specified InitialBufferSize as ~6GB:
memory_filebased
Catastrophic behavior: BP4 without using steps, no intermittent output possible
memory_groupbased_nosteps
Streaming with a slow reader, writer gets congested:
memory_sst_congested
Streaming with fast reader, InitialBufferSize specified:
memory_sst_easy
Variable-based encoding in BP4 with InitialBufferSize specified, buffer is reused:
memory_variablebased
Variable-based encoding in BP4 without InitialBufferSize specified, exponential probing phase in the beginning (memory usage >= twice the actually needed memory size due to reallocation)
memory_variablebased_initialization

@franzpoeschel
Copy link
Contributor Author

Can you please post the multi-KB .png images as attachment to the PR description? Then you can link to that image (click on it, get link) via this .rst syntax: ECP-WarpX/impactx#173

.. figure:: https://user-images.githubusercontent.com/1353258/180332191-f9ce11fc-8c56-4713-a91a-2ad12ab09805.png
   :alt: Chicane beam width and emittance evolution

This allows us to not have to check them into the source code and avoid that our source repo gets too large (we clone openPMD-api very often in CMake superbuilds).

Good idea, yeah
Last commit does that

Copy link
Member

@ax3l ax3l left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, thanks a lot 🎉

@ax3l ax3l merged commit b05171d into openPMD:dev Jul 28, 2022
eschnett added a commit to eschnett/openPMD-api that referenced this pull request Nov 11, 2022
* dev: (70 commits)
  Docs: Recommend Static Build for Superbuilds (openPMD#1325)
  Python 3.11 (openPMD#1323)
  pybind11: v2.10.1+ (openPMD#1322)
  Add Attribute::getOptional<T>() and use to add some more dynamic datatype conversions at read time (openPMD#1278)
  Mapping between ADIOS steps and openPMD iterations (openPMD#949)
  Deprecate shareRaw (openPMD#1229)
  Fix append mode double attributes (openPMD#1302)
  Constant scalars: Don't flush double (openPMD#1315)
  Remove caching cmake vars (openPMD#1313)
  [pre-commit.ci] pre-commit autoupdate (openPMD#1311)
  storeChunk: Add an overload for shared_ptr<T[]> (openPMD#1296)
  Fix `operationAsString` Export (openPMD#1309)
  ADIOS2: more fine-grained control for file endings (openPMD#1218)
  [pre-commit.ci] pre-commit autoupdate (openPMD#1307)
  Fix file existence check in parallel tests (openPMD#1303)
  ADIOS2: Flush to disk within a step (openPMD#1207)
  [pre-commit.ci] pre-commit autoupdate (openPMD#1304)
  [pre-commit.ci] pre-commit autoupdate (openPMD#1295)
  Update catch2 to v2.13.9 (openPMD#1299)
  [pre-commit.ci] pre-commit autoupdate (openPMD#1292)
  ...

# Conflicts:
#	.github/workflows/linux.yml
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: new additions to the API backend: ADIOS2
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants