Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ADIOS2: Flush to disk within a step #1207

Merged
merged 8 commits into from
Jul 28, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -469,10 +469,12 @@ set(CORE_SOURCE
src/benchmark/mpi/OneDimensionalBlockSlicer.cpp
src/helper/list_series.cpp)
set(IO_SOURCE
src/IO/AbstractIOHandler.cpp
src/IO/AbstractIOHandlerImpl.cpp
src/IO/AbstractIOHandlerHelper.cpp
src/IO/DummyIOHandler.cpp
src/IO/IOTask.cpp
src/IO/FlushParams.cpp
src/IO/HDF5/HDF5IOHandler.cpp
src/IO/HDF5/ParallelHDF5IOHandler.cpp
src/IO/HDF5/HDF5Auxiliary.cpp
Expand Down
97 changes: 92 additions & 5 deletions docs/source/backends/adios2.rst
Original file line number Diff line number Diff line change
Expand Up @@ -130,12 +130,18 @@ This buffer is drained to storage only at specific times:

The usage pattern of openPMD, especially the choice of iteration encoding influences the memory use of ADIOS2.
The following graphs are created from a real-world application using openPMD (PIConGPU) using KDE Heaptrack.
Ignore the 30GB initialization phases.

BP4 file engine
***************

The internal data structure of BP4 is one large buffer that holds all data written by a process.
It is drained to the disk upon ending a step or closing the engine (in parallel applications, data will usually be aggregated at the node-level before this).
This approach enables a very high IO performance by requiring only very few, very large IO operations, at the cost of a high memory consumption and some common usage pitfalls as detailed below:

* **file-based iteration encoding:** A new ADIOS2 engine is opened for each iteration and closed upon ``Iteration::close()``.
Each iteration has its own buffer:

.. image:: ./memory_filebased.png
.. figure:: https://user-images.githubusercontent.com/14241876/181477396-746ee21d-6efe-450b-bb2f-f53d49945fb9.png
:alt: Memory usage of file-based iteration encoding

* **variable-based iteration encoding and group-based iteration encoding with steps**:
Expand All @@ -147,19 +153,100 @@ Ignore the 30GB initialization phases.
These memory spikes can easily lead to out-of-memory (OOM) situations, motivating that the ``InitialBufferSize`` should not be chosen too small.
Both behaviors are depicted in the following two pictures:

.. image:: ./memory_variablebased.png
.. figure:: https://user-images.githubusercontent.com/14241876/181477405-0439b017-256b-48d6-a169-014b3fe3aeb3.png
:alt: Memory usage of variable-based iteration encoding

.. image:: ./memory_variablebased_initialization.png
.. figure:: https://user-images.githubusercontent.com/14241876/181477406-f6e2a173-2ec1-48df-a417-0cb97a160c91.png
:alt: Memory usage of variable-based iteration encoding with bad ``InitialBufferSize``

* **group-based iteration encoding without steps:**
This encoding **should be avoided** in ADIOS2.
No data will be written to disk before closing the ``Series``, leading to a continuous buildup of memory, and most likely to an OOM situation:

.. image:: ./memory_groupbased_nosteps.png
.. figure:: https://user-images.githubusercontent.com/14241876/181477397-4d923061-7051-48c4-ae3a-a9efa10dcac7.png
:alt: Memory usage of group-based iteration without using steps

SST staging engine
******************

Like the BP4 engine, the SST engine uses one large buffer as an internal data structure.

Unlike the BP4 engine, however, a new buffer is allocated for each IO step, leading to a memory profile with clearly distinct IO steps:

.. figure:: https://user-images.githubusercontent.com/14241876/181477403-7ed7810b-dedf-48b8-b17b-8ce89fd3c34a.png
:alt: Ideal memory usage of the SST engine

The SST engine performs all IO asynchronously in the background and releases memory only as soon as the reader is done interacting with an IO step.
With slow readers, this can lead to a buildup of past IO steps in memory and subsequently to an out-of-memory condition:

.. figure:: https://user-images.githubusercontent.com/14241876/181477400-f342135f-612e-464f-b0e7-c1978ef47a94.png
:alt: Memory congestion in SST due to a slow reader

This can be avoided by specifying the `ADIOS2 parameter <https://adios2.readthedocs.io/en/latest/engines/engines.html#bp5>`_ ``QueueLimit``:

.. code:: cpp

std::string const adios2Config = R"(
{"adios2": {"engine": {"parameters": {"QueueLimit": 1}}}}
)";
Series series("simData.sst", Access::CREATE, adios2Config);

By default, the openPMD-api configures a queue limit of 2.
Depending on the value of the ADIOS2 parameter ``QueueFullPolicy``, the SST engine will either ``"Discard"`` steps or ``"Block"`` the writer upon reaching the queue limit.

BP5 file engine
***************

The BP5 file engine internally uses a linked list of equally-sized buffers.
The size of each buffer can be specified up to a maximum of 2GB with the `ADIOS2 parameter <https://adios2.readthedocs.io/en/latest/engines/engines.html#bp5>`_ ``BufferChunkSize``:

.. code:: cpp

std::string const adios2Config = R"(
{"adios2": {"engine": {"parameters": {"BufferChunkSize": 2147381248}}}}
)";
Series series("simData.bp5", Access::CREATE, adios2Config);

This approach implies a sligthly lower IO performance due to more frequent and smaller writes, but it lets users control memory usage better and avoids out-of-memory issues when configuring ADIOS2 incorrectly.

The buffer is drained upon closing a step or the engine, but draining to the filesystem can also be triggered manually.
In the openPMD-api, this can be done by specifying backend-specific parameters to the ``Series::flush()`` or ``Attributable::seriesFlush()`` calls:

.. code:: cpp

series.flush(R"({"adios2": {"preferred_flush_target": "disk"}})")

The memory consumption of this approach shows that the 2GB buffer is first drained and then recreated after each ``flush()``:

.. figure:: https://user-images.githubusercontent.com/14241876/181477392-7eff2020-7bfb-4ddb-b31c-27b9937e088a.png
:alt: Memory usage of BP5 when flushing directly to disk

.. note::

KDE Heaptrack tracks the **virtual memory** consumption.
While the BP4 engine uses ``std::vector<char>`` for its internal buffer, BP5 uses plain ``malloc()`` (hence the 2GB limit), which does not initialize memory.
Memory pages will only be allocated to physical memory upon writing.
In applications with small IO sizes on systems with virtual memory, the physical memory usage will stay well below 2GB even if specifying the BufferChunkSize as 2GB.

**=> Specifying the buffer chunk size as 2GB as shown above is a good idea in most cases.**

Alternatively, data can be flushed to the buffer.
Note that this involves data copies that can be avoided by either flushing directly to disk or by entirely avoiding to flush until ``Iteration::close()``:

.. code:: cpp

series.flush(R"({"adios2": {"preferred_flush_target": "buffer"}})")

With this strategy, the BP5 engine will slowly build up its buffer until ending the step.
Rather than by reallocation as in BP4, this is done by appending a new chunk, leading to a clearly more acceptable memory profile:

.. figure:: https://user-images.githubusercontent.com/14241876/181477384-ce4ea8ab-3bde-4210-991b-2e627dfcc7c9.png
:alt: Memory usage of BP5 when flushing to the engine buffer

The default is to flush to disk, but the default ``preferred_flush_target`` can also be specified via JSON/TOML at the ``Series`` level.




Known Issues
------------
Expand Down
Binary file removed docs/source/backends/memory_filebased.png
Binary file not shown.
Binary file removed docs/source/backends/memory_groupbased_nosteps.png
Binary file not shown.
Binary file removed docs/source/backends/memory_variablebased.png
Binary file not shown.
Binary file not shown.
15 changes: 14 additions & 1 deletion docs/source/details/backendconfig.rst
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,10 @@ The configuration string may refer to the complete ``openPMD::Series`` or may ad
This reflects the fact that certain backend-specific parameters may refer to the whole Series (such as storage engines and their parameters) and others refer to actual datasets (such as compression).
Dataset-specific configurations are (currently) only available during dataset creation, but not when reading datasets.

A JSON/TOML configuration may either be specified as an inline string that can be parsed as a JSON/TOML object, or alternatively as a path to a JSON/TOML-formatted text file (only in the constructor of ``openPMD::Series``):
Additionally, some backends may provide different implementations to the ``Series::flush()`` and ``Attributable::flushSeries()`` calls.
JSON/TOML strings may be passed to these calls as optional parameters.

A JSON/TOML configuration may either be specified as an inline string that can be parsed as a JSON/TOML object, or alternatively as a path to a JSON/TOML-formatted text file (only in the constructor of ``openPMD::Series``, all other API calls that accept a JSON/TOML specification require in-line datasets):

* File paths are distinguished by prepending them with an at-sign ``@``.
JSON and TOML are then distinguished by the filename extension ``.json`` or ``.toml``.
Expand Down Expand Up @@ -119,6 +122,16 @@ Explanation of the single keys:
Please refer to the `official ADIOS2 documentation <https://adios2.readthedocs.io/en/latest/engines/engines.html>`_ for the available engine parameters.
The openPMD-api does not interpret these values and instead simply forwards them to ADIOS2.
* ``adios2.engine.usesteps``: Described more closely in the documentation for the :ref:`ADIOS2 backend<backends-adios2>` (usesteps).
* ``adios2.engine.preferred_flush_target`` Only relevant for BP5 engine, possible values are ``"disk"`` and ``"buffer"`` (default: ``"disk"``).
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've called this preferred_flush_target since it is only implemented by BP5. Otherwise, we will see user requests "I specified flush_target=disk, why is BP4 buffering the data anyway?".
I don't really want to explicitly use the BP5 name, e.g. bp5_flush_target=disk because other engines might in future implement this too.


* If ``"disk"``, data will be moved to disk on every flush.
* If ``"buffer"``, then only upon ending an IO step or closing an engine.

This behavior can be overridden on a per-flush basis by specifying this JSON/TOML key as an optional parameter to the ``Series::flush()`` or ``Attributable::seriesFlush()`` methods.

Additionally, specifying ``"disk_override"`` or ``"buffer_override"`` will take precedence over options specified without the ``_override`` suffix, allowing to invert the normal precedence order.
This way, a data producing code can hardcode the preferred flush target per ``flush()`` call, but users can e.g. still entirely deactivate flushing to disk in the ``Series`` constructor by specifying ``preferred_flush_target = buffer_override``.
This is useful when applying the asynchronous IO capabilities of the BP5 engine.
* ``adios2.dataset.operators``: This key contains a list of ADIOS2 `operators <https://adios2.readthedocs.io/en/latest/components/components.html#operator>`_, used to enable compression or dataset transformations.
Each object in the list has two keys:

Expand Down
9 changes: 9 additions & 0 deletions docs/source/usage/workflow.rst
Original file line number Diff line number Diff line change
Expand Up @@ -63,3 +63,12 @@ Attributes are (currently) unaffected by this:

* In writing, attributes are stored internally by value and can afterwards not be accessed by the user.
* In reading, attributes are parsed upon opening the Series / an iteration and are available to read right-away.

.. attention::

Note that the concrete implementation of ``Series::flush()`` and ``Attributable::seriesFlush()`` is backend-specific.
Using these calls does neither guarantee that data is moved to storage/transport nor that it can be accessed by independent readers at this point.

Some backends (e.g. the BP5 engine of ADIOS2) have multiple implementations for the openPMD-api-level guarantees of flush points.
For user-guided selection of such implementations, ``Series::flush`` and ``Attributable::seriesFlush()`` take an optional JSON/TOML string as a parameter.
See the section on :ref:`backend-specific configuration <backendconfig>` for details.
4 changes: 2 additions & 2 deletions include/openPMD/IO/ADIOS/ADIOS1IOHandler.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -50,7 +50,7 @@ class OPENPMDAPI_EXPORT ADIOS1IOHandler : public AbstractIOHandler
return "ADIOS1";
}

std::future<void> flush(internal::FlushParams const &) override;
std::future<void> flush(internal::ParsedFlushParams &) override;

void enqueue(IOTask const &) override;

Expand All @@ -72,7 +72,7 @@ class OPENPMDAPI_EXPORT ADIOS1IOHandler : public AbstractIOHandler
return "DUMMY_ADIOS1";
}

std::future<void> flush(internal::FlushParams const &) override;
std::future<void> flush(internal::ParsedFlushParams &) override;

private:
std::unique_ptr<ADIOS1IOHandlerImpl> m_impl;
Expand Down
43 changes: 37 additions & 6 deletions include/openPMD/IO/ADIOS/ADIOS2IOHandler.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@
#include "openPMD/IO/AbstractIOHandler.hpp"
#include "openPMD/IO/AbstractIOHandlerImpl.hpp"
#include "openPMD/IO/AbstractIOHandlerImplCommon.hpp"
#include "openPMD/IO/FlushParametersInternal.hpp"
#include "openPMD/IO/IOTask.hpp"
#include "openPMD/IO/InvalidatableFile.hpp"
#include "openPMD/IterationEncoding.hpp"
Expand Down Expand Up @@ -140,7 +141,7 @@ class ADIOS2IOHandlerImpl

~ADIOS2IOHandlerImpl() override;

std::future<void> flush(internal::FlushParams const &);
std::future<void> flush(internal::ParsedFlushParams &);

void
createFile(Writable *, Parameter<Operation::CREATE_FILE> const &) override;
Expand Down Expand Up @@ -209,6 +210,16 @@ class ADIOS2IOHandlerImpl
*/
adios2::Mode adios2AccessMode(std::string const &fullPath);

enum class FlushTarget : unsigned char
{
Buffer,
Buffer_Override,
Disk,
Disk_Override
};

FlushTarget m_flushTarget = FlushTarget::Disk;

private:
adios2::ADIOS m_ADIOS;
/*
Expand Down Expand Up @@ -412,6 +423,7 @@ namespace ADIOS2Defaults
constexpr const_str str_type = "type";
constexpr const_str str_params = "parameters";
constexpr const_str str_usesteps = "usesteps";
constexpr const_str str_flushtarget = "preferred_flush_target";
constexpr const_str str_usesstepsAttribute = "__openPMD_internal/useSteps";
constexpr const_str str_adios2Schema =
"__openPMD_internal/openPMD2_adios2_schema";
Expand Down Expand Up @@ -927,6 +939,8 @@ namespace detail
friend struct BufferedGet;
friend struct BufferedPut;

using FlushTarget = ADIOS2IOHandlerImpl::FlushTarget;

BufferedActions(BufferedActions const &) = delete;

/**
Expand Down Expand Up @@ -1039,10 +1053,26 @@ namespace detail
template <typename BA>
void enqueue(BA &&ba, decltype(m_buffer) &);

struct ADIOS2FlushParams
{
/*
* Only execute performPutsGets if UserFlush.
*/
FlushLevel level;
FlushTarget flushTarget = FlushTarget::Disk;

ADIOS2FlushParams(FlushLevel level_in) : level(level_in)
{}

ADIOS2FlushParams(FlushLevel level_in, FlushTarget flushTarget_in)
: level(level_in), flushTarget(flushTarget_in)
{}
};

/**
* Flush deferred IO actions.
*
* @param level Flush Level. Only execute performPutsGets if UserFlush.
* @param flushParams Flush level and target.
* @param performPutsGets A functor that takes as parameters (1) *this
* and (2) the ADIOS2 engine.
* Its task is to ensure that ADIOS2 performs Put/Get operations.
Expand All @@ -1057,7 +1087,7 @@ namespace detail
*/
template <typename F>
void flush(
FlushLevel level,
ADIOS2FlushParams flushParams,
F &&performPutsGets,
bool writeAttributes,
bool flushUnconditionally);
Expand All @@ -1067,7 +1097,7 @@ namespace detail
* and does not flush unconditionally.
*
*/
void flush(FlushLevel, bool writeAttributes = false);
void flush(ADIOS2FlushParams, bool writeAttributes = false);

/**
* @brief Begin or end an ADIOS step.
Expand Down Expand Up @@ -1265,7 +1295,8 @@ class ADIOS2IOHandler : public AbstractIOHandler
// we must not throw in a destructor
try
{
this->flush(internal::defaultFlushParams);
auto params = internal::defaultParsedFlushParams;
this->flush(params);
}
catch (std::exception const &ex)
{
Expand Down Expand Up @@ -1304,6 +1335,6 @@ class ADIOS2IOHandler : public AbstractIOHandler
return "ADIOS2";
}

std::future<void> flush(internal::FlushParams const &) override;
std::future<void> flush(internal::ParsedFlushParams &) override;
}; // ADIOS2IOHandler
} // namespace openPMD
2 changes: 1 addition & 1 deletion include/openPMD/IO/ADIOS/ParallelADIOS1IOHandler.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -54,7 +54,7 @@ class OPENPMDAPI_EXPORT ParallelADIOS1IOHandler : public AbstractIOHandler
return "MPI_ADIOS1";
}

std::future<void> flush(internal::FlushParams const &) override;
std::future<void> flush(internal::ParsedFlushParams &) override;
#if openPMD_HAVE_ADIOS1
void enqueue(IOTask const &) override;
#endif
Expand Down
23 changes: 21 additions & 2 deletions include/openPMD/IO/AbstractIOHandler.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -103,12 +103,24 @@ namespace internal
struct FlushParams
{
FlushLevel flushLevel = FlushLevel::InternalFlush;
std::string backendConfig = "{}";

explicit FlushParams()
{}
FlushParams(FlushLevel flushLevel_in) : flushLevel(flushLevel_in)
{}
FlushParams(FlushLevel flushLevel_in, std::string backendConfig_in)
: flushLevel(flushLevel_in)
, backendConfig{std::move(backendConfig_in)}
{}
};

/*
* To be used for reading
*/
constexpr FlushParams defaultFlushParams{};
FlushParams const defaultFlushParams{};

struct ParsedFlushParams;
} // namespace internal

/** Interface for communicating between logical and physically persistent data.
Expand Down Expand Up @@ -164,7 +176,14 @@ class AbstractIOHandler
* @return Future indicating the completion state of the operation for
* backends that decide to implement this operation asynchronously.
*/
virtual std::future<void> flush(internal::FlushParams const &) = 0;
std::future<void> flush(internal::FlushParams const &);

/** Process operations in queue according to FIFO.
*
* @return Future indicating the completion state of the operation for
* backends that decide to implement this operation asynchronously.
*/
virtual std::future<void> flush(internal::ParsedFlushParams &) = 0;

/** The currently used backend */
virtual std::string backendName() const = 0;
Expand Down
2 changes: 1 addition & 1 deletion include/openPMD/IO/DummyIOHandler.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,6 @@ class DummyIOHandler : public AbstractIOHandler
/** No-op consistent with the IOHandler interface to enable library use
* without IO.
*/
std::future<void> flush(internal::FlushParams const &) override;
std::future<void> flush(internal::ParsedFlushParams &) override;
}; // DummyIOHandler
} // namespace openPMD
Loading