Skip to content

Commit

Permalink
Doc: Design Cosmetics
Browse files Browse the repository at this point in the history
Fix rst and format properly.
  • Loading branch information
C0nsultant authored and ax3l committed Sep 13, 2019
1 parent 98acc3f commit 24de1c3
Show file tree
Hide file tree
Showing 2 changed files with 43 additions and 54 deletions.
96 changes: 42 additions & 54 deletions docs/source/dev/design.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,11 @@
Design Overview
===============

.. note::

This section is a stub.
Please open a pull-request to improve it or open an issue with open questions.

This library consists of three conceptual parts:

- The backend, concerned with elementary low-level I/O operations.
Expand All @@ -12,47 +17,32 @@ This library consists of three conceptual parts:

Backend
-------
One of the main goals of this library is to provide a high-level common interface to synchronize persistent data with a
volatile representation in memory. This includes handling data in any number of supported file formats transparently.
Therefore, enabling users to handle hierarchical, self-describing file formats while disregarding the actual
nitty-gritty details of just those file formats, required the reduction of possible operations reduced to a common set
of `IOTasks <https://github.com/openPMD/openPMD-api/blob/dev/include/openPMD/IO/IOTask.hpp>`:
One of the main goals of this library is to provide a high-level common interface to synchronize persistent data with a volatile representation in memory.
This includes handling data in any number of supported file formats transparently.
Therefore, enabling users to handle hierarchical, self-describing file formats while disregarding the actual nitty-gritty details of just those file formats, required the reduction of possible operations reduced to a common set of `IOTasks <https://github.com/openPMD/openPMD-api/blob/dev/include/openPMD/IO/IOTask.hpp>`_:

.. literalinclude:: IOTask.hpp
:language: cpp
:lines: 41-66

Every task is designed to be a fully self-contained description of one such atomic operation. By describing a required
minimal step of work (without any side-effect), these operations are the foundation of the unified handling mechanism
across suitable file formats.
The actual low-level exchange of data is implemented in IOHandlers, one per file format (possibly two if handling
MPI-parallel work is possible and requires different behaviour).
The only task of these IOHandlers is to execute one atomic ``IOTask`` at a time. Ideally, additional logic is contained
to improve performance by keeping track of open file handles, deferring and coalescing parts of work, avoiding redundant
operations. It should be noted that while this is desirable, sequential consistency must be guaranteed (see
:ref:`_queue_label`.)

``AbstractParameter`` and subclasses as typesafe descriptions of task parameters

``Writable`` as unique identification in task, corresponding to node in frontend hierarchy (tree-like structure)

subclass of ``AbstractIOHandler`` to ensure simple extensibilty

only two public interface methods (``enqueue()`` and ``flush()``) to hide separate behaviour & state
Every task is designed to be a fully self-contained description of one such atomic operation. By describing a required minimal step of work (without any side-effect), these operations are the foundation of the unified handling mechanism across suitable file formats.
The actual low-level exchange of data is implemented in ``IOHandlers``, one per file format (possibly two if handlingi MPI-parallel work is possible and requires different behaviour).
The only task of these IOHandlers is to execute one atomic ``IOTask`` at a time.
Ideally, additional logic is contained to improve performance by keeping track of open file handles, deferring and coalescing parts of work, avoiding redundant operations.
It should be noted that while this is desirable, sequential consistency must be guaranteed (see :ref:`queue_label`.)

``AbstractFilePosition`` as a format-dependent location inside persistent data (e.g. node-id / path string)

should be entirely agnostic to openPMD and just treat transferred data as raw bytes without *knowledge*
*Note* this paragraph is a stub:
``AbstractParameter`` and subclasses as typesafe descriptions of task parameters, ``Writable`` as unique identification in task, corresponding to node in frontend hierarchy (tree-like structure), subclass of ``AbstractIOHandler`` to ensure simple extensibilty, and only two public interface methods (``enqueue()`` and ``flush()``) to hide separate behaviour & state ``AbstractFilePosition`` as a format-dependent location inside persistent data (e.g. node-id / path string) should be entirely agnostic to openPMD and just treat transferred data as raw bytes without *knowledge*

.. _queue_label:

I/O-Queue
---------
To keep coupling between openPMD logic and actual low-level I/O to a minimum, a sequence of atomic I/O-Tasks is used to
transfer data between logical and physical representation. Individual tasks are scheduled by frontend application logic
and stored in a data structure that allows for FIFO order processing (in future releases, this order might be relaxed).
Tasks are not executed during their creation, but are instead buffered in this queue. Disk accesses can be coalesced and
high access latencies can be amortized by performing multiple tasks bunched together.
At appropriate points in time, the used backend processes all pending tasks (strict, single-threaded, synchronous FIFO
is currently used in all backends, but is not mandatory as long as consistency with that order can be guaranteed).
To keep coupling between openPMD logic and actual low-level I/O to a minimum, a sequence of atomic I/O-Tasks is used to transfer data between logical and physical representation.
Individual tasks are scheduled by frontend application logic and stored in a data structure that allows for FIFO order processing (in future releases, this order might be relaxed).
Tasks are not executed during their creation, but are instead buffered in this queue.
Disk accesses can be coalesced and high access latencies can be amortized by performing multiple tasks bunched together.
At appropriate points in time, the used backend processes all pending tasks (strict, single-threaded, synchronous FIFO is currently used in all backends, but is not mandatory as long as consistency with that order can be guaranteed).

A typical sequence of tasks that are scheduled during the read of an existing file *could* look something like this:

Expand All @@ -77,30 +67,27 @@ A typical sequence of tasks that are scheduled during the read of an existing fi
9.X OPEN_PATH // every 'path' in 8.
...

Note that (especially for reading), pending tasks might have to be processed between any two steps to guarantee data
consistency. That is because action might have to be taken conditionally on read or written values, openPMD conformity
checked to fail fast, or a processing of the tasks be requested by the user explicitly.
Note that (especially for reading), pending tasks might have to be processed between any two steps to guarantee data consistency.
That is because action might have to be taken conditionally on read or written values, openPMD conformity checked to fail fast, or a processing of the tasks be requested by the user explicitly.

As such, FIFO-equivalence with the scheduling order must be satisfied. A task that is not located at the head of the
queue (i.e. does not have the earliest schedule time of all pending tasks) is not guaranteed to succeed in isolation.
As such, FIFO-equivalence with the scheduling order must be satisfied.
A task that is not located at the head of the queue (i.e. does not have the earliest schedule time of all pending tasks) is not guaranteed to succeed in isolation.
Currently, this can only guaranteed by sequentially performing all tasks scheduled prior to it in chronological order.
To give two examples where this matters:
- Reading value chunks from a dataset only works after the dataset has been opened. Due to limitations in some of the
backends and the atomic nature of the I/O-tasks in this API (i.e. operations without side effects), datatype and
extent of a dataset are only obtained by opening the dataset. For some backends this information is required for
chunk reading and thus must be known prior to performing the read.
- Consecutive chunk writing and reading (to the same dataset) mirrors classical RAW data dependence. The two chunks
might overlap, in which case the read has to reflect the value changes introduced by the write.

Atomic operations contained in this queue are
* Reading value chunks from a dataset only works after the dataset has been opened.
Due to limitations in some of the backends and the atomic nature of the I/O-tasks in this API (i.e. operations without side effects), datatype and extent of a dataset are only obtained by opening the dataset.
For some backends this information is required for chunk reading and thus must be known prior to performing the read.
* Consecutive chunk writing and reading (to the same dataset) mirrors classical RAW data dependence.
The two chunks might overlap, in which case the read has to reflect the value changes introduced by the write.

Atomic operations contained in this queue are ...

Frontend
--------
While the other two components are primarily concerned with actual I/O, this one is the glue and constraint logic that
lets a user build the in-memory view of the hierarchical file structure. Public interfaces should be limited to this
part (exceptions may arise, e.g. format-dependent dataset parameters).
Where the other parts contain virtually zero knowledge about openPMD, this one contains all of it and none of the
low-level I/O.
While the other two components are primarily concerned with actual I/O, this one is the glue and constraint logic that lets a user build the in-memory view of the hierarchical file structure.
Public interfaces should be limited to this part (exceptions may arise, e.g. format-dependent dataset parameters).
Where the other parts contain virtually zero knowledge about openPMD, this one contains all of it and none of the low-level I/O.

``Writable`` (mixin) base class of every front-end class, used to tree structure used in backend

Expand All @@ -114,11 +101,12 @@ low-level I/O.
forces them into the correct hierarchy (no dangling objects)

all meta-data access stores in the ``Attributable`` part of an object and follows the syntax
::

Object& setFoo(Foo foo);
Foo foo() const;
.. code-block:: cpp
Object& setFoo(Foo foo);
Foo foo() const;
(future work: use `CRTP <https://en.wikipedia.org/wiki/Curiously_recurring_template_pattern>`
(future work: use `CRTP <https://en.wikipedia.org/wiki/Curiously_recurring_template_pattern>`_)

``Series`` as root of every hierarchy, supporting ``groupBased`` and ``fileBased`` transparently
``Series`` as root of every hierarchy, supporting ``groupBased`` and ``fileBased`` transparently ...
1 change: 1 addition & 0 deletions docs/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -64,6 +64,7 @@ Development

dev/contributing
dev/repostructure
dev/design
dev/backend
dev/dependencies
dev/buildoptions
Expand Down

0 comments on commit 24de1c3

Please sign in to comment.