Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parallel JSON #1475

Merged
merged 10 commits into from
Feb 28, 2024
Merged
2 changes: 1 addition & 1 deletion CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -1370,7 +1370,7 @@ if(openPMD_BUILD_TESTING)
--outfile \
../samples/git-sample/thetaMode/data_%T.bp && \
\
${Python_EXECUTABLE} \
${MPI_TEST_EXE} ${Python_EXECUTABLE} \
${openPMD_RUNTIME_OUTPUT_DIRECTORY}/openpmd-pipe \
--infile ../samples/git-sample/thetaMode/data_%T.bp \
--outfile ../samples/git-sample/thetaMode/data%T.json \
Expand Down
37 changes: 35 additions & 2 deletions docs/source/backends/json.rst
Original file line number Diff line number Diff line change
Expand Up @@ -92,7 +92,6 @@ propagate the exception thrown by Niels Lohmann's library.

The (keys) names ``"attributes"``, ``"data"`` and ``"datatype"`` are reserved and must not be used for base/mesh/particles path, records and their components.

A parallel (i.e. MPI) implementation is *not* available.

TOML Restrictions
-----------------
Expand All @@ -106,7 +105,41 @@ TOML does not support null values.

The (keys) names ``"attributes"``, ``"data"`` and ``"datatype"`` are reserved and must not be used for base/mesh/particles path, records and their components.

A parallel (i.e. MPI) implementation is *not* available.

Using in parallel (MPI)
-----------------------

Parallel I/O is not a first-class citizen in the JSON and TOML backends, and neither backend will "go out of its way" to support parallel workflows.

However there is a rudimentary form of read and write support in parallel:

Parallel reading
................

In order not to overload the parallel filesystem with parallel reads, read access to JSON datasets is done by rank 0 and then broadcast to all other ranks.
Note that there is no granularity whatsoever in reading a JSON file.
A JSON file is always read into memory and broadcast to all other ranks in its entirety.

Parallel writing
................

When executed in an MPI context, the JSON/TOML backends will not directly output a single text file, but instead a folder containing one file per MPI rank.
Neither backend will perform any data aggregation at all.

.. note::

The parallel write support of the JSON/TOML backends is intended mainly for debugging and prototyping workflows.

The folder will use the specified Series name, but append the postfix ``.parallel``.
(This is a deliberate indication that this folder cannot directly be opened again by the openPMD-api as a JSON/TOML dataset.)
This folder contains for each MPI rank *i* a file ``mpi_rank_<i>.json`` (resp. ``mpi_rank_<i>.toml``), containing the serial output of that rank.
A ``README.txt`` with basic usage instructions is also written.

.. note::

There is no direct support in the openPMD-api to read a JSON/TOML dataset written in this parallel fashion. The single files (e.g. ``data.json.parallel/mpi_rank_0.json``) are each valid openPMD files and can be read separately, however.

Note that the auxiliary function ``json::merge()`` (or in Python ``openpmd_api.merge_json()``) is not adequate for merging the single JSON/TOML files back into one, since it does not merge anything below the array level.


Example
Expand Down
15 changes: 14 additions & 1 deletion include/openPMD/IO/JSON/JSONIOHandler.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -24,17 +24,30 @@
#include "openPMD/IO/AbstractIOHandler.hpp"
#include "openPMD/IO/JSON/JSONIOHandlerImpl.hpp"

#if openPMD_HAVE_MPI
#include <mpi.h>
#endif

namespace openPMD
{
class JSONIOHandler : public AbstractIOHandler
{
public:
JSONIOHandler(
std::string const &path,
std::string path,
Access at,
openPMD::json::TracingJSON config,
JSONIOHandlerImpl::FileFormat,
std::string originalExtension);
#if openPMD_HAVE_MPI
JSONIOHandler(
std::string path,
Access at,
MPI_Comm,
openPMD::json::TracingJSON config,
JSONIOHandlerImpl::FileFormat,
std::string originalExtension);
#endif

~JSONIOHandler() override;

Expand Down
20 changes: 19 additions & 1 deletion include/openPMD/IO/JSON/JSONIOHandlerImpl.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,9 @@

#include <istream>
#include <nlohmann/json.hpp>
#if openPMD_HAVE_MPI
#include <mpi.h>
#endif

#include <complex>
#include <fstream>
Expand Down Expand Up @@ -70,6 +73,7 @@ struct File

std::string name;
bool valid = true;
bool printedReadmeWarningAlready = false;
};

std::shared_ptr<FileState> fileState;
Expand Down Expand Up @@ -167,6 +171,15 @@ class JSONIOHandlerImpl : public AbstractIOHandlerImpl
FileFormat,
std::string originalExtension);

#if openPMD_HAVE_MPI
JSONIOHandlerImpl(
AbstractIOHandler *,
MPI_Comm,
openPMD::json::TracingJSON config,
FileFormat,
std::string originalExtension);
#endif

~JSONIOHandlerImpl() override;

void
Expand Down Expand Up @@ -230,6 +243,10 @@ class JSONIOHandlerImpl : public AbstractIOHandlerImpl
std::future<void> flush();

private:
#if openPMD_HAVE_MPI
std::optional<MPI_Comm> m_communicator;
#endif

using FILEHANDLE = std::fstream;

// map each Writable to its associated file
Expand Down Expand Up @@ -323,7 +340,8 @@ class JSONIOHandlerImpl : public AbstractIOHandlerImpl

// write to disk the json contents associated with the file
// remove from m_dirty if unsetDirty == true
void putJsonContents(File const &, bool unsetDirty = true);
auto putJsonContents(File const &, bool unsetDirty = true)
-> decltype(m_jsonVals)::iterator;

// figure out the file position of the writable
// (preferring the parent's file position) and extend it
Expand Down
19 changes: 17 additions & 2 deletions src/IO/AbstractIOHandlerHelper.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -125,8 +125,23 @@ std::unique_ptr<AbstractIOHandler> createIOHandler<json::TracingJSON>(
"ssc",
std::move(originalExtension));
case Format::JSON:
throw error::WrongAPIUsage(
"JSON backend not available in parallel openPMD.");
return constructIOHandler<JSONIOHandler, openPMD_HAVE_JSON>(
"JSON",
path,
access,
comm,
std::move(options),
JSONIOHandlerImpl::FileFormat::Json,
std::move(originalExtension));
case Format::TOML:
return constructIOHandler<JSONIOHandler, openPMD_HAVE_JSON>(
"JSON",
path,
access,
comm,
std::move(options),
JSONIOHandlerImpl::FileFormat::Toml,
std::move(originalExtension));
default:
throw error::WrongAPIUsage(
"Unknown file format! Did you specify a file ending? Specified "
Expand Down
18 changes: 16 additions & 2 deletions src/IO/JSON/JSONIOHandler.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -26,15 +26,29 @@ namespace openPMD
JSONIOHandler::~JSONIOHandler() = default;

JSONIOHandler::JSONIOHandler(
std::string const &path,
std::string path,
Access at,
openPMD::json::TracingJSON jsonCfg,
JSONIOHandlerImpl::FileFormat format,
std::string originalExtension)
: AbstractIOHandler{path, at}
: AbstractIOHandler{std::move(path), at}
, m_impl{this, std::move(jsonCfg), format, std::move(originalExtension)}
{}

#if openPMD_HAVE_MPI
JSONIOHandler::JSONIOHandler(
std::string path,
Access at,
MPI_Comm comm,
openPMD::json::TracingJSON jsonCfg,
JSONIOHandlerImpl::FileFormat format,
std::string originalExtension)
: AbstractIOHandler{std::move(path), at}
, m_impl{JSONIOHandlerImpl{
this, comm, std::move(jsonCfg), format, std::move(originalExtension)}}
{}
#endif

std::future<void> JSONIOHandler::flush(internal::ParsedFlushParams &)
{
return m_impl.flush();
Expand Down
Loading
Loading