Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API Docs/Cheet Sheet: Collective, Blocking and Memory Contracts #1268

Open
ax3l opened this issue Mar 8, 2019 · 11 comments
Open

API Docs/Cheet Sheet: Collective, Blocking and Memory Contracts #1268

ax3l opened this issue Mar 8, 2019 · 11 comments
Assignees

Comments

@ax3l
Copy link
Contributor

ax3l commented Mar 8, 2019

Hi,

we are currently sitting in the ADIOS2 tutorial at TU Dresden and as with ADIOS1, the following would be extremely useful:

A cheat sheet (or API documentation) of all public API calls (engine open/close, defines, step begin/end puts) that documents for each of those:

  • is it MPI collective or local?
  • is it MPI blocking/non-blocking?
  • document memory contracts: when is a user allowed or forbidden to modify shared buffers and what will happen if the user does (ignored, overwritten, undefined behavior, etc.)
  • does any of the above change with choice of engine?

I personally like a lot about ADIOS1, that most calls on variables are local, non-blocking and not even collective in the MPI sense, but just explicitly documenting this will make starting with ADIOS2 for folks that already know MPI (everyone in HPC) so much easier.

@ax3l ax3l changed the title API Docs/Cheet Sheet API Docs/Cheet Sheet: Collective, Blocking and Memory Contracts Mar 8, 2019
@williamfgc
Copy link
Contributor

@ax3l point taken. Thanks for the feedback as usual. In the meantime, have you had the chance to look at https://adios2.readthedocs.io/en/latest/ ? Would the section on language API bindings (uses doxygen) help? We are always improving the docs based on applications feedback. Thanks!

@williamfgc williamfgc self-assigned this Mar 8, 2019
@ax3l
Copy link
Contributor Author

ax3l commented Mar 8, 2019

Thanks, we enjoy the staging demos very much over here.

https://adios2.readthedocs.io

Argh, I have overlooked that in the README. Do you want to link it in the repo URL (next to the repo description on GitHub) just to make it more prominent? Will dig in.

@williamfgc
Copy link
Contributor

williamfgc commented Mar 8, 2019

@ax3l good idea. Feel free to provide feedback if the docs are not clear (I will add the cheat sheet on MPI-related calls), all the https://adios2.readthedocs.io sources live under ADIOS2/docs and we use the breathe package to pull doxygen API info from C++ and C headers under ADIOS2/bindings. I haven't found a good way to automate Python and Fortran APIs due to their not so friendly nature with doxygen.

@germasch
Copy link
Contributor

germasch commented Mar 8, 2019

  • is it MPI collective or local?

Let me second this particular point, as that is something that I'm still not clear on. The only thing I've seen in the docs is

Always pass MPI_COMM_SELF if an Engine lives in only one MPI process. Open and Close are collective operations.

Which makes sense, but I'm sure that's not all that's collective. In particular, how about PerformPuts()? (I'd expect that to be collective) And a simple Put()? (I'd expect that to be local for, e.g. BP3, for sure, but I'm much less sure what happens if you're using some M procs -> N files mapping?)

@williamfgc
Copy link
Contributor

williamfgc commented Mar 8, 2019

@germasch @ax3l this is where the virtual nature of the abstract Engine functions becomes relevant. The only absolute functions that are always collective and common to all Engines are the ADIOS constructor/destructor (this is something @germasch introduced), and Open/Close. Each Engine will decide the nature of the virtual calls. In BP, by default, EndStep is collective, but certain parameters makes it non-collective (CollectiveMetadata =OFF) or partially-collective (CollectiveMetadat=OFF and M-to-N), or some collective some not (buffer a certain number of steps), also buffer size might make certain Put operations partially-collective, but they are mostly local (including PerformPuts).
For BP3 parameters: https://adios2.readthedocs.io/en/latest/engines/engines.html#bp3-default
Other engines act differently according to their parameters, but sticking to the same API and memory contracts.

Keep in mind, the main difference between ADIOS1 and ADIOS2 is that ADIOS2 is a framework abstraction with key/value parameters that can alter the internal nature of the Engine virtual functions to improve performance. For the most part, IO (except Open), Variable, Attribute and Operator functions are local. Each Engine needs to document its abstraction and parameters to concretize for their use cases.

@williamfgc
Copy link
Contributor

williamfgc commented Mar 8, 2019

@germasch @ax3l since you guys are physicists and comfortable with numerical solvers, we can make the analogy to PETSc KSP interface....each solver/engine allows for a set of parameters (e.g. Preconditioners, accuracy, norm type) to relax, optimize or improve the solution based on the nature of your problem (e.g. matrix type, partition used, communication,...). ADIOS2 is no different is that regard, Engines need better docs, though. Hope this helps.

@germasch
Copy link
Contributor

germasch commented Mar 8, 2019

Thanks, that makes sense. As you say, it'd be good to have this document, in particular cases where, e.g., a Put() is usually not collective but may sometimes be if, say, a buffer is full.

@ax3l
Copy link
Contributor Author

ax3l commented Mar 9, 2019

Thanks for the details!

Yes, the idea is basically for users to know what they can write in an if and what not, because it might block which leads to deadlocks or it might be collective in some cases. Examples are ranks that contribute to the engine's MPI communicator but add zero-data (e.g. because they temporarily only model a vacuum without particles in an MD-sim, etc.).

Similar to C++ noexcept/throw() the user must somehow be made aware about what is safe to put in a branched execution and what not.

@williamfgc williamfgc mentioned this issue Mar 13, 2019
@ax3l
Copy link
Contributor Author

ax3l commented Mar 14, 2019

#1286 addresses MPI questions, thanks a lot!

Maybe, if I haven't missed it, the memory contracts could be described on Put/Get, Begin/EndStep and Perform* in detail as well?

@williamfgc
Copy link
Contributor

@ax3l https://adios2.readthedocs.io/en/latest/components/components.html#engine-api-functions I'll try to make this section more clear and improve the narrative on memory contracts. The API docs, too. Thanks!

@williamfgc
Copy link
Contributor

Readthedocs have now a section for Put and Get memory contracts for the pointer (address) and data contents: https://adios2.readthedocs.io/en/latest/components/components.html#put-modes-and-memory-contracts

https://adios2.readthedocs.io/en/latest/components/components.html#get-modes-and-memory-contracts

The are similar to C++ deferred launch mode https://en.cppreference.com/w/cpp/thread/launch

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants