Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Put/Span #1322

Open
germasch opened this issue Mar 24, 2019 · 4 comments
Open

Put/Span #1322

germasch opened this issue Mar 24, 2019 · 4 comments

Comments

@germasch
Copy link
Contributor

So I'm kinda reluctant to open yet another issue related to spans, nor do I actually much feel like going through this discussion again. Then again, it's still an open issue, so I'll try to summarize where I think we're at.

  • Put(Sync) and Put(Deferred) -> PerformPuts() mirror MPI_Send and MPI_Isend -> MPI_Wait in terms of when data is ready when data can be touched by whom.

If this is not what ADIOS2's API meant to express, what follows may not apply.

  • A way for the app to write directly into ADIOS2's internal buffer can be useful in certain cases, because it avoids a copy of the data (avoids performance penalty) and allocating a second data buffer in the first place (avoids allocating twice as much memory as actually needed.) (Different kinds of engines and avoiding copies #1315)

I think we agree on that. The current Span interface achieves those goals. It comes with caveats (I'm trying to summarize #1290 primarily -- that's a long thread, so the below won't be complete. If you feel I forgot anything important, let me know and I'll add it):

  • The memory it returns may not be aligned to usual malloc guarantees (Span.Data() alignment #1314). This may or may not be fixable, and it may or may no be an actual problem for users of the Span interface (but it should be documented at least)
  • If the API to return the buffer to be filled to the user is called Put, that breaks the expectation that data is ready at the time of Put, which is currently satisfied by the two existing Put(Sync) and Put(Deferred).
  • The buffer returned to the user is subject to change if the underlying adios2 buffer gets realloced. This should be documented. To me, it also means that the choice "Span" is a questionable name for what's returned, because it was meant to imply an analogy to std::span, but std::span never changes data() under you. Anyway, can be handled with documentation.
  • The current implementation of this interface uses Put to return the buffer, but then doesn't use Put at the time the data is ready, so implicitly, the data is ready at PerformPuts time. To me, that's not consistent with the existing interface, and also means that there is no API to support potential overlap (the Put(Deferred) -> PerformPuts type) for engines that can use it, like InsituMPI.

So what I suggested in #1290 are two changes:

  • rename the Span-returning function to PutPrealloc (The actual name is a suggestion that could be discussed (one alternative would be GetPutBuffer or something like this). I don't think I've found a perfect name for it, but IMO the current choice of Put is already taken to mean something else.
  • use Put(Sync) and Put(Deferred) on prealloced buffers just like on regular buffers.

I had previously put forth these changes in PR #1305. (I'm not asking for it to merged here. It'd need some work and probably would conflict anyway in its old shape). It serves to demonstrate that my suggestion is quite easy straightforward to implement.

Let me try to summarize the arguments against it:

  • Put can be overloaded, so everything Put-related should be called Put.
  • Put(Sync) implies data readiness. Put(Deferred) implies data pointer readiness, but not data content readiness, which in this case happens at PerformPuts. Hence Put doesn't actually imply data content readiness.
  • It's implicitly clear that the new Put overload returns a buffer.
  • The returned buffer is initialized (to zero or some other constant value) at prealloc time, so the buffer is actually put into adios2 at that time.
  • Symmetry with Get is broken.

I'm not going to try to rehash the the discussion from #1290 here. I do want to address the symmetry with a future internal-buffer access version of Get a little bit, but if this turns into a discussion on that interface, I think that also better be done in a separate thread:

  • I think there may be a benefit on the Get side of things to have an interface that avoids a copy / duplicate memory as well.
  • Get is actually more complicated, because generally speaking, what's returned to the user is a view into the global array, which may well be assembled for a number of blocks that may even live in different files. Once you have to assemble things, you have to copy anyway, in which case the existing interface will do just fine.
  • However, there is a case where the app may ask for exactly the block of data that was previously written (as one unit) into the file, and on the same proc. That may actually be an important use case, as checkpoint / restart would often see this pattern.
  • I think Get(Sync) would be the right name for a function returning something Span like, with the expectation that the data will be accessible in the returned Span after the Get(Sync) returns.
  • I also think there should be a Get(Deferred) for internal buffer access, that would also return a Span, but with the expectation that the data is not yet available, but only after PerformGets. Same semantics as the current API. This would be useful if, for example, the data to be read actually lives on a different proc, so some MPI_Irecv() could be started, but not yet MPI_Waited for.
  • In either case, when using the internal buffer, there needs to be a mechanism to tell the library that the user is done with using the preallocated data, so that the library can free or reuse the buffer. Something like GetRelease (that's what I just came up with, hopefully someone else will come up with a better choice.)

So what I'm getting at is that there's is no perfect symmetry between Put and Get, there already is not in the existing case. (Sure, there's Get / PerformGets and Put / PerformPuts. But the underlying semantics is actually quite different, as to when the data is ready, e.g., One takes a T*, the other a const T*, and for good reason.) An internal-buffer based Get should maintain the semantics of Get/PerformGets, but not expect to get add an additional function, just like on the Put side.

@germasch
Copy link
Contributor Author

Alright, so last thing, to actually add something new, the Span interface returning (essentially) a pointer into an internal buffer:

  • Doing so is dangerous. If the app writes beyond the extent, internals of the BP3 file will be messed up. Then again, that's actually true always, if you write out-of-bounds of the memory region that was allocated, all bets are off.
  • Having the pointer change under you is dangerous. When I first wrote some tests for this interface, I didn't expect that behavior, so I wrote incorrect code -- not because I wanted to demonstrate that it can be used incorrectly, but because it happened.

I think these are two important drawbacks, but essentially I think one has to just live with them, because there is no better way (that I can think of) short of reengineering the BP3 Serializer completely to use a chain of buffers or something like that, which might be a way to prevent the 2nd issue, but I don't think it's an option any time soon. So mostly what this means is just that these issues should be documented clearly.

I still have some suggestions that you could consider:

  • Having internal buffer realloc'ed is not only bad because it invalidates the Span's data() pointer, but also because you're presumably using this interface for performance reasons in the first place (though it should be noted that realloc is often quite fast these days). So while this in general can't be prevented, one could do the following:
  • Add a PutReserve API call, which you can use to tell the engine the total amount of buffer space you will need. The engine can do the realloc at the time, before handing out any pointers that'll change. (Note: I think that's kinda possible right now by some Engine parameter, but how many buffers the app needs may change over the course of the simulation (it does for mine, due to load balancing).
  • Add a lock flag to the PutPrealloc (using my name for it), ideally make it true by default.
    If the underlying buffer runs out of space, if this flag is on, throw an exception rather than changing data() under the user. If the flag is off, keep the current behavior, as the user indicated they know what they're doing and will handle it.

@germasch
Copy link
Contributor Author

germasch commented Apr 3, 2019

I swear I'm not making this up: After updating adios2 to current master, an existing user of adios2 (my particle in cell code) got broken. Simplified snippet:

    unsigned long length = 99;
    writer.Put(var, length);

This used to work. Now:

../src/libpsc/include/grid.inl:20:12: error: call to member function 'Put' is ambiguous
    writer.Put(var, length);
    ~~~~~~~^~~
/Users/kai/local/adios2/include/cxx11/Engine.h:105:5: note: candidate function [with T = unsigned long]
    Put(Variable<T> variable, const size_t bufferID = 0, const T &value = {});
    ^
/Users/kai/local/adios2/include/cxx11/Engine.h:154:10: note: candidate function [with T = unsigned long]
    void Put(Variable<T> variable, const T &datum,

Adding a new overload to Put in the Span-related updates broke existing use by making it ambiguous.

@williamfgc
Copy link
Contributor

@germasch I remember seeing this with my local compilers, thought it's fixed. It seems is only triggered when used or by certain compilers. Thanks for reporting this, I will think of a fix. BTW, for coverage purposes, which compiler are you using?

@germasch
Copy link
Contributor Author

germasch commented Apr 3, 2019

I believe this was with

Apple LLVM version 9.1.0 (clang-902.0.39.2)

It shouldn't be compiler dependent. The issue occurs if you're trying to write a single value with a type that is an alias to size_t. There is an obvious fix: Rename Put[Span-related overload] to PutPrealloc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants