You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Why is this feature important?
I am setting up a simulation workflow using SST that has a large memory footprint, hence I specify "QueueLimit" = "1". However, the second step is still opened, and accordingly a second buffer is allocated. The writer is only blocked at Engine::EndStep(), effectively implying a memory footprint as if specifying QueueLimit=2.
Attached is the memory footprint of a simulation that outputs 4 ADIOS2 steps, each step with a size of 40Gb per rank. Each step after the first has a 40Gb higher memory footprint compared to the first, due to two steps being held in memory.
In this picture, the red area (arrows) is held by the SST buffer, the rest (orange and others) is PIConGPU memory outside ADIOS2.
What is the potential impact of this feature in the community?
Memory-sensitive setups can make better use of SST.
The simulation is blocked at a place that makes more sense (blocking at EndStep does not make a lot of sense; at this point the data can also be published for the benefit of fast readers).
Is your feature request related to a problem? Please describe.
described above
Describe the solution you'd like and potential required effort
Instead of blocking the Engine::EndStep() call of the second step, it would be helpful for us if the Engine::BeginStep() would already be blocked.
Potential effort depends on the implementation and internal logic of SST, from my superficial point of view, the effort might be relatively small.
Describe alternatives you've considered and potential required effort
There is effectively no way currently to specify a QueueLimit that is really 1. Specifying a QueueLimit n implies a memory usage of (n+1) times the buffer size.
Additional context
I reused the example from #3453 for a small demonstration of this behavior:
Writer
The reader sleeps for 15 seconds after opening the first step.
The output up until 15 seconds is the following, implying that the writer is allowed to create 2 steps:
Unfortunately, BeginStep is not a collective call on any engine as far as I can tell, but instead just does local operations. Giving it the ability to block on queue size would at least require making it collective so that all ranks could do the same thing. I think that the other engines are done with data by the time they exit EndStep, with SST being unique in that it's not. Maybe we could make that argument that it's OK for BeginStep to be collective in SST and not in others? Maybe only collective if there's queue limit set? Or maybe we should introduce another call that might block waiting for the last timestep to be consumed before continuing (which would be more flexible than just making that be BeginStep)?
Hello Greg, thanks for the answer. To be fair, I was not aware that BeginStep() is not collective, so far we have always treated it as a collective call. So, speaking from our perspective only, actually making it collective in SST is no problem.
Maybe only collective if there's queue limit set? Or maybe we should introduce another call that might block waiting for the last timestep to be consumed before continuing (which would be more flexible than just making that be BeginStep)?
Maybe an engine parameter? E.g., as new options for QueueFullPolicy: BlockAtBeginStep, BlockAtEndStep.
Why is this feature important?
I am setting up a simulation workflow using SST that has a large memory footprint, hence I specify
"QueueLimit" = "1"
. However, the second step is still opened, and accordingly a second buffer is allocated. The writer is only blocked atEngine::EndStep()
, effectively implying a memory footprint as if specifyingQueueLimit=2
.Attached is the memory footprint of a simulation that outputs 4 ADIOS2 steps, each step with a size of 40Gb per rank. Each step after the first has a 40Gb higher memory footprint compared to the first, due to two steps being held in memory.
In this picture, the red area (arrows) is held by the SST buffer, the rest (orange and others) is PIConGPU memory outside ADIOS2.
What is the potential impact of this feature in the community?
Memory-sensitive setups can make better use of SST.
The simulation is blocked at a place that makes more sense (blocking at EndStep does not make a lot of sense; at this point the data can also be published for the benefit of fast readers).
Is your feature request related to a problem? Please describe.
described above
Describe the solution you'd like and potential required effort
Instead of blocking the
Engine::EndStep()
call of the second step, it would be helpful for us if theEngine::BeginStep()
would already be blocked.Potential effort depends on the implementation and internal logic of SST, from my superficial point of view, the effort might be relatively small.
Describe alternatives you've considered and potential required effort
There is effectively no way currently to specify a QueueLimit that is really 1. Specifying a QueueLimit n implies a memory usage of (n+1) times the buffer size.
Additional context
I reused the example from #3453 for a small demonstration of this behavior:
Writer
Reader:
The reader sleeps for 15 seconds after opening the first step.
The output up until 15 seconds is the following, implying that the writer is allowed to create 2 steps:
The text was updated successfully, but these errors were encountered: