-
Notifications
You must be signed in to change notification settings - Fork 128
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Names in API: Feel "More Like XML" #1281
Comments
Hi Axel
One of the things with ADIOS is that it’s open source and it would be great to have someone in Dresden who could implement this in ADIOS 2 so the community would have this. Do you see this as much work. We could provide some assistance to who ever would do this. I like what you are saying but since we have some critical deadlines associated with different ECP APPS we need more assistance in these valuable requests.
Thanks
Scott
…________________________________
From: Axel Huebl <[email protected]>
Date: March 12, 2019 at 9:02:32 AM GMT+1
To: ornladios/ADIOS2 <[email protected]>
Cc: Klasky, Scott A. <[email protected]>, Mention <[email protected]>
Subject: [ornladios/ADIOS2] Feel "More Like XML" (#1281)
Hi,
one feedback we gave @pnorbert<https://github.com/pnorbert> and @sklasky<https://github.com/sklasky> in our last weeks workshop was about the "physiological feeling" that developers and users have when interacting with ADIOS files compared to what they know (XML or HDF5).
As a matter of personal opinion, HDF5 is still around not because of its performance or painful 80s C API but because people can quickly open and browse it like a filesystem and definitely because of h5py. (For the first part we added ADIOS1 to HDFCompass<https://github.com/HDFGroup/hdf-compass> in the past.)
The new ADIOS2 API looks amazing and is super clean, yet as in ADIOS1 one thing is a bit undervalued - hierarchies hidden in "names" (paths).
I am aware that "directory hierarchies" in ADIOS1 and 2 are internally just flat maps of names, which is fine, yet a users doesn't need to know that in the frontend. What we should add to make .bp file interaction for developers feel like browsing a filesystem is:
* allow to add "empty groups/directories": for example one might just create a path and add some attributes to it
* allow to "list" all kinds of groups, variables or attributes for arbitrary intermediate paths
* add a "group" object to the API with changed "base path": get a local, filtered view while interacting with an adios::IO object
For example in h5py there is a specific Group class that can be interacted with just as an intermediate object to access deeper Groups, Attributes and Variables without specifying the full path. Such a group basically just changes what is viewed as "root" / and otherwise would be similar to adios2::IO.
It would be important to allow "listings" on such a view that are both recursive and non-recursive from the changed "present working dir" in the ADIOS-name (pwd).
Such an intermediate object in the APIs would also be really nice for ADIOS2. It's purely cosmetical, also it's purely optional, yet makes a world of a difference for the developer that expects some kind of "filesystem/XML hierarchy" when writing or reading complex data. With many variables and attributes, we just want to imagine ourselves to put them in little boxes (directories) and we want to create and annotate such directories with attributes even if we do not yet put variables in them.
If I understood the new APIs correctly, we probably just need an adios::Group object that is a filtered "view" of an adios::IO object.
Of course the ADIOS2 "name" of a variable and attribute is more general, yet when used with sane "/" separators we should also expose objects that can reflect fine grained access and listing (with changed pwd if you will).
cc @C0nsultant<https://github.com/C0nsultant> who wrote a python wrapper for ADIOS1 that makes it look like h5py in terms of group handling (can you link it?)
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub<#1281>, or mute the thread<https://github.com/notifications/unsubscribe-auth/AMz8B-9NH-58vA_6CZ0l3W73oAPIe0_Pks5vV18EgaJpZM4bqR3f>.
|
@sklasky thank you for the feedback! Yes, please see the issues I open as a documentation of ideas. Just so they do not get lost since I am currently involved in writing my thesis and might forget them. If you agree that the concept of "name groups/paths" fits conceptually into ADIOS2's API as described above, this would definitely be something we can work on together. For more intuitive interaction with hierarchies in names , it's probably just a derived |
@williamfgc do you think it's possible to derive an |
@ax3l are you asking for something like this:
? Notice how I use |
Similar, yes. Actually, my issue with many variables and attributes in files is the following:
In most cases when reading ADIOS files, I just want to Of course we could ask the user for an arbitrary separator. The added benefit of a lightweight |
My personal experience with data hierarchies is that every group/customer has their own ideal of what their data representation should be. ADIOS2 is the library under these higher-level constructs. We can provide basic functionality, but we don't want to restrict users to the Unix filename style '/'. We are not trying to be a hierarchical database, but the engine underneath any style. That being said, about the I wouldn't mind adding the wildcard or regex style to AvailableVariables, you've shown it's worth it. |
Similarly, the above applies to Python dictionaries, BTW. |
Fully agree, we should make this flexible. The question is if ADIOS/BP is a hierarchical file format or not. If it shall be/appear as one, let us mainline some fundamental helpers to iterate the user incentive of a hierarchy. Since it's fully optional, it will not add to the initial learning curve: it's just a new valid return type in case a non-complete "path" is passed. // adios2::Engine reader
// adios2::IO io
io.AvailableVariables();
// /usr/bin/bash /lib/libadios /opt/cuda-10/bin/nvcc /home/axel/mails
// /some/prefix/a /some/prefix/b /some/prefix/c/d/e
auto newScope = io("/some/prefix", separator="/");
newScope.AvailableVariables();
// a b c/d/e
newScope.AvailableVariables(adios2::here, separator="/");
// a b
newScope.AvailableGroups(adios2::here, separator="/");
// c
auto oneMore = newScope("c", separator="/");
oneMore.AvailableGroups();
// d/e
// allocate and define: data, start, count
adios2::Variable<double> variable = oneMore.InquireVariable<double>("d/e");
variable.SetSelection({start, count});
reader.Get(variable, data); |
That's for the user/application/consumer-of-adios2 to decide. We provide a flat hierarchy that can be interpreted in an infinite number of ways (as it is today in science anyways). The separator was a later addition, my personal opinion is that those things belong to the application. Should we make it easier for them to enable their data hierarchy? Absolutely. If you look at your code, those are things that be easily done outside the adios2 library. It's a matter of how much external workflow code should go into the library (and these will only serve a few users, not all, but we don't want to alienate users by picking a single style). |
This is why I am bullish on the regex/wildcard, it's of general use. Other stuff is too Unix centric, and not everyone (even C++ namespaces doesn't) follow this hierarchy. |
I think flexible wildcards would help a lot building arbitrary representations and I appreciate the flexibility in naming. Nevertheless, I think stacked tree hierarchies are so common - everyone does it every day on their personal home directory - that we should provide such an optional "group" class in the mainline. As shown above, it can be fully ignored, just like we also have a simple API. I think this makes for a great gateway for users that switch from other formats and is super helpful in reads. |
tree hierarchies, data frames, SQL style, node graphs, mesh formats...they can be built on top of the library and we see they are heavily used as we diversify apps usage. I wouldn't mind adding features that serve all of the above, but a dedicated one in particular can give the wrong impression and would add a tremendous cost to the library learning curve. |
I know we just support several applications, and one of the applications is the WDMApp. We might have a movement to use OpenPMD, and Axel and Michael are two of the creators. This comes from email exchange with Jean Luc and Amitava. So I need to understand if openpmd will need this . I should also find out from chuck and Berk if the schema for viz will require this functionality as well.
If they do then I need to understand a work estimate so we can figure out who and when will create this .
Thanks
Scott
…________________________________
From: William F Godoy <[email protected]>
Date: March 12, 2019 at 4:19:03 PM GMT+1
To: ornladios/ADIOS2 <[email protected]>
Cc: Klasky, Scott A. <[email protected]>, Mention <[email protected]>
Subject: Re: [ornladios/ADIOS2] Names in API: Feel "More Like XML" (#1281)
tree hierarchies, data frames, SQL style, node graphs, mesh formats...they can be built on top of the library and we see they are heavily used as we diversify apps use. I wouldn't mind adding features that serve all of the above, but a dedicated one in particular can give the wrong impression and would add a tremendous cost to the library learning curve.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub<#1281 (comment)>, or mute the thread<https://github.com/notifications/unsubscribe-auth/AMz8B5P4_VBKC-N0tz_QqsFjPfzYXf5Wks5vV8VagaJpZM4bqR3f>.
|
In openPMD we use as simple tree-like structure. What do you think about this: I like @williamfgc approach via general wild-card based queries. Can we maybe "store" the result of such a query in a general helper object inside the ADIOS mainline which in turn can be queried again? So instead of my suggestion above we could do a combination with the Williams's wildcards: // adios2::Engine reader
// adios2::IO io
io.AvailableVariables();
// /usr/bin/bash /lib/libadios /opt/cuda-10/bin/nvcc /home/axel/mails
// /some/prefix/a /some/prefix/b /some/prefix/c/d/e
auto newScope = io("/some/prefix/*");
newScope.AvailableVariables();
// a b c/d/e
auto newLocalScope = io("/some/prefix/*!(/*)");
newLocalScope.AvailableVariables();
// a b
// newLocalScope.AvailableGroups();
// I can now emulate this via a union of available variables and attributes,
// narrowed down to `*/*` and cropped at the first `/`
// c
auto oneMore = newScope("c/*");
oneMore.AvailableGroups();
// d/e
// allocate and define: data, start, count
adios2::Variable<double> variable = oneMore.InquireVariable<double>("d/e");
variable.SetSelection({start, count});
reader.Get(variable, data); With that, building all the described structures on the user side will be quite efficient. I admit that intuitive integration in exploratory tools such as HDFcompass is easier if we can add some kind of optional schema about separators, e.g. an attribute that reflects this, but otherwise we will add this downstream, I see your wish for flexibility. |
Hi Axel,
I think one critical thing for us is to make sure that we can fully support OpenPMD, since this is going to be part of an ECP activity. One thing I would like the guys at Kitware (Chuck, Berk) is to make sure that everything we need for the Viz Schema can be fully supported as well.
I would like to have a few motivating examples for this work.
I would also like Jason to chime in for what Radio Astronomy with CASACORE will need to help with this too.
Thanks,
Scott
From: Axel Huebl <[email protected]>
Sent: Wednesday, March 13, 2019 4:37 AM
To: ornladios/ADIOS2 <[email protected]>
Cc: Klasky, Scott A. <[email protected]>; Mention <[email protected]>
Subject: Re: [ornladios/ADIOS2] Names in API: Feel "More Like XML" (#1281)
In openPMD we use as simple tree-like structure.
What do you think about this: I like @williamfgc<https://github.com/williamfgc> approach via general wild-card based queries. Can we maybe "store" the result of such a query in a general helper object inside the ADIOS mainline which in turn can be queried again?
So instead of my suggestion above we could do a combination with the Williams's wildcards:
// adios2::Engine reader
// adios2::IO io
io.AvailableVariables();
// /usr/bin/bash /lib/libadios /opt/cuda-10/bin/nvcc /home/axel/mails
// /some/prefix/a /some/prefix/b /some/prefix/c/d/e
auto newScope = io("/some/prefix/*");
newScope.AvailableVariables();
// a b c/d/e
auto newLocalScope = io("/some/prefix/*!(/*)");
newLocalScope.AvailableVariables();
// a b
newLocalScope.AvailableGroups();
// c
auto oneMore = newScope("c/*");
oneMore.AvailableGroups();
// d/e
// allocate and define: data, start, count
adios2::Variable<double> variable = oneMore.InquireVariable<double>("d/e");
variable.SetSelection({start, count});
reader.Get(variable, data);
With that, building all the described structures on the user side will be quite efficient.
I admit that intuitive integration in exploratory tools such as HDFcompass is easier if we can add some kind of optional schema about separators, e.g. an attribute that reflects this, but otherwise we will add this downstream, I see your wish for flexibility.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub<#1281 (comment)>, or mute the thread<https://github.com/notifications/unsubscribe-auth/AMz8B-zD7xJzX3TClCG27DXmtJxyTG20ks5vWLifgaJpZM4bqR3f>.
|
Thanks, this sounds great! I will get back to writing now and can provide more examples in a bit more than a month. For now, just imagine you get a It's all just "soft skills" but it will help with the adoption if we provide basic functionality for such fundamental workflows, even if it's just in an |
Hi Axel,
I agree you should get back to writing! I am actually talking a lot to Lipeng about metadata management, with sizes along the side you mention…
Scott
From: Axel Huebl <[email protected]>
Sent: Wednesday, March 13, 2019 5:06 AM
To: ornladios/ADIOS2 <[email protected]>
Cc: Klasky, Scott A. <[email protected]>; Mention <[email protected]>
Subject: Re: [ornladios/ADIOS2] Names in API: Feel "More Like XML" (#1281)
Thanks, this sounds great!
I will get back to writing now and can provide more examples in a bit more than a month.
For now, just imagine you get a .bp file with 10'000 entries and you want to explore its content and post-process only a specific part of variables in it. How can we make that experience joyful for all users that do not work with their computes parsing ls -R / when looking for the latest presentation in their $HOME? How can we naturally describe and iterate user-defined hierarchies?
It's all just "soft skills" but it will help with the adoption if we provide basic functionality for such fundamental workflows, even if it's just in a adios2::helper::` scope.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub<#1281 (comment)>, or mute the thread<https://github.com/notifications/unsubscribe-auth/AMz8BxmFnpkQnzBhK8oWa5fzB0gQft_Lks5vWL9ggaJpZM4bqR3f>.
|
@ax3l let's not create extra objects when C++ is only doing the work for us....we already have too many in the official API, adds to our maintenance, tutorials, examples material, learning curve, etc. Ultimately, this is what we are talking about: https://stackoverflow.com/questions/17253690/finding-in-a-std-map-using-regex In fact, to answer to @ax3l question |
Hi,
one feedback we gave @pnorbert and @sklasky in our last weeks workshop was about the "impression/feeling" that developers and users have when interacting with variable and attribute names in ADIOS files compared to what they know (XML, JSON, filesystem trees or HDF5).
As a matter of personal opinion, HDF5 is still around not because of its performance or painful 80s C API but because people can quickly open and browse it like a filesystem and definitely because of h5py. (For the first part we added ADIOS1 to HDFCompass in the past.)
The new ADIOS2 API looks amazing and is super clean, yet as in ADIOS1 one thing is a bit undervalued - hierarchies hidden in "names" (paths).
I am aware that "directory hierarchies" in ADIOS1 and 2 are internally just flat maps of names, which is fine, yet a users doesn't need to know that in the frontend. What we should add to make
.bp
file interaction for developers feel like browsing a filesystem is:adios::IO
objectFor example in
h5py
there is a specificGroup
class that can be interacted with just as an intermediate object to access deeperGroup
s,Attributes
andVariables
without specifying the full path. Such a group basically just changes what is viewed as "root"/
and otherwise would be similar toadios2::IO
.It would be important to allow "listings" on such a view that are both recursive and non-recursive from the changed "present working dir" in the ADIOS-name (pwd).
Such an intermediate object in the APIs would also be really nice for ADIOS2. It's purely cosmetical, also it's purely optional, yet makes a world of a difference for the developer that expects some kind of "filesystem/XML hierarchy" when writing or reading complex data. With many variables and attributes, we just want to imagine ourselves to put them in little boxes (directories) and we want to create and annotate such directories with attributes even if we do not yet put variables in them.
If I understood the new APIs correctly, we probably just need an
adios::Group
object that is a filtered "view" of anadios::IO
object (aka it will prefix all names with the new pwd).Of course the ADIOS2 "name" of a variable and attribute is more general, yet when used with sane "/" separators we should also expose objects that can reflect fine grained access and listing (with changed
pwd
if you will).cc @C0nsultant who wrote a python wrapper for ADIOS1 that makes it look like h5py in terms of group handling (can you link it?) and @franzpoeschel who is working around this missing abstraction
The text was updated successfully, but these errors were encountered: