Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Names in API: Feel "More Like XML" #1281

Open
ax3l opened this issue Mar 12, 2019 · 18 comments
Open

Names in API: Feel "More Like XML" #1281

ax3l opened this issue Mar 12, 2019 · 18 comments

Comments

@ax3l
Copy link
Contributor

ax3l commented Mar 12, 2019

Hi,

one feedback we gave @pnorbert and @sklasky in our last weeks workshop was about the "impression/feeling" that developers and users have when interacting with variable and attribute names in ADIOS files compared to what they know (XML, JSON, filesystem trees or HDF5).

As a matter of personal opinion, HDF5 is still around not because of its performance or painful 80s C API but because people can quickly open and browse it like a filesystem and definitely because of h5py. (For the first part we added ADIOS1 to HDFCompass in the past.)
The new ADIOS2 API looks amazing and is super clean, yet as in ADIOS1 one thing is a bit undervalued - hierarchies hidden in "names" (paths).

I am aware that "directory hierarchies" in ADIOS1 and 2 are internally just flat maps of names, which is fine, yet a users doesn't need to know that in the frontend. What we should add to make .bp file interaction for developers feel like browsing a filesystem is:

  • allow to add "empty groups/directories": for example one might just create a path and add some attributes to it
  • allow to "list" all kinds of groups, variables or attributes for arbitrary intermediate paths
  • add a "group" object to the API with changed "base path": get a local, filtered view while interacting with an adios::IO object

For example in h5py there is a specific Group class that can be interacted with just as an intermediate object to access deeper Groups, Attributes and Variables without specifying the full path. Such a group basically just changes what is viewed as "root" / and otherwise would be similar to adios2::IO.
It would be important to allow "listings" on such a view that are both recursive and non-recursive from the changed "present working dir" in the ADIOS-name (pwd).

Such an intermediate object in the APIs would also be really nice for ADIOS2. It's purely cosmetical, also it's purely optional, yet makes a world of a difference for the developer that expects some kind of "filesystem/XML hierarchy" when writing or reading complex data. With many variables and attributes, we just want to imagine ourselves to put them in little boxes (directories) and we want to create and annotate such directories with attributes even if we do not yet put variables in them.

If I understood the new APIs correctly, we probably just need an adios::Group object that is a filtered "view" of an adios::IO object (aka it will prefix all names with the new pwd).

Of course the ADIOS2 "name" of a variable and attribute is more general, yet when used with sane "/" separators we should also expose objects that can reflect fine grained access and listing (with changed pwd if you will).

cc @C0nsultant who wrote a python wrapper for ADIOS1 that makes it look like h5py in terms of group handling (can you link it?) and @franzpoeschel who is working around this missing abstraction

@ax3l ax3l changed the title Feel "More Like XML" API: Feel "More Like XML" Mar 12, 2019
@ax3l ax3l changed the title API: Feel "More Like XML" Names in API: Feel "More Like XML" Mar 12, 2019
@sklasky
Copy link

sklasky commented Mar 12, 2019 via email

@ax3l
Copy link
Contributor Author

ax3l commented Mar 12, 2019

@sklasky thank you for the feedback!

Yes, please see the issues I open as a documentation of ideas. Just so they do not get lost since I am currently involved in writing my thesis and might forget them.

If you agree that the concept of "name groups/paths" fits conceptually into ADIOS2's API as described above, this would definitely be something we can work on together. For more intuitive interaction with hierarchies in names , it's probably just a derived adios::IO helper object with filtered scope. For writing "empty groups", it's likely a metadata addition.

@ax3l
Copy link
Contributor Author

ax3l commented Mar 12, 2019

@williamfgc do you think it's possible to derive an adios2::IO object in the described way above to create a adios2::Group object with narrowed scope? Would love your opinion on this.

@williamfgc
Copy link
Contributor

williamfgc commented Mar 12, 2019

@ax3l are you asking for something like this:

   std::map<std::string, std::map<std::string,std::string> > process_vars_info = io.AvailableVariables("root::process::*"); 

?

Notice how I use :: as in C++ namespaces. We don't want to enforce a single / symbol for hierarchy.

@ax3l
Copy link
Contributor Author

ax3l commented Mar 12, 2019

Similar, yes. Actually, my issue with many variables and attributes in files is the following:

io.AvailableVariables() (and attributes) is similar to ls -R / on a filesystem.

In most cases when reading ADIOS files, I just want to cd /home/axel and then ls ./* instead of ls -R. Wildcards as in your example would actually be wonderful as well.

Of course we could ask the user for an arbitrary separator.

The added benefit of a lightweight adios2::Group object would be, that I can use the same cd-like syntax for writing. (We write groups of variables in similar prefixes and quite a bunch of attributes per variable).

@williamfgc
Copy link
Contributor

williamfgc commented Mar 12, 2019

Of course we could ask the user for an arbitrary separator.

My personal experience with data hierarchies is that every group/customer has their own ideal of what their data representation should be. ADIOS2 is the library under these higher-level constructs. We can provide basic functionality, but we don't want to restrict users to the Unix filename style '/'. We are not trying to be a hierarchical database, but the engine underneath any style.

That being said, about the adios2::Group object (or any lightweight object that wraps around a C++ container...which is different from a typename/using alias) is that we add to the learning curve of the ADIOS2 API when a C++ container would do the job just fine. Keep in mind that AvailableVariables returns a std::map and with that all of its benefits (key sorted, log(N) search, you manipulate it as your workflow requires with erase, iterators, etc.). Ultimately, we are giving you a standard b-tree already, adios2::Group would just be a thin layer.

I wouldn't mind adding the wildcard or regex style to AvailableVariables, you've shown it's worth it.

@williamfgc
Copy link
Contributor

Similarly, the above applies to Python dictionaries, BTW.

@ax3l
Copy link
Contributor Author

ax3l commented Mar 12, 2019

Fully agree, we should make this flexible.

The question is if ADIOS/BP is a hierarchical file format or not. If it shall be/appear as one, let us mainline some fundamental helpers to iterate the user incentive of a hierarchy. Since it's fully optional, it will not add to the initial learning curve: it's just a new valid return type in case a non-complete "path" is passed.

// adios2::Engine reader
// adios2::IO io

io.AvailableVariables();
// /usr/bin/bash /lib/libadios /opt/cuda-10/bin/nvcc /home/axel/mails
// /some/prefix/a /some/prefix/b /some/prefix/c/d/e

auto newScope = io("/some/prefix", separator="/");
newScope.AvailableVariables();
// a b c/d/e
newScope.AvailableVariables(adios2::here, separator="/");
// a b

newScope.AvailableGroups(adios2::here, separator="/");
// c

auto oneMore = newScope("c", separator="/");
oneMore.AvailableGroups();
// d/e

// allocate and define: data, start, count
adios2::Variable<double> variable = oneMore.InquireVariable<double>("d/e");
variable.SetSelection({start, count});
reader.Get(variable, data);

@williamfgc
Copy link
Contributor

The question is if ADIOS/BP is a hierarchical file format or not

That's for the user/application/consumer-of-adios2 to decide. We provide a flat hierarchy that can be interpreted in an infinite number of ways (as it is today in science anyways). The separator was a later addition, my personal opinion is that those things belong to the application. Should we make it easier for them to enable their data hierarchy? Absolutely.

If you look at your code, those are things that be easily done outside the adios2 library. It's a matter of how much external workflow code should go into the library (and these will only serve a few users, not all, but we don't want to alienate users by picking a single style).

@williamfgc
Copy link
Contributor

This is why I am bullish on the regex/wildcard, it's of general use. Other stuff is too Unix centric, and not everyone (even C++ namespaces doesn't) follow this hierarchy.

@ax3l
Copy link
Contributor Author

ax3l commented Mar 12, 2019

I think flexible wildcards would help a lot building arbitrary representations and I appreciate the flexibility in naming.

Nevertheless, I think stacked tree hierarchies are so common - everyone does it every day on their personal home directory - that we should provide such an optional "group" class in the mainline. As shown above, it can be fully ignored, just like we also have a simple API. I think this makes for a great gateway for users that switch from other formats and is super helpful in reads.

@williamfgc
Copy link
Contributor

williamfgc commented Mar 12, 2019

tree hierarchies, data frames, SQL style, node graphs, mesh formats...they can be built on top of the library and we see they are heavily used as we diversify apps usage. I wouldn't mind adding features that serve all of the above, but a dedicated one in particular can give the wrong impression and would add a tremendous cost to the library learning curve.

@sklasky
Copy link

sklasky commented Mar 12, 2019 via email

@ax3l
Copy link
Contributor Author

ax3l commented Mar 13, 2019

In openPMD we use as simple tree-like structure.

What do you think about this: I like @williamfgc approach via general wild-card based queries. Can we maybe "store" the result of such a query in a general helper object inside the ADIOS mainline which in turn can be queried again?

So instead of my suggestion above we could do a combination with the Williams's wildcards:

// adios2::Engine reader
// adios2::IO io

io.AvailableVariables();
// /usr/bin/bash /lib/libadios /opt/cuda-10/bin/nvcc /home/axel/mails
// /some/prefix/a /some/prefix/b /some/prefix/c/d/e

auto newScope = io("/some/prefix/*");
newScope.AvailableVariables();
// a b c/d/e

auto newLocalScope = io("/some/prefix/*!(/*)");
newLocalScope.AvailableVariables();
// a b

// newLocalScope.AvailableGroups();
// I can now emulate this via a union of available variables and attributes,
// narrowed down to `*/*` and cropped at the first `/`
// c

auto oneMore = newScope("c/*");
oneMore.AvailableGroups();
// d/e

// allocate and define: data, start, count
adios2::Variable<double> variable = oneMore.InquireVariable<double>("d/e");
variable.SetSelection({start, count});
reader.Get(variable, data);

With that, building all the described structures on the user side will be quite efficient.

I admit that intuitive integration in exploratory tools such as HDFcompass is easier if we can add some kind of optional schema about separators, e.g. an attribute that reflects this, but otherwise we will add this downstream, I see your wish for flexibility.

@sklasky
Copy link

sklasky commented Mar 13, 2019 via email

@ax3l
Copy link
Contributor Author

ax3l commented Mar 13, 2019

Thanks, this sounds great!

I will get back to writing now and can provide more examples in a bit more than a month.

For now, just imagine you get a .bp file with 10'000 variables and 50'000 attributes and you want to explore its content and post-process only a specific part of variables in it, depending on what you find. How can we make that experience joyful for all users that do not work with their computers by parsing ls -R / when looking for the latest presentation in their $HOME? How can we naturally describe and iterate user-defined hierarchies?

It's all just "soft skills" but it will help with the adoption if we provide basic functionality for such fundamental workflows, even if it's just in an adios2::helper:: scope.

@sklasky
Copy link

sklasky commented Mar 13, 2019 via email

@williamfgc
Copy link
Contributor

williamfgc commented Mar 13, 2019

@ax3l let's not create extra objects when C++ is only doing the work for us....we already have too many in the official API, adds to our maintenance, tutorials, examples material, learning curve, etc. Ultimately, this is what we are talking about: https://stackoverflow.com/questions/17253690/finding-in-a-std-map-using-regex
@sklasky I'd be happy to work with @ax3l on this wildcard/regex .

In fact, to answer to @ax3l question bpls already have pattern support: https://stackoverflow.com/questions/17253690/finding-in-a-std-map-using-regex ....and bpls is a utility built on top of the library.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants