-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Leverage pyessv #1
Comments
@huard it looks like intake-esm-datastore has similar parsing methods. The main difference is that pyessv taps into the control vocabulary to check for valid values. @andersy005, here's a working example that uses pyessv based on an example notebook they provide:
The code returns:
If you change Amon in the path to Gmon, it will let you know that this isn't a valid value. It will also require extra code to get the member_id and version in the directory structure working, but I figure this would give us enough of a base for discussion. @andersy005 do you see any extra benefits for using this library to help with the parsing? |
👍 I like the fact that you can enforce the validity of the parsed attributes.
Our current parsers are fragile due to the lack of verifying that the parsed attributes are valid. I think taking advantage of pyessv's features would make it easy to guarantee that the built catalogs contain attributes that are compliant with valid vocabularies. |
I am transferring this issue to a new repo: https://github.com/NCAR/ecg |
@andersy005 Since you've started work on getting the information out of the cmip6 files themselves instead of using file paths, I'm wondering if this is still needed. File paths change and this is where this library is helpful in verifying the file path assumptions that were built in. File attributes are more of an effort to change and are more likely to be correct. All of these attributes are verified to be correct using the same controlled vocabulary that pyessv uses before they can be added to ESGF. If an inconsistency is found between the controlled vocabulary and the file attribute, the file cannot be published to ESGF. The file also cannot be published if one of these attributes is missing. I feel confident with proceeding with harvesting the attributes from the files themselves and not adding the extra check. What do you think? |
I'm guessing catalog building and vocabulary validation could leverage https://github.com/ES-DOC/pyessv. See in particular parsers such as https://github.com/ES-DOC/pyessv/blob/master/pyessv/parsers/cmip6_dataset_id.py
I have no experience with it myself, so maybe not a good fit.
The text was updated successfully, but these errors were encountered: