Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use glossarist for reading yaml files #14

Closed
HassanAkbar opened this issue Aug 3, 2023 · 31 comments · Fixed by #20
Closed

Use glossarist for reading yaml files #14

HassanAkbar opened this issue Aug 3, 2023 · 31 comments · Fixed by #20
Assignees

Comments

@HassanAkbar
Copy link
Member

we should use glossarist for reading concept yaml files as mentioned here -> #12 (comment)

@HassanAkbar
Copy link
Member Author

It is mentioned here #12 (comment)

I think this makes sense:

  1. start supporting authoritativeSource,
  2. support current Glossarist 2 model in glossarist-ruby,
  3. switch jekyll-geolexica to use glossarist-ruby with thorough testing on existing website repositories to ensure no undesired changes happen when they are later deployed.

In that order…

@opoudjis @ronaldtse Do we want to use Glossarist model v2 in glossaries-ruby for this or can we start working on this with v1 ?

@ronaldtse
Copy link
Member

@HassanAkbar we want to migrate all files to Glossarist model v2. Can we do it for all our existing repositories?

@HassanAkbar
Copy link
Member Author

@ronaldtse Currently glossarist-ruby does not support Glossarist model v2 and if we are migrating every repo to v2 then we need to update glossarist-ruby as well and currently I am not sure what V2 is.

@strogonoff I see that you have worked on updating the isotc211-glossary to V2. Can you help me understand the structure of V2 glossary?

@ronaldtse
Copy link
Member

In this case there is no v2 and we should get this working with v1!

@HassanAkbar
Copy link
Member Author

@ronaldtse @strogonoff Currently the glossary for isotc211-glossary is in V2 and glossary for osgeo-glossary is in V1 so if we go with V1 then isotc211 will break.

@HassanAkbar
Copy link
Member Author

We can update the Jekyll-geolexica version and use the Glossarist model V2 in the updated version and for the sites that are using Glossarist model V1 we can keep using the old version of Jekyll-geolexica.

@ronaldtse The important thing is that we will drop the support of Glossarist model V1 in Jekyll-geolexica. We won’t be able to make any changes in the older version of Jekyll-geolexica and we should be prioritizing the update of sites to use V2 to get any modifications/features/bugs done.

@ronaldtse
Copy link
Member

The important thing is that we will drop the support of Glossarist model V1 in Jekyll-geolexica

That's fine to me because we control all those repositories right now. We should bring all of those repositories up to date as soon as possible.

@HassanAkbar
Copy link
Member Author

@ronaldtse @strogonoff I have a few questions related to Glossarist model V2,

  • In isotc211-glossary the localized-concept 0033e1ef-60b2-558f-9d0f-b882b4f7da75 has a date type accepted in the data and a dateAccepted outside the data at the end here.

    Both of these have different dates, what is the difference between both the dates and do we need to store them separately inside glossarist? i.e concept.data.dates and concept.dates.

  • Can review_decision_date and review_date go inside the dates with types reviewe_decision and reviewed respectively or are these different?

  • Can we move review_decision_notes in notes with type review_decision?

@strogonoff
Copy link
Contributor

strogonoff commented Oct 30, 2023

  • What you are asking about is mostly not Glossarist models, this is register models.
    • Only contents of data is supposed to conform to Glossarist model. dateAccepted, id, etc. is not part of data.
    • In this YAML, data contains extra fields that should not be there (like review* fields).

In more detail:

  • Review notes, acceptance dates, etc. are part of registration procedures. We implement registration procedures in RegistryKit (which is based on ISO 19135). RegistryKit models are expressed in TypeScript here (we’re working on better documentation), but they are irrelevant to Glossarist. Glossarist model is based on ISO 704/ISO 10241, and registration procedures are not part of Glossarist.
  • In case of that YAML, data contains many extra fields that are not in Glossarist model. That’s likely a mistake. I think the YAML you are seeing is probably output by some data conversion script that doesn’t follow the models as intended.
    • review* fields are not supposed to be there
    • dates list is not supposed to be there
  • We should probably update the script that generates that YAML to generate a proper structure. It probably also outputs wrong dateAccepted.
  • However, we probably don’t need to waste too much time on that script, since per discussions with Reese I think Geolexica won’t deal with registration procedures.
    • From my understanding, Geolexica is used only to present terminology, not to display or manage proposal information, so it should only be concerned with Glossarist models, which do not concern themselves with registration procedures.
    • I’ll check with Reese to what extent Geolexica should deal with registration information (perhaps it will need a bit of it, like dateAccepted).

@stefanomunarini
Copy link

stefanomunarini commented Oct 30, 2023

  • In case of that YAML, data contains many extra fields that are not in Glossarist model. That’s likely a mistake. I think the YAML you are seeing is probably output by some data conversion script that doesn’t follow the models as intended.

    • review* fields are not supposed to be there
    • dates list is not supposed to be there

I can update the script to delete excess data, as for above.

  • It probably also outputs wrong dateAccepted.

@strogonoff are we talking about the output of a concept or of a localized-concept? In the case of the first, we are setting its value to a dummy date that we retrieve from config.py. In the case of the latter, we retrieve its value from the data itself, if present, or use the same default value as for the first, if not.

How could this be improved?

@ronaldtse
Copy link
Member

Just for the record, we have 2 families of models:

  1. Glossarist models: represents a concept and related things
  2. Register models: represents a register and related things (such as a register item)

In this case, it happens that the Glossarist dataset is managed by a Register. This means that every Glossarist Concept is also a Register Concept (in the new ISO 19135 under development, but in the old version currently it is a Register Item).

It happens that in ISO/TC 211, they use the old Register model which means that every Concept is a Register Item, and that each concept in the MLGT (the content on isotc211.geolexica.org) is accompanied by some status dates such as "approval date" (and this content is at both the general concept level and the localized concept level).

In an ideal world, the data for the Glossarist models (data content) is separate from the Register models (administrative content). The Register models can refer to the Glossarist models, of course and vice versa. This way we could use different parsers/models accessors to work with the data:

  • Paneron will need to work with both types of models
  • The Glossarist gem only needs to work with the Glossarist models, and can use a separate gem to handle Register information.

@HassanAkbar
Copy link
Member Author

@ronaldtse So, for Glossarist we should only read the data inside the data key and discard other keys in the yaml file.
Also I think we should discard the Register data in Glossarist.
Should we create a separate gem for that or is there an existing gem that we can use?

@HassanAkbar
Copy link
Member Author

As mentioned by @strogonoff

In case of that YAML, data contains many extra fields that are not in Glossarist model. That’s likely a mistake. I think the YAML you are seeing is probably output by some data conversion script that doesn’t follow the models as intended.

  • review* fields are not supposed to be there
  • dates list is not supposed to be there

@ronaldtse One more question related to this, Should I assume that these will be fixed in isotc211-glossary or should I add these fields temporarily in Glossarist?

@stefanomunarini
Copy link

As mentioned by @strogonoff

In case of that YAML, data contains many extra fields that are not in Glossarist model. That’s likely a mistake. I think the YAML you are seeing is probably output by some data conversion script that doesn’t follow the models as intended.

  • review* fields are not supposed to be there
  • dates list is not supposed to be there

If I get green light on this one, I can update the script to fix the data structure, removing excess data fields. It's straightforward, won't take long.

@ronaldtse
Copy link
Member

I am lost in this thread. What is still pending?

The goal here is to synchronize the YAML structures for the Glossarist Ruby gem (used by jekyll-geolexica) and the Glossarist plugin.

This means we need to update all the data sets to the latest structure. That's it.

@HassanAkbar
Copy link
Member Author

In case of that YAML, data contains many extra fields that are not in Glossarist model. That’s likely a mistake. I think the YAML you are seeing is probably output by some data conversion script that doesn’t follow the models as intended.

  • review* fields are not supposed to be there
  • dates list is not supposed to be there

If I get green light on this one, I can update the script to fix the data structure, removing excess data fields. It's straightforward, won't take long.

@ronaldtse just want to confirm that do we need to update this in glossarist or fix the data structure?

@ronaldtse
Copy link
Member

@HassanAkbar so the tricky thing here is about the latest MLGT data which is done using this gem: https://github.com/geolexica/tc211-termbase .

The point is actually to upgrade the tc211-termbase gem to use the glossarist gem.

The input data for the gem is the XSLX file, and the output is a Glossarist YAML ConceptCollection.

@HassanAkbar
Copy link
Member Author

@ronaldtse Let me summarize what’s going on here.

As I had no idea of Glossarist model V2 , to understand it @strogonoff suggested to take a look at paneron-extension-glossarist/models/concepts.ts.

While discussing about the format @strogonoff explained that review* and dates fields do not belong in the model here #14 (comment).

As these fields don't belong to Glossarist model, I believe we should let @stefanomunarini update the generation script so that the data in isotc211-glossary can be corrected. @stefanomunarini Can you help with that ?

@stefanomunarini
Copy link

As these fields don't belong to Glossarist model, I believe we should let @stefanomunarini update the generation script so that the data in isotc211-glossary can be corrected. @stefanomunarini Can you help with that ?

Sure, I've pushed a commit. You can now re run the script to update the data @HassanAkbar

@HassanAkbar
Copy link
Member Author

HassanAkbar commented Feb 1, 2024

@stefanomunarini I was looking at the isotc211-glossary and it seems like you updated the concepts last time.
Can you let me know the steps needed to generate the isotc211-glossary?

@stefanomunarini
Copy link

Hi @HassanAkbar please review and merge this PR geolexica/isotc211-glossary#44

@ronaldtse
Copy link
Member

The content of isotc211-glossary is created by the tc211-termbase gem, which took the XLSX file and processed it into the old Glossarist YAML. Once I get back to the computer I’ll provide you with documentation.

@ronaldtse
Copy link
Member

@HassanAkbar the tc211-termbase gem is updated at geolexica/tc211-termbase#31 , can you now:

  1. regenerate the isotc211-glossary concept set and push the changes, and
  2. ensure that the output directly works with jekyll-geolexica?

Thanks.

@HassanAkbar
Copy link
Member Author

@ronaldtse can you let me know from where can I get the xlsx file for generating the concepts?

@ronaldtse
Copy link
Member

@HassanAkbar
Copy link
Member Author

@ronaldtse I think I don't have access to https://github.com/ISO-TC211/mlgt-data repo, can you help with that?

@HassanAkbar
Copy link
Member Author

@ronaldtse just saw this issue -> authoritativeSources in localizedConcepts YAML are empty objects,
Currently there is no support for authoritativeSources in glossarist and as we are using it to generate concepts in tc211-termbase, the output files does not have a authoritativeSources key in localized-concepts.

Should we add this in tc211-termbase or should I run a separate script after concepts generation is completed?

@ronaldtse
Copy link
Member

@HassanAkbar yes we should add them in tc211-termbase. Previously there were sources in the generated output, I don't know where they have gone.

@HassanAkbar
Copy link
Member Author

HassanAkbar commented Feb 26, 2024

@HassanAkbar here's the file:

https://github.com/ISO-TC211/mlgt-data/blob/main/release-6/20231214%20Multi-Lingual%20Glossary%20–%20Published%20__unlocked__%20with%20Math.xlsx

@ronaldtse I've updated the glossary using the above file in this PR -> geolexica/isotc211-glossary#47

I have a couple of questions related to the generated concept files

  • Currently in the concept files the key for localized concepts is localized_concepts because it is generated using the glossarist and we use snake case convention in glossarist, while in the previous version the key was in camel casing i.e localizedConcepts. So what should I use now?
  • Also the register information(info outside the data key) is not being added because it is not handled in glossarist so should I add that using a script or should I add this functionality in isotc211-termbase repo?

@ronaldtse
Copy link
Member

@ronaldtse I've updated the glossary using the above file in this PR -> geolexica/isotc211-glossary#47

I have a couple of questions related to the generated concept files

  • Currently in the concept files the key for localized concepts is localized_concepts because it is generated using the glossarist and we use snake case convention in glossarist, while in the previous version the key was in camel casing i.e localizedConcepts. So what should I use now?

Use the Glossarist gem convention because jekyll-geolexica also uses the Glossarist gem. Correct?

  • Also the register information(info outside the data key) is not being added because it is not handled in glossarist so should I add that using a script or should I add this functionality in isotc211-termbase repo?

This should be added in the tc211-termbase gem so we can display them in isotc211.geolexica.org.

@HassanAkbar
Copy link
Member Author

HassanAkbar commented Mar 1, 2024

Use the Glossarist gem convention because jekyll-geolexica also uses the Glossarist gem. Correct?

@ronaldtse Currently it is not using glossarist gem. I will update jekyll-geolexica next to read the concepts using glossarist.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants