Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fairtracks assembly #1419

Merged
merged 27 commits into from
Dec 20, 2023
Merged
Show file tree
Hide file tree
Changes from 7 commits
Commits
Show all changes
27 commits
Select commit Hold shift + click to select a range
43caf2c
first draft of FAIRtracks tool assembly
bianchini88 Nov 30, 2023
237c52e
fixing confilcts
bianchini88 Nov 30, 2023
65d8707
adding EMBL-EBI to affiliations
bianchini88 Nov 30, 2023
e17522d
Merge branch 'elixir-europe:master' into fairtracks_assembly
bianchini88 Dec 1, 2023
beea8eb
Update no_resources.md
bianchini88 Dec 1, 2023
eb34e12
Update no_resources.md
bianchini88 Dec 1, 2023
ff25bc5
Update fairtracks_assembly.md
bianchini88 Dec 1, 2023
ee67b3f
Merge branch 'elixir-europe:master' into fairtracks_assembly
bianchini88 Dec 4, 2023
50fcb94
Merge branch 'master' into fairtracks_assembly
bianchini88 Dec 6, 2023
f51c7da
Merge branch 'elixir-europe:master' into fairtracks_assembly
bianchini88 Dec 6, 2023
00437d9
Merge branch 'elixir-europe:master' into fairtracks_assembly
bianchini88 Dec 8, 2023
a2b423d
Update fairtracks_assembly.md
bianchini88 Dec 11, 2023
21c85d8
Merge branch 'elixir-europe:master' into fairtracks_assembly
bianchini88 Dec 14, 2023
e4f6ab3
Merge branch 'elixir-europe:master' into fairtracks_assembly
bianchini88 Dec 18, 2023
bcd32a0
adding link to training events on TeSS
bianchini88 Dec 18, 2023
841d097
Adding TeSS to the omnipy tool entry
bianchini88 Dec 18, 2023
43bb0b4
revision from Sveinung
bianchini88 Dec 18, 2023
f29f7e9
cross-referencing domain pages
bianchini88 Dec 19, 2023
4b3ab9e
revision of FAIRtracks assembly
bianchini88 Dec 19, 2023
d5530b3
Merge branch 'master' into fairtracks_assembly
bianchini88 Dec 19, 2023
41e1494
replacing figure with new version based on feedback from the editors
bianchini88 Dec 19, 2023
a8a5249
Merge branch 'fairtracks_assembly' of https://github.com/bianchini88/…
bianchini88 Dec 19, 2023
82bd6b9
Update news.yml
bianchini88 Dec 19, 2023
429c61f
Update news.yml
bianchini88 Dec 19, 2023
e811427
Merge branch 'elixir-europe:master' into fairtracks_assembly
bianchini88 Dec 19, 2023
a934256
adding newline
bedroesb Dec 20, 2023
9b08d93
Update news.yml
bianchini88 Dec 20, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 6 additions & 1 deletion _data/CONTRIBUTORS.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -577,7 +577,12 @@ Styliani-Christina Fragkouli:
git: sfragkoul
orcid: 0000-0003-4067-7123
email: [email protected]
affiliation: Institute of Applied Biosciences(INAB|CERTH) / University of Athens / ELIXIR-GR
affiliation: Institute of Applied Biosciences(INAB|CERTH) / University of Athens / ELIXIR-GR
Sveinung Gundersen:
git: sveinugu
orcid: 0000-0001-9888-7954
email: [email protected]
affiliation: ELIXIR Norway
Diana Pilvar:
git: diana-pilvar
email: [email protected]
Expand Down
6 changes: 6 additions & 0 deletions _data/affiliations.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -160,3 +160,9 @@
expose: yes
type: infrastructure
url: https://www.bbmri.nl/
- name: EMBL-EBI
image_url: /images/institutions/Ebi_official_logo.png
pid: https://ror.org/02catss52
expose: yes
type: project
url: https://www.ebi.ac.uk/
2 changes: 2 additions & 0 deletions _data/sidebars/data_management.yml
Original file line number Diff line number Diff line change
Expand Up @@ -110,6 +110,8 @@ subitems:
url: /covid19_data_portal
- title: CSC
url: /csc_assembly
- title: FAIRtracks
url: /fairtracks_assembly
- title: Galaxy
url: /galaxy_assembly
- title: IFB
Expand Down
32 changes: 32 additions & 0 deletions _data/tool_and_resource_list.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2297,3 +2297,35 @@
registry:
biotools: dataplan
url: https://plan.nfdi4plants.org
- id: omnipy
name: Omnipy
url: https://github.com/fairtracks/omnipy
description:
Omnipy is a high level Python library for type-driven data wrangling and scalable workflow orchestration.
registry:
biotools: omnipy
- id: trackfind
name: TrackFind
url: https://trackfind.elixir.no/
description:
TrackFind is a search and curation engine for metadata of geneomic tracks. It supports crawling of the TrackHub Registry and other portals.
registry:
biotools: trackfind
- id: pydantic
name: Pydantic
url: https://docs.pydantic.dev/latest/
description:
Pydantic is the most widely used data validation library for Python.
- id: prefect
name: Prefect
url: https://www.prefect.io/
description:
Prefect is a workflow orchestration tool empowering developers to build, observe, and react to data pipelines.
- id: track-hub-registry
name: Track Hub Registry
url: https://www.trackhubregistry.org/
description:
A global centralised collection of publicly accessible track hubs
registry:
fairsharing: a1de61

Binary file added images/fairtracks-rdmkit-tool-assembly.png
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one could use the life cycle colours in the top bar and remove the RDMkit logo

Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/institutions/Ebi_official_logo.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
4 changes: 2 additions & 2 deletions pages/national_resources/no_resources.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ contributors: [Nazeefa Fatima,Federico Bianchini,Korbinian Bösl,Erin Calhoun]
coordinators: [Korbinian Bösl, Nazeefa Fatima]

related_pages:
tool_assembly: [tsd, nels, marine_assembly]
tool_assembly: [tsd, nels, marine_assembly, fairtracks]

training:
- name: Training in TeSS
Expand Down Expand Up @@ -84,7 +84,7 @@ national_resources:
how_to_access: A formal application is required to gain access to the storage services.
related_pages:
your_tasks: [transfer, storage]
tool_assembly: [nels]
tool_assembly: [nels, fairtracks]
url: https://documentation.sigma2.no/files_storage/nird.html
- name: Sigma2 HPC systems
description: The current Norwegian academic HPC infrastructure consists of three systems for different purposes. The Norwegian academic high-performance computing and storage infrastructure is maintained by [Sigma2 NRIS](https://sigma2.no/nris), which is a joint collaboration between UiO, UiB, NTNU, UiT, and [UNINETT Sigma2 (SIKT)](https://www.sigma2.no/).
Expand Down
106 changes: 106 additions & 0 deletions pages/tool_assembly/fairtracks_assembly.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,106 @@
---
title: FAIRtracks
contributors: [Federico Bianchini, Sveinung Gundersen]
description: The FAIRtracks ecosystem provides technical solutions for the FAIRification of genome browser track files
page_id: fairtracks
affiliations: ["NO", "ES", "EMBL-EBI"]
related_pages:
your_tasks: [data_publication, data_transfer, metadata]
your_domain: [human_pathogen_genomics, plants]
---

## What is the FAIRtracks tool assembly?

The [FAIRtracks ecosystem](https://fairtracks.net/) is a set of services associated with a "minimum information"
[metadata standard](https://fairtracks.net/standards/#standards-01-fairtracks) for
[genomic tracks](https://fairtracks.net/tracks/#tracks-01-genomic-tracks)
implemented as a [set of JSON Schemas](https://github.com/fairtracks/fairtracks_standard/tree/master/json/schema).
The FAIRtracks ecosystem is developed and provided as part of the national Service Delivery Plans by
[ELIXIR Norway](https://elixir.no/) and [ELIXIR Spain](https://elixir-europe.org/about-us/who-we-are/nodes/spain),
and is supported by the [Track Hub Registry group](https://trackhubregistry.org/) at [EMBL-EBI](https://www.ebi.ac.uk/).
FAIRtracks is endorsed by [ELIXIR Europe](https://elixir-europe.org/) as a
[Recommended Interoperability Resource](https://elixir-europe.org/platforms/interoperability/rirs).

In the context of the Data Life Cycle and its stages, the FAIRtracks ecosystem covers [Collecting](collecting), [Processing](processing),
[Analysing](analysing), [Sharing](sharing), and [Reusing](reusing). It has to be noted, however, that the FAIRtracks ecosystem operates
on derived/secondary data and not on the raw sequencing data, as illustrated in Figure 1.
Sequencing data needs to be handled independently following domain best practices
(see e.g. the pages on [Human pathogen genomics](human_pathogen_genomics) or [Plant sciences](plant_sciences) ).

{% include image.html file="fairtracks-rdmkit-tool-assembly.png" caption="Figure 1. The FAIRtracks tool assembly and the associated workflow."
alt="FAIRtracks RDMkit" %}

## Who can use the FAIRtracks tool assembly?

The FAIRtracks ecosystem is available to everyone.
Distinct services are hosted on different platforms which might require authentication procedures which are not handled centrally by FAIRtracks.
Most of the services are accessible through Application Programming Interfaces (APIs). More detials is provided in the description below.
Users of the FAIRtracks ecosystem belong to different categories, which could be summarised as:

- Researchers and data analysts
- Data providers and biocurators
- Developers working on tooling for
- Research
- Implementation of the FAIR data principles

Each of these categories benefits specifically from a subset of the global ecosystem.
The core services can be accessed both upstream (for data providers and biocurators) and downstream (for tool developers and analytical end users).

## For what can you use the FAIRtracks data management tool assembly?

The FAIRtracks tool assembly can be used for a large number of applications; we summarise the main ones below following the steps of data life-cycle
and focusing on the particular tools.

While the assembly does not include a tool for [Data Management Planning](dmp),
the FAIRtracks metadata standard is registered in {%tool "fairsharing" %}
and, thus, formally connected to a number of other standards and databases.
The FAIRtracks standard can, thus, be selected on your Data Management Plan in all the instances of {% tool "data-stewardship-wizard" %} through
the integration with FAIRsharing.

{%tool "omnipy" %} is a high-level Python library for type-driven data wrangling and scalable workflow orchestration;
it is a very important self-standing subset of the FAIRtracks ecosystem covering several steps in the data life-cicle.
It can be used to extract metadata from specific portals, doubling up on the [data Collection](collecting)
capabilities of TrackFind, and to [process](processing) metadata entries to uniform them to a unique standard.
Omnipy workflows are defined as transformations from specific input data models to specific output data models.
Input and output data are validated at each iteration through parsing based on {%tool "pydantic" %}.
Offloading of workflows to external compute resources is provided through the integration of Omnipy with a
workflow engine based on {%tool "prefect" %}.

There is ongoing work into adding {%tool "prefect" %} as one of the services available in the
[National Infrastructure for Research Data (NIRD) service platform](https://www.sigma2.no/nird-service-platform).
This would enable running {%tool "omnipy" %} on data and metadata stored in the [NIRD data storage](https://www.sigma2.no/data-storage).
Refer also to the [Norwegian national page](no_resources) for more details. Note that, while the usage of NIRD storage and services
is certainly of convenient for Norwegian users, this is not a central or mandatory part of the tool assembly which is born as an international
service and aims at maintaining this status.

Data [sharing](sharing) and preservation is one of the key component of the FAIRtracks ecosystem.
Since genomic annotations (tracks) typically consist of secondary data files referring to primary data sources,
they are often deposited together with the primary data. The aim of the minimal information metadata model is to
offer a greater level of granularity, providing each track with an identifyer and enabling the possibility of analysis across datasets
in a semi-automatised fashion. In order to accomplish this a dedicated registry would be required. Given that this tool does not yet exist,
the current reccomendation is to deposit FAIRtracks-compliant metadata files to {%tool "zenodo" %},
as this platform supports both Digital Object identifier (DOI) versioning and DOI reservation prior to publication.
The identifiers on the metadata FAIRtracks object are then cross-linked with the actual data which is hosted
e.g. in a [Track Hub](https://genome.ucsc.edu/goldenPath/help/hgTrackHubHelp.html) and registered in
the {%tool "track-hub-registry" %}.

Data and metadata organised in this fashion can then be collected for [reuse](reusing) using {%tool "trackfind" %},
a search and curation engine for genomic tracks.
{%tool "trackfind" %} supports crawling of the {%tool "track-hub-registry" %} and other data portals to fetch track metadata,
that can be accessed through hierarchical browsing or by search queries both through a web-based user interface and as a RESTful API.
TrackFind supports advanced SQL-based queries that can be easily built in the user interface.

Additional tools that comprise the core of the FAIRtracks ecosystem are the
[metadata validation](https://fairtracks.net/services/?category=Core%20services&tags%5B0%5D=Metadata%20validation) and the
[metadata augmentation](https://fairtracks.net/services/?category=Core%20services&tags%5B0%5D=Metadata%20augmentation) services.
The former is REST API that extends the standard JSON Schema validation technology through additional modules allowing for:

* validation of ontology terms against specific ontology versions;
* checking Compact Uniform Resource Identifiers (CURIEs) against the registered entries at {%tool "identifiers-org" %};
* checking restrictions on a full set of documents, e.g. whether identifiers are unique and whether the records referred to by foreign keys exist.

The [FAIRtracks augmentation service](https://fairtracks.net/services/?category=Core%20services&tags%5B0%5D=Metadata%20augmentation)
is implemented as a REST API that expands on the information contained in a minimal FAIRtracks JSON by adding
a set of fields with human-readable values. These include ontology labels and versions, summaries, and other relevant fields.
This service bridges the gap between data providers, which are required to submit only minimal information, and data consumers
who require richer information for data discovery and retrieval.
Loading