Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fairtracks assembly #1419

Merged
merged 27 commits into from
Dec 20, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
27 commits
Select commit Hold shift + click to select a range
43caf2c
first draft of FAIRtracks tool assembly
bianchini88 Nov 30, 2023
237c52e
fixing confilcts
bianchini88 Nov 30, 2023
65d8707
adding EMBL-EBI to affiliations
bianchini88 Nov 30, 2023
e17522d
Merge branch 'elixir-europe:master' into fairtracks_assembly
bianchini88 Dec 1, 2023
beea8eb
Update no_resources.md
bianchini88 Dec 1, 2023
eb34e12
Update no_resources.md
bianchini88 Dec 1, 2023
ff25bc5
Update fairtracks_assembly.md
bianchini88 Dec 1, 2023
ee67b3f
Merge branch 'elixir-europe:master' into fairtracks_assembly
bianchini88 Dec 4, 2023
50fcb94
Merge branch 'master' into fairtracks_assembly
bianchini88 Dec 6, 2023
f51c7da
Merge branch 'elixir-europe:master' into fairtracks_assembly
bianchini88 Dec 6, 2023
00437d9
Merge branch 'elixir-europe:master' into fairtracks_assembly
bianchini88 Dec 8, 2023
a2b423d
Update fairtracks_assembly.md
bianchini88 Dec 11, 2023
21c85d8
Merge branch 'elixir-europe:master' into fairtracks_assembly
bianchini88 Dec 14, 2023
e4f6ab3
Merge branch 'elixir-europe:master' into fairtracks_assembly
bianchini88 Dec 18, 2023
bcd32a0
adding link to training events on TeSS
bianchini88 Dec 18, 2023
841d097
Adding TeSS to the omnipy tool entry
bianchini88 Dec 18, 2023
43bb0b4
revision from Sveinung
bianchini88 Dec 18, 2023
f29f7e9
cross-referencing domain pages
bianchini88 Dec 19, 2023
4b3ab9e
revision of FAIRtracks assembly
bianchini88 Dec 19, 2023
d5530b3
Merge branch 'master' into fairtracks_assembly
bianchini88 Dec 19, 2023
41e1494
replacing figure with new version based on feedback from the editors
bianchini88 Dec 19, 2023
a8a5249
Merge branch 'fairtracks_assembly' of https://github.com/bianchini88/…
bianchini88 Dec 19, 2023
82bd6b9
Update news.yml
bianchini88 Dec 19, 2023
429c61f
Update news.yml
bianchini88 Dec 19, 2023
e811427
Merge branch 'elixir-europe:master' into fairtracks_assembly
bianchini88 Dec 19, 2023
a934256
adding newline
bedroesb Dec 20, 2023
9b08d93
Update news.yml
bianchini88 Dec 20, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 7 additions & 2 deletions _data/CONTRIBUTORS.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -575,7 +575,12 @@ Styliani-Christina Fragkouli:
git: sfragkoul
orcid: 0000-0003-4067-7123
email: [email protected]
affiliation: Institute of Applied Biosciences(INAB|CERTH) / University of Athens / ELIXIR-GR
affiliation: Institute of Applied Biosciences(INAB|CERTH) / University of Athens / ELIXIR-GR
Sveinung Gundersen:
git: sveinugu
orcid: 0000-0001-9888-7954
email: [email protected]
affiliation: ELIXIR Norway
Diana Pilvar:
git: diana-pilvar
email: [email protected]
Expand All @@ -596,4 +601,4 @@ Pavankumar Videm:
git: pavanvidem
email: [email protected]
orcid: 0000-0002-5192-126X
affiliation: University of Freiburg / European Galaxy team
affiliation: University of Freiburg / European Galaxy team
6 changes: 6 additions & 0 deletions _data/affiliations.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -166,3 +166,9 @@
expose: yes
type: infrastructure
url: https://www.bbmri.nl/
- name: EMBL-EBI
image_url: /images/institutions/Ebi_official_logo.png
pid: https://ror.org/02catss52
expose: yes
type: project
url: https://www.ebi.ac.uk/
4 changes: 4 additions & 0 deletions _data/news.yml
Original file line number Diff line number Diff line change
Expand Up @@ -154,3 +154,7 @@
date: 2023-12-19
linked_pr: 1429
description: The content of the "tool assembly" page for CSC (Finnish IT Center for Science) was updated. [Discover the page here](csc_assembly).
- name: "New page: FAIRtracks tool assembly"
date: 2023-12-20
linked_pr: 1419
description: A new "tool assembly" page for FAIRtracks was added. [Discover the page here](fairtracks_assembly).
2 changes: 2 additions & 0 deletions _data/sidebars/data_management.yml
Original file line number Diff line number Diff line change
Expand Up @@ -112,6 +112,8 @@ subitems:
url: /covid19_data_portal
- title: CSC
url: /csc_assembly
- title: FAIRtracks
url: /fairtracks_assembly
- title: Galaxy
url: /galaxy_assembly
- title: IFB
Expand Down
32 changes: 32 additions & 0 deletions _data/tool_and_resource_list.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2297,6 +2297,38 @@
registry:
biotools: dataplan
url: https://plan.nfdi4plants.org
- id: omnipy
name: Omnipy
url: https://github.com/fairtracks/omnipy
description:
Omnipy is a high level Python library for type-driven data wrangling and scalable workflow orchestration.
registry:
biotools: omnipy
tess: omnipy
- id: trackfind
name: TrackFind
url: https://trackfind.elixir.no/
description:
TrackFind is a search and curation engine for metadata of geneomic tracks. It supports crawling of the TrackHub Registry and other portals.
registry:
biotools: trackfind
- id: pydantic
name: Pydantic
url: https://docs.pydantic.dev/latest/
description:
Pydantic is the most widely used data validation library for Python.
- id: prefect
name: Prefect
url: https://www.prefect.io/
description:
Prefect is a workflow orchestration tool empowering developers to build, observe, and react to data pipelines.
- id: track-hub-registry
name: Track Hub Registry
url: https://www.trackhubregistry.org/
description:
A global centralised collection of publicly accessible track hubs
registry:
fairsharing: a1de61
- description: Fast, sensitive and accurate integration of single-cell data.
id: harmony
name: Harmony
Expand Down
Binary file added images/fairtracks_tool-assembly.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/institutions/Ebi_official_logo.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
4 changes: 2 additions & 2 deletions pages/national_resources/no_resources.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ contributors: [Nazeefa Fatima,Federico Bianchini,Korbinian Bösl,Erin Calhoun]
coordinators: [Korbinian Bösl, Nazeefa Fatima]

related_pages:
tool_assembly: [tsd, nels, marine_assembly]
tool_assembly: [tsd, nels, marine_assembly, fairtracks]

training:
- name: Training in TeSS
Expand Down Expand Up @@ -84,7 +84,7 @@ national_resources:
how_to_access: A formal application is required to gain access to the storage services.
related_pages:
your_tasks: [transfer, storage]
tool_assembly: [nels]
tool_assembly: [nels, fairtracks]
url: https://documentation.sigma2.no/files_storage/nird.html
- name: Sigma2 HPC systems
description: The current Norwegian academic HPC infrastructure consists of three systems for different purposes. The Norwegian academic high-performance computing and storage infrastructure is maintained by [Sigma2 NRIS](https://sigma2.no/nris), which is a joint collaboration between UiO, UiB, NTNU, UiT, and [UNINETT Sigma2 (SIKT)](https://www.sigma2.no/).
Expand Down
115 changes: 115 additions & 0 deletions pages/tool_assembly/fairtracks_assembly.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,115 @@
---
title: FAIRtracks
contributors: [Federico Bianchini, Sveinung Gundersen]
description: The FAIRtracks ecosystem provides technical solutions for the FAIRification of genome browser track files
page_id: fairtracks
affiliations: ["NO", "ES", "EMBL-EBI"]
related_pages:
your_tasks: [data_publication, data_transfer, metadata]
your_domain: [plants, rare_disease, single_cell_sequencing, human_data]
training:
- name: Training in TeSS
registry: TeSS
url: https://tess.elixir-europe.org/search?q=fairtracks
---

## What is the FAIRtracks tool assembly?

The [FAIRtracks ecosystem](https://fairtracks.net/) is a set of services associated with a minimal
[metadata model](https://fairtracks.net/standards/#standards-01-fairtracks) for
[genomic annotations/tracks](https://fairtracks.net/tracks/#tracks-01-genomic-tracks),
implemented as a [set of JSON Schemas](https://github.com/fairtracks/fairtracks_standard/tree/master/json/schema).
The FAIRtracks model contains metadata fields particularly useful for data discovery,
harmonised through strict adherence to a selection of ontologies available through the {%tool "ontology-lookup-service" %}.
The usability of the model can be expanded through referencing the original records via Compact Uniform Resource Identifiers (CURIEs)
resolvable by {% tool "identifiers-org" %}.

In the context of the Data Life Cycle and its stages, the FAIRtracks ecosystem covers [Collecting](collecting), [Processing](processing),
[Analysing](analysing), [Sharing](sharing), and [Reusing](reusing). It has to be noted, however, that the FAIRtracks ecosystem is structured
around a secondary data life cycle, as illustrated in Figure 1. As part of this secondary life cycle, the annotation/track data gets further distributed
and its discovery is enhanced through derived metadata. The FAIRtracks ecosystem aims at harmonising this process.
Primary data needs to be handled independently following domain best practices
(see e.g. the pages on [Single cell sequencing](single_cell_sequencing), [Plant sciences](plant_sciences), or [Rare disease data](rare_disease_data)).

The FAIRtracks ecosystem is developed and provided as part of the national Service Delivery Plans by
[ELIXIR Norway](https://elixir.no/) and [ELIXIR Spain](https://elixir-europe.org/about-us/who-we-are/nodes/spain),
and is supported by the [Track Hub Registry group](https://trackhubregistry.org/) at [EMBL-EBI](https://www.ebi.ac.uk/).
FAIRtracks is endorsed by [ELIXIR Europe](https://elixir-europe.org/) as a
[Recommended Interoperability Resource](https://elixir-europe.org/platforms/interoperability/rirs).

{% include image.html file="fairtracks_tool-assembly.png" caption="Figure 1. Illustration of the Data life cycle
for the FAIRtracks tool assembly. As genomic tracks/annotations represent condensed summaries of the raw data,
this ecosystem covers a secondary cycle designed around the FAIRtracks metadata model.
The grey box shows the areas of relevance for the FAIRtracks ecosystem with its integrations,
and only a subset of the icons represents FAIRtracks services per se. Omnipy (dark grey box) is a general Python library
for scalable and reproducible data wrangling which can be used across several data models and research disciplines."
alt="FAIRtracks RDMkit" %}

## Who can use the FAIRtracks tool assembly?

There is no central authentication solution for the FAIRtracks services requiring login.
The entire FAIRtracks ecosystem is available to everyone.
Most of the services are accessible through Application Programming Interfaces (APIs). More details are provided in the description below.
Users of the FAIRtracks ecosystem belong to different categories, which could be summarised as:

- Researchers and data analysts
- Data providers and biocurators
- Developers working on tooling for
- Research
- Implementation of the FAIR data principles

Each of these categories benefits specifically from a subset of the global ecosystem.
The core services can be accessed both upstream (for data providers and biocurators) and downstream (for tool developers and analytical end users).

## For what can you use the FAIRtracks tool assembly?

The FAIRtracks tool assembly can be used for a large number of applications; we summarise the main ones below following the steps of the data life-cycle
and focusing on particular tools.

While the assembly does not include a tool for [Data Management Planning](dmp),
the FAIRtracks metadata standard is registered in {%tool "fairsharing" %}
and, thus, formally connected to several other standards and databases.
The FAIRtracks standard can, thus, be selected on your Data Management Plan in all the instances of {% tool "data-stewardship-wizard" %} through
the integration with {%tool "fairsharing" %}.

{%tool "omnipy" %} is a high-level Python library for type-driven data wrangling and scalable data flow orchestration;
it is a self-standing subset of the FAIRtracks ecosystem covering several steps in the data life-cicle.
It can be used to extract metadata from specific portals and for [Processing](processing) of metadata entries to harmonise them into a unique model.
{%tool "omnipy" %} data flows are defined as transformations from specific input data models to specific output data models.
Input and output data are validated at each iteration through parsing based on {%tool "pydantic" %}.
Offloading of data flows to external compute resources is provided through the integration of {%tool "omnipy" %} with an orchestration engine based on {%tool "prefect" %}.

There is ongoing work into adding {%tool "prefect" %} as one of the services available in the
[National Infrastructure for Research Data (NIRD) service platform](https://www.sigma2.no/nird-service-platform).
This would enable running {%tool "omnipy" %} on data and metadata stored in the [NIRD data storage](https://www.sigma2.no/data-storage).
Refer also to the [Norwegian national page](no_resources) for more details. Note that, while the usage of NIRD storage and services
is certainly convenient for Norwegian users, this is not a central or mandatory part of the tool assembly which is born as an international
service and aims at maintaining this status.

Data [Sharing](sharing) and preservation is one of the key components of the FAIRtracks ecosystem.
Since genomic annotations/tracks typically consist of secondary data files referring to primary data sources,
they are often deposited together with the primary data. The aim of the minimal metadata model is to
offer a greater level of granularity, providing each track with an identifier and enabling the possibility of analysis across datasets
in an automatised fashion. A dedicated registry would typically be required to accomplish this. Given that such a registry does not yet exist,
the current recommendation is to deposit FAIRtracks-compliant metadata files to {%tool "zenodo" %},
as this platform supports both Digital Object Identifier (DOI) versioning and DOI reservation before publication.
The identifiers on the metadata FAIRtracks object are then cross-linked with the actual data which is hosted
e.g. in a [Track Hub](https://genome.ucsc.edu/goldenPath/help/hgTrackHubHelp.html) and registered in
the {%tool "track-hub-registry" %}.

Data and metadata organised in this fashion can be discovered for [Reusing](reusing) through {%tool "trackfind" %},
a search and curation engine for genomic tracks.
{%tool "trackfind" %} will import FAIRtracks-compliant metadata from e.g. {%tool "zenodo" %}.
This metadata can be accessed through hierarchical browsing or by search queries both through a web-based user interface and as a RESTful API.
TrackFind supports advanced SQL-based queries that can be easily built into the user interface.

Additional tools that comprise the core of the FAIRtracks ecosystem are the
[metadata validation](https://fairtracks.net/services/?category=Core%20services&tags%5B0%5D=Metadata%20validation) and the
[metadata augmentation](https://fairtracks.net/services/?category=Core%20services&tags%5B0%5D=Metadata%20augmentation) services.
The former is REST API that extends the standard JSON Schema validation technology to
e.g. validate ontology terms or check CURIEs against the registered entries.
The [FAIRtracks augmentation service](https://fairtracks.net/services/?category=Core%20services&tags%5B0%5D=Metadata%20augmentation)
is implemented as a REST API that expands on the information contained in a minimal FAIRtracks JSON by adding
a set of fields with human-readable values including ontology labels, versions, and summaries.
This service bridges the gap between data providers, which are required to submit only minimal information, and data consumers
who require richer information for data discovery and retrieval.
2 changes: 1 addition & 1 deletion pages/your_domain/human_data.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ contributors: [Niclas Jareborg, Nirupama Benis, Ana Portugal Melo, Pinar Alper,
page_id: human_data
related_pages:
your_tasks: [sensitive, gdpr_compliance]
tool_assembly: [tsd, covid-19, transmed]
tool_assembly: [tsd, covid-19, transmed, fairtracks]
training:
- name: Training in TeSS
registry: TeSS
Expand Down
2 changes: 1 addition & 1 deletion pages/your_domain/plant_sciences.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ related_pages:
page_id: plants
related_pages:
your_tasks: [metadata]
tool_assembly: [plant_geno_assembly, plant_pheno_assembly]
tool_assembly: [plant_geno_assembly, plant_pheno_assembly, fairtracks]
training:
- name: Training in TeSS
registry: TeSS
Expand Down
1 change: 1 addition & 0 deletions pages/your_domain/rare_disease_data.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@ contributors: [Philip van Damme, Nirupama Benis, César Bernabé, Shuxin Zhang,
page_id: rare_disease
related_pages:
your_domain: [human_data]
tool_assembly: [fairtracks]
your_tasks: [dmp, data_publication, machine_actionability]
---

Expand Down
2 changes: 1 addition & 1 deletion pages/your_domain/single_cell_sequencing.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ description: "Managing data generated from single-cell sequencing experiments."
contributors: [Johan Rollin, Pavankumar Videm, Mehmet Tekman]
related_pages:
your_tasks: [dmp, data_organisation, data_publication, metadata, storage]
tool_assembly: [galaxy]
tool_assembly: [galaxy, fairtracks]
training:
- name: Single-cell training on the Galaxy Training Network
url: "https://usegalaxy.eu/training-material/topics/single-cell/"
Expand Down
Loading