diff --git a/schema/drafts/atac_schema.md b/schema/drafts/atac_schema.md index 4aac5f00e..40a7a450b 100644 --- a/schema/drafts/atac_schema.md +++ b/schema/drafts/atac_schema.md @@ -4,18 +4,20 @@ unpaired assay: "EFO:0010891" for scATAC-seq or its descendants that is not a descendant of "EFO:0008913" for single-cell RNA sequencing -## Fragment File Dataset Criteria +## scATAC-seq asset Dataset Criteria -A Dataset MUST meet each of the following criteria in order to be eligible for an Fragment File asset: +A Dataset MUST meet each of the following criteria in order to be eligible for scATAC-seq assets: * the obs['assay_ontology_term_id'] values MUST all be paired assays or MUST all be unpaired assays -* the obs['is_primary_data'] values MUST be all `True` +* the obs['is_primary_data'] values MUST be all True * the var['feature_reference'] values MUST include one of "NCBITaxon:9606" for Homo sapiens or "NCBITaxon:10090" for Mus musculus, but not both. The value that is present will determine the appropriate Chromosome Table for standards. -If the obs['assay_ontology_term_id'] values are all paired assays then the Dataset MAY have a fragment file asset. +If the obs['assay_ontology_term_id'] values are all paired assays then the Dataset MAY have a fragments file asset. -If the obs['assay_ontology_term_id'] values are all unpaired assays then the Dataset MUST have a fragment file asset. +If the obs['assay_ontology_term_id'] values are all unpaired assays then the Dataset MUST have a fragments file asset. -## Fragment File +If a Dataset has a fragments file asset, it MAY have genome track assets. Otherwise, it MUST NOT have genome track assets. + +## scATAC-seq Asset: Fragments File This MUST be a gzipped tab-separated values (TSV) file. @@ -75,7 +77,7 @@ The curator MUST annotate the following header-less columns. Additional columns Value - str. This MUST be the cell identifier. The value MUST be found in the obs index of the associated Dataset. Every obs index value of the associated Dataset MUST appear at least once in this column of the fragment file. + str. This MUST be the cell identifier. The value MUST be found in the obs index of the associated Dataset. Every obs index value of the associated Dataset MUST appear at least once in this column of the fragments file. @@ -96,9 +98,34 @@ The curator MUST annotate the following header-less columns. Additional columns
-## Fragment File index +## scATAC-seq Asset: Fragments File index + +For every fragments file asset, CELLxGENE Discover MUST generate a tabix index of the fragment intervals from the fragments file. The file name MUST be the name of the corresponding fragments file appended with `.tbi`. + +## `uns` (Dataset Metadata) + + + + + + + + + + + + + + +
Keypeak_grouping
AnnotationCurator MAY annotate if the Dataset has a fragments file asset; otherwise, this key MUST NOT be present.
Value + str. The value MUST match a key in obs. If annotated, genome track assets MUST be submitted. +
+ +## scATAC-seq Asset: Genome Track + +If uns['peak_grouping'] is annotated, there MUST be exactly one genome track asset submitted for each unique value in the obs column specified as determined by anndata.obs.{peak_grouping_column}.unique(). Otherwise, this MUST NOT be submitted. -CELLxGENE Discover MUST generate a tabix index of the fragment intervals from the fragment file. The file name MUST be the name of the corresponding fragment file appended with `.tbi`. +Asset file specifications TBD based on the visualization solution. Accepting .bigWig format is a requirement. ## Chromosome Tables