From d6b90c1c979296b4c7444787f9da7f1af54232f3 Mon Sep 17 00:00:00 2001
From: jahilton <jahilton@stanford.edu>
Date: Mon, 7 Oct 2024 08:43:09 -0700
Subject: [PATCH 1/7] initial draft of fragments file

---
 schema/atac_schema.md | 473 ++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 473 insertions(+)
 create mode 100644 schema/atac_schema.md
diff --git a/schema/atac_schema.md b/schema/atac_schema.md
new file mode 100644
index 000000000..fc930c1b2
--- /dev/null
+++ b/schema/atac_schema.md
@@ -0,0 +1,473 @@
+## scATAC-seq assay types
+
+<i>paired assay</i>: any descendant of <a href="https://www.ebi.ac.uk/ols4/ontologies/efo/classes?obo_id=EFO%3A0010891"><code>"EFO:0010891"</code></a> for <i>scATAC-seq</i> that is also a descendant of <a href="https://www.ebi.ac.uk/ols4/ontologies/efo/classes?obo_id=EFO%3A0008913"><code>"EFO:0008913"</code></a> for <i>single-cell RNA sequencing</i>
+
+<i>unpaired assay</i>: <a href="https://www.ebi.ac.uk/ols4/ontologies/efo/classes?obo_id=EFO%3A0010891"><code>"EFO:0010891"</code></a> for <i>scATAC-seq</i> or its descendants that is not a descendant of <a href="https://www.ebi.ac.uk/ols4/ontologies/efo/classes?obo_id=EFO%3A0008913"><code>"EFO:0008913"</code></a> for <i>single-cell RNA sequencing</i>
+
+## Fragment File Dataset Criteria
+
+A Dataset MUST meet each of the following criteria in order to be eligible for an attached Fragment File:
+* the <code>obs['assay_ontology_term_id']</code> values MUST all be <i>paired assays</i> or MUST all be <i>unpaired assays</i>
+* the <code>obs['is_primary_data']</code> values MUST be all `True`
+* the <code>var['feature_reference']</code> values MUST include one of <code>"NCBITaxon:9606"</code> for <i>Homo sapiens</i> or <code>"NCBITaxon:10090"</code> for <i>Mus musculus</i>, but not both. The value that is present will determine the appropriate Chromosome Table for standards.
+
+If the <code>obs['assay_ontology_term_id']</code> values are all <i>paired assays</i> then a fragment file MAY be attached to the Dataset.
+
+If the <code>obs['assay_ontology_term_id']</code> values are all <i>unpaired assays</i> then a fragment file MUST be attached to the Dataset.
+
+## Fragment File
+
+This MUST be a gzipped tab-separated values (TSV) file.
+
+The curator MUST annotate the following header-less columns. Additional columns and header lines beginning with `#` MUST NOT be included.
+
+### first column
+
+<table><tbody>
+    <tr>
+      <th>Annotator</th>
+      <td>Curator MUST annotate.</td>
+    </tr>
+    <tr>
+      <th>Value</th>
+        <td><code>str</code>. This MUST be the reference genome chromosome the fragment is located on. The value MUST be one of the values from the <code>Chromosome</code> column in the appropriate Chromosome Table.
+        </td>
+    </tr>
+</tbody></table>
+<br>
+
+### second column
+
+<table><tbody>
+    <tr>
+      <th>Annotator</th>
+      <td>Curator MUST annotate.</td>
+    </tr>
+    <tr>
+      <th>Value</th>
+        <td><code>int</code>. This MUST be the 0-based start coordinate of the fragment.
+        </td>
+    </tr>
+</tbody></table>
+<br>
+
+### third column
+
+<table><tbody>
+    <tr>
+      <th>Annotator</th>
+      <td>Curator MUST annotate.</td>
+    </tr>
+    <tr>
+      <th>Value</th>
+        <td><code>int</code>. This MUST be the 0-based end coordinate of the fragment. The end position is exclusive, so represents the position immediately following the fragment interval. The value MUST be greater than the start coordinate specified in the second column and less than or equal to the <code>Length</code> of the <code>Chromosome</code> specified in the first column, as specified in the appropriate Chromosome Table.
+        </td>
+    </tr>
+</tbody></table>
+<br>
+
+### fourth column
+
+<table><tbody>
+    <tr>
+      <th>Annotator</th>
+      <td>Curator MUST annotate.</td>
+    </tr>
+    <tr>
+      <th>Value</th>
+        <td><code>str</code>. This MUST be the cell identifier. The value MUST be found in the <code>obs</code> index of the associated Dataset. Every <code>obs</code> index value of the associated Dataset MUST appear at least once in this column of the fragment file.
+        </td>
+    </tr>
+</tbody></table>
+<br>
+
+### fifth column
+
+<table><tbody>
+    <tr>
+      <th>Annotator</th>
+      <td>Curator MUST annotate.</td>
+    </tr>
+    <tr>
+      <th>Value</th>
+        <td><code>int</code>. This MUST be the total number of read pairs associated with this fragment. The value MUST be <code>1</code> or greater.
+        </td>
+    </tr>
+</tbody></table>
+<br>
+
+## Fragment File index
+
+CELLxGENE Discover MUST generate a <a href="https://www.htslib.org/doc/tabix.html">tabix</a> index of the fragment intervals from the fragment file. The file name MUST be the name of the corresponding fragment file appended with `.tbi`.
+
+## Chromosome Tables
+
+As determined by the reference assembly used by the gene annotation versions pinned for this version of the schema. Only chromosomes or scaffolds that have at least one gene feature present are included.
+
+### <a href="https://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_44/GRCh38.primary_assembly.genome.fa.gz">human (GRCh38.p14)</a>
+
+<table>
+  <thead>
+  <tr>
+  <th>Chromosome</th>
+  <th>Length</th>
+  </tr>
+  </thead>
+  <tbody>
+    <tr>
+        <td>chr1</td>
+        <td>248956422</td>
+    </tr>
+    <tr>
+        <td>chr2</td>
+        <td>242193529</td>
+    </tr>
+    <tr>
+        <td>chr3</td>
+        <td>198295559</td>
+    </tr>
+    <tr>
+        <td>chr4</td>
+        <td>190214555</td>
+    </tr>
+    <tr>
+        <td>chr5</td>
+        <td>181538259</td>
+    </tr>
+    <tr>
+        <td>chr6</td>
+        <td>170805979</td>
+    </tr>
+    <tr>
+        <td>chr7</td>
+        <td>159345973</td>
+    </tr>
+    <tr>
+        <td>chr8</td>
+        <td>145138636</td>
+    </tr>
+    <tr>
+        <td>chr9</td>
+        <td>138394717</td>
+    </tr>
+    <tr>
+        <td>chr10</td>
+        <td>133797422</td>
+    </tr>
+    <tr>
+        <td>chr11</td>
+        <td>135086622</td>
+    </tr>
+    <tr>
+        <td>chr12</td>
+        <td>133275309</td>
+    </tr>
+    <tr>
+        <td>chr13</td>
+        <td>114364328</td>
+    </tr>
+    <tr>
+        <td>chr14</td>
+        <td>107043718</td>
+    </tr>
+    <tr>
+        <td>chr15</td>
+        <td>101991189</td>
+    </tr>
+    <tr>
+        <td>chr16</td>
+        <td>90338345</td>
+    </tr>
+    <tr>
+        <td>chr17</td>
+        <td>83257441</td>
+    </tr>
+    <tr>
+        <td>chr18</td>
+        <td>80373285</td>
+    </tr>
+    <tr>
+        <td>chr19</td>
+        <td>58617616</td>
+    </tr>
+    <tr>
+        <td>chr20</td>
+        <td>64444167</td>
+    </tr>
+    <tr>
+        <td>chr21</td>
+        <td>46709983</td>
+    </tr>
+    <tr>
+        <td>chr22</td>
+        <td>50818468</td>
+    </tr>
+    <tr>
+        <td>chrX</td>
+        <td>156040895</td>
+    </tr>
+    <tr>
+        <td>chrY</td>
+        <td>57227415</td>
+    </tr>
+    <tr>
+        <td>chrM</td>
+        <td>16569</td>
+    </tr>
+    <tr>
+        <td>GL000009.2</td>
+        <td>201709</td>
+    </tr>
+    <tr>
+        <td>GL000194.1</td>
+        <td>191469</td>
+    </tr>
+    <tr>
+        <td>GL000195.1</td>
+        <td>182896</td>
+    </tr>
+    <tr>
+        <td>GL000205.2</td>
+        <td>185591</td>
+    </tr>
+    <tr>
+        <td>GL000213.1</td>
+        <td>164239</td>
+    </tr>
+    <tr>
+        <td>GL000216.2</td>
+        <td>176608</td>
+    </tr>
+    <tr>
+        <td>GL000218.1</td>
+        <td>161147</td>
+    </tr>
+    <tr>
+        <td>GL000219.1</td>
+        <td>179198</td>
+    </tr>
+    <tr>
+        <td>GL000220.1</td>
+        <td>161802</td>
+    </tr>
+    <tr>
+        <td>GL000225.1</td>
+        <td>211173</td>
+    </tr>
+    <tr>
+        <td>KI270442.1</td>
+        <td>392061</td>
+    </tr>
+    <tr>
+        <td>KI270711.1</td>
+        <td>42210</td>
+    </tr>
+    <tr>
+        <td>KI270713.1</td>
+        <td>40745</td>
+    </tr>
+    <tr>
+        <td>KI270721.1</td>
+        <td>100316</td>
+    </tr>
+    <tr>
+        <td>KI270726.1</td>
+        <td>43739</td>
+    </tr>
+    <tr>
+        <td>KI270727.1</td>
+        <td>448248</td>
+    </tr>
+    <tr>
+        <td>KI270728.1</td>
+        <td>1872759</td>
+    </tr>
+    <tr>
+        <td>KI270731.1</td>
+        <td>150754</td>
+    </tr>
+    <tr>
+        <td>KI270733.1</td>
+        <td>179772</td>
+    </tr>
+    <tr>
+        <td>KI270734.1</td>
+        <td>165050</td>
+    </tr>
+    <tr>
+        <td>KI270744.1</td>
+        <td>168472</td>
+    </tr>
+    <tr>
+        <td>KI270750.1</td>
+        <td>148850</td>
+    </tr>
+</tbody></table>
+
+### <a href="https://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_mouse/release_M33/GRCm39.primary_assembly.genome.fa.gz">mouse (GRCm39)</a>
+
+<table>
+  <thead>
+  <tr>
+  <th>Chromosome</th>
+  <th>Length</th>
+  </tr>
+  </thead>
+  <tbody>
+    <tr>
+        <td>chr1</td>
+        <td>195154279</td>
+    </tr>
+    <tr>
+        <td>chr2</td>
+        <td>181755017</td>
+    </tr>
+    <tr>
+        <td>chr3</td>
+        <td>159745316</td>
+    </tr>
+    <tr>
+        <td>chr4</td>
+        <td>156860686</td>
+    </tr>
+    <tr>
+        <td>chr5</td>
+        <td>151758149</td>
+    </tr>
+    <tr>
+        <td>chr6</td>
+        <td>149588044</td>
+    </tr>
+    <tr>
+        <td>chr7</td>
+        <td>144995196</td>
+    </tr>
+    <tr>
+        <td>chr8</td>
+        <td>130127694</td>
+    </tr>
+    <tr>
+        <td>chr9</td>
+        <td>124359700</td>
+    </tr>
+    <tr>
+        <td>chr10</td>
+        <td>130530862</td>
+    </tr>
+    <tr>
+        <td>chr11</td>
+        <td>121973369</td>
+    </tr>
+    <tr>
+        <td>chr12</td>
+        <td>120092757</td>
+    </tr>
+    <tr>
+        <td>chr13</td>
+        <td>120883175</td>
+    </tr>
+    <tr>
+        <td>chr14</td>
+        <td>125139656</td>
+    </tr>
+    <tr>
+        <td>chr15</td>
+        <td>104073951</td>
+    </tr>
+    <tr>
+        <td>chr16</td>
+        <td>98008968</td>
+    </tr>
+    <tr>
+        <td>chr17</td>
+        <td>95294699</td>
+    </tr>
+    <tr>
+        <td>chr18</td>
+        <td>90720763</td>
+    </tr>
+    <tr>
+        <td>chr19</td>
+        <td>61420004</td>
+    </tr>
+    <tr>
+        <td>chrX</td>
+        <td>169476592</td>
+    </tr>
+    <tr>
+        <td>chrY</td>
+        <td>91455967</td>
+    </tr>
+    <tr>
+        <td>chrM</td>
+        <td>16299</td>
+    </tr>
+    <tr>
+        <td>GL456210.1</td>
+        <td>169725</td>
+    </tr>
+    <tr>
+        <td>GL456211.1</td>
+        <td>241735</td>
+    </tr>
+    <tr>
+        <td>GL456212.1</td>
+        <td>153618</td>
+    </tr>
+    <tr>
+        <td>GL456219.1</td>
+        <td>175968</td>
+    </tr>
+    <tr>
+        <td>GL456221.1</td>
+        <td>206961</td>
+    </tr>
+    <tr>
+        <td>GL456239.1</td>
+        <td>40056</td>
+    </tr>
+    <tr>
+        <td>GL456354.1</td>
+        <td>195993</td>
+    </tr>
+    <tr>
+        <td>GL456372.1</td>
+        <td>28664</td>
+    </tr>
+    <tr>
+        <td>GL456381.1</td>
+        <td>25871</td>
+    </tr>
+    <tr>
+        <td>GL456385.1</td>
+        <td>35240</td>
+    </tr>
+    <tr>
+        <td>JH584295.1</td>
+        <td>1976</td>
+    </tr>
+    <tr>
+        <td>JH584296.1</td>
+        <td>199368</td>
+    </tr>
+    <tr>
+        <td>JH584297.1</td>
+        <td>205776</td>
+    </tr>
+    <tr>
+        <td>JH584298.1</td>
+        <td>184189</td>
+    </tr>
+    <tr>
+        <td>JH584299.1</td>
+        <td>953012</td>
+    </tr>
+    <tr>
+        <td>JH584303.1</td>
+        <td>158099</td>
+    </tr>
+    <tr>
+        <td>JH584304.1</td>
+        <td>114452</td>
+    </tr>
+</tbody></table>
\ No newline at end of file

From fc3a9acca82e90fcd13d515d7303bb8a8bb312e7 Mon Sep 17 00:00:00 2001
From: jahilton <jahilton@stanford.edu>
Date: Wed, 23 Oct 2024 16:53:41 -0700
Subject: [PATCH 2/7] move atac schema into drafts

---
 schema/{ => drafts}/atac_schema.md | 0
 1 file changed, 0 insertions(+), 0 deletions(-)
 rename schema/{ => drafts}/atac_schema.md (100%)

diff --git a/schema/atac_schema.md b/schema/drafts/atac_schema.md
similarity index 100%
rename from schema/atac_schema.md
rename to schema/drafts/atac_schema.md

From 6cb2849e5b50eb391d55980e6a115af870183b17 Mon Sep 17 00:00:00 2001
From: jahilton <jahilton@stanford.edu>
Date: Wed, 23 Oct 2024 16:54:31 -0700
Subject: [PATCH 3/7] use asset language

---
 schema/drafts/atac_schema.md | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/schema/drafts/atac_schema.md b/schema/drafts/atac_schema.md
index fc930c1b2..4aac5f00e 100644
--- a/schema/drafts/atac_schema.md
+++ b/schema/drafts/atac_schema.md
@@ -6,14 +6,14 @@
 
 ## Fragment File Dataset Criteria
 
-A Dataset MUST meet each of the following criteria in order to be eligible for an attached Fragment File:
+A Dataset MUST meet each of the following criteria in order to be eligible for an  Fragment File asset:
 * the <code>obs['assay_ontology_term_id']</code> values MUST all be <i>paired assays</i> or MUST all be <i>unpaired assays</i>
 * the <code>obs['is_primary_data']</code> values MUST be all `True`
 * the <code>var['feature_reference']</code> values MUST include one of <code>"NCBITaxon:9606"</code> for <i>Homo sapiens</i> or <code>"NCBITaxon:10090"</code> for <i>Mus musculus</i>, but not both. The value that is present will determine the appropriate Chromosome Table for standards.
 
-If the <code>obs['assay_ontology_term_id']</code> values are all <i>paired assays</i> then a fragment file MAY be attached to the Dataset.
+If the <code>obs['assay_ontology_term_id']</code> values are all <i>paired assays</i> then the Dataset MAY have a fragment file asset.
 
-If the <code>obs['assay_ontology_term_id']</code> values are all <i>unpaired assays</i> then a fragment file MUST be attached to the Dataset.
+If the <code>obs['assay_ontology_term_id']</code> values are all <i>unpaired assays</i> then the Dataset MUST have a fragment file asset.
 
 ## Fragment File
 

From 01bb4c11b16f881fd03f12d6a54df638b0e29df2 Mon Sep 17 00:00:00 2001
From: jahilton <jahilton@stanford.edu>
Date: Thu, 24 Oct 2024 10:25:56 -0700
Subject: [PATCH 4/7] add Genome Tracks notes, consistent fragmentS file term

---
 schema/drafts/atac_schema.md | 45 ++++++++++++++++++++++++++++--------
 1 file changed, 36 insertions(+), 9 deletions(-)

diff --git a/schema/drafts/atac_schema.md b/schema/drafts/atac_schema.md
index 4aac5f00e..40a7a450b 100644
--- a/schema/drafts/atac_schema.md
+++ b/schema/drafts/atac_schema.md
@@ -4,18 +4,20 @@
 
 <i>unpaired assay</i>: <a href="https://www.ebi.ac.uk/ols4/ontologies/efo/classes?obo_id=EFO%3A0010891"><code>"EFO:0010891"</code></a> for <i>scATAC-seq</i> or its descendants that is not a descendant of <a href="https://www.ebi.ac.uk/ols4/ontologies/efo/classes?obo_id=EFO%3A0008913"><code>"EFO:0008913"</code></a> for <i>single-cell RNA sequencing</i>
 
-## Fragment File Dataset Criteria
+## scATAC-seq asset Dataset Criteria
 
-A Dataset MUST meet each of the following criteria in order to be eligible for an  Fragment File asset:
+A Dataset MUST meet each of the following criteria in order to be eligible for scATAC-seq assets:
 * the <code>obs['assay_ontology_term_id']</code> values MUST all be <i>paired assays</i> or MUST all be <i>unpaired assays</i>
-* the <code>obs['is_primary_data']</code> values MUST be all `True`
+* the <code>obs['is_primary_data']</code> values MUST be all <code>True</code>
 * the <code>var['feature_reference']</code> values MUST include one of <code>"NCBITaxon:9606"</code> for <i>Homo sapiens</i> or <code>"NCBITaxon:10090"</code> for <i>Mus musculus</i>, but not both. The value that is present will determine the appropriate Chromosome Table for standards.
 
-If the <code>obs['assay_ontology_term_id']</code> values are all <i>paired assays</i> then the Dataset MAY have a fragment file asset.
+If the <code>obs['assay_ontology_term_id']</code> values are all <i>paired assays</i> then the Dataset MAY have a fragments file asset.
 
-If the <code>obs['assay_ontology_term_id']</code> values are all <i>unpaired assays</i> then the Dataset MUST have a fragment file asset.
+If the <code>obs['assay_ontology_term_id']</code> values are all <i>unpaired assays</i> then the Dataset MUST have a fragments file asset.
 
-## Fragment File
+If a Dataset has a fragments file asset, it MAY have genome track assets. Otherwise, it MUST NOT have genome track assets.
+
+## scATAC-seq Asset: Fragments File
 
 This MUST be a gzipped tab-separated values (TSV) file.
 
@@ -75,7 +77,7 @@ The curator MUST annotate the following header-less columns. Additional columns
     </tr>
     <tr>
       <th>Value</th>
-        <td><code>str</code>. This MUST be the cell identifier. The value MUST be found in the <code>obs</code> index of the associated Dataset. Every <code>obs</code> index value of the associated Dataset MUST appear at least once in this column of the fragment file.
+        <td><code>str</code>. This MUST be the cell identifier. The value MUST be found in the <code>obs</code> index of the associated Dataset. Every <code>obs</code> index value of the associated Dataset MUST appear at least once in this column of the fragments file.
         </td>
     </tr>
 </tbody></table>
@@ -96,9 +98,34 @@ The curator MUST annotate the following header-less columns. Additional columns
 </tbody></table>
 <br>
 
-## Fragment File index
+## scATAC-seq Asset: Fragments File index
+
+For every fragments file asset, CELLxGENE Discover MUST generate a <a href="https://www.htslib.org/doc/tabix.html">tabix</a> index of the fragment intervals from the fragments file. The file name MUST be the name of the corresponding fragments file appended with `.tbi`.
+
+## `uns` (Dataset Metadata)
+
+<table><tbody>
+    <tr>
+      <th>Key</th>
+      <td>peak_grouping</td>
+    </tr>
+    <tr>
+      <th>Annotation</th>
+      <td>Curator MAY annotate if the Dataset has a fragments file asset; otherwise, this key MUST NOT be present.</td>
+    </tr>
+    <tr>
+      <th>Value</th>
+        <td>
+          <code>str</code>. The value MUST match a key in <code>obs</code>. If annotated, genome track assets MUST be submitted.
+        </td>
+    </tr>
+</tbody></table>
+
+## scATAC-seq Asset: Genome Track
+
+If <code>uns['peak_grouping']</code> is annotated, there MUST be exactly one genome track asset submitted for each unique value in the obs column specified as determined by <code>anndata.obs.{peak_grouping_column}.unique()</code>. Otherwise, this MUST NOT be submitted.
 
-CELLxGENE Discover MUST generate a <a href="https://www.htslib.org/doc/tabix.html">tabix</a> index of the fragment intervals from the fragment file. The file name MUST be the name of the corresponding fragment file appended with `.tbi`.
+Asset file specifications TBD based on the visualization solution. Accepting <a href="https://genome.ucsc.edu/goldenpath/help/bigWig.html">.bigWig format</a> is a requirement.
 
 ## Chromosome Tables
 

From 2458bad3f2b0af789320e15399834575079c85cb Mon Sep 17 00:00:00 2001
From: jahilton <jahilton@stanford.edu>
Date: Thu, 24 Oct 2024 12:16:21 -0700
Subject: [PATCH 5/7] block ontology id fields from peak_grouping

---
 schema/drafts/atac_schema.md | 13 ++++++++++++-
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/schema/drafts/atac_schema.md b/schema/drafts/atac_schema.md
index 40a7a450b..d6adc5e1d 100644
--- a/schema/drafts/atac_schema.md
+++ b/schema/drafts/atac_schema.md
@@ -116,7 +116,18 @@ For every fragments file asset, CELLxGENE Discover MUST generate a <a href="http
     <tr>
       <th>Value</th>
         <td>
-          <code>str</code>. The value MUST match a key in <code>obs</code>. If annotated, genome track assets MUST be submitted.
+          <code>str</code>. The value MUST match a key in <code>obs</code>. If annotated, genome track assets MUST be submitted. The following columns MUST NOT be specified:
+      <ul>
+        <li>assay_ontology_term_id</li>
+        <li>cell_type_ontology_term_id</li>
+        <li>development_stage_ontology_term_id</li>
+        <li>disease_ontology_term_id</li>
+        <li>organism_ontology_term_id</li>
+        <li>self_reported_ethnicity_ontology_term_id</li>
+        <li>sex_ontology_term_id</li>
+        <li>tissue_ontology_term_id</li>
+      </ul>
+      Instead specify the corresponding Discover column such as <code>cell_type</code>.<br><br>
         </td>
     </tr>
 </tbody></table>

From 427951ac463cab117e1efc4682ebfea3486c2f04 Mon Sep 17 00:00:00 2001
From: jahilton <jahilton@stanford.edu>
Date: Thu, 31 Oct 2024 16:29:42 -0700
Subject: [PATCH 6/7] split fragments from tracks

---
 .../{atac_schema.md => fragments_file.md}     | 43 +++----------------
 schema/drafts/genome_track.md                 | 41 ++++++++++++++++++
 2 files changed, 46 insertions(+), 38 deletions(-)
 rename schema/drafts/{atac_schema.md => fragments_file.md} (84%)
 create mode 100644 schema/drafts/genome_track.md

diff --git a/schema/drafts/atac_schema.md b/schema/drafts/fragments_file.md
similarity index 84%
rename from schema/drafts/atac_schema.md
rename to schema/drafts/fragments_file.md
index d6adc5e1d..813900432 100644
--- a/schema/drafts/atac_schema.md
+++ b/schema/drafts/fragments_file.md
@@ -15,9 +15,8 @@ If the <code>obs['assay_ontology_term_id']</code> values are all <i>paired assay
 
 If the <code>obs['assay_ontology_term_id']</code> values are all <i>unpaired assays</i> then the Dataset MUST have a fragments file asset.
 
-If a Dataset has a fragments file asset, it MAY have genome track assets. Otherwise, it MUST NOT have genome track assets.
 
-## scATAC-seq Asset: Fragments File
+## scATAC-seq Asset: Fragments File (submitted)
 
 This MUST be a gzipped tab-separated values (TSV) file.
 
@@ -98,45 +97,13 @@ The curator MUST annotate the following header-less columns. Additional columns
 </tbody></table>
 <br>
 
-## scATAC-seq Asset: Fragments File index
-
-For every fragments file asset, CELLxGENE Discover MUST generate a <a href="https://www.htslib.org/doc/tabix.html">tabix</a> index of the fragment intervals from the fragments file. The file name MUST be the name of the corresponding fragments file appended with `.tbi`.
+## scATAC-seq Asset: Fragments File (processed)
 
-## `uns` (Dataset Metadata)
+From every fragments file asset, CELLxGENE Discover MUST generate a tab-separated values (TSV) file position-sorted and compressed by bgzip.
 
-<table><tbody>
-    <tr>
-      <th>Key</th>
-      <td>peak_grouping</td>
-    </tr>
-    <tr>
-      <th>Annotation</th>
-      <td>Curator MAY annotate if the Dataset has a fragments file asset; otherwise, this key MUST NOT be present.</td>
-    </tr>
-    <tr>
-      <th>Value</th>
-        <td>
-          <code>str</code>. The value MUST match a key in <code>obs</code>. If annotated, genome track assets MUST be submitted. The following columns MUST NOT be specified:
-      <ul>
-        <li>assay_ontology_term_id</li>
-        <li>cell_type_ontology_term_id</li>
-        <li>development_stage_ontology_term_id</li>
-        <li>disease_ontology_term_id</li>
-        <li>organism_ontology_term_id</li>
-        <li>self_reported_ethnicity_ontology_term_id</li>
-        <li>sex_ontology_term_id</li>
-        <li>tissue_ontology_term_id</li>
-      </ul>
-      Instead specify the corresponding Discover column such as <code>cell_type</code>.<br><br>
-        </td>
-    </tr>
-</tbody></table>
-
-## scATAC-seq Asset: Genome Track
-
-If <code>uns['peak_grouping']</code> is annotated, there MUST be exactly one genome track asset submitted for each unique value in the obs column specified as determined by <code>anndata.obs.{peak_grouping_column}.unique()</code>. Otherwise, this MUST NOT be submitted.
+## scATAC-seq Asset: Fragments File index
 
-Asset file specifications TBD based on the visualization solution. Accepting <a href="https://genome.ucsc.edu/goldenpath/help/bigWig.html">.bigWig format</a> is a requirement.
+From every fragments file (processed) asset, CELLxGENE Discover MUST generate a <a href="https://www.htslib.org/doc/tabix.html">tabix</a> index of the fragment intervals from the fragments file. The file name MUST be the name of the corresponding fragments file appended with `.tbi`.
 
 ## Chromosome Tables
 
diff --git a/schema/drafts/genome_track.md b/schema/drafts/genome_track.md
new file mode 100644
index 000000000..dc18a6e6d
--- /dev/null
+++ b/schema/drafts/genome_track.md
@@ -0,0 +1,41 @@
+## scATAC-seq asset Dataset Criteria
+
+See [fragments file schema](https://github.com/chanzuckerberg/single-cell-curation/blob/main/schema/drafts/fragments_file.md) for criteria a Dataset MUST meet in order to be eligible for scATAC-seq assets.
+
+If a Dataset has a fragments file asset, it MAY have genome track assets. Otherwise, it MUST NOT have genome track assets.
+
+## `uns` (Dataset Metadata)
+
+<table><tbody>
+    <tr>
+      <th>Key</th>
+      <td>peak_grouping</td>
+    </tr>
+    <tr>
+      <th>Annotation</th>
+      <td>Curator MAY annotate if the Dataset has a fragments file asset; otherwise, this key MUST NOT be present.</td>
+    </tr>
+    <tr>
+      <th>Value</th>
+        <td>
+          <code>str</code>. The value MUST match a key in <code>obs</code>. If annotated, genome track assets MUST be submitted. The following columns MUST NOT be specified:
+      <ul>
+        <li>assay_ontology_term_id</li>
+        <li>cell_type_ontology_term_id</li>
+        <li>development_stage_ontology_term_id</li>
+        <li>disease_ontology_term_id</li>
+        <li>organism_ontology_term_id</li>
+        <li>self_reported_ethnicity_ontology_term_id</li>
+        <li>sex_ontology_term_id</li>
+        <li>tissue_ontology_term_id</li>
+      </ul>
+      Instead specify the corresponding Discover column such as <code>cell_type</code>.<br><br>
+        </td>
+    </tr>
+</tbody></table>
+
+## scATAC-seq Asset: Genome Track
+
+If <code>uns['peak_grouping']</code> is annotated, there MUST be exactly one genome track asset submitted for each unique value in the obs column specified as determined by <code>anndata.obs.{peak_grouping_column}.unique()</code>. Otherwise, this MUST NOT be submitted.
+
+Asset file specifications TBD based on the visualization solution. Accepting <a href="https://genome.ucsc.edu/goldenpath/help/bigWig.html">.bigWig format</a> is a requirement.

From 2dfd9061c414395d3d486166f978a09644328388 Mon Sep 17 00:00:00 2001
From: jahilton <jahilton@stanford.edu>
Date: Thu, 31 Oct 2024 16:38:12 -0700
Subject: [PATCH 7/7] specify naming convention for fragments file and index

---
 schema/drafts/fragments_file.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/schema/drafts/fragments_file.md b/schema/drafts/fragments_file.md
index 813900432..db4cc8cc7 100644
--- a/schema/drafts/fragments_file.md
+++ b/schema/drafts/fragments_file.md
@@ -99,11 +99,11 @@ The curator MUST annotate the following header-less columns. Additional columns
 
 ## scATAC-seq Asset: Fragments File (processed)
 
-From every fragments file asset, CELLxGENE Discover MUST generate a tab-separated values (TSV) file position-sorted and compressed by bgzip.
+From every fragments file asset, CELLxGENE Discover MUST generate <code>{dataset_version_id}-fragments.tsv.gz</code>, a tab-separated values (TSV) file position-sorted and compressed by bgzip.
 
 ## scATAC-seq Asset: Fragments File index
 
-From every fragments file (processed) asset, CELLxGENE Discover MUST generate a <a href="https://www.htslib.org/doc/tabix.html">tabix</a> index of the fragment intervals from the fragments file. The file name MUST be the name of the corresponding fragments file appended with `.tbi`.
+From every fragments file (processed) asset, CELLxGENE Discover MUST generate <code>{dataset_version_id}-fragments.tsv.gz.tbi</code>, a <a href="https://www.htslib.org/doc/tabix.html">tabix</a> index of the fragment intervals from the fragments file.
 
 ## Chromosome Tables
 

Annotator	Curator MUST annotate.
Value	`str`. This MUST be the reference genome chromosome the fragment is located on. The value MUST be one of the values from the `Chromosome` column in the appropriate Chromosome Table. +
Chromosome	Length
chr1	248956422
chr2	242193529
chr3	198295559
chr4	190214555
chr5	181538259
chr6	170805979
chr7	159345973
chr8	145138636
chr9	138394717
chr10	133797422
chr11	135086622
chr12	133275309
chr13	114364328
chr14	107043718
chr15	101991189
chr16	90338345
chr17	83257441
chr18	80373285
chr19	58617616
chr20	64444167
chr21	46709983
chr22	50818468
chrX	156040895
chrY	57227415
chrM	16569
GL000009.2	201709
GL000194.1	191469
GL000195.1	182896
GL000205.2	185591
GL000213.1	164239
GL000216.2	176608
GL000218.1	161147
GL000219.1	179198
GL000220.1	161802
GL000225.1	211173
KI270442.1	392061
KI270711.1	42210
KI270713.1	40745
KI270721.1	100316
KI270726.1	43739
KI270727.1	448248
KI270728.1	1872759
KI270731.1	150754
KI270733.1	179772
KI270734.1	165050
KI270744.1	168472
KI270750.1	148850
Chromosome	Length
chr1	195154279
chr2	181755017
chr3	159745316
chr4	156860686
chr5	151758149
chr6	149588044
chr7	144995196
chr8	130127694
chr9	124359700
chr10	130530862
chr11	121973369
chr12	120092757
chr13	120883175
chr14	125139656
chr15	104073951
chr16	98008968
chr17	95294699
chr18	90720763
chr19	61420004
chrX	169476592
chrY	91455967
chrM	16299
GL456210.1	169725
GL456211.1	241735
GL456212.1	153618
GL456219.1	175968
GL456221.1	206961
GL456239.1	40056
GL456354.1	195993
GL456372.1	28664
GL456381.1	25871
GL456385.1	35240
JH584295.1	1976
JH584296.1	199368
JH584297.1	205776
JH584298.1	184189
JH584299.1	953012
JH584303.1	158099
JH584304.1	114452
Key	peak_grouping
Annotation	Curator MAY annotate if the Dataset has a fragments file asset; otherwise, this key MUST NOT be present.
Value	+ `str`. The value MUST match a key in `obs`. If annotated, genome track assets MUST be submitted. +