Skip to content

Commit

Permalink
3.4.2
Browse files Browse the repository at this point in the history
  • Loading branch information
tdayris committed Mar 29, 2024
1 parent 508b572 commit 968c0c2
Show file tree
Hide file tree
Showing 16 changed files with 253 additions and 160 deletions.
7 changes: 7 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,10 @@
# 3.4.2

## Features:

* Use human readable functions to replace raw lookups
* snakemake-wrappers update to 3.7.0

# 3.4.1

## Features:
Expand Down
42 changes: 21 additions & 21 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,10 +25,10 @@ The tools used in this pipeline are described [here](https://github.com/tdayris/

| Step | Commands |
| -------------------------------- | ---------------------------------------------------------------------------------------------------------------- |
| Download DNA Fasta from Ensembl | [ensembl-sequence](https://snakemake-wrappers.readthedocs.io/en/v3.5.2/wrappers/reference/ensembl-sequence.html) |
| Download DNA Fasta from Ensembl | [ensembl-sequence](https://snakemake-wrappers.readthedocs.io/en/v3.7.0/wrappers/reference/ensembl-sequence.html) |
| Remove non-canonical chromosomes | [pyfaidx](https://github.com/mdshw5/pyfaidx) |
| Index DNA sequence | [samtools](https://snakemake-wrappers.readthedocs.io/en/v3.5.2/wrappers/samtools/faidx.html) |
| Creatse sequence Dictionary | [picard](https://snakemake-wrappers.readthedocs.io/en/v3.5.2/wrappers/picard/createsequencedictionary.html) |
| Index DNA sequence | [samtools](https://snakemake-wrappers.readthedocs.io/en/v3.7.0/wrappers/samtools/faidx.html) |
| Creatse sequence Dictionary | [picard](https://snakemake-wrappers.readthedocs.io/en/v3.7.0/wrappers/picard/createsequencedictionary.html) |

```
┌────────────────────────────────────────┐
Expand All @@ -51,11 +51,11 @@ The tools used in this pipeline are described [here](https://github.com/tdayris/

| Step | Commands |
| ---------------------------------------------------------- | -------------------------------------------------------------------------------------------------------------------- |
| Download GTF annotation | [ensembl-annotation](https://snakemake-wrappers.readthedocs.io/en/v3.5.2/wrappers/reference/ensembl-annotation.html) |
| Fix format errors | [Agat](https://agat.readthedocs.io/en/v3.5.2/tools/agat_convert_sp_gff2gtf.html) |
| Remove non-canonical chromosomes, based on above DNA Fasta | [Agat](https://agat.readthedocs.io/en/v3.5.2/tools/agat_sq_filter_feature_from_fasta.html) |
| Remove `<NA>` Transcript support levels | [Agat](https://agat.readthedocs.io/en/v3.5.2/tools/agat_sp_filter_feature_by_attribute_value.html) |
| Convert GTF to GenePred format | [gtf2genepred](https://snakemake-wrappers.readthedocs.io/en/v3.5.2/wrappers/ucsc/gtftogenepred.html) |
| Download GTF annotation | [ensembl-annotation](https://snakemake-wrappers.readthedocs.io/en/v3.7.0/wrappers/reference/ensembl-annotation.html) |
| Fix format errors | [Agat](https://agat.readthedocs.io/en/v3.7.0/tools/agat_convert_sp_gff2gtf.html) |
| Remove non-canonical chromosomes, based on above DNA Fasta | [Agat](https://agat.readthedocs.io/en/v3.7.0/tools/agat_sq_filter_feature_from_fasta.html) |
| Remove `<NA>` Transcript support levels | [Agat](https://agat.readthedocs.io/en/v3.7.0/tools/agat_sp_filter_feature_by_attribute_value.html) |
| Convert GTF to GenePred format | [gtf2genepred](https://snakemake-wrappers.readthedocs.io/en/v3.7.0/wrappers/ucsc/gtftogenepred.html) |


```
Expand Down Expand Up @@ -89,9 +89,9 @@ The tools used in this pipeline are described [here](https://github.com/tdayris/

| Step | Commands |
| --------------------------------------------------------- | ----------------------------------------------------------------------------------------------------------- |
| Extract transcript sequences from above DNA Fasta and GTF | [gffread](https://snakemake-wrappers.readthedocs.io/en/v3.5.2/wrappers/gffread.html) |
| Index DNA sequence | [samtools](https://snakemake-wrappers.readthedocs.io/en/v3.5.2/wrappers/samtools/faidx.html) |
| Creatse sequence Dictionary | [picard](https://snakemake-wrappers.readthedocs.io/en/v3.5.2/wrappers/picard/createsequencedictionary.html) |
| Extract transcript sequences from above DNA Fasta and GTF | [gffread](https://snakemake-wrappers.readthedocs.io/en/v3.7.0/wrappers/gffread.html) |
| Index DNA sequence | [samtools](https://snakemake-wrappers.readthedocs.io/en/v3.7.0/wrappers/samtools/faidx.html) |
| Creatse sequence Dictionary | [picard](https://snakemake-wrappers.readthedocs.io/en/v3.7.0/wrappers/picard/createsequencedictionary.html) |


```
Expand All @@ -115,10 +115,10 @@ The tools used in this pipeline are described [here](https://github.com/tdayris/

| Step | Commands |
| ----------------------------------------------------- | ----------------------------------------------------------------------------------------------------------- |
| Extract coding transcripts from above GTF | [Agat](https://agat.readthedocs.io/en/v3.5.2/tools/agat_sp_filter_feature_by_attribute_value.html) |
| Extract coding sequences from above DNA Fasta and GTF | [gffread](https://snakemake-wrappers.readthedocs.io/en/v3.5.2/wrappers/gffread.html) |
| Index DNA sequence | [samtools](https://snakemake-wrappers.readthedocs.io/en/v3.5.2/wrappers/samtools/faidx.html) |
| Creatse sequence Dictionary | [picard](https://snakemake-wrappers.readthedocs.io/en/v3.5.2/wrappers/picard/createsequencedictionary.html) |
| Extract coding transcripts from above GTF | [Agat](https://agat.readthedocs.io/en/v3.7.0/tools/agat_sp_filter_feature_by_attribute_value.html) |
| Extract coding sequences from above DNA Fasta and GTF | [gffread](https://snakemake-wrappers.readthedocs.io/en/v3.7.0/wrappers/gffread.html) |
| Index DNA sequence | [samtools](https://snakemake-wrappers.readthedocs.io/en/v3.7.0/wrappers/samtools/faidx.html) |
| Creatse sequence Dictionary | [picard](https://snakemake-wrappers.readthedocs.io/en/v3.7.0/wrappers/picard/createsequencedictionary.html) |


```
Expand All @@ -142,9 +142,9 @@ The tools used in this pipeline are described [here](https://github.com/tdayris/

| Step | Commands |
| -------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------- |
| Download dbSNP variants | [ensembl-variation](https://snakemake-wrappers.readthedocs.io/en/v3.5.2/wrappers/reference/ensembl-variation.html) |
| Filter non-canonical chromosomes | [pyfaidx](https://github.com/mdshw5/pyfaidx) + [BCFTools](https://snakemake-wrappers.readthedocs.io/en/v3.5.2/wrappers/bcftools/filter.html) |
| Index variants | [tabix](https://snakemake-wrappers.readthedocs.io/en/v3.5.2/wrappers/tabix/index.html) |
| Download dbSNP variants | [ensembl-variation](https://snakemake-wrappers.readthedocs.io/en/v3.7.0/wrappers/reference/ensembl-variation.html) |
| Filter non-canonical chromosomes | [pyfaidx](https://github.com/mdshw5/pyfaidx) + [BCFTools](https://snakemake-wrappers.readthedocs.io/en/v3.7.0/wrappers/bcftools/filter.html) |
| Index variants | [tabix](https://snakemake-wrappers.readthedocs.io/en/v3.7.0/wrappers/tabix/index.html) |


```
Expand All @@ -168,8 +168,8 @@ The tools used in this pipeline are described [here](https://github.com/tdayris/

| Step | Commands |
| ----------------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| Extract gene_id <-> gene_name correspondancy | [pyroe](https://snakemake-wrappers.readthedocs.io/en/v3.5.2/wrappers/pyroe/idtoname.html) |
| Extract transcript_id <-> gene_id <-> gene_name | [Agat](https://agat.readthedocs.io/en/v3.5.2/tools/agat_convert_sp_gff2tsv.html) + [XSV](https://snakemake-wrappers.readthedocs.io/en/v3.5.2/wrappers/xsv.html) |
| Extract gene_id <-> gene_name correspondancy | [pyroe](https://snakemake-wrappers.readthedocs.io/en/v3.7.0/wrappers/pyroe/idtoname.html) |
| Extract transcript_id <-> gene_id <-> gene_name | [Agat](https://agat.readthedocs.io/en/v3.7.0/tools/agat_convert_sp_gff2tsv.html) + [XSV](https://snakemake-wrappers.readthedocs.io/en/v3.7.0/wrappers/xsv.html) |

```
┌────────────────────────────────┐
Expand All @@ -193,7 +193,7 @@ The tools used in this pipeline are described [here](https://github.com/tdayris/
| Step | Commands |
| ---------------------------- | -------------------------------------------------------------------------------------------- |
| Download blacklisted regions | [Github source](https://github.com/Boyle-Lab/Blacklist/tree/master/lists) |
| Merge overlapping intervals | [bedtools](https://snakemake-wrappers.readthedocs.io/en/v3.5.2/wrappers/bedtools/merge.html) |
| Merge overlapping intervals | [bedtools](https://snakemake-wrappers.readthedocs.io/en/v3.7.0/wrappers/bedtools/merge.html) |


```
Expand Down
16 changes: 8 additions & 8 deletions workflow/reports/material_methods.rst
Original file line number Diff line number Diff line change
Expand Up @@ -46,17 +46,17 @@ usage, and resutls can be found on the `Snakemake workflow`_ page.
.. _Snakemake: https://snakemake.readthedocs.io
.. _Github: https://github.com/tdayris/fair_genome_indexer
.. _`Snakemake workflow`: https://snakemake.github.io/snakemake-workflow-catalog?usage=tdayris/fair_genome_indexer
.. _Picard: https://snakemake-wrappers.readthedocs.io/en/v3.5.2/wrappers/picard/createsequencedictionary.html
.. _Samtools: https://snakemake-wrappers.readthedocs.io/en/v3.5.2/wrappers/samtools/faidx.html
.. _Picard: https://snakemake-wrappers.readthedocs.io/en/v3.7.0/wrappers/picard/createsequencedictionary.html
.. _Samtools: https://snakemake-wrappers.readthedocs.io/en/v3.7.0/wrappers/samtools/faidx.html
.. _Agat: https://agat.readthedocs.io/en/latest/index.html
.. _Pyroe: https://snakemake-wrappers.readthedocs.io/en/v3.5.2/wrappers/pyroe/idtoname.html
.. _Pyroe: https://snakemake-wrappers.readthedocs.io/en/v3.7.0/wrappers/pyroe/idtoname.html
.. _Pyfaidx: https://github.com/mdshw5/pyfaidx
.. _GFFRead: https://snakemake-wrappers.readthedocs.io/en/v3.5.2/wrappers/gffread.html
.. _XSV: https://snakemake-wrappers.readthedocs.io/en/v3.5.2/wrappers/xsv.html
.. _BCFTools: https://snakemake-wrappers.readthedocs.io/en/v3.5.2/wrappers/bcftools/filter.html
.. _Tabix: https://snakemake-wrappers.readthedocs.io/en/v3.5.2/wrappers/tabix/index.html
.. _GFFRead: https://snakemake-wrappers.readthedocs.io/en/v3.7.0/wrappers/gffread.html
.. _XSV: https://snakemake-wrappers.readthedocs.io/en/v3.7.0/wrappers/xsv.html
.. _BCFTools: https://snakemake-wrappers.readthedocs.io/en/v3.7.0/wrappers/bcftools/filter.html
.. _Tabix: https://snakemake-wrappers.readthedocs.io/en/v3.7.0/wrappers/tabix/index.html
.. _`Boyle-Lab's Github`: https://github.com/Boyle-Lab/Blacklist
.. _BEDTools: https://snakemake-wrappers.readthedocs.io/en/v3.5.2/wrappers/bedtools/merge.html
.. _BEDTools: https://snakemake-wrappers.readthedocs.io/en/v3.7.0/wrappers/bedtools/merge.html
.. _UCSC: https://genome.ucsc.edu/FAQ/FAQformat.html

:Authors:
Expand Down
40 changes: 9 additions & 31 deletions workflow/rules/agat.smk
Original file line number Diff line number Diff line change
Expand Up @@ -11,9 +11,8 @@ rule fair_genome_indexer_agat_config:
benchmark:
"benchmark/fair_genome_indexer/agat_config.tsv"
params:
config=dlookup(
config=lookup_config(
dpath="params/fair_genome_indexer/agat/config",
within=config,
default={
"output_format": "GTF",
"gff_output_version": 3,
Expand Down Expand Up @@ -68,9 +67,7 @@ rule fair_genome_indexer_agat_convert_sp_gff2gtf:
benchmark:
"benchmark/fair_genome_indexer/agat_convert_sp_gff2gtf/{species}.{build}.{release}.tsv"
params:
extra=dlookup(
dpath="params/fair_genome_indexer/agat/gff2gtf", within=config, default=""
),
extra=lookup_config(dpath="params/fair_genome_indexer/agat/gff2gtf", default=""),
conda:
"../envs/agat.yaml"
script:
Expand Down Expand Up @@ -103,9 +100,8 @@ rule fair_genome_indexer_agat_sp_filter_feature_by_attribute_value:
benchmark:
"benchmark/fair_genome_indexer/agat_sp_filter_feature_by_attribute_value/{species}.{build}.{release}.tsv"
params:
extra=dlookup(
extra=lookup_config(
dpath="params/fair_genome_indexer/agat/select_feature_by_attribute_value",
within=config,
default="--attribute 'transcript_support_level' --value '\"NA\"' --test '='",
),
conda:
Expand All @@ -117,25 +113,14 @@ rule fair_genome_indexer_agat_sp_filter_feature_by_attribute_value:
rule fair_genome_indexer_agat_sq_filter_feature_from_fasta:
input:
gtf=branch(
dlookup(
lookup_config(
dpath="params/fair_genome_indexer/agat/select_feature_by_attribute_value",
within=config,
),
then="tmp/fair_genome_indexer/agat_sp_filter_feature_by_attribute_value/{species}.{build}.{release}.filtered.gtf",
otherwise="tmp/fair_genome_indexer/agat_convert_sp_gff2gtf/{species}.{build}.{release}.format.gtf",
),
fasta=dlookup(
default="reference/sequences/{species}.{build}.{release}.dna.fasta",
query="species == '{species}' & build == '{build} & release == '{release}'",
key="dna_fasta",
within=genomes,
),
fasta_index=dlookup(
query="species == '{species}' & build == '{build} & release == '{release}'",
key="dna_fai",
within=genomes,
default="reference/sequences/{species}.{build}.{release}.dna.fasta.fai",
),
fasta=lambda wildcards: get_dna_fasta(wildcards),
fasta_index=lambda wildcards: get_dna_fai(wildcards),
config="tmp/fair_genome_indexer/agat_config/config.yaml",
output:
gtf="reference/annotation/{species}.{build}.{release}.gtf",
Expand All @@ -151,9 +136,8 @@ rule fair_genome_indexer_agat_sq_filter_feature_from_fasta:
benchmark:
"benchmark/fair_genome_indexer/agat_sq_filter_feature_from_fasta/{species}.{build}.{release}.tsv"
params:
extra=dlookup(
extra=lookup_config(
dpath="params/fair_genome_indexer/agat/filter_features",
within=config,
default="",
),
conda:
Expand All @@ -164,12 +148,7 @@ rule fair_genome_indexer_agat_sq_filter_feature_from_fasta:

use rule fair_genome_indexer_agat_sp_filter_feature_by_attribute_value as fair_genome_indexer_agat_sp_filter_feature_by_attribute_value_cdna with:
input:
gtf=dlookup(
query="species == '{species} & release == '{release}' & build == '{build}'",
within=genomes,
key="gtf",
default="reference/annotation/{species}.{build}.{release}.gtf",
),
gtf=lambda wildcards: get_gtf(wildcards),
config="tmp/fair_genome_indexer/agat_config/config.yaml",
output:
gtf=temp(
Expand All @@ -186,8 +165,7 @@ use rule fair_genome_indexer_agat_sp_filter_feature_by_attribute_value as fair_g
benchmark:
"benchmark/fair_genome_indexer/agat_sp_filter_feature_by_attribute_value_cdna/{species}.{build}.{release}.tsv"
params:
extra=dlookup(
extra=lookup_config(
dpath="params/fair_genome_indexer/agat/filter_feature_by_attribute_value",
within=config,
default="--attribute transcript_biotype --value '\"protein_coding\"' --test '='",
),
26 changes: 7 additions & 19 deletions workflow/rules/bcftools_filter_dbsnp.smk
Original file line number Diff line number Diff line change
@@ -1,34 +1,23 @@
rule fair_genome_indexer_pyfaidx_fasta_dict_to_bed:
input:
fasta=dlookup(
query="species == '{species}' & build == '{build}' & release == '{release}'",
within=genomes,
key="dna_fasta",
default="reference/sequences/{species}.{build}.{release}.dna.fasta",
),
fai=dlookup(
query="species == '{species}' & build == '{build}' & release == '{release}'",
within=genomes,
key="dna_fai",
default="reference/sequences/{species}.{build}.{release}.dna.fasta.fai",
),
fasta=lambda wildcards: select_fasta(wildcards),
fai=lambda wildcards: select_fai(wildcards),
output:
temp(
"tmp/fair_genome_indexer/pyfaidx_fasta_dict_to_bed/{species}.{build}.{release}.dna.bed"
"tmp/fair_genome_indexer/pyfaidx_fasta_dict_to_bed/{species}.{build}.{release}.{datatype}.bed"
),
threads: 1
resources:
mem_mb=lambda wildcards, attempt: 768 * attempt,
runtime=lambda wildcards, attempt: 5 * attempt,
tmpdir=tmp,
log:
"logs/fair_genome_indexer/pyfaidx_fasta_dict_to_bed/{species}.{build}.{release}.dna.log",
"logs/fair_genome_indexer/pyfaidx_fasta_dict_to_bed/{species}.{build}.{release}.{datatype}.log",
benchmark:
"benchmark/fair_genome_indexer/pyfaidx_fasta_dict_to_bed/{species}.{build}.{release}.dna.tsv"
"benchmark/fair_genome_indexer/pyfaidx_fasta_dict_to_bed/{species}.{build}.{release}.{datatype}.tsv"
params:
extra=dlookup(
extra=lookup_config(
dpath="params/fair_genome_indexer/pyfaidx/fasta_dict_to_bed",
within=config,
default="",
),
conda:
Expand Down Expand Up @@ -58,9 +47,8 @@ rule fair_genome_indexer_bcftools_filter_non_canonical_chrom:
benchmark:
"benchmark/fair_genome_indexer/bcftools_filter_non_canonical_chrom/{species}.{build}.{release}.all.tsv"
params:
extra=dlookup(
extra=lookup_config(
dpath="params/fair_genome_indexer/bedtools/filter_non_canonical_chrom",
within=config,
default="",
),
wrapper:
Expand Down
3 changes: 1 addition & 2 deletions workflow/rules/bedtools_merge_blacklist.smk
Original file line number Diff line number Diff line change
Expand Up @@ -13,9 +13,8 @@ rule fair_genome_indexer_bedtools_merge_blacklist:
benchmark:
"benchmark/fair_genome_indexer/bedtools_merge_blacklist/{species}.{build}.{release}.tsv"
params:
extra=dlookup(
extra=lookup_config(
dpath="params/fair_genome_indexer/bedtools/merge",
within=config,
default="-d 5",
),
wrapper:
Expand Down
Loading

0 comments on commit 968c0c2

Please sign in to comment.