Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove constitutive k-mers #131

Open
wants to merge 45 commits into
base: dev
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
45 commits
Select commit Hold shift + click to select a range
7b147ba
Remove SortMeRNA from requirements
olgabot Mar 9, 2021
b3d415c
Add Luiz's remove-many code
olgabot Mar 9, 2021
1a48555
Remove SortMeRNA
olgabot Mar 9, 2021
edd1dd1
Add parameters for housekeeping gene removal
olgabot Mar 9, 2021
5212de1
Add mini refseq download option for testing
olgabot Mar 10, 2021
f27b037
Add raw quote strings around nf-core lint
olgabot Mar 10, 2021
c1e9c59
Get fasta filtering working
olgabot Mar 10, 2021
19e9801
Update test params for download refseq
olgabot Mar 10, 2021
a861610
Add Rsync, return gxx linux
olgabot Mar 10, 2021
caf4450
Add osx environment yml
olgabot Mar 10, 2021
5a56dde
download refseq and filter fasta is working
olgabot Mar 10, 2021
7a27e40
Merge remote-tracking branch 'origin' into olgabot/remove-ribo-kmers
olgabot Mar 10, 2021
4d49662
Add missing quote
olgabot Mar 10, 2021
ed82230
Move merged sigs to view
olgabot Mar 10, 2021
071b441
Merge remote-tracking branch 'origin' into olgabot/remove-ribo-kmers
olgabot Mar 10, 2021
c5a0aae
Add test_download_refseq to ci.yml
olgabot Mar 10, 2021
7606bd2
Update scrape_software_versions
olgabot Mar 10, 2021
ba243e3
Remove sortmerna from get_software_versions
olgabot Mar 10, 2021
dbf824f
Fix sketch params in test_download_refseq
olgabot Mar 10, 2021
7b5cc3d
Got subtract to work!!
olgabot Mar 10, 2021
71695fc
Use mamba to install packages
olgabot Mar 11, 2021
ed266eb
Move Rust to conda-forge section
olgabot Mar 11, 2021
87082ac
Set sketch_scaled to 10 by default
olgabot Mar 11, 2021
76c32b1
reference_proteome_fasta --> translate_proteome_fasta
olgabot Mar 11, 2021
e4154cf
Use my branch of the rust sourmash remove code
olgabot Mar 11, 2021
94d5f2c
Add cmake to help with gcc building
olgabot Mar 11, 2021
c7d603d
Get housekeeping removal from sig, fasta to work
olgabot Mar 11, 2021
b65ebcd
Update vital gene tests
olgabot Mar 11, 2021
c22d7ee
Soft link conda C libraries
olgabot Mar 11, 2021
253a04d
housekeeping --> constitutive
olgabot Mar 12, 2021
d9cbf42
Add explicit path for conda bin
olgabot Mar 15, 2021
ea742f3
Actually do soft links
olgabot Mar 15, 2021
8d73e30
Update whitespace to make dockerfile more readable
olgabot Mar 15, 2021
b4e91b8
Add separate creation of ch_refseq_moltypes_to_download
olgabot Mar 15, 2021
403ab49
Update tests to all use mini refseq data
olgabot Mar 15, 2021
9a1af8a
Pipeline is running!
olgabot Mar 15, 2021
a565bd3
Get "sourmash compare" to run
olgabot Mar 15, 2021
f0337b8
Update constitutive rna sig for all configs
olgabot Mar 15, 2021
dbf97f9
Add test_bam alone
olgabot Mar 16, 2021
2ed90f8
Update constitutive signatures
olgabot Mar 16, 2021
6a694f0
housekeeping --> constitutive
olgabot Mar 16, 2021
ddaed1c
Reference proteome fasta --> translate_proteome_fasta
olgabot Mar 16, 2021
e7ff62f
Move bam to input section
olgabot Mar 16, 2021
f188f77
reference proteome fasta to translate_proteome_fasta in test_constitu…
olgabot Mar 16, 2021
793e9f1
Don't fail fast for all tests to see which individual ones are failing
olgabot Mar 16, 2021
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -61,18 +61,23 @@ jobs:
NXF_VER: '20.07.1'
NXF_ANSI_LOG: false
strategy:
fail-fast: false
matrix:
profile_flags:
- "test --sketch_scaled false --sketch_scaled_log2 2"
- "test --sketch_scaled false --sketch_num_hashes 20"
- "test --sketch_scaled false --sketch_num_hashes_log2 20"
- "test_bam"
- "test_bam --barcodes_file false --rename_10x_barcodes false --save_fastas false --write_barcodes_meta_csv false"
- "test_bam --rename_10x_barcodes false --write_barcodes_meta_csv false"
- "test_bam --skip_sig_merge"
- "test_bam --write_barcodes_meta_csv false"
- "test_bam --barcodes_file false --rename_10x_barcodes false"
- "test_bam --rename_10x_barcodes false"
- "test_fastas"
- "test_constitutive_from_download_refseq"
- "test_constitutive_from_fasta"
- "test_constitutive_from_sig"
- "test_protein_fastas"
- "test_remove_ribo"
- "test_sig_merge"
Expand Down
13 changes: 12 additions & 1 deletion Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -4,14 +4,25 @@ LABEL authors="Olga Botvinnik" \

# Install the conda environment
COPY environment.yml /
RUN conda env create --quiet -f /environment.yml && conda clean -a
RUN conda install -c conda-forge mamba
RUN mamba env create -f /environment.yml && mamba clean -a

# Add conda installation dir to PATH (instead of doing 'conda activate')
ENV PATH /opt/conda/envs/nf-core-kmermaid-0.1.0dev/bin:$PATH

# Dump the details of the installed packages to a file for posterity
RUN conda env export --name nf-core-kmermaid-0.1.0dev > nf-core-kmermaid-0.1.0dev.yml

# Install super fast rust code to remove nuisance hashes (e.g. ribosomal) from signatures
RUN git clone -b olgabot/mut-warning https://github.com/olgabot/2021-01-27-olga-remove-protein.git
# Soft link all conda C-related libraries to their non-prefixed name
# for rust to be able to build the C libraries
RUN for f in $(ls /opt/conda/envs/nf-core-kmermaid-0.1.0dev/bin/x86_64-conda_cos6-linux-gnu*); \
do g=$(echo $f | sed 's:x86_64-conda_cos6-linux-gnu-::') ; echo $g; ln -s $f $g ; done
RUN cd 2021-01-27-olga-remove-protein && cargo build --release
# Add "subtract" command to path
ENV PATH $HOME/2021-01-27-olga-remove-protein/target/release:$PATH

# Instruct R processes to use these empty files instead of clashing with a local version
RUN touch .Rprofile
RUN touch .Renviron
44 changes: 44 additions & 0 deletions bin/filter_fasta_regex.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
#!/usr/bin/env python

import argparse
import re


import screed


def write_records_to_fasta(records, fasta):
with open(fasta, "w") as f:
for record in records:
f.write(f'>{record["name"]}\n{record["sequence"]}\n')


def filter_records(fasta, pattern):
filtered_records = []
with screed.open(fasta) as records:
for record in records:
name = record["name"]
if re.findall(pattern, name, flags=re.I):
filtered_records.append(record)
return filtered_records


def filter_fasta_with_regex(fasta_to_filter, out_fasta, regex):
record_subset = filter_records(fasta_to_filter, regex)
write_records_to_fasta(record_subset, out_fasta)


if __name__ == "__main__":
parser = argparse.ArgumentParser(
description="""Extract sequences whose names match a pattern"""
)
parser.add_argument("--input-fasta", type=str, help="Sequence file to filter")
parser.add_argument("--output-fasta", type=str, help="File to write")
parser.add_argument(
"--regex-pattern",
type=str,
help="Regular expression pattern to match for the names of seuqences in the file",
)
args = parser.parse_args()

filter_fasta_with_regex(args.input_fasta, args.output_fasta, args.regex_pattern)
10 changes: 7 additions & 3 deletions bin/scrape_software_versions.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,8 +14,10 @@
"SKA": ["v_ska.txt", r"SKA Version: (\S+)"],
"htslib": ["v_samtools.txt", r"htslib (\S+)"],
"Sourmash": ["v_sourmash.txt", r"sourmash (\S+)"],
"SortMeRNA": ["v_sortmerna.txt", r"SortMeRNA version (\S+),"],
"Rsync": ["v_rsync.txt", r"rsync version (\S+)"],
"Rsync (Protocol)": ["v_rsync.txt", r"protocol version (\S+)"],
"orpheum": ["v_orpheum.txt", r"Version: (\S+)"],
"Python": ["v_python.txt", r"Python (\S+)"],
}
results = OrderedDict()
results["nf-core/kmermaid"] = '<span style="color:#999999;">N/A</span>'
Expand All @@ -25,11 +27,13 @@
results["bam2fasta"] = '<span style="color:#999999;">N/A</span>'
results["fastp"] = '<span style="color:#999999;">N/A</span>'
results["htslib"] = '<span style="color:#999999;">N/A</span>'
results["orpheum"] = '<span style="color:#999999;">N/A</span>'
results["Python"] = '<span style="color:#999999;">N/A</span>'
results["Rsync"] = '<span style="color:#999999;">N/A</span>'
results["Rsync (Protocol)"] = '<span style="color:#999999;">N/A</span>'
results["Samtools"] = '<span style="color:#999999;">N/A</span>'
results["SKA"] = '<span style="color:#999999;">N/A</span>'
results["Sourmash"] = '<span style="color:#999999;">N/A</span>'
results["SortMeRNA"] = '<span style="color:#999999;">N/A</span>'
results["orpheum"] = '<span style="color:#999999;">N/A</span>'

# Search each file using its regex
for k, v in regexes.items():
Expand Down
2 changes: 1 addition & 1 deletion bin/validate_sketch_value.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ def get_sketch_value(value, value_log2):
if "," in value:
logger.exception(
f"Can only provide a single number to --sketch_num_hashes or"
f" --sketch_scaled. Provided '{value}"
f" --sketch_scaled. Provided '{value}'"
)
sketch_value = int(value)
else:
Expand Down
1 change: 1 addition & 0 deletions conf/base.config
Original file line number Diff line number Diff line change
Expand Up @@ -54,6 +54,7 @@ process {

withName: 'multiqc|get_software_versions' {
memory = { check_max( 2.GB * task.attempt, 'memory' ) }
errorStrategy = "ignore"
cache = false
}
withName: 'sourmash_compute_sketch_fastx_nucleotide|sourmash_compute_sketch_fastx_peptide' {
Expand Down
5 changes: 4 additions & 1 deletion conf/test.config
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,6 @@ params {
// Input data
// samples = 'testing/samples.csv'
// fastas = 'testing/fastas/*.fasta'
sketch_scaled = 2
molecules = 'dna,protein,dayhoff'
// read_pairs = 'testing/fastqs/*{1,2}.fastq.gz'
// sra = "SRP016501"
Expand All @@ -29,4 +28,8 @@ params {
['SRR4238351', ['https://github.com/nf-core/test-datasets/raw/rnaseq/testdata/SRR4238351_subsamp.fastq.gz']],
['SRR4238355', ['https://github.com/nf-core/test-datasets/raw/rnaseq/testdata/SRR4238355_subsamp.fastq.gz']],
]
// Remove constitutively expressed genes
test_mini_refseq_download = true
constitutive_protein_sig = "https://github.com/czbiohub/test-datasets/raw/olgabot/kmermaid--housekeeping-fasta/reference/vertebrate_mammalian--205--2021-03-15.protein.fa__only_constitutive_genes.fa__molecule-protein,dayhoff__ksize-21,30,51__scaled-10__track_abundance-true.sig"
constitutive_rna_sig = "https://github.com/czbiohub/test-datasets/raw/olgabot/kmermaid--housekeeping-fasta/reference/vertebrate_mammalian--205--2021-03-15.rna.fa__only_constitutive_genes.fa__molecule-dna__ksize-21,30,51__scaled-10__track_abundance-true.sig"
}
3 changes: 2 additions & 1 deletion conf/test_bam.config
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,6 @@ params {
'https://github.com/nf-core/test-datasets/raw/olgabot/kmermaid--bam-unique-names/testdata/mouse_lung.bam',
'https://github.com/nf-core/test-datasets/raw/olgabot/kmermaid--bam-unique-names/testdata/mouse_brown_fat_ptprc_plus_unaligned.bam']
// Sketch Parameters
sketch_scaled = 2
molecules = 'dna,protein,dayhoff'
read_pairs = false
save_fastas = "fastas"
Expand All @@ -28,4 +27,6 @@ params {
// For bam, each fasta record represents each barcode and each should have a signature
// they should not be merged, For computation on bam file using sourmash, please set true for the below flag
tenx_min_umi_per_cell = 2
constitutive_protein_sig = "https://github.com/czbiohub/test-datasets/raw/olgabot/kmermaid--housekeeping-fasta/reference/vertebrate_mammalian--205--2021-03-15.protein.fa__only_constitutive_genes.fa__molecule-protein,dayhoff__ksize-21,30,51__scaled-10__track_abundance-true.sig"
constitutive_rna_sig = "https://github.com/czbiohub/test-datasets/raw/olgabot/kmermaid--housekeeping-fasta/reference/vertebrate_mammalian--205--2021-03-15.rna.fa__only_constitutive_genes.fa__molecule-dna__ksize-21,30,51__scaled-10__track_abundance-true.sig"
}
32 changes: 32 additions & 0 deletions conf/test_constitutive_from_download_refseq.config
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
/*
* -------------------------------------------------
* Nextflow config file for running tests
* -------------------------------------------------
* Defines bundled input files and everything required
* to run a fast and simple test. Use as follows:
* nextflow run nf-core/kmermaid -profile test
*/

params {
config_profile_name = 'Test profile'
config_profile_description = 'Minimal test dataset to check pipeline function'
// Limit resources so that this can run on Travis
max_cpus = 2
max_memory = 6.GB
max_time = 48.h

// Input data
input_paths = [
['SRR4050379', ['https://github.com/nf-core/test-datasets/raw/kmermaid/testdata/SRR4050379_pass_1.fastq.gz',
'https://github.com/nf-core/test-datasets/raw/kmermaid/testdata/SRR4050379_pass_2.fastq.gz']],
['SRR4050380', ['https://github.com/nf-core/test-datasets/raw/kmermaid/testdata/SRR4050380_pass_1.fastq.gz',
'https://github.com/nf-core/test-datasets/raw/kmermaid/testdata/SRR4050380_pass_2.fastq.gz']],
['SRR4238351', ['https://github.com/nf-core/test-datasets/raw/rnaseq/testdata/SRR4238351_subsamp.fastq.gz']],
['SRR4238355', ['https://github.com/nf-core/test-datasets/raw/rnaseq/testdata/SRR4238355_subsamp.fastq.gz']],
]

// "Other" is the smallest refseq taxonomy subdirectory: ftp://ftp.ncbi.nlm.nih.gov/refseq/release/other/
// Protein fasta is 453 B
refseq_taxonomy = 'vertebrate_mammalian'
test_mini_refseq_download = true
}
32 changes: 32 additions & 0 deletions conf/test_constitutive_from_fasta.config
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
/*
* -------------------------------------------------
* Nextflow config file for running tests
* -------------------------------------------------
* Defines bundled input files and everything required
* to run a fast and simple test. Use as follows:
* nextflow run nf-core/kmermaid -profile test
*/

params {
config_profile_name = 'Test profile'
config_profile_description = 'Minimal test dataset to check pipeline function'
// Limit resources so that this can run on Travis
max_cpus = 2
max_memory = 6.GB
max_time = 48.h

// Input data
input_paths = [
['SRR4050379', ['https://github.com/nf-core/test-datasets/raw/kmermaid/testdata/SRR4050379_pass_1.fastq.gz',
'https://github.com/nf-core/test-datasets/raw/kmermaid/testdata/SRR4050379_pass_2.fastq.gz']],
['SRR4050380', ['https://github.com/nf-core/test-datasets/raw/kmermaid/testdata/SRR4050380_pass_1.fastq.gz',
'https://github.com/nf-core/test-datasets/raw/kmermaid/testdata/SRR4050380_pass_2.fastq.gz']],
['SRR4238351', ['https://github.com/nf-core/test-datasets/raw/rnaseq/testdata/SRR4238351_subsamp.fastq.gz']],
['SRR4238355', ['https://github.com/nf-core/test-datasets/raw/rnaseq/testdata/SRR4238355_subsamp.fastq.gz']],
]
constitutive_protein_fasta = "https://github.com/czbiohub/test-datasets/raw/olgabot/kmermaid--housekeeping-fasta/reference/vertebrate_mammalian--205--2021-03-15.protein.fa__only_constitutive_genes.fa.gz"
constitutive_rna_fasta = "https://github.com/czbiohub/test-datasets/raw/olgabot/kmermaid--housekeeping-fasta/reference/vertebrate_mammalian--205--2021-03-15.rna.fa__only_constitutive_genes.fa.gz"

translate_proteome_fasta = 'https://github.com/nf-core/test-datasets/raw/kmermaid/reference/gencode.v32.pc_translations.subsample5.randomseed0.fa'
bloomfilter_tablesize = '1e6'
}
29 changes: 29 additions & 0 deletions conf/test_constitutive_from_sig.config
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
/*
* -------------------------------------------------
* Nextflow config file for running tests
* -------------------------------------------------
* Defines bundled input files and everything required
* to run a fast and simple test. Use as follows:
* nextflow run nf-core/kmermaid -profile test
*/

params {
config_profile_name = 'Test profile'
config_profile_description = 'Minimal test dataset to check pipeline function'
// Limit resources so that this can run on Travis
max_cpus = 2
max_memory = 6.GB
max_time = 48.h

// Input data
input_paths = [
['SRR4050379', ['https://github.com/nf-core/test-datasets/raw/kmermaid/testdata/SRR4050379_pass_1.fastq.gz',
'https://github.com/nf-core/test-datasets/raw/kmermaid/testdata/SRR4050379_pass_2.fastq.gz']],
['SRR4050380', ['https://github.com/nf-core/test-datasets/raw/kmermaid/testdata/SRR4050380_pass_1.fastq.gz',
'https://github.com/nf-core/test-datasets/raw/kmermaid/testdata/SRR4050380_pass_2.fastq.gz']],
['SRR4238351', ['https://github.com/nf-core/test-datasets/raw/rnaseq/testdata/SRR4238351_subsamp.fastq.gz']],
['SRR4238355', ['https://github.com/nf-core/test-datasets/raw/rnaseq/testdata/SRR4238355_subsamp.fastq.gz']],
]
constitutive_protein_sig = "https://github.com/czbiohub/test-datasets/raw/olgabot/kmermaid--housekeeping-fasta/reference/vertebrate_mammalian--205--2021-03-15.protein.fa__only_constitutive_genes.fa__molecule-protein,dayhoff__ksize-21,30,51__scaled-10__track_abundance-true.sig"
constitutive_rna_sig = "https://github.com/czbiohub/test-datasets/raw/olgabot/kmermaid--housekeeping-fasta/reference/vertebrate_mammalian--205--2021-03-15.rna.fa__only_constitutive_genes.fa__molecule-dna__ksize-21,30,51__scaled-10__track_abundance-true.sig"
}
3 changes: 2 additions & 1 deletion conf/test_fastas.config
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,6 @@ params {
// Input data
// samples = 'testing/samples.csv'
// fastas = 'testing/fastas/*.fasta'
sketch_scaled = 2
molecules = 'dna,protein,dayhoff'
// read_pairs = 'testing/fastqs/*{1,2}.fastq.gz'
// sra = "SRP016501"
Expand All @@ -26,4 +25,6 @@ params {
['SRR4050380', ['https://github.com/nf-core/test-datasets/raw/olgabot/kmermaid--bam-unique-names/testdata/SRR4050380_pass_concatenated.fasta']],

]
constitutive_protein_sig = "https://github.com/czbiohub/test-datasets/raw/olgabot/kmermaid--housekeeping-fasta/reference/vertebrate_mammalian--205--2021-03-15.protein.fa__only_constitutive_genes.fa__molecule-protein,dayhoff__ksize-21,30,51__scaled-10__track_abundance-true.sig"
constitutive_rna_sig = "https://github.com/czbiohub/test-datasets/raw/olgabot/kmermaid--housekeeping-fasta/reference/vertebrate_mammalian--205--2021-03-15.rna.fa__only_constitutive_genes.fa__molecule-dna__ksize-21,30,51__scaled-10__track_abundance-true.sig"
}
4 changes: 3 additions & 1 deletion conf/test_full.config
Original file line number Diff line number Diff line change
Expand Up @@ -12,10 +12,12 @@ params {
config_profile_description = 'Full test dataset to check pipeline function'

// Input data for full size test
sketch_scaled = 2
molecules = 'dna,protein,dayhoff'
input_paths = [
['GM12878', ['ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR319/007/SRR3192657/SRR3192657_1.fastq.gz,ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR319/007/SRR3192657/SRR3192657_2.fastq.gz','ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR319/008/SRR3192658/SRR3192658_1.fastq.gz,ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR319/008/SRR3192658/SRR3192658_2.fastq.gz']],
['K562', ['ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR319/008/SRR3192408/SRR3192408_1.fastq.gz,ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR319/008/SRR3192408/SRR3192408_2.fastq.gz', 'ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR319/009/SRR3192409/SRR3192409_1.fastq.gz,ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR319/009/SRR3192409/SRR3192409_2.fastq.gz']]
]
constitutive_protein_sig = "https://github.com/czbiohub/test-datasets/raw/olgabot/kmermaid--housekeeping-fasta/reference/vertebrate_mammalian--205--2021-03-15.protein.fa__only_constitutive_genes.fa__molecule-protein,dayhoff__ksize-21,30,51__scaled-10__track_abundance-true.sig"
constitutive_rna_sig = "https://github.com/czbiohub/test-datasets/raw/olgabot/kmermaid--housekeeping-fasta/reference/vertebrate_mammalian--205--2021-03-15.rna.fa__only_constitutive_genes.fa__molecule-dna__ksize-21,30,51__scaled-10__track_abundance-true.sig"

}
4 changes: 2 additions & 2 deletions conf/test_protein_fastas.config
Original file line number Diff line number Diff line change
Expand Up @@ -26,8 +26,8 @@ params {
['https://github.com/czbiohub/test-datasets/raw/predictorthologs/testdata/bonobo_liver_ptprc__molecule-dayhoff__coding_reads_peptides.fasta']]]

// Sketch Parameters
sketch_scaled = 2
molecules = 'protein,dayhoff,hp'
read_pairs = false

constitutive_protein_sig = "https://github.com/czbiohub/test-datasets/raw/olgabot/kmermaid--housekeeping-fasta/reference/vertebrate_mammalian--205--2021-03-15.protein.fa__only_constitutive_genes.fa__molecule-protein,dayhoff__ksize-21,30,51__scaled-10__track_abundance-true.sig"
constitutive_rna_sig = "https://github.com/czbiohub/test-datasets/raw/olgabot/kmermaid--housekeeping-fasta/reference/vertebrate_mammalian--205--2021-03-15.rna.fa__only_constitutive_genes.fa__molecule-dna__ksize-21,30,51__scaled-10__track_abundance-true.sig"
}
3 changes: 2 additions & 1 deletion conf/test_remove_ribo.config
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,6 @@ params {
// Input data
// samples = 'testing/samples.csv'
// fastas = 'testing/fastas/*.fasta'
sketch_scaled = 2
molecules = 'dna,protein,dayhoff'
// read_pairs = 'testing/fastqs/*{1,2}.fastq.gz'
// sra = "SRP016501"
Expand All @@ -31,4 +30,6 @@ params {
['SRR4238351', ['https://github.com/nf-core/test-datasets/raw/rnaseq/testdata/SRR4238351_subsamp.fastq.gz']],
['SRR4238355', ['https://github.com/nf-core/test-datasets/raw/rnaseq/testdata/SRR4238355_subsamp.fastq.gz']],
]
constitutive_protein_sig = "https://github.com/czbiohub/test-datasets/raw/olgabot/kmermaid--housekeeping-fasta/reference/vertebrate_mammalian--205--2021-03-15.protein.fa__only_constitutive_genes.fa__molecule-protein,dayhoff__ksize-21,30,51__scaled-10__track_abundance-true.sig"
constitutive_rna_sig = "https://github.com/czbiohub/test-datasets/raw/olgabot/kmermaid--housekeeping-fasta/reference/vertebrate_mammalian--205--2021-03-15.rna.fa__only_constitutive_genes.fa__molecule-dna__ksize-21,30,51__scaled-10__track_abundance-true.sig"
}
3 changes: 2 additions & 1 deletion conf/test_sig_merge.config
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,6 @@ params {
bam = ['https://github.com/nf-core/test-datasets/raw/olgabot/kmermaid--bam-unique-names/testdata/mouse_lung.bam',
'https://github.com/nf-core/test-datasets/raw/olgabot/kmermaid--bam-unique-names/testdata/mouse_brown_fat_ptprc_plus_unaligned.bam']
// Sketch Parameters
sketch_scaled = 2
molecules = 'dna,protein,dayhoff'
read_pairs = false
save_fastas = "fastas"
Expand All @@ -29,4 +28,6 @@ params {

reference_proteome_fasta = 'https://github.com/nf-core/test-datasets/raw/kmermaid/reference/ptprc_bam_translations.fa'
bloomfilter_tablesize = '1e6'
constitutive_protein_sig = "https://github.com/czbiohub/test-datasets/raw/olgabot/kmermaid--housekeeping-fasta/reference/vertebrate_mammalian--205--2021-03-15.protein.fa__only_constitutive_genes.fa__molecule-protein,dayhoff__ksize-21,30,51__scaled-10__track_abundance-true.sig"
constitutive_rna_sig = "https://github.com/czbiohub/test-datasets/raw/olgabot/kmermaid--housekeeping-fasta/reference/vertebrate_mammalian--205--2021-03-15.rna.fa__only_constitutive_genes.fa__molecule-dna__ksize-21,30,51__scaled-10__track_abundance-true.sig"
}
5 changes: 2 additions & 3 deletions conf/test_tenx_tgz.config
Original file line number Diff line number Diff line change
Expand Up @@ -20,14 +20,13 @@ params {
'https://github.com/nf-core/test-datasets/raw/olgabot/kmermaid-unaligned-tgz-v3/testdata/mouse_brown_fat_ptprc_plus_unaligned.tgz'
]
// Sketch Parameters
sketch_scaled = 2
molecules = 'dna,protein,dayhoff'
read_pairs = false
save_fastas = "fastas"
save_intermediate_files = "/tmp/"
write_barcode_meta_csv = "metadata.csv"
// For bam, each fasta record represents each barcode and each should have a signature
// they should not be merged, For computation on bam file using sourmash, please set true for the below flag
tenx_min_umi_per_cell = 10
shard_size = 350
constitutive_protein_sig = "https://github.com/czbiohub/test-datasets/raw/olgabot/kmermaid--housekeeping-fasta/reference/vertebrate_mammalian--205--2021-03-15.protein.fa__only_constitutive_genes.fa__molecule-protein,dayhoff__ksize-21,30,51__scaled-10__track_abundance-true.sig"
constitutive_rna_sig = "https://github.com/czbiohub/test-datasets/raw/olgabot/kmermaid--housekeeping-fasta/reference/vertebrate_mammalian--205--2021-03-15.rna.fa__only_constitutive_genes.fa__molecule-dna__ksize-21,30,51__scaled-10__track_abundance-true.sig"
}
Loading