MACHETE

More Accurate vs Mismatched Alignment CHimEra Tracking Engine

MACHETE is a fusion detection software that is in development.

PREREQUISITE SOFTWARE:

KNIFE - https://github.com/lindaszabo/KNIFE
R version 3.0.2 - or a later version but with the package "data.table" installed
Bowtie2
python version 2.7.5
SLURM job scheduler

INSTRUCTIONS BEFORE YOU RUN

Paired fastq files must named identically and end in _1.fq and _2.fq Example: ACCEPTABLE -- MySample_1.fq, MySample_2.fq UNACCEPTABLE -- MySample1_001.fq, MySample2_001.fq

2.Generate or download a directory of necessary pickles Pickles are a method of storing serialized data in Python. We use pickles to store annotated exon information.

A: Generating the pickles directory from a gtf and fasta file Use makeExonDB.py to create the pickles directory. The OutputDirectory is the pickles directory. usage: python makeExonDB.py -f <genome.fasta> -a <genome.gtf> -o

This can be adapted to any build of any genome. The downloadable version is the hg19 genome Genome.fasta is the name of the fasta file used, ex: hg19_genome.fasta Genome.gtf is the name of the gtf file used, ex: hg19_genome.gtf OutputDirectory is a path to the pickle directory which is chosen by the user

B: Downloading the pickles directory At Stanford -- For Sherlock users who are using the HG19 genome, copy or point MACHETE at this directory "/scratch/PI/horence/gillian/HG19exons/" Outside of Stanford - please email [email protected]. The Pickles directory is too large to fit on GitHub.

Generate or download an index of indels made from the KNIFE linear junction index. Generate the index. The RegIndelsIndices will be created in a subfolder of the OutputDirectory called “IndelIndices”. Run MakeRegIndelsIndex.sh using the following command

Sh MakeRegIndelsIndex.sh <# indels desired> <Resource flag (optional)>

Linear junctions fasta is the path to the fasta file containing all linear junctions that is created for KNIFE Output directory is the path to a directory where the user plans to store linear junction indels

indels desired is the integer number of indels that will be used in searching for alignment artifact. We have chosen 5

Genome is the name of the genome that is being used. This is only used to name output files. For example, if “HG19” is entered, output files will be named HG19_reg_indels_1.fa, etc. Resource flag is an optional field for users of the Stanford SLURM network to specify which queue should be requested, eg “-p owners” or “-p horence” or can be left blank.

Download the index At Stanford -- for Sherlock users who are using the HG19 genome, copy or point MACHETE at this directory “/scratch/PI/horence/gillian/HG19_reg_indels/IndelIndices/” Outside of Stanford - please email [email protected]. The RegIndels directory is too large to fit on GitHub

PREPARING THE SHELL SCRIPT Open createFarJunctions_SLURM.sh

line40 - change your INSTALLDIR to the full path to the MACHETE script. line42 - change CIRCREF to the path to the reference libraries generated for KNIFE e.g. directory that contains hg19_genome, hg19_transcriptome, hg19_junctions_reg and hg19_junctions_scrambled bowtie indices. line44 - change REGINDELINDICES to the directory above under “INSTRUCTIONS BEFORE YOU RUN”, bullet #3. This path should end with “IndelIndices” line59 - change PICKLEDIR to the directory above under “INSTRUCTIONS BEFORE YOU RUN”, bullet #2

USING A DIFFERENT GENOME OTHER THAN HG19 Change line 44 REGINDELINDICES to the chosen new genome linear junction indices that were generated in “INSTRUCTIONS BEFORE YOU RUN”, bullet #3. either change or duplicate lines 57 to lines 60. Choose a genome name to replace “HG19”, and change the PICKLEDIR as needed.
Either change or duplicate lines 200-206. Choose a genome name to replace “HG19” and change the paths to the KNIFE generated reference indices to reflect the genome change.

RUNNING MACHETE:

First run KNIFE script completely to generate linear and scrambled junction reports and alignments.

sh createFarJunctions_SLURM.sh <1. KNIFE parent directory> <2. output directory> <3. discordant read distance> <4. ref genome> <5. #indels to use> <6. special queue>

KNIFE parent directory - contains output from the KNIFE algorithm. This directory is the path to the directory that contains "circReads", "orig", "logs", "sampleStats"
Output directory - if not already existing, will be created.
discordant read distance -- For testing, we have used 100000 base pairs to identify paired end reads that aligned discordantly.
"HG19" is the only option currently. If you are have created HG38 or another organism, pickles and a linear junction index must be generated above, instead of downloaded. Additionally, lines 200-204 can be changed to point at KNIFE reference indices for that organism.
Currently using KNIFE convention of "8" for files with read lengths < 70 and "13" for files with read lengths > 70
-- "owners" if you want to run in owners queue, otherwise leave #6 blank

Example command: sh createFarJunctions_SLURM.sh /scratch/PI/horence/alignments/EWS_FLI_bigmem/ /scratch/PI/horence/alignments/EWS_FLI_bigmem/FarJunc/ 100000 HG19 13 owners

For the an explanation of outputs of MACHETE, please reference our paper, currently in submission.

Name		Name	Last commit message	Last commit date
Latest commit History 43 Commits
SPORK		SPORK
UnusedFiles		UnusedFiles
classinput_ASpriority		classinput_ASpriority
classinput_indexpriority		classinput_indexpriority
AddIndelsToFasta.py		AddIndelsToFasta.py
AddIndelstolinearGLM.sh		AddIndelstolinearGLM.sh
AlignUnalignedtoFJ.sh		AlignUnalignedtoFJ.sh
AlignUnalignedtoRegIndel.sh		AlignUnalignedtoRegIndel.sh
AlphabetizeKNIFEreads.sh		AlphabetizeKNIFEreads.sh
AppendNaiveRept.py		AppendNaiveRept.py
AppendNaiveRept.sh		AppendNaiveRept.sh
BowtieAlignFJIndels.sh		BowtieAlignFJIndels.sh
BowtieAligner.batch.sh		BowtieAligner.batch.sh
BowtieAligner_BadFJv2.sh		BowtieAligner_BadFJv2.sh
BowtieIndexFJIndels.sh		BowtieIndexFJIndels.sh
BowtieIndexer.batch.sh		BowtieIndexer.batch.sh
DistantPE_Counter.py		DistantPE_Counter.py
DistantPE_Counter.sh		DistantPE_Counter.sh
FJIndelsClassID.sh		FJIndelsClassID.sh
FJIndelsNaiveReport.py		FJIndelsNaiveReport.py
FJIndels_ClassIDFile.py		FJIndels_ClassIDFile.py
FarJuncNaiveReport.py		FarJuncNaiveReport.py
FarJuncNaiveReport.sh		FarJuncNaiveReport.sh
FindAlignmentArtifact.py		FindAlignmentArtifact.py
FindAlignmentArtifact_LinearJunc.py		FindAlignmentArtifact_LinearJunc.py
FindAlignmentArtifact_SLURM.sh		FindAlignmentArtifact_SLURM.sh
GLM_script_UseIndel.r		GLM_script_UseIndel.r
KNIFEglmReportsForMachete.py		KNIFEglmReportsForMachete.py
LenientBadFJ_SLURM.sh		LenientBadFJ_SLURM.sh
LinearJuncLigationArtifact.sh		LinearJuncLigationArtifact.sh
MakeIndelFiles.sh		MakeIndelFiles.sh
MakeIndelsHisto.py		MakeIndelsHisto.py
MakeRegIndelsFasta.py		MakeRegIndelsFasta.py
MakeRegIndelsIndex.sh		MakeRegIndelsIndex.sh
PEfinder.py		PEfinder.py
PEfinder.sh		PEfinder.sh
ParseLargeFasta.sh		ParseLargeFasta.sh
README.md		README.md
RegIndelsClassID.sh		RegIndelsClassID.sh
RegIndels_ClassIDFile.py		RegIndels_ClassIDFile.py
SortPairedEnds.sh		SortPairedEnds.sh
SplitFastaforBadFJ.py		SplitFastaforBadFJ.py
Spork_BowtieIndex.sh		Spork_BowtieIndex.sh
createFarJunctions_SLURM.sh		createFarJunctions_SLURM.sh
createFarJunctions_SLURM_part.sh		createFarJunctions_SLURM_part.sh
createFarJunctions_SPORK_SLURM.sh		createFarJunctions_SPORK_SLURM.sh
denovo_pipeline_GH.py		denovo_pipeline_GH.py
filter_large_fasta.py		filter_large_fasta.py
generateSPORKfasta.sh		generateSPORKfasta.sh
linkfastafiles.sh		linkfastafiles.sh
makeExonDB.py		makeExonDB.py
makeJunctions.py		makeJunctions.py
makeJunctions.sh		makeJunctions.sh
old_README.md		old_README.md
parse_FJ_ID_for_GLM.sh		parse_FJ_ID_for_GLM.sh
run_GLM.sh		run_GLM.sh
utils_junction.py		utils_junction.py
utils_os.py		utils_os.py
utils_os.pyc		utils_os.pyc
writeStemIDFiles.py		writeStemIDFiles.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MACHETE

indels desired is the integer number of indels that will be used in searching for alignment artifact. We have chosen 5

About

Releases

Packages

Contributors 2

Languages

salzman-lab/MACHETE

Folders and files

Latest commit

History

Repository files navigation

MACHETE

indels desired is the integer number of indels that will be used in searching for alignment artifact. We have chosen 5

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages