This is a cheat sheet for bioinformatics command line programs.
Bioinformatics, Yay!!
Use HOMER for finding motifs in the genome given bed files.
Build a command using Python
# Search for short motifs, lenghts 4,5,6,7 with up to 1 mismatch
n_processors = 4
homer_flags = '-rna -len 4,5,6,7 -mset vertebrates -mis 1 -p {}'.format(n_processors)
findMotifsGenome = '/home/yeo-lab/software/homer/bin/findMotifsGenome.pl'
command = '{} {} hg19 {} -bg {} {}'.format(
findMotifsGenome, bedfile, out_dir, background, homer_flags)
The final command looks like this:
findMotifsGenome.pl peaks.bed out_dir hg19 -bg background.bed -rna -len 4,5,6,7 -mset vertebrates -mis 1 -p 4
Use seqtk (installable via bioconda) to subsample a fastq.gz file down to 1000 reads, using a random seed of 0
.
mkdir subsampled
for F in $(ls *.gz) ;do echo $F ; seqtk sample -s 0 $F 1000 | gzip -c - > subsampled/$F ; done