title

layout

header

Introduction to Data Wrangling

single

overlay_color	overlay_image
444444	/assets/images/dna.jpg

Calculate Sequence Length (fasta)

Sometimes it is essential to know the length distribution of your sequences. It may be your newly assembled scaffolds or it might be a genome, that you wish to know the size of chromosomes, or it could just be any multi fasta sequence file.

1. Using biopython

Save this as a script, make it an executable and run on a fasta file:

#!/usr/bin/python
from Bio import SeqIO
import sys
cmdargs = str(sys.argv)
for seq_record in SeqIO.parse(str(sys.argv[1]), "fasta"):
 output_line = '%s\t%i' % \
(seq_record.id, len(seq_record))
 print(output_line)

To run:

chmod +x seq_length.py
seq_length.py input_file.fasta

This will print length for all the sequences in that file.

2. Using bioawk

Bioawk is an extension of the awk written by Heng Li. It is available to donwload from this link. Installation is easy too. To get sequence length, run it as:

bioawk -c fastx '{print $name length($seq)}' input.fasta

Output will be similar to the above script and can be redicrected to any file if you want.

More information

Introduction to Bioawk

Table of contents

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

calculate-sequence-lengths-in-a-fasta-file.md

calculate-sequence-lengths-in-a-fasta-file.md

Calculate Sequence Length (fasta)

1. Using biopython

2. Using bioawk

More information

Files

calculate-sequence-lengths-in-a-fasta-file.md

Latest commit

History

calculate-sequence-lengths-in-a-fasta-file.md

File metadata and controls

Calculate Sequence Length (fasta)

1. Using biopython

2. Using bioawk

More information