Calculate length of all sequences in an multi-fasta file

Sometimes it is essential to know the length distribution of your sequences. It may be your newly assembled scaffolds or it might be a genome, that you wish to know the size of chromosomes, or it could just be any multi fasta sequence file. A simple way to do it is using biopython.

For example save this script as seq_length.py

#!/usr/bin/python
from Bio import SeqIO
import sys
cmdargs = str(sys.argv)
for seq_record in SeqIO.parse(str(sys.argv[1]), "fasta"):
 output_line = '%s\t%i' % \
(seq_record.id, len(seq_record))
 print(output_line)

To run,

chmod +x seq_length.py
seq_length.py inpput_file.fasta

This will print length for all the sequences in that file.