Split multi-fasta sequence file

Sometimes it is necessary to split a large file containing several sequences (fasta format) in to individual files. I do this by a simple ‘awk’ command where i separate sequences based on regular expression match and then write it to a file numbered sequentially. It is easy and quick!

awk '/^>/{s=++d".fasta"} {print > s}' <inputFile>

9 thoughts on “Split multi-fasta sequence file

  1. Pingback: Bioinformatics one-liners | GenomicsNX - Next Generation Sequencing Knowledge-Based

  2. nice men, let me add an improve in the case that you have a huge multifasta:
    awk ‘/^>/{close(s);s=++d”.fasta”} {print > s}’ multifasta.fasta (or fna, fn, etc)

  3. If I try to use this, I get
    bash: /{s=++d”.fasta”}: Permission denied
    Also it does not mention anywhere into how many pieces it breaks the multi fasta.
    Adding the close(s) gives an error
    bash: syntax error near unexpected token `(‘

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s