Convert FASTQ to FASTA

Using SED

sed -n '1~4s/^@/>/p;2~4p' INFILE.fastq > OUTFILE.fasta

NOTE: This is, by far, fastest way to convert FASTQ to FASTA

Using PASTE

cat INFILE.fastq | paste - - - - | \\
cut -f 1, 2| sed 's/@/>/'g | \\
tr -s "/t" "/n" > OUTFILE.fasta

EMBOSS:seqret

seqret -sequence reads.fastq -outseq reads.fasta

Using AWK

cat infile.fq | \\
awk '{if(NR%4==1) {printf(">%s\n",substr($0,2));} else if(NR%4==2) print;}' > file.fa

FASTX-toolkit

fastq_to_fasta -h
usage: fastq_to_fasta [-h] [-r] [-n] [-v] [-z] [-i INFILE] [-o OUTFILE]
# Remember to use -Q33 for illumina reads!
version 0.0.6
       [-h]         = This helpful help screen.
       [-r]         = Rename sequence identifiers to numbers.
       [-n]         = keep sequences with unknown (N) nucleotides.
                    Default is to discard such sequences.
       [-v]         = Verbose - report number of sequences.
                    If [-o] is specified, report will be printed to STDOUT.
                    If [-o] is not specified (and output goes to STDOUT),
                    report will be printed to STDERR.
       [-z]         = Compress output with GZIP.
       [-i INFILE]  = FASTA/Q input file. default is STDIN.
       [-o OUTFILE] = FASTA output file. default is STDOUT.

Split multi-fasta sequence file

Sometimes it is necessary to split a large file containing several sequences (fasta format) in to individual files. I do this by a simple ‘awk’ command where i separate sequences based on regular expression match and then write it to a file numbered sequentially. It is easy and quick!

awk '/^>/{s=++d".fasta"} {print > s}' <inputFile>

Bash loops for productivity

Working in bash is so much fun! If you spend enough time in terminal, then you might get addicted to it and never like the gui windows. There are commands (especially loops) that can save you lot of time. They are very useful to do some routine stuff. My favorite loops are as follows:
Loops through all the files with txt extension and performs the action

for f in *.txt; do yourcommand $f >$f.out; done

Read file line by line and run command on it

while read line; do yourcommand $line; done<FileToRead.txt

Other variation of this above command (extremely useful when you have to read arguments from a file:

while read fld1 fld2 fld3; do YourCommand -a $fld1 -b $fld2 -c $fld3 > $fld1.$fld2.$fld3.txt; done<FileToRead.txt

A simple loop for a set of numbers (you can also use {a..z} etc., or mix them)

for i in {1..10}; do echo $i; done

Another variation of the above command

for i in {1..22} X Y; do echo "human chromosome $i"; done

I hope these will help you too!

 

Serving HTTP from your Linux terminal

There is a very useful python module called SimpleHTTPServer that lets you create a webserver instantly. Say you have a huge file (50-60gb) and you want to send it to someone across the campus, then you have a use for SimpleHTTPserver.

First get to know the hostname of your server, just type hostname and press enter, it will return yourhostname

Then type,

python -m SimpleHTTPServer

That’s it! you can now access all your files in that directory through browser using this url:

http://yourhostname:8000/

It will stay as long as you let that command run (to kill it just type Ctrl + C).