Calculating moving average in R

When the raw data obtained from an experiment is too noisy and you need to smooth-en it to better represent the trend, you need to calculate the moving average . Moving average is nothing but average of n previous numbers, with a specific step size.  Let me give an example:  if there are 100 numbers, then moving average is calculated by averaging 1-15, 2-16, 3-17 and so on.  Here, the n is 15 and step size is 1.

 

The R script to do this:

datain <- read.table("input.txt", header=1)
field2 = datain[,2]
coef15 = 1/15
mvavg15 = filter(field2, rep(coef15, 15), sides=1)
jpeg('input.jpg')
plot(mvavg15, type="l", main="Plot Title", xlab="X label", ylab="Y label")
dev.off()

Here, the data is assumed to be 2 column, first with serial number and second with value.

Making an never ending ‘history’ file

One useful feature in bash is that you can recall previously used commands (using arrow keys). You can also recursively search your history by pressing Ctrl + r and typing the command name, which brings up the matching commands. It will also lets you cycle through all matching commands by pressing Ctrl + r repeatedly. This is only helpful, if your history file is big. By default, $HISTFILE holds only limited number of entries (1000 lines or commands). You can easily hack it, so that you can store unlimited number of entries. Simply follow these steps:
First, in your .bashrc file, set these variables

export HISTFILESIZE=
export HISTSIZE=
export HISTFILE=~/.bash_eternal_history

This will make your history file unlimited! Other useful feature that you can use is setting $HISTTIMEFORMAT variable. This will write time stamps in the history file, marked with the history comment character, so they may be preserved across shell sessions.

export HISTTIMEFORMAT="[%F %T] "

Now start making a never ending history!

PBS: How to submit jobs that depend on previously submitted jobs?

To submit jobs one after the other (i.e., run second job after the completion of first), we can use depend function of qsub

First submit the firstjob, like normal

qsub first_job.sub

You will get the output (jobid#)

1234567.hpc

Second submit the second job following way,

qsub -W depend=afterok:1234567 second_job.sub

Both job will be queued, but second job won’t start till the first job is finished

You can also automate this step using a simple bash script

#!/bin/bash
FIRST=$(qsub first_job.sub)
SECOND=$(qsub -W depend=afterok:$FIRST second_job.sub)
THIRD=$(qsub -W depend=afterok:$SECOND third_job.sub)
FOURTH=$(qsub -W depend=afterok:$THIRD fourth_job.sub)

 

Simple script to count the number of reads in FASTQ file

If you want to quickly count the number of reads in a fastq file, you can count the total number of line and divide them by 4. However, when you want to generate a nice table for the reads when writing a report, this will be a inconvenience. So, here is my simple bash script that does this job

#!/bin/bash
if [ $# -lt 1 ] ; then
	echo ""
	echo "usage: count_fastq.sh [fastq_file1] <fastq_file2> ..|| *.fastq"
	echo "counts the number of reads in a fastq file"
	echo ""
	exit 0
fi

filear=${@};
for i in ${filear[@]}
do
lines=$(wc -l $i|cut -d " " -f 1)
count=$(($lines / 4))
echo -n -e "\t$i : "
echo "$count"  | \
sed -r '
  :L
  s=([0-9]+)([0-9]{3})=\1,\2=
  t L'
done