I have a new blog/Website!


I finally made the move and used the GitHub pages to make my new website. I’ll continue posting my blogs/news/hacks on that website. For convenience, I’ve moved all my previous blogs to that new website as well. Please stop by and say hi!

The new website/blog is:


Thank you!



Shortcuts for SSH hosts

Are you tired of typing full length host-names while connecting via SSH? Do you frequently scp files from one server to another and have to lookup what the host-names are? Do you want to rsync between local and remote host easily with a a simple command? Then, read-on.

The hard-way:

# connect
ssh username@clustername.hostdomain.dept.edu
# scp
scp yourfile username@clustername.hostdomain.dept.edu:/path/to/destination/
# rsync
rsync -e 'ssh -c aes128-ctr' -rts your_folder username@clustername.hostdomain.dept.edu:/path/to/destination/

As you can see, if you have a bunch of hosts, it gets really hairy to retype them everytime you want to do any of these things.

The Solution:

Create a config file under the ~/.ssh directory, with the short name for these host-names. Then you can simply connect to the server by using the short name instead of the full host-name!

First, edit the file

vi ~/.ssh/config

and add the details:

Host sweet
  Hostname clustername.hostdomain.dept.edu
  User username
  ForwardX11 yes
Host sugary
  Hostname anothercluster.hostdomain.dept.edu
  User username
  ForwardX11 yes

Set permissions straight:

chmod 600 ~/.ssh/config

Now, have fun! the above commands can now be done using:

# connect
ssh sweet
# scp
scp yourfile sweet:/path/to/destination/
# rsync
rsync -e 'ssh -c aes128-ctr' -rts your_folder sweet:/path/to/destination/

You can read more about the config by opening the man page:

man ssh-config

Hope this trick will make your life little easier!

Enabling variable expansion on Linux HPC

Remember how we can auto complete the bash commands on terminal? and how double tab gives you all available options for a matching pattern? Saves a lot of time typing and searching for a command in Linux and increases your efficiency. It is also possible to do this with the variables, whether env (preset variables) or the custom variables that are initiated by the .bashrc file. To achieve this, simply add these 2 lines in your .bashrc file

shopt -s direxpand
shopt -s cdable_vars

How to move all CPAN installations from one perl version to another?

cpan as a built-in option for this purpose. First, create a bundle of all existing packages. For this, load the older perl version which will be the source for all cpan modules:

module load perl/5.22.1
perl -MCPAN -eautobundle

This will print all the CPAN modules that it puts in the bundle. Once complete, you’ll see:

Wrote bundle file

Now, unload the older perl and load newer perl:

module purge
module load perl/5.24.0

And install modules:

perl -MCPAN -e 'install Bundle::Snapshot_2017_05_09_00'

It should install all CPAN modules in the bundle!

How to prevent SSH from disconnecting if it’s been idle for a while

If you’re using HPC, chances are that your University/Sysadmin has a policy about how long you can stay inactive with the established SSH connection. If you are frustrated with this automatic disconnections, here is a way to prevent this. You can either:

ssh -o "ServerAliveInterval 60" -X username@server.edu

or create a file in ~/.ssh/config with the following line:

ServerAliveInterval 60

This will enable ssh client keepalives. The above line will send an ssh keepalive every 60 seconds that will prevent network devices from considering the session as idle.

Source: https://superuser.com/a/699680/173980

How to insert word like comments in Overleaf/Latex documents?

There is a todo package that can be used for this purpose. Simply add these lines to your document and you can leave comments with \todo commands.


The above package will be used in most documents, so no need to add it if its already there


by default it will be on right, to put it on left (if you don’t have room beacuse of margins) use


For commenting

\todo{this is an example comment}

Comments will be show as follows:

2017-08-17 21_17_08-Overleaf.png

Summary statistics for a fasta file

On my previous post (Calculate length of all sequences in an multi-fasta file) , one of the user commented that it would be useful to have summary statistics for the sequence length distribution or histogram. So, this simple script can be used along with the previous script to get the summary statistics.

# reads stdin and prints summary statistics
# total, count, mean, median, min and max
# you can pipe this through scripts or redirect input <
# modified from http://unix.stackexchange.com/a/13779/27194
sort -n | awk '
  $1 ~ /^[0-9]*(\.[0-9]*)?$/ {
    a[c++] = $1;
    sum += $1;
  END {
    ave = sum / c;
    if( (c % 2) == 1 ) {
      median = a[ int(c/2) ];
    } else {
      median = ( a[c/2] + a[c/2-1] ) / 2;
    { printf ("Total:\t""%'"'"'d\n", sum)}
    { printf ("Count:\t""%'"'"'d\n", c)}
    { printf ("Mean:\t""%'"'"'d\n", ave)}
    { printf ("Median:\t""%'"'"'d\n", median)}
    { printf ("Min:\t""%'"'"'d\n", a[0])}
    { printf ("Max:\t""%'"'"'d\n", a[c-1])}

For using this script, first change the permissions and pipe it through the output of the previous script (seq_length.py, see post: Calculate length of all sequences in an multi-fasta file).

chmod +x summary_stats.sh
seq_length.py input_file.fasta | cut -f 2 |summary_stats.sh

You will see formatted output as follows

Total:  1,825,408
Count:  1,155
Mean:   1,580
Median: 1,360
Min:    1,001
Max:    12,972

For getting a histogram, I suggest using a per-existing package called data_hacks (on github). The installation is fairly simple (see README), can be used with the above scripts. Once you download/installed the package, run the above script to get the min/max values. This can be plugged in to the data_hacks script to get the histogram (you can also save the data to be later plotted using any other ways, too).

seq_length.py input_file.fasta |cut -f 2 | histogram.py --percentage --max=12972 --min=1001

The output you will get is:

# NumSamples = 1155; Min = 1001.00; Max = 12972.00
# Mean = 1580.439827; Variance = 624229.340751; SD = 790.081857; Median 1360.000000
# each ∎ represents a count of 13
 1001.0000 -  2198.1000 [  1019]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎ (88.23%)
 2198.1000 -  3395.2000 [   108]: ∎∎∎∎∎∎∎∎ (9.35%)
 3395.2000 -  4592.3000 [    17]: ∎ (1.47%)
 4592.3000 -  5789.4000 [     6]:  (0.52%)
 5789.4000 -  6986.5000 [     2]:  (0.17%)
 6986.5000 -  8183.6000 [     0]:  (0.00%)
 8183.6000 -  9380.7000 [     1]:  (0.09%)
 9380.7000 - 10577.8000 [     1]:  (0.09%)
10577.8000 - 11774.9000 [     0]:  (0.00%)
11774.9000 - 12972.0000 [     1]:  (0.09%)

I hope this post helps!

Calculating moving average in R

When the raw data obtained from an experiment is too noisy and you need to smooth-en it to better represent the trend, you need to calculate the moving average . Moving average is nothing but average of n previous numbers, with a specific step size.  Let me give an example:  if there are 100 numbers, then moving average is calculated by averaging 1-15, 2-16, 3-17 and so on.  Here, the n is 15 and step size is 1.


The R script to do this:

datain <- read.table("input.txt", header=1)
field2 = datain[,2]
coef15 = 1/15
mvavg15 = filter(field2, rep(coef15, 15), sides=1)
plot(mvavg15, type="l", main="Plot Title", xlab="X label", ylab="Y label")

Here, the data is assumed to be 2 column, first with serial number and second with value.

Making an never ending ‘history’ file

One useful feature in bash is that you can recall previously used commands (using arrow keys). You can also recursively search your history by pressing Ctrl + r and typing the command name, which brings up the matching commands. It will also lets you cycle through all matching commands by pressing Ctrl + r repeatedly. This is only helpful, if your history file is big. By default, $HISTFILE holds only limited number of entries (1000 lines or commands). You can easily hack it, so that you can store unlimited number of entries. Simply follow these steps:
First, in your .bashrc file, set these variables

export HISTSIZE=
export HISTFILE=~/.bash_eternal_history

This will make your history file unlimited! Other useful feature that you can use is setting $HISTTIMEFORMAT variable. This will write time stamps in the history file, marked with the history comment character, so they may be preserved across shell sessions.

export HISTTIMEFORMAT="[%F %T] "

Now start making a never ending history!

Highlight words enclosed within parenthesis in MS-Word

Office 2013 offers robust regular expression matching, that will be very handy to do some basic stuff. One example, I was recently given a 10 page document to edit and incorporate citations. They wanted me to look for the authors (citations) within the text and add correct reference in the “references” section. One problem: there were way too many citations! It was hard for me to distinguish authors and normal text. I decided to highlight all the authors using regular expressions. This is how I did it:


Here the first "\" is escape character, this tells Word to treat the next character "(", as-is i.e., find a opening parenthesis in the document. Next, "(*)" tells Word to find one or more words after the opening parenthesis, and finally "\)" the search pattern ends when it encounters closing parenthesis (requires an escape character). So basically it matches anything within the parenthesis.

Note that you need to have "Use wildcards" checked. For the next part (to highlight), simply click on "Format" button and select "Highlight". When done, just click on "Replace All". You will have all the text in the document, within the parenthesis, highlighted! You can change "\(" and "\)" to other things as well. For eg., "\{" and  "\}" for text within the curly braces, so on.