Annotations and Linux

By einar

September 13, 2006 - Comments

A lot of my bioinformatics work involves performing functional annotation on genes. This means that given lits of genes I need to resolve their known function, or if they’re part or some metabolic pathways and so on. Even with the current trend in our laboratory, that is investigating DNA copy number changes using SNP microarrays (it’s a rather new form of analysis, but some relevant papers are out already), in the end we have to go back to the genes affected by such changes (in order to find interesting/marker genes - we study solid tumors).

For part of my work I use the excellent DAVID 2006 software by the NIAID, which can perform a lot of analyses including functional annotation, clustering, and more. Its output is usually a tab-delimited text file with the fields of interest. But that’s just part of the work I need to do, as I usually have to filter and analyze the lists using different critera (depending on the analyses I do). And that’s when Linux helps me a lot in doing this job.

For example, today I had a lists of annotated genes, around 300, and I wanted to filter the list so that only the ones including the Entrez Gene description. I know that DAVID had the sixth column as the description identifier. So all I had to do was:

[code] awk ’ BEGIN { FS="\t"} { if ( $6 !="") { print }} ’ infile > outfile [/code]

$6 is the 6th field, which corresponds to the Entrez Gene description in the text file I obtained from DAVID. This line of code will print all the lines where $6 is not empty. Once I did that, I obtained all the genes where the description was available. However, there were still fields I wouldn’t need:

[code] awk ’ BEGIN { FS="\t"}{ print $1"\t"$2"\t"$4"\t"$6 } ’ infile > outfile [/code]

That stripped two fields that weren’t useful for me. I could have probably merged this on the other one. Even with this done, the file was hardly readable, both using less or oocalc to display it better. I needed to convert it to an HTML table! But how to do it? After a quick googling, I found the answer in the form of t2t, a Perl script that converts text files to HTML tables. After installing it, I invoked it with:

[code] t2t –header file_to_be_converted [/code]

and I had immediately an usable HTML file.

I’m pretty sure there are more tricks I could learn to improve my work.

Comments