Below you will find pages that utilize the taxonomy term “Science”
Partings
Multiscale bootstrap clustering with Python and R
While reading the statistics for my blog, I noticed that a number of searches looked for hierarchical clustering with Python, which [I covered quite a while ago]({{ site.url }}/2007/11/data-clustering-with-python). Today I’d like to present an updated version which uses more robust techniques.
Akademy: my own BoF
DataMatrix 0.8 is finally out
Gene search applet: suggestions and code review needed
In the past months I’ve always wanted to write a small Plasma applet to aid me in some boring tasks as a bioinformatician. One example (for the non-scientific crowd out there) is when I find a specific gene out of my analysis work which I want to take a look at. I am often lazy, so instead of firing up the browser to look at the online resources, I wanted to write something which could access said resources programmatically.
Moving on
Science and KDE: kile
During the course of my research work, I may obtain results that are worthy of publication in scientific journals. Since my master’s thesis I’ve been using LaTeX as my writing platform, mainly because I can concentrate on content rather than presentation (I find it useful also for writing non-scientific stuff as well). Also, I can handle bibliography (essential for a scientific publication) very well without using expensive proprietary applications (such as Endnote).
In my early days I used kLyX first, then LyX, but I found the platform to be too limited for my tastes, and also LaTeX errors were difficult to diagnose. I needed a proper editor, and that’s when I heard of kile, a KDE front-end for LaTeX. Kile is currently at version 2.0.2 and is a KDE 3 application. However, in KDE SVN work is ongoing to produce a KDE4 version (2.1) and that’s what I’ll look at in this entry.
Science and KDE: rkward
I try to use FOSS extensively for my scientific work. In fact, when possible, I use only FOSS tools. Among these there is the R programming language. It’s a Free implementation of the S-plus language, and it’s mainly aimed at statistics and mathematics. As the people who read my scientific posts know, I don’t like R much. But sometimes it’s the only alternative.
Well, what does R have to do with KDE? With this post I’d like to start a series (hopefully) of articles that deals with KDE programs used for scientific purposes. In this particular entry, I’ll focus on rkward, a GUI front-end for R.
Published! (and it matters more)
DataMatrix 0.7 has been released
The plague of cross-database annotations
DataMatrix 0.5
data.frames in Python - DataMatrix
For a long time I have tried to handle text files in Python in the same way that R’s data.frame does - that is, direct access to columns and rows of a loaded text file. As I don’t like R at all, I struggled to find a Pythonic equivalent, and since I found none, I decided to eat my own food and write an implementation, which is what you’ll find below.
Commercial applications, public funding
I wanted to write this earier, but I couldn’t: I’m now in a hotel in Maastricht, Netherlands, and waiting to get back tomorrow. I’ve been attending the 4th NuGO hands-on advanced microarray data analysis course and I even wanted to blog about it… but the hotel’s connection did not resolve any non-European web page until late today.
FOSS and research
Performance and R
Follow up on meta-analysis
Meta analysis difficulty increasing
Ph.D.!
Brain drain
There is always a lot of talk about “brain drain” (fuga di cervelli in Italian) from my country. I keep on reading disgruntled comments of low pays and poor research, and that going abroad is the only solution for an Italian scientist to be successful.
While I believe that research done outside of my country can be handled better (but it’s impossible to know for sure: never tar everyone with the same brush), I think that, also thanks to the way the media and the scientists themselves handle it, in everyone’s view it has almost become like the El Dorado. And that, in my opinion, is incorrect.
Gene identifiers
Data clustering with Python
**Notice:**Just now I realized this has been linked to to a Stack Overflow question. I recently wrote a new post that uses a different technique and a combination of R and Python. [Check it out!]({{ site.url }}/2011/05/multiscale-bootstrap-clustering-with-python-and-r)
Following up my recent post, I’ve been looking for alternatives to TMeV. So far I’ve found the R package pvclust and the Pycluster library, part of BioPython. The first one also performs bootstrapping (I’m not sure if it’s similar to what support trees do, but it’s still better than no resampling at all). I’ve found another Python project but it is still too basic to perform what I need.
Buggy bioinformatics software
As people who read my science-related posts know already, I’m not a big fan of {{post id=“software-and-biological-research” text=“software made just to support a publication”}}. Recently I’ve stumbled again into similar software. Namely, I’m talking about the TIGR Multiexperiment Viewer (TMeV), a Java-based program which is often used for microarray analysis. It’s not exactly “fit for publication”, because it has reached version 4 last year, but shows some of the problems ({{post id=“genbugg” text=“mentioned already”}}) with releasing bioinformatics software.
I use TMeV mostly because I didn’t find any other implementation of the hierarchical clustering algorithm with support trees. However, I’ve stumbled upon a very annoying bug in the most recent version. Normally I use average linkage clustering and as the distance metric I employ the Pearson’s correlation, and with gene and sample bootstrapping: with certain files this makes TMeV report errors at random during the iterations.
Thesis completed
SOFT file woes
Easy RMA: RMAExpress
Today I was looking for an easy way to do some calculations of raw expression data on Affymetrix arrays, but I didn’t want to use R: I have already mentioned how I don’t like its design and implementation. While looking for some documentation, I stumbled upon this nifty little program called RMAExpress.
The tower of Babel of bioinformatics
The title of this post tries to give some insight on a problem that I’ve stumbled upon a lot of problems when doing microarray data analysis: the plethora of different file formats. In “conventional” (as I call it) bioinformatics this is less problematic, as FASTA or PDB are quite standardized by now.
New office
Science and Microsoft Word
At the time of writing, a lot of people (even in bioinformatics) uses Microsoft Word to write their papers. I personally think it’s not a good idea, and not just for the file formats (like Microsoft lobbying semi-legally to get OOXML approved by ISO), but because for scientific papers the WYSIWYG paradigm is not appropriate.
Full speed ahead
Publish or perish
Data handling
As the people who read my science related posts already know,[ I’m in the middle of doing meta-analysis]({{ site.url }}/2007/05/28/more-meta-analysis-difficulty/). That brought up a problem, so to speak, and it’s related to annotations.
More meta-analysis difficulty
Thesis work and wikis
Databases
As I’ve been working to get some results done for my Ph.D. thesis, I’ve stumbled across the problem of having different data obtained through different software. Even if it’s just a matter of text files, the fields are all different and even if dealing with the same data, trying to infer relationships is a pain.
Python usefulness
Bioinformatics != sequence analysis
GenBUGG
A BED file builder
As I anticipated, I can finally release this small script. Its purpose is to build BED files out of tab-delimited text files. I made this because I had several files to make and moving columns left and right (not to mention writing a heading line for the Genome Browser) was becoming annoying. Its use is fairly straightforward, as the help itself says:
Old code
Deadlines and domains
A lot of work
Two days in Firenze
A day with an Apple Xserve
Yesterday I and another person went to the server room to do the basic configuration of the Apple Xserve we have bought, along with its Xserve RAID unit. Despite the general “idea” that anything Apple does is user-friendly, our experience was plagued by problems.
Working with Genome Browser data
Probably useless, but...
A simple annotator
Random photos
More studying
The joy of meta-analysis
Recently, I’ve been in need to retrieve some records regarding renal cell carcinoma referenced in papers by Zhao et al. and Higgins et al. The records of the former were hosted on NCBI’s Gene Expression Omnibus, while the latter records were uploaded to EBI’s ArrayExpress database. Getting data from others and using it for your own analysis is called meta-analysis, and it’s often used to validate methods and algorithms with different data sets.
PhD students
Software and biology
I’ve noticed that the journal Science code for Biology and Medicine has finally launched. While some said that would be a journal if someone is desperate for a publication, I think it fills in a gap that’s very felt in bioinformatics: the availability of source code.
Text files with Python
Finally I cleaned up my code enough to post it here. It’s probably still ugly, but not as ugly as when I wrote it down the first time. It’s all about manipulating text files, to be precise tab-delimited files. All the snippets are published under the GNU GPL v2 (not that I think that anyone would use them, but just in case…).
My first real Python program
Annotations and Linux
A lot of my bioinformatics work involves performing functional annotation on genes. This means that given lits of genes I need to resolve their known function, or if they’re part or some metabolic pathways and so on. Even with the current trend in our laboratory, that is investigating DNA copy number changes using SNP microarrays (it’s a rather new form of analysis, but some relevant papers are out already), in the end we have to go back to the genes affected by such changes (in order to find interesting/marker genes - we study solid tumors).
I'm back
Computational Biology... biology?
Perspectives
The power of the shell
Yesterday I was trying to adjust some files in order to make a program use Affymetrix SNP arrays data (instead of arrayCGH data like the program was designed for). I had a big (116,000 rows) tab-delimited text file and I needed to use only part of the columns there.
Upgrade
Busy busy busy!
Classes and meetings
Biological software and HIG
Stress
Security breach
Continuous analysis
Outage
Windows and Missions
Mixed entry (yet again)
Bertinoro - Day 4
Bertinoro - Day 3
Bertinoro - Arrival & Day 1
Finally I got hold of a stable Internet connection so I can post about my experience in Bertinoro di Romagna so far.