February 25, 2021

Partings

Because changes occur when one least expects them. This post is about one such change.

May 29, 2011

Multiscale bootstrap clustering with Python and R

While reading the statistics for my blog, I noticed that a number of searches looked for hierarchical clustering with Python, which [I covered quite a while ago]({{ site.url }}/2007/11/data-clustering-with-python). Today I’d like to present an updated version which uses more robust techniques.

May 29, 2010

Akademy: my own BoF

![I’m going to Akademy 2010 image]({{ site.url }}/images/2010/05/igta2010.png) My Akademy talk proposal was not accepted, but the organizers were kind enough to offer me the chance to hold a BoF on the same subject. Now I bet you wonder on what I’m going to discuss, and I think the title already gives you an idea: KDE and bioinformatics: the missing link Although in the KDE community we have our fair share of scientists (hey there, Stuart!

June 13, 2009

DataMatrix 0.8 is finally out

At last, after months of inactivity, I pushed out a new release of [DataMatrix]({{ site.url }}/projects/datamatrix). Although the version bump is small (0.8) there are a lot of changes since last releases. The most notable include: Ability to apply functions to elements of the matrix Ability to filter rows by column contents Ability to transpose rows with columns An option to load text files produced by R (which are, by design, broken) Removed the getter for columns, using dictionary-like syntax directly A lot of bug fixes The download links on [the project page]({{ site.

March 31, 2009

Gene search applet: suggestions and code review needed

In the past months I’ve always wanted to write a small Plasma applet to aid me in some boring tasks as a bioinformatician. One example (for the non-scientific crowd out there) is when I find a specific gene out of my analysis work which I want to take a look at. I am often lazy, so instead of firing up the browser to look at the online resources, I wanted to write something which could access said resources programmatically.

February 27, 2009

Moving on

Some say that all good things must come to an end. I’m not entirely sure that this is a universal truth, but I can say that at some point in life there are decisions that need to be taken. In this case I made my own: today was the last day inDr.Cristina Battaglia’s laboratory, a place where I spent my three-year Ph.D. course and one year as a post-doc research fellow.

February 22, 2009

Science and KDE: kile

During the course of my research work, I may obtain results that are worthy of publication in scientific journals. Since my master’s thesis I’ve been using LaTeX as my writing platform, mainly because I can concentrate on content rather than presentation (I find it useful also for writing non-scientific stuff as well). Also, I can handle bibliography (essential for a scientific publication) very well without using expensive proprietary applications (such as Endnote).

In my early days I used kLyX first, then LyX, but I found the platform to be too limited for my tastes, and also LaTeX errors were difficult to diagnose. I needed a proper editor, and that’s when I heard of kile, a KDE front-end for LaTeX. Kile is currently at version 2.0.2 and is a KDE 3 application. However, in KDE SVN work is ongoing to produce a KDE4 version (2.1) and that’s what I’ll look at in this entry.

February 7, 2009

Science and KDE: rkward

I try to use FOSS extensively for my scientific work. In fact, when possible, I use only FOSS tools. Among these there is the R programming language. It’s a Free implementation of the S-plus language, and it’s mainly aimed at statistics and mathematics. As the people who read my scientific posts know, I don’t like R much. But sometimes it’s the only alternative.

Well, what does R have to do with KDE? With this post I’d like to start a series (hopefully) of articles that deals with KDE programs used for scientific purposes. In this particular entry, I’ll focus on rkward, a GUI front-end for R.

January 6, 2009

Published! (and it matters more)

Finally I can lift the curtain of silence and tell the reason why I’ve been very busy before Christmas: it all lies in the publication of a paper, “Using Pathway Signatures as Means of Identifying Similarities among Microarray Experiments”, which is finally out on this week’s issue of PLoS ONE. It’s different from [the previous paper I mentioned]({{ site.url }}/2008/01/phd) (which was not my first publication, either), for two main reasons:

December 27, 2008

DataMatrix 0.7 has been released

Finally a new entry! I’ve been extremely busy with other things, that is why I did not have time to write more. One of the main reason is related to an important landmark in my professional career, but I’ll write more about it after January 1st (hint: those who follow my Twitter updates may have already understood). As a nice way to break the hiatus, I’m releasing a new version of DataMatrix, my implementation of R’s data.

November 2, 2008

The plague of cross-database annotations

Recently I had to annotate a large (10,000+) number of genes identified by Entrez Gene IDs. My goal was to avoid “annotation files” (basically CSV files) that a part of wet lab group likes, because I wanted to stay up-to-date without having to remember to update them. So the obvious solution was to use a service available on the web, and in an automated way. For reference, I just tried to attach gene symbol, gene name, chromosome and cytoband.

September 19, 2008

DataMatrix 0.5

At last, since it’s been like ages, I decided to put out a new version of DataMatrix. For those who haven’t seen my previous post, DataMatrix is a Pythonic implementation of R’s data.frame. It enables you to manipulate a text file by columns or rows, to your liking, using a dictionary-like syntax. In this new version there have been a few improvements and correction to a couple bugs (for example saveMatrix did not really save) and the start (only a stub at the moment) of an append function to add more columns (I’ll also think about a function to add rows).

June 29, 2008

data.frames in Python - DataMatrix

For a long time I have tried to handle text files in Python in the same way that R’s data.frame does - that is, direct access to columns and rows of a loaded text file. As I don’t like R at all, I struggled to find a Pythonic equivalent, and since I found none, I decided to eat my own food and write an implementation, which is what you’ll find below.

June 27, 2008

Commercial applications, public funding

I wanted to write this earier, but I couldn’t: I’m now in a hotel in Maastricht, Netherlands, and waiting to get back tomorrow. I’ve been attending the 4th NuGO hands-on advanced microarray data analysis course and I even wanted to blog about it… but the hotel’s connection did not resolve any non-European web page until late today.

May 10, 2008

FOSS and research

I’ve been wondering about why FOSS is often compared to the academic world, but at least in my limited experience, I see little people that grasp its concept in the world of research. On a quick look, developing FOSS in a research environment would be very good: not only you’d get publicly available results when you publish, but at the same time you can make sure that in an extreme case your application will be carried on by someone else should you not be able to continue development.

April 5, 2008

Performance and R

I’m often wondering why people only resort to R when working with microarrays. I can understand that Bioconductor offers a plethora of different packages and that R’s statistical functions come in handy for many applications, but still, I think people underestimate the impact of performance. R is not a performing language at all, it doesn’t parallelize well when using HPC (at least from the talks I’ve had with people studying the matter), and in general is a memory and resource hog.

February 28, 2008

Follow up on meta-analysis

Fourteen days since my last post. Quite a while, indeed. Mostly I’ve been stumbled with work and some health related issues. Anyway, I thought I’d follow up on the meta analysis matter I discussed in my last post. It turns out that it’s a fault of both limma and the data sets, because apparently the raw data found in the Stanford Microarray Database have different length, gene-wise (a result of not all spots on the array being good?

February 14, 2008

Meta analysis difficulty increasing

Again in the past days I’ve been banging my head thanks to the fact that doing meta-analysis with microarray data is more difficult than what it seems. The problem sometimes lies in the data, sometimes lies in the analysis software and sometimes in a combination of factors. When doing work on a public data set (Zhao et al., 2005), I had to start analysis from raw data. Now, I tried using both the limma and marray Bioconductor packages, but both of them bail out with cryptic error messages.

January 14, 2008

Ph.D.!

The title says it all. After all these years, I was finally able to get my Ph.D. in Molecular Medicine this morning, with my thesis “Identification of disregulated metabolic pathways by transcriptomic analysis in renal carcinoma samples” (yes, that’s a long title). The defense was a success and I admit I was surprised when the commitee actually expressed a significant interest in my work. In any case, I’m happy that it’s over, as the past period has been rather hectic.

December 22, 2007

Brain drain

There is always a lot of talk about “brain drain” (fuga di cervelli in Italian) from my country. I keep on reading disgruntled comments of low pays and poor research, and that going abroad is the only solution for an Italian scientist to be successful.

While I believe that research done outside of my country can be handled better (but it’s impossible to know for sure: never tar everyone with the same brush), I think that, also thanks to the way the media and the scientists themselves handle it, in everyone’s view it has almost become like the El Dorado. And that, in my opinion, is incorrect.

November 15, 2007

Gene identifiers

While working today on an annotation class in Python I stumbled on a problem. Normally I work with lists of genes that are consistent, i.e. all Entrez Gene IDs (or RefSeq IDs, or Genome Browser IDs…), but today I had a list of mixed identifiers. The subsequent idea was “let’s implement auto-detection of common identifiers in the class”. The problem is… is there any actual documentation on how identifiers are made?

November 7, 2007

Data clustering with Python

**Notice:**Just now I realized this has been linked to to a Stack Overflow question. I recently wrote a new post that uses a different technique and a combination of R and Python. [Check it out!]({{ site.url }}/2011/05/multiscale-bootstrap-clustering-with-python-and-r)

Following up my recent post, I’ve been looking for alternatives to TMeV. So far I’ve found the R package pvclust and the Pycluster library, part of BioPython. The first one also performs bootstrapping (I’m not sure if it’s similar to what support trees do, but it’s still better than no resampling at all). I’ve found another Python project but it is still too basic to perform what I need.

November 7, 2007

Buggy bioinformatics software

As people who read my science-related posts know already, I’m not a big fan of {{post id=“software-and-biological-research” text=“software made just to support a publication”}}. Recently I’ve stumbled again into similar software. Namely, I’m talking about the TIGR Multiexperiment Viewer (TMeV), a Java-based program which is often used for microarray analysis. It’s not exactly “fit for publication”, because it has reached version 4 last year, but shows some of the problems ({{post id=“genbugg” text=“mentioned already”}}) with releasing bioinformatics software.

I use TMeV mostly because I didn’t find any other implementation of the hierarchical clustering algorithm with support trees. However, I’ve stumbled upon a very annoying bug in the most recent version. Normally I use average linkage clustering and as the distance metric I employ the Pearson’s correlation, and with gene and sample bootstrapping: with certain files this makes TMeV report errors at random during the iterations.

October 27, 2007

Thesis completed

My supervisor has given me an OK for my thesis (save for a couple of cosmetic changes), therefore now I have just to wait for the verdict of the Ph.D. council then fill in some paperwork: the next step is the defense, sometime in January. After that I’ll probably put my thesis online and post a few articles on the concept of group testing for microarray data.

October 9, 2007

SOFT file woes

Today I started working on a data set published on GEO. As the sample data were somehow inconsistent (they mentioned 23 controls when I found 28), I decided to parse the SOFT file from GEO in order to get the exact sample information. I did a grave mistake. First of all, Biopython’s SOFT parser is horribly broken (doesn’t work at all) and quite undocumented: I could work around the lack of documentation (API docs) but not with the fact that it wouldn’t work.

October 4, 2007

Easy RMA: RMAExpress

Today I was looking for an easy way to do some calculations of raw expression data on Affymetrix arrays, but I didn’t want to use R: I have already mentioned how I don’t like its design and implementation. While looking for some documentation, I stumbled upon this nifty little program called RMAExpress.

September 19, 2007

The tower of Babel of bioinformatics

The title of this post tries to give some insight on a problem that I’ve stumbled upon a lot of problems when doing microarray data analysis: the plethora of different file formats. In “conventional” (as I call it) bioinformatics this is less problematic, as FASTA or PDB are quite standardized by now.

September 11, 2007

New office

After little more than a year, I’ve been moved to a new office, because new people needed to be put in the room I was in. The new place is slightly bigger (four desks instead of six) and for now quieter. I spent most of the morning fixing things and setting up network connections. This is how it looks now: [![The desk at distance]({{ site.url }}/images/2007/09/016.thumbnail.JPG)]({{ site.url }}/images/2007/09/016.JPG) [![A closer look]({{ site.

September 1, 2007

Science and Microsoft Word

At the time of writing, a lot of people (even in bioinformatics) uses Microsoft Word to write their papers. I personally think it’s not a good idea, and not just for the file formats (like Microsoft lobbying semi-legally to get OOXML approved by ISO), but because for scientific papers the WYSIWYG paradigm is not appropriate.

July 12, 2007

Full speed ahead

After giving in the summary of my thesis today, I also started writing the actual thing. Of course this is not something I will finish in a few days. It’s a long journey that will go on until I deliver it. Aside that, I’ve been having some problems with R, a language which I really don’t like and I hope to use as less as possible. I was obtaining a list of differentially expressed genes (DEGs) out of some data files with SAM, and of course I had to supply a matrix with expression values for every gene.

June 25, 2007

Publish or perish

The idea to blog about this common phenomenon came after I read a post over at Bioinformatics Zen on the matter of open science. There, Mike describes the situation quite clearly: mostly, you need a middle ground between complete secrecy and absolute openness. That said, I still think science should be more open, at least in the field of life sciences. Publication should be a way to let others know, benefit and also build upon your work, not just a way to obtain funding or improve one’s career.

June 20, 2007

Data handling

As the people who read my science related posts already know,[ I’m in the middle of doing meta-analysis]({{ site.url }}/2007/05/28/more-meta-analysis-difficulty/). That brought up a problem, so to speak, and it’s related to annotations.

May 28, 2007

More meta-analysis difficulty

UPDATE: Today I found out that J Brooks (the corresponding author of Zhao’s paper) has agreed to send the data I needed. Thanks a lot! When you do bioinformatics, you often test your own procedures not only on your data, but also on datasets provided by other people and publicly available. [As I stated previously]({{ site.url }}/2006/11/10/the-joy-of-meta-analysis/), that’s what meta-analysis is. I’m doing a bit of that for my thesis and recently I noticed that some datasets, while being public, are far from complete.

May 18, 2007

Thesis work and wikis

I’ve been able to complete most of the analysis for my Ph.D. thesis work, at last. I need a few runs more but I am arranging to work at a location of one of our research partners, as I don’t have a powerful enough computer to handle the calculations (apparently, 1Gb of RAM isn’t enough). At least the results I have look promising. Now all it’s left (and that’s not an easy task, heh).

April 25, 2007

Databases

As I’ve been working to get some results done for my Ph.D. thesis, I’ve stumbled across the problem of having different data obtained through different software. Even if it’s just a matter of text files, the fields are all different and even if dealing with the same data, trying to infer relationships is a pain.

April 21, 2007

Python usefulness

I’ve again seen how useful and powerful Python can be. The other day I had to prepare an Excel spreadsheet (sadly) which among other things needed to contain links to the GeneCards database for each gene listed. There were more than 2900 genes listed, so adding links by hand would have been suicidal. That’s when Python, through its Windows extensions, comes into play. First of all I created a module for COM objects using the makepy utility.

April 14, 2007

Bioinformatics != sequence analysis

This post sums up my frustration in trying to use Python for my daily work. Like Perl and Ruby, it has its own Bio version to deal with biological data. However, the current implementation leaves a lot to be desired. A lot of stuff that doesn’t deal with sequence analysis, even for simple tasks such as fetching annotations from Entrez Gene, is missing (but present in Bioperl, for example). Also, documentation for some modules is lacking or non-existant (why keeping a parser for Affymetrix CEL files when there are no information on how to use it, let alone know which formats does it support?

March 29, 2007

GenBUGG

This entry already shows what I think of a popular open source pathway analysis and visualization program, GenMAPP. It’s quite used to map gene expression data coming from microarrays to metabolic pathways, and in addition to that you can also evaluate enrichment for both pathways and Gene Ontology terms. Last but not least you can create and contribute new metabolic maps. So why the title of the post? Because, despite being actively developed and with conscious developers that don’t want to break the software with new features, it’s extremely buggy.

March 13, 2007

A BED file builder

As I anticipated, I can finally release this small script. Its purpose is to build BED files out of tab-delimited text files. I made this because I had several files to make and moving columns left and right (not to mention writing a heading line for the Genome Browser) was becoming annoying. Its use is fairly straightforward, as the help itself says:

March 12, 2007

Old code

Today I went over my old Python code, back when I first started programming. I’d say that I found what I had written to be largely amusing to say the least. Loads of ugly hacks all over the place, duplicated functions, etc… in short, a real mess. I’m quite happy of the results I achieved in the past month, since I finally learnt how to code properly and started making my own classes (this book was immensely helpful).

March 1, 2007

Deadlines and domains

I just learnt today that the paper I’m writing with my colleague has to be in an acceptable state (almost submission ready) by the 20th. Considering I’m still behind with the analysis, it will be certainly challenging to complete the task, considering I also have other duties to attend to. In primis I need to study more and more papers (I add more and more to my TODO list). Also, probably next week the new infrastructure for authentication (which includes a Primary Domain Controller for Windows clients) will be put in place and I’ll have to make sure that the data handling for our group will be without problems.

February 25, 2007

A lot of work

Work is getting hectic as we’re finally gathering a lot of data for a new publication. This means reading a lot of literature, choosing the figures and planning results and discussion properly. On top of that, I’m starting to study pathway analysis for my thesis, and of course, seeing how I can use that to study our own data. I also have some other non research duties to attend to, such as being the person repsonsible for data handling of our group in relation to the [server that we bought recently]({{ site.

February 11, 2007

Two days in Firenze

I’ve just came back from a two days’ trip in Firenze. On the first day we had a long (4.30 hours) meeting on some interesting research topics, such as cellular aging and about oxidative damage in yeast. On the second day I attended part of a workshop on pathway analysis and related topics. Many speakers were from the NuGO project funded by the European Community, but the talks were still quite general.

February 3, 2007

A day with an Apple Xserve

Yesterday I and another person went to the server room to do the basic configuration of the Apple Xserve we have bought, along with its Xserve RAID unit. Despite the general “idea” that anything Apple does is user-friendly, our experience was plagued by problems.

December 13, 2006

Working with Genome Browser data

In the past two days I’ve been tackling an annotation problem. I’m trying to provide annotations for genes found in regions that are significantly altered, DNA copy-number wise (thanks to the STAC method). The idea would be to annotate those regions (that span one megabase) using UCSC Table Browser. However, the task was impractical, so I decided to automate it a bit. I converted the data into ranges and then used the KnownGene annotation file (downloaded from UCSC) to obtain which genes were in which reagion.

December 1, 2006

Probably useless, but...

A post I wrote ended up on the front page of nodalpoint.org. I liked that, even though it may not seem a lot to many: at least it shows that some of my concerns are shared with other people in the scientific community. Also after viewing some presentations about Web 2.0 at the EMBL, I decided to make a del.icio.us account. I find it quite handy so far to organize bookmarks and such.

November 25, 2006

A simple annotator

In the past two days I’ve written a simple annotator program, that given an input list of RefSeq genes, automatically determines the relevant Entrez Gene IDs and annotates them using the flat files provided by the NCBI. A direct conversion was not possible due to limitations in Biopython’s parsers, but I managed to use the GenBank parser to identify and extract the references to the Gene IDs (and putting them in a list).

November 17, 2006

Random photos

Since recently (thanks to my brother) I got a new cellular phone with better capabilities, photo-wise, I decided to take a few photos of my workplace. I changed office a while ago, now I’m located on the first floor of the building, sharing the office with 5 other people: Raoul, Michele, Giorgio, Roberta and Alessandro. This photo shows my current computer setup: 429 Quant was running a screensaver when I took this photo: the mouse pad and that other black thing next to the keyboard are gel pads to help me with my tendinitis.

November 15, 2006

More studying

EURASIP Journal on Bioinformatics and Systems Biology is having a special issue on text mining for biology. My boss wants to publish something there, and asked if I and a colleague could work on that. I’ll probably focus on evaluation methods, since I’m not an expert in text mining or language processing. And that’s where the problem comes in: I know almost nothing on text mining. Which means this will add up to the ever growing pile of papers and books I have to study, including but not limited to Python programming, microarray analysis and XML…

November 10, 2006

The joy of meta-analysis

Recently, I’ve been in need to retrieve some records regarding renal cell carcinoma referenced in papers by Zhao et al. and Higgins et al. The records of the former were hosted on NCBI’s Gene Expression Omnibus, while the latter records were uploaded to EBI’s ArrayExpress database. Getting data from others and using it for your own analysis is called meta-analysis, and it’s often used to validate methods and algorithms with different data sets.

November 7, 2006

PhD students

After discussing my second PhD year today, and after noticing that not even a single professor from the commitee was there (save my boss and a person that joined later on), I realized that the situation for students like me isn’t really the best. A PhD program is supposed (at least here) to be a cross between a research job and a student’s position, since you should study but at the same time conduct research and (if you’re lucky) also get paid for it.

November 2, 2006

Software and biology

I’ve noticed that the journal Science code for Biology and Medicine has finally launched. While some said that would be a journal if someone is desperate for a publication, I think it fills in a gap that’s very felt in bioinformatics: the availability of source code.

October 23, 2006

Text files with Python

Finally I cleaned up my code enough to post it here. It’s probably still ugly, but not as ugly as when I wrote it down the first time. It’s all about manipulating text files, to be precise tab-delimited files. All the snippets are published under the GNU GPL v2 (not that I think that anyone would use them, but just in case…).

October 11, 2006

My first real Python program

Although I’m still studying the language, finally I managed to create a program that actually does something I need. The need arose from a gene list I was given, made up by Entrez Gene IDs. I need to annotate it, but in a form that wasn’t able to be produced by the usual functional annotation tools I have. Actually it could have been done, but what I needed was to make something that I could automate.

September 13, 2006

Annotations and Linux

A lot of my bioinformatics work involves performing functional annotation on genes. This means that given lits of genes I need to resolve their known function, or if they’re part or some metabolic pathways and so on. Even with the current trend in our laboratory, that is investigating DNA copy number changes using SNP microarrays (it’s a rather new form of analysis, but some relevant papers are out already), in the end we have to go back to the genes affected by such changes (in order to find interesting/marker genes - we study solid tumors).

August 29, 2006

I'm back

I’m finally back from holidays and since there’s almost no one in the office here I thought I’d write a little mixed entry, while quant is busy compiling a patched 2.6.17 kernel (but quant as an odd overheating problem that I will describe in another entry, so I have to compile it by steps). Work is somewhat slow, as the people I need to contact are still on holiday. In the mean time, I’m reading an interesting paper on some GPLed software for Single Nucleotide Polymorphism analysis, written in Perl and R.

July 29, 2006

Computational Biology... biology?

In the past few months I’ve been wondering about the state of bioinformatics and computational biology in general. When I attended the school in Bertinoro (as [I wrote about previously]({{ site.url }}/2006/03/19/bertinoro-arrival-day-1/)). The development in this field can be regarded as quite astounding, just a look at journals like BMC Bioinformatics, Bioinformatics or others can give an insight of that. However, and talks at Bertinoro seemed to confirm this impression, it looks like that many in the computational field forget that we’re talking about biology.

July 28, 2006

Perspectives

Our paper’s completion has probably been delayed till September or so pending some (in my opinion puzzling) organizational issues within a group that works with us. My research work is also pending due to the fact that I’m waiting for someone to prepare a “data matrix” for me, and the person doesn’t seem to know that leaving other people hanging is somewhat impolite. So, I decided that as a person that works in bioinformatics I could learn something new.

July 8, 2006

The power of the shell

Yesterday I was trying to adjust some files in order to make a program use Affymetrix SNP arrays data (instead of arrayCGH data like the program was designed for). I had a big (116,000 rows) tab-delimited text file and I needed to use only part of the columns there.

June 20, 2006

Upgrade

I got upgrade hardin (my laptop)’s memory at least, bringing it to 1Gb, at last. However I had two 256 Mb modules installed, that means I had to buy two 512 Mb ones. Oh well, I’ll just try to sell those ones. Everything seems snappier now, and even Firefox seems not to clog the CPU when loading pages (I need to experiment more, since I have no clue why that should have changed).

June 17, 2006

Busy busy busy!

I haven’t been writing a lot here mostly because of the rather hectic work schedule I’m having. We’re nearing the completion phase of a paper that includes quite a bit of my work (and also a lot of other people’s contributions), and I’m also swamped with PhD classes (two weeks to go). Also, one of my bosses has assigned me another project which I’ve yet to tackle, and I got to help my colleague with a poster as well.

June 8, 2006

Classes and meetings

This week has been absolutely exhausting. I’ve been having PhD classes (statistics, effective writing, and more statistics) almost every day (save tomorrow) and also heaps of work piled up. I’ve got new data to work on, and also I’ve been assigned to do something else along with my current bioinformatics work. I also need to get in touch with the Partek software representative to let him know what I think of their genomics software.

May 25, 2006

Biological software and HIG

Today I obtained a trial license of a data analysis program. I plan on using it for the next two weeks to see if it could improve the analysis workflow in our laboratory. I noticed this software uses the Tk widget set to achieve cross-platform capability (in fact, it can run on Linux, which is a big plus for me). However, in my opinion, Tk widgets look rather ugly. I wonder why the company didn’t consider using Trolltech’s Qt widgets.

May 17, 2006

Stress

This has been a rather stressful period work-wise. Things are looking extremely confusing on the (upcoming) paper we need to write and even a small doubt can cause huge time losses (I spent the whole morning checking to see if my analysis was correct or not while in fact it was correct in the first place). I’ve been shown a new software for microarray analysis, well, at least some brochures. I’m going to inquire about pricing (the most important part) and platform availability (I’d love if it could run on Linux).

May 8, 2006

Security breach

Today I found out that a computer running Windows had been “self-writing” words when an Internet Explorer window was open. This obviously led to the conclusion that there was some kind of malware running. I immediately unplugged the network cable but the typing continued - this is a good sign meaning that it was just some random program doing it. It only affected IE, no other programs. I wonder how it got there, I can only suspect the current user, as I never do any network-based activity there, only analysis (and I run a much safer Linux box - no Windows for me).

May 5, 2006

Continuous analysis

This week has been quite busy at work. As I finally got the two lists of differentially expressed genes that I needed, I started my analysis work. Perhaps analysis is not quite appropriate, as what I to is mostly to annotate the genes (i.e. I try to find out everything about them, or certain features I need) and to see if the lists are “enriched” by a particular feature. From this I try to see if what I see has a biological relevance, or not.

April 10, 2006

Outage

Today one of our desktops won’t turn on at all. I didn’t even see the POST lights turning on (this is a Dell computer), so I assume that the PSU is dead. I’d have already called for support but since the box has been bought in the USA (part of a bundle of our GeneChip systems) I had to request a warranty transfer (notice that I don’t even know if there’s still a warranty!

April 6, 2006

Windows and Missions

Finally, we have windows in the office. Before that it was just a room with walls, and it looked a little claustrophobic. It took a good deal of time (4 months) to get these works done, but I think the result is worth it. At least the whole room looks a little more human, now. I will be spending tomorrow’s work day analyzing data with an extremely bad piece of software, at least design-wise (the algorithm used inside it’s very solid, instead).

March 30, 2006

Mixed entry (yet again)

I haven’t been writing on this blog as much as I wanted, partly because of tiredness, partly because of my always-present tiredness. Today I delivered the receipts from the Bertinoro school and also from Trieste: I hope I get a refund fairly quickly (although usually “fairly quickly” means around 3-4 months if I’m lucky). Also, these days I’m working on the DNAcopy package from Bioconductor (a series of add-on packages for R) to see if I can use the data from Affymetrix’s Chromosome Copy Number Analysis Tool to get a genome-wide view on copy number aberrations (CNAT only permits you to see one chromosome at a time).

March 22, 2006

Bertinoro - Day 4

Talks, talks, talks. That’s all about this day. Some were also extremely difficult to understand, and I’m not sure I took good notes. Talks went on from 8.30 to 18.00 (excluding lunch break) and then a workshop until 19.30. I’m really beat. I retired to my room while the others wanted to play karaoke, but I was too tired. Tomorrow is the last day, thank goodness! I’m tired of waking up at 6.

March 21, 2006

Bertinoro - Day 3

I just came home and I’m terribly tired, so I’ll be rather brief. This morning we had four lectures, including two interesting ones on protein structure prediction, though one speaker was too fast and I almost couldn’t take notes. Then we went in the afternoon to Ravenna. I took some pictures during the sightseeing and I’ll post them when I get home. We visited the “Mausoleo of Teodorico”, the “Basilica di S.

March 19, 2006

Bertinoro - Arrival & Day 1

Finally I got hold of a stable Internet connection so I can post about my experience in Bertinoro di Romagna so far.

March 16, 2006

Random entry

I haven’t blogged in a while, mostly because I’m either doing something else, or when I want to, I find out I’m really tired (this and the past week have been very tiring for me). In any case, I’ll make a big, random entry. I’ll start saying that I’m amazed at how badly software can be written. In the life sciences field you see very clear examples. Horrible UIs and poor documentation mostly (while the algorithms and methods implemented are usually very solid).

March 11, 2006

Trento, March 7th-8th

Finally I have some time to give an insight of what happened Tuesday and Friday, when I left for Trento to attend to the ITC-irst’s meeting called “Biobanks for functional genomics”. The trip was mostly uneventful, though we had to wait a lot once we arrived at the ITC-irst because we were way earlier than expected. I spent time talking to some people there, they work in an interesting field: applying computational methods to microarray expression data.

February 28, 2006

Bertinoro di Romagna

After sorting out the bureaucracy, finally I was able to enroll for the 6th Course in Bioinformatics for Molecular Biologists that will take place in Bertinoro di Romagna in three weeks. I’m glad I was able to, the course program looks really interesting and perhaps I’ll be finally able to learn bioinformatics in the right way, since so far I’m entirely self-taught, and sadly, it shows. On another topic, the ceiling is going to be fixed by tomorrow, I hope, so I’ll be able to get back to my old office at last.

February 6, 2006

Woes of a scientist

Not everything can be fun in the life of a scientist. For example, my work in the bioinformatics field of our laboratory makes me see a lot of annoying problems related to files, to be precise file formats. A lot of sofware in the biological sciences uses plain text files to export data to other programs (sometimes binary data, but that’s another matter). The problem comes when you have to use the output into another program.

February 5, 2006

All-in-one entry

First of all, the strangest oddity ever. On Friday once at the laboratory I was contacted by an Affymetrix representative regarding [this entry ]({{ site.url }}/2006/01/26/badly-written-software-or-policy/). She wanted to know why I wrote it and that it seemed that I “didn’t contact Affymetrix to see if the problem could be solved”. She wasn’t offended or anything (she will even get me in touch with someone from the software department), but nevertheless this is my blog and I’m entitled to my personal opinions.

January 30, 2006

Clarissa's birthday

That was the only noteworthy event of today’s work. I arrived late (took 1 hour and 30 just to get there) and then during lunch break we moved again to the LITA to celebrate. We got a sort of necklace for her which she liked quite a lot, and we got time to have a toast and eat some cake. It took way more than I expected - I was back at the CISI at around 15.

January 26, 2006

Badly written software, or policy?

Today while working I got stuck in what probably is an intentional flaw of the software we’re using. When we scan Affymetrix’s GeneChips there are a number of files produced by the GeneChip Operating Software (GCOS in short) including raw acquisition images and various analyses. Now, a certain number of people need to use those data to work (including myself), but I don’t want people to fiddle on the scanner workstation unless they know what they’re doing.

January 20, 2006

Office, take 2

Now that the office is more or less in working order (if we exclude the lack of chairs) I thought I’d post something about it. I finally finished setting everything up (including the 250 Gb network drive) today, so I can finally resume my real work next week (that is, bioinformatic analysis). I plan on buying a cheap chair this weekend so I can get at least a place to sit: there are two chairs only for 6 stations.

January 16, 2006

Moved

Finally today there was the (largely) anticipated move from the LITA to the CISI. I actually showed up at the CISI first, to do some research work, then after the lunch break I and a colleague went to the LITA to pack things up. We almost looked like robbers (and for sure got some funny looks from people as we moved to put things in the car): we had four computers, four monitors and keyboards, lots of cables, my notebooks, and more.

January 11, 2006

Always needed

I haven’t been able to write because I always end up extremely tired from work. Since I’m moving to the CISI next week, it looks like everyone needs me. I managed to study a bit, then I had to help a couple people with a protocol, then work on the firewall/router migration (we’re restructuring our Intranet once I move a few computers there) then help with a software, etc. And tomorrow looks the same.

January 9, 2006

Moving? Not yet

Today I thought I was going to move from my current location (LITA building, University of Milano ) to the new one, the CISI. However I didn’t think up that some stuff (including the arrival of 3 new people at our laboratory, Sarah, Guya and Alessandra) would have held me back. I won’t relocate till next Monday. In the mean time I spent the day testing out some procedures and creating a draft proposal for the new laboratory’s network structure, and getting the equipment to put that in place.