data.frames in Python - DataMatrix

For a long time I have tried to handle text files in Python in the same way that R’s data.frame does - that is, direct access to columns and rows of a loaded text file. As I don’t like R at all, I struggled to find a Pythonic equivalent, and since I found none, I decided to eat my own food and write an implementation, which is what you’ll find below.

The idea is to store the values of the text file as a dictionary of columns which includes then a list of (row name, row value) tuples. Like this, you can access the columns by their name (I need to see if it’s workable to also use numbers), or you can view specific rows, including all or a subset of the columns. It’s decently faster and it allows for non-sequential access, which you can’t do when reading a file (or a file-like structure).

Requirements

I have tested this on Python 2.5.1. Older versions may or may not work. All modules called by this one should be shipped with Python itself.

**Download and installation
**

Download the py file directly. Currently there is no installation mechanism, so copy it wherever Python can find it.  There’s some API documentation generated with pydoc.

This module is licensed under the GNU General Public License, version 2.

Usage

First of all, import the module

[code lang=”python”]

import datamatrix[/code]

Then open a file and instantiate a DataMatrix object

[code lang=”python”]

fh = open(“somefile.txt”)
data = datamatrix.DataMatrix(fh)[/code]

By default no column with row names is specified, so if you have one, you have to specify it:

[code lang=”python”]
data = datamatrix.DataMatrix(fh, row_names=1)
[/code]

More options are in the documentation.

Once the DataMatrix is initialized, you can view how many columns are there and also view rows with the getRow method:

[code lang=”python”]

data.columns
[“GeneID”,”Great_Exp1”,”Great_Exp2”]

data[“Great_Exp1”]
[(“Gene1”,56.34),

]

data.getRow(5)
[“NOT_EXISTENT”,”56.545”,”4.56”]
[/code]

Sometimes you’d want to get only the column without the row identifier, and that’s where getColumn comes in:

[code lang=”python”]

data.getColumn(“Great_Exp1”)
[56.34,2.55…..]

[/code]

Should you want to save a DataMatrix instance, you can use the writeMatrix function:

[code lang=”python”]

datamatrix.writeMatrix(data,fname=”/path/to/somewhere/file.txt”)

[/code]

That’s all. Questions and suggestions, especially on coding and improvements, are very welcome.

Dialogue & Discussion