Wednesday, December 16, 2009

Modern Information Management in Bioinformatics

Jon Udell talks bioinformatics with Randy Julian of Indigo BioSystems on the topic of flexible data repositories.

… without buying into all the hype around semantic web and so on, you would argue that a flexible schema makes more sense in a knowledge gathering or knowledge generation context than a fixed schema does.

His contention is that fixed schemas don't work for knowledge discovery, instead the right tools are flexible schemas and linked data. Also, it's not enough to represent experimental data in standard ways. We also need to describe the experimental design that provides the context for that data. To accomplish this use documents annotated with RDF style triples or XML plus (not-quite-free-text) descriptions built from controlled vocabulary. Use virtualization to archive complete data analysis environments for reproducability.

On the IndigoBio blog, there's a couple posts about interoperable data that make use of R and Cytoscape. Sounds like territory familiar to my current project/nemesis Gaggle.

The conversation then turns to the increasingly distributed nature of the drug industry and the IT challenges of strictly proscribed data sharing between highly paranoid competitors. The goal is to produce portable data assets with the ability to merge with any clients knowledge base -- mapping into the other's terms.

