Sunday, June 21, 2009

Semantic data in life science

When I first saw RDF I said, "Blech!" Well, that's also what I said when I first saw HTML. But, I'm coming around to it. Sure, it's not pretty, but RDF is a graph and graphs are cool. And, we're starting to see more and more that the relational model of data works less well in some situations than it does in transaction processing, while RDF related tools are moving from vaporware to something more practical.

With the web, a new data model is growing up which can be generalized as a graph, where nodes and edges have properties, plus indexes to quickly find sets of nodes in the graph. The web and its search engines are an instance of this pattern. As the web becomes a channel for structured data, it gets more natural to model your data like this, too. Biology has a great tradition of open data and the network is already a workhorse of modern biology. So, why not structure you data that way?

Tim Berners-Lee, in a TED-talk on the blooming of Linked Data, points out the huge untapped potential of integrating the separate data silos distributed all over the web. Because biology was an early adopter of open data, some of its key assets are open, but poorly linked and not very programmable. Maybe the Semantic Web of Life Science will change that particularly in Systems Biology, which demands the integration of diverse types of data.

Clay Shirky criticized the semantic web for its links to AI, and deductive reasoning, asking "What is the Semantic Web good for?" Well, maybe data integration, rather than inference, is the answer.