Sunday, December 02, 2007

The web as a channel for structured data

Structured data on the web is really taking off these days. It takes lots of different forms: mash-ups, web services, AJAX, microformats, and the semantic web. The common element in these "web 2.0" technologies is structured data traveling over http.

This is in contrast to HTML, which has structure of course, but is oriented towards page layout. In its emphasis on display, HTML looses the metadata which specifies, for example, that a certain bit of text is a person's last name, or the name of a city in an address, or a list of proteins involved in a specific metabolic process.

My experience with these ideas comes from my work on a Firefox extension called Firegoose. (Paper in BMC Bioinformatics) Firegoose attempts to provide users of biological data repositories on the web with an easy means to query those resources using local data and retrieve data for use in desktop analysis and visualization packages. In a way, Firegoose does on-demand data integration in the browser.

Browser extensions or plugins are a great way to combine the point-and-click ease of the browser with more powerful tools for working with data. Operator reads standard microformats for basic data types like calendar events and contact information. So adding appointments to a calendar or contacts to an address book can be done by surfing to the appropriate web page and clicking a button.

