Tuesday, December 09, 2008

Bioinformatics as a Queryable Knowledge Map: the Pygr Project

Pygr is a hypergraph database in Python with applications in bioinformatics written by Christopher Lee, a faculty member at UCLA. There's a 30 minute video of talk about Pygr and a bunch of other resources on the Lee Lab website and Lee's thinking bioinformatics blog.

Thesis: Hypergraphs are a general model for bioinformatics and Python’s core models are already a good model of Bioinformatics Data
  • Sequence: protein and nucleic acid sequences 
  • Mapping / Graphs: alignment, annotation 
  • Attributes: schema, i.e. relations between data 
  • Namespace (import): the ontology of all bioinformatics data 
Pygr aims to show that these Pythonic patterns are a general and scalable solution for bioinformatics.

The general idea is not entirely different from the data types behind Gaggle, especially in the emphasis on basic data structures without a heavy semantic component.

Dr. Lee is also writing a textbook on probabilistic inference.