In the January 2008 issue of Communications of the ACM, Jeannette Wing of Carnegie Mellon University poses these questions:
- P=NP?
- What is computable?
- What is intelligence?
- What is information?
- (How) can we build complex systems simply?
Mad science in silico...
In the January 2008 issue of Communications of the ACM, Jeannette Wing of Carnegie Mellon University poses these questions:
- P=NP?
- What is computable?
- What is intelligence?
- What is information?
- (How) can we build complex systems simply?
I've harbored a secret desire to learn Haskell for a few years now. Simon Peyton-Jones is one of the key people behind Haskell. His web site at MSR has tons of papers, a tutorial on concurrent programming in Haskell, and a video lecture of A taste of Haskell. There's also a Simon Peyton-Jones podcast at SE-Radio.
What is Haskell
Haskell is a programming language that is
- purely functional
- lazy
- higher order
- strongly typed
- general purpose
Why should I care?
Functional programming will make you think differently about programming
- Mainstream languages are all about state
- Functional programming is all about values
Whether or not you drink the Haskell Kool-Aid, you'll be a better programmer in whatever language you regularly use
I should read a Haskell book or two, and, in related functional goodness, I keep reading how great Practical Common Lisp is. I also need to fulfill my quest to finish SICP. I've read the first three chapters twice, doing the examples once in Scheme and again in OCAML. I've read chapter 4 on interpreters. I need to work through the examples in that chapter and take in the final fifth chapter.
Let's start off by saying I'm not anti-Ruby. I like Ruby. Ruby is cool. Matz is cool. But, a while back I was wondering, What is a Ruby code block? My feeble curiosity has been revealed for the half-assed dilettantery it is by Paul Cantrell. Mr. Cantrell chases this question down, grabs it by the scruff of the neck, and wrings it out like a bulldog with a new toy. He also rocks on the piano, by the way.
So in fact, there are no less than seven -- count 'em, SEVEN -- different closure-like constructs in Ruby:
- block (implicitly passed, called with yield)
- block (&b => f(&b) => yield)
- block (&b => b.call)
- Proc.new
- proc
- lambda
- method
This is quite a dizzing array of syntactic options, with subtle semantics differences that are not at all obvious, and riddled with minor special cases. It's like a big bear trap from programmers who expect the language to just work. Why are things this way? Because Ruby is:
- designed by implementation, and
- defined by implementation.
Again, neither I nor Mr. P.C. are bashing Ruby. He shows how to pull off some tasty functional goodness like transparent lazy lists later in the article. Thanks to railspikes for the link.
If you're into stats, both of these are highly regarded, but miles over my head.
Papers
Dr. Larry Ruzzo at UW teaches a Computational Biology course. Some of the links above are from his reading list, particularly the Sean Eddy Primer articles from Nature Biotechnology.
In winter of 2008, some UW CS grad students held a seminar course on data management issues in life sciences. In case that link doesn't stay up forever, here's some of the reading list:
Intro to BiologyOverview on biological data integration
Specific tools and techniques
Also on the subject of data: Dynamic Fusion of Web Data.
Books I wanna read
Finally, here are some books that I haven't read, will probably never get the time to read, but I wish I would read.
Slashdot linked to Bjarne Stroustrup on Educating Software Developers which follows up on an earlier article, The 'Anti-Java' Professor and the Jobless Programmers. The Anti-Java professor is Robert Dewar at NYU, who coauthored a short paper, Computer Science Education: Where Are the Software Engineers of Tomorrow? They contend that computer science curricula have been dumbed down to counter falling enrollment post-dot-com-crash and partially blame Java, which fosters reliance on libraries and garbage collection. But, not all of their critique can be written off as language bigotry. The result?
We are training easily replaceable professionals.
Dewar advocates:
Those sound like solid points to me. One thing the field of medicine really gets right is an emphasis on mentoring. Mentoring is the heart of residency, which depending on specialty can last from 3 to 7 years. By the time a physician graduates from residency, they will have performed hundreds of procedures and seen thousands of patients under the guidance of an attending physician. I've often wished there was more of this in the computing field.
Over the years, I've accumulated a list of topics I wish I'd been exposed to as a CS undergrad.
Of course, then my undergraduate degree would have taken 7 years... On second thought, only my Dad would have complained.
What do you wish you'd learned in college? Post a comment!
I wondered who else might be working on bioinformatics related extensions for Firefox besides Firegoose. One interesting project is iHOPerator, which builds on Greasemonkey. And, there's a hint of something to come here.
It seems like there was a flurry of interest around 2005, in the early days of AJAX and mash-ups, which produced biobar along with two now-dead projects - bioFox and NCBI Search Toolbar. Back in those days, John Udell asked, How do you design a remixable Web application? Nifty developments like the REST API in EMBL's STRING 8.0 are starting to provide answers.
Pygr is a hypergraph database in Python with applications in bioinformatics written by Christopher Lee, a faculty member at UCLA. There's a 30 minute video of talk about Pygr and a bunch of other resources on the Lee Lab website and Lee's thinking bioinformatics blog.
Thesis: Hypergraphs are a general model for bioinformatics and Python’s core models are already a good model of Bioinformatics DataPygr aims to show that these Pythonic patterns are a general and scalable solution for bioinformatics.
- Sequence: protein and nucleic acid sequences
- Mapping / Graphs: alignment, annotation
- Attributes: schema, i.e. relations between data
- Namespace (import): the ontology of all bioinformatics data
The general idea is not entirely different from the data types behind Gaggle, especially in the emphasis on basic data structures without a heavy semantic component.
Dr. Lee is also writing a textbook on probabilistic inference.
I happened across a very cool project on web data integration at the University of Leipzig. Their paper Dynamic Fusion of Web Data is worth a look. They're working towards a theory of on-the-fly data integration for mashup applications that they refer to as dynamic data fusion. Data integration in mashups is dynamic in that it occurs as runtime. This provides for a pay-as-you-go model, rather than a large up-front semantic mapping task that limits the scalability of traditional data integration methods like data warehouses.
They describe mashups as workflow-like. Do they mean mashups are programmatic as opposed to declarative? In place of SQL, this group's iFuice system uses a scripting language with "set operations (e.g., union, intersection, and difference) and data transformation (e.g., fuse, aggregate) which can be used to post-process query results". Other key features are instance-level mapping and accommodation of structured and unstructured data.
This definitely gets at what Firegoose is good for - using the web as a channel for structured data - an approach that does for data integration what loose coupling does for software. Firegoose, part of the Gaggle framework, is a toolbar for Firefox that allows data to be exchanged between desktop software and the web. Firegoose can read microformats, call web services, query databases, or even perform nasty dirty screen scraping. Unlike a mashup, data integration in Firegoose and Gaggle requires user participation, although the user never deals with schemas, only instances of the Gaggle data types - mainly lists of identifiers, matrices of numeric data, networks, and tuples. The identifiers serve in a role somewhat analogous to primary keys.
More papers in a similar vein
I may as well come clean and admit that I'm developing a genome browser. What? Another genome browser? Why? You may well ask these questions. Well, it's a long story. But here is a completely non-exhaustive list of existing genome browsers.
Note: updated in Sept. 2009 to reflect the fact that everyone and their uncle built a genome browser this past couple of years. See Brother, can you spare a genome browser?
Note: updated again in May of 2010 and again in Feb 2011 to add Savant.