Sunday, July 25, 2010

Gaggle Genome Browser

There's a certain windmill I've been tilting towards for a couple of years now. It's known as the Gaggle Genome Browser and we've published a paper on it called Integration and visualization of systems biology data in context of the genome.

The Gaggle Genome Browser is a cross-platform desktop program, based on Java and SQLite for interactively visualizing high-density genomic data, joining heterogeneous data by location on the genome to create information-rich visualizations of genome organization, transcription and its regulation. As always, a key feature is interoperability with other bioinformatics apps through the Gaggle framework.

Here it is displaying some tiling microarray data for Sulfolobus solfataricus. Click for a bigger graphic. The reference sample is shown in blue circles overlaid with segmentation in red. Eight time-points along a growth curve are plotted as a heat map - red indicating increased transcription relative to the reference; green indicating decreased transcription. We also show Pfam domains, predicted operons, and some previously observed non-coding RNAs, several of which we were able to confirm.

One of the features I'm most proud of is the integration with R, a tactic also being used by MeV. At this point it's only partially complete. There's quite a bit more that could be done with it, and I'm looking for time (or help!) to finish.

The past couple of years have seen a whole crop of new genome browsers. See the entry browsing genomes for a partial list. One reason is a new generation of lab hardware and techniques, including ChIP-chip, tiling arrays and high-throughput next-generation sequencing. Another is the ever changing landscape in computing.

It's lacking polish in some places. There's plenty yet to be done. Maybe later, I'll write up some lessons learned and mistakes made, but for now, I'm happy to have it published and out there.

Read more about the biology here:

Check out the screencast by OpenHelix here:

Tuesday, July 20, 2010

How to design good APIs

A long time ago, I asked a bunch of programming gurus how to go about designing an API. Several gave answers that boiled down to the unsettling advice, "Try to get it right the first time," to which a super-guru then added, "...but you'll never get it right the first time." With that zen wisdom in mind, here's a pile of resources that may help get it slightly less wrong.

Joshua Bloch, designer of the Java collection classes and author of Effective Java, gives a Google tech-talk called How to Design a Good API & Why it Matters. Video for another version of the same talk is available on InfoQ. He starts off with the observation that, "Good programming is modular. Module boundaries are APIs."

Characteristics of a Good API

  • Easy to learn
  • Easy to use, even without documentation
  • Hard to misuse
  • Easy to read and maintain code that uses it
  • Sufficiently powerful to satisfy requirements
  • Easy to extend
  • Appropriate to audience
Michi Henning, in API Design Matters, Communications of the ACM, May 2009, observes that, "An API is a user interface. APIs should be designed from the perspective of the caller."
Much of software development is about creating abstractions, and APIs are the visible interfaces to these abstractions. Abstractions reduce complexity because they throw away irrelevant detail and retain only the information that is necessary for a particular job. Abstractions do not exist in isolation; rather, we layer abstractions on top of each other. [...] This hierarchy of abstraction layers is an immensely powerful and useful concept. Without it, software as we know it could not exist because programmers would be completely overwhelmed by complexity.

Because you'll get it wrong the first time, and just because things change, you'll have to evolve APIs. Breaking clients is unpleasant, but "Backward compatibility erodes APIs over time."

My own little bit of wisdom is this: Performance characteristics are often part of the API. Unless stated otherwise, the caller will assume that a function will complete quickly. For example, it often seems like a good idea to make remote method calls look just like local method calls. This is a bad idea, because you can't abstract away time.

Links

Thursday, July 01, 2010

Science funding and productivity

These are interesting times for the practice and funding of science. The traditional model of fee-for-subscription peer-reviewed academic journals is looking more and more outdated. Scientific funding is increasingly competitive and dependent on salesmanship and networking rather than scientific merit.

We Must Stop the Avalanche of Low-Quality Research argues that scientists are drowning in a sea of mediocre papers that nobody reads.

In economic terms, attention is the scarce resource. Electronic publishing is dirt cheap, so it makes sense to publish even weak or negative results. But human attention is expensive and the peer review process is time consuming and unfunded. There needs to be a better mechanism for ranking the quality and importance of papers, so that scarce attention can be allocated efficiently.

Certainly, counting papers is as poor a metric of scientific output as counting lines of code is of programmer productivity.

Scientists should be scientists, not fund raisers. Real Lives and White Lies in the Funding of Scientific Research details the tyranny of grant applications.

One proposed improvement is a track system, in which a researcher would be placed into a funding category and reviewed for productivity every five years and moved up or down to higher or lower tracks accordingly. Emphasis would shift from plans to outcomes.

Stanford bioengineering professor Steven Quake makes a similar point in the New York Times:

As we consider the monumental challenges facing our generation — climate change, energy needs and health care — and look to science for solutions, it would behoove us to remember that it is almost impossible to predict where the next great discoveries will be made — and thus we should invest broadly and let scientists off their leashes.

One has to wonder how well science funding will hold up in the face of the gaping government deficits in most western countries.

Meanwhile, China is becoming scientific superpower.

Luo Minmin, 37, a neurobiologist, returned to China six years ago after getting his PhD from the University of Pennsylvania and completing a postdoctoral research stint at Duke. Luo said he has a big budget at NIBS and greater research freedom than he would have in the United States. "If I had stayed in America, the chances of making a discovery would have been lower," he said. "Here, people are willing to take risks. They give you money, and essentially you can do whatever you want."

Related links