Friday, April 27, 2012

Sage Bionetworks Synapse

Michael Kellen, Director of Technology at Sage Bionetworks, is trying to build a GitHub for science. It's called Synapse and Kellen described it in a talk at the Sage Bionetworks Commons Congress 2012, this past weekend: 'Synapse' Pilot for Building an 'Information Commons'.

To paraphrase a Kellen's intro:

Science works better when people build off of each other's works. Every great advance is preceded by a multitude of smaller advances. It's no accident that the invention of the printing press and the emergence of the first scientific journals coincide with the many great scientific discoveries of the age of enlightenment. But scientific journals are stuck in a paradigm revolving around the printing press. In other domains, namely open source software, people are more radically reinventing systems for sharing information with each other. Github is a collaborative environment for the domain of software. Synapse aims to be a similar environment for medical and genomic research.

The Synapse concept of a project packages together data and the code to process it. I tried to download the R script shown in the contents and couldn't, either because I'm a knucklehead or because Synapse is a work in progress. On the plus side, they give you a helpful cut-n-paste snippet of R code in the lower right corner to access the project through their R API. When this is fully implemented, it could provide a key piece of computing infrastructure for reproducible data-driven science.

Sage intends to explore ways of connecting to traditional scientific journals. Picture figures that link to interactive visualizations or computational methods that link to code. I'm a big fan of the "live document" concept and it would be great to see journal articles evolve in that direction.

An unintended consequence of NGS, Robert Gentleman points out, is that the data is too big for existing pipes. Any concept of a GitHub for science will have to incorporate processing biological data in the cloud. I could imagine a Synapse project containing data sets, code and a recipe for standing up an EC2 instance (or several). At a click, a scripted process would run, bootstrapping the machines, installing software and dependencies, running a processing pipeline, and visualizing the results in a browser. How would that be for reproducible science?

Michael Kellen's blog has a bunch of interesting stuff about why building a GitHub for biology is more fun than selling sheets. I bet it is.

4 comments:

  1. Hi Christopher,

    Your problems are most certainly caused by our mad rush to launch a beta and demo at the Sage Congress, and not by you being a knucklehead! The demo I gave had a few things a bit stubbed out like some of the code objects not working as intended yet. We need to catch up on bug fixes and documentation after our recent mad rush of features. You should be able to get data out though the R client, see https://sagebionetworks.jira.com/wiki/display/SYNR/Home for work-in-progress docs

    ReplyDelete
  2. Christopher,

    Glad to hear that you gave Synapse a try.

    You are most certainly not a "knucklehead", but even if you are, our goal is to make synapse so easy that even a caveman could use it!

    We had some compatibility issue with older R versions, but we have everything worked out now. You can get the latest version of the synapse R client from our servers by doing:

    source("http://depot.sagebase.org/CRAN.R")
    pkgInstall("synapseClient")

    Let me know if you still have issues after updating your client version. If you do, we are here to help. Don't hesitate to contact us at platform@sagebase.org.

    Matt

    ReplyDelete
  3. Not related to Sage, except by similar goals, Syapse is a scientific data management startup in silicon valley. They were written up in a Nature news segment:

    Going paperless: The digital lab. Lab-management software and electronic notebooks are here — and this time, it's more than just talk.

    ReplyDelete