Monday, January 12, 2015

Brave Genius

Brave Genius is an unlikely dual biography of a biologist and a writer who shared a friendship and a common philosophy. Both were active in the French resistance to the German Occupation and both would later receive a Nobel prize. Sean B. Carroll forges an inspiring story from seemingly incongruous elements: the desperate defiance of a few in an occupied country, the exhilarating pursuit of an open scientific question, and a lonely stand on the moral high ground.

In 1940, Jacques Monod was a newly married father of twins and a researcher at the Sorbonne. Albert Camus, having already published a couple of books of essays, departed his native Algeria for France in March of that year to find work.

On May 10 1940, German troops crossed into Holland and Belgium. Panzers raced towards the Atlantic coast severing Allied lines and stranding French and British troops in the low countries. French defenses collapsed and Germans arrived in an undefended Paris on June 14. The armistice signed on June 22nd marked the beginning of four years of occupation.

During those years, Camus edited and wrote for the underground newspaper Combat urging resistance to the occupation. As the tide of the war turned, Monod organized sabotage attacks and armed resistance ahead of the approaching liberators.

“I have always believed that if people who placed their hopes in the human condition were mad, those who despaired of events were cowards. Henceforth, there will be only one honorable choice: to wager everything on the belief that in the end words will prove stronger than bullets.” Camus, Combat (November 30, 1946)

François Jacob, André Lwoff and Jacques Monod were awarded a Nobel prize in 1965 for their work on the control of gene expression, elucidating the regulation of the lac operon by which bacteria switch on metabolism of the sugar lactose.

In his writing, Camus confronts the absurdity of the human search for clarity and meaning in a world that offers only indifference. The attempt to derive meaning and morality without resort to mysticism links Camus's philosophy to Monod's scientific work, which provided some of the first direct evidence that life is mechanistic rather than the result of some magical "vital force" and that its workings could be understood.

“The scientific approach reveals to Man that he is an accident, almost a stranger in the universe.” Monod, in On Values in the Age of Science (1969)

“One of the great problems of philosophy, is the relationship between the realm of knowledge and the realm of values. Knowledge is what is; values are what ought to be. I would say that all traditional philosophies up to and including Marxism have tried to derive the 'ought' from the 'is.' My point of view is that this is impossible.” Monod

Carroll, a biologist himself, embeds philosophy and science into the personal lives of his protagonists and the geopolitical events unfolding around them. Both men did brilliant work in the darkest of times, and did so not by retreating but by fully engaging at great risk with the struggles that faced them. The book serves as a warning of what happens when good people overlook the malfeasance of their leaders, but also as confirmation of the resilience of intellect, creativity and humanity.

More

Sunday, January 04, 2015

The Master Switch

The Master Switch: The Rise and Fall of Information Empires was described as "essential reading" by my boss's boss. If you're at all interested in the interplay of technology, economics and politics, I think you'll agree.

Author Tim Wu is the originator of the term "net neutrality" and a law professor at Columbia. He has written a fast-forward history of the information technology industry focusing on the people and corporations that have, over time, controlled the commanding heights of the information economy. The book examines the cartels that held sway over telephone, radio, film, and television leading up to the question of whether the internet will also come to fall under similar domination.

The cycle is the author's term for the progression of any given technology from the wide-open wild-west early days through a process of integration and consolidation to an end state of oligopoly or monopoly. This stasis eventually gets disrupted by newer technology or government intervention, leading to another open phase and a new round of the cycle, empires rising and falling in the process. "The one-time revolutionaries always become the next generation of dictators. That's why we need, in technology, another generation of revolutionaries to upend them."[1]

Open vs. closed systems

The book revolves around the virtues and vices of open and closed systems. Open systems are more adaptable and democratic but have trouble matching the stability, security and efficiency of closed systems. Open systems embrace the advantages of decentralization as espoused in different ways by Friedrich Hayek and Jane Jacobs. But, integrated centralized systems can be reliable and convenient.

Closed systems, of course, appeal to empire builders such as Theodore Vail who created the AT&T Bell System. Wu's knack for sketch biography is put to good use profiling these power-hungry moghuls and the often utopian upstarts that seek to dethrone them. We meet titans, like Vail, and get a glimps into the sometimes contradictory character traits it takes to control an information empire, for example: David Sarnoff, who ruled the Radio Corporation of America (RCA) and NBC; John Reith, founder of the BBC; Adolph Zukor who started Paramount pictures and Ted Turner creator CNN and former head of Time Warner. We also meet hackers like early radio enthusiast Lee De Forest and supressed inventor of FM radio Edwin Armstrong.

The capture of the Internet?

The American system attempts to carefully balance power within the government, but takes a laissez faire approach to private power. If Wu is right and we let things take their natural course, the openness that now characterizes the Internet - the "integrity of the Internet itself as a reliable, independent, and open structure"[2] - may be lost to a period of lockdown. Network effects, the power of integration and economies of scale favor the monopolist. Consumers may decide to favor consistency and convenience over openness and choice only to regret it later. If this is the case, the internet will not remain open automatically but only with concerted effort.

The remedy Wu proposes is a principle of separation akin to the separation of church and state or the separation of powers within the branches of the American government. The common carrier obligation of all infrastructure providers implies net neutrality and opposes verical integration across layers of the network stack. Technology leaders would be expected to self-regulate based on a sense of public duty. The FCC should pursue enforcement with an eye to the special role of information technology in a democratic society. Anti-trust regulation is the back-up, when it's time to bring out the big guns.

Fight on

The Master Switch gives a deeper perspective on the great game playing out in the technology sector. After reading it, you'll recognize the historical themes threading through the open-source movement, the Apple vs Google skirmishes or 2012's battle that defeated the SOPA / PIPA acts. The fight over the future of the Internet is surely not over.

Wednesday, December 24, 2014

What the #@$% is a Monad?

Monads are like fight club. The first rule of monads is don't blog about monads.

Kind of a design pattern for functional programming, monads are already the subject of more than enough well intentioned but confusing tutorials. We'll not commit the monad tutorial fallacy here. But, monads are needed for a couple of the labs from FP101x, an online class in Haskell - labs with a throw-'em-into-the-deep-end quality to them.

Here's a quick list of some of the better resources I found, while struggling to get a handle on these super-abstract objects of mystery.

Starting points

Phillip Wadler

It's been said that "Monads are hard because there are so many bad monad tutorials getting in the way of finally finding Wadler’s nice paper." Find it here:

Need more?

Those got me over the first hump, but here are some I may want to come back to later:

To put monads in a more general context, here's a really great guide to Getting started with Haskell.

Wednesday, December 03, 2014

Lee Edlefsen on Big Data in R

Lee Edlefsen, Chief Scientist at Revolution Analytics, spoke about Big Data in R at the FHCRC a week or two back. He introduced the PEMA or parallel external memory algorithm.

“Parallel external memory algorithms (PEMA's) allow solution of both capacity and speed problems, and can deal with distributed and streaming data.”

When a problem is too big to fit in memory, external memory algorithms come into play. The data to be processed is chunked and loaded into memory a chunk at a time and partial results from each chunk combined into a final result:

  1. initialize
  2. process chunk
  3. update results
  4. process results

Edlefsen made a couple of nice observations about these steps. Processing an individual chunk can often be done independently of other chunks. In this case, it's possible to parallelize. If updating results can be done as new data arrives, you get streaming.

Revolution has developed a framework for writing parallel external memory algorithms in R, RevoPemaR, making use of R reference classes.

I couldn't find Edlefsen's exact slides, but these decks on parallel external memory algorithms and another from UseR 2011 on Scalable data analysis in R seem to cover everything he talked about.

Saturday, November 22, 2014

Haskell class, so far

Well, I'm about 5 weeks into Introduction to Functional Programming, a.k.a. FP101x, an online class taught in Haskell by Eric Meijer. The class itself is a couple weeks ahead of that; I'm lagging a bit. So, how is it so far, you ask?

The first 4 weeks covered basic functional concepts and how to express them in Haskell, closely following chapters 1-7 of the book, Graham Hutton's Programming in Haskell:

  • Defining and applying functions
  • Haskell's type system
    • parametric types
    • type classes
    • type signatures of curried functions
  • pattern matching
  • list comprehensions
  • recursion
  • higher-order functions

Haskell's hierarchy of type classes is elegant, but some obvious things seem to be missing. For example, you can't show a function. But, it would be really helpful to show something like a docstring, or at least the function's type signature. Also machine-word sized Int's don't automatically promote, so if n is an Int, n/5 produces a type error.

Most of the concepts were familiar already from other functional languages, Scheme via SICP, OCAML via Dan Grossman's programming languages class, and Clojure via The Joy of Clojure. So, this early part was mostly a matter of learning Haskell's syntax.

Some nifty examples

  • a recursive definition of factorial:

    factorial :: Integer -> Integer
    factorial 0 = 1
    factorial n = n * (factorial (n-1))
  • sum of the first 8 powers of 2:

    sum (map (2^) [0..7])
  • a recursive definition of map:

    map :: (a -> b) -> [a] -> [b]
    map f [] = []
    map f (x:xs) = f x : map f xs
  • get all adjacent pairs of elements from a list:

    pairs :: [a] -> [(a,a)]
    pairs xs = zip xs (tail xs)
  • check if a list of elements that can be ordered is sorted by confirming that each pair of elements is ordered:

    sorted :: Ord a => [a] -> Bool
    sorted xs = and [x <= y |(x,y) <- pairs xs]

Haskell attains its sparse beauty by leaving a lot implied. One thing I figured out during my brief time with OCAML also seems to apply to Haskell. Although these languages lack the forest of parentheses you'll encounter in Lispy languages, it's not that the parentheses aren't there; you just can't see them. A key to reading Haskell is understanding the rules of precedence, associativity and fixity that imply the missing parentheses.

Pre- cedence Left associative Non- associative Right associative

9

!!

.

8

^,^^,**

7

*, /, `div`, `mod`, `rem`, `quot`

6

+,-

5

:,++

4

==, /=, <, <=, >, >=, `elem`, `notElem`

3

&&

2

||

1

>>, >>=

0

$, $!, `seq`

Another key is reading type signatures of curried functions, as currying is the default in Haskell and is relied upon extensively in composing functions, particularly in the extra terse "point-free" style.

Currently, I'm trying to choke down Graham Hutton's Addendum on Monads. If I end up understanding that, it'll get me a code-monkey merit badge, for sure.

Tuesday, November 11, 2014

The DREAM / RECOMB Conference 2014

The RECOMB/ISCB Conference on Regulatory and Systems Genomics, with DREAM Challenges and Cytoscape Workshops is running this week in San Diego.

A bunch of us from Sage Bionetworks are here to connect with the DREAM community. In introductory remarks, Stephen Friend framed the challenges as piloting new modes of collaboration and engagement addressing multidimensional problems based on the idea that open innovation will trump closed silos.

Lincoln Stein: The Future of Genomic Databases

I first heard Lincoln Stein speak at an O'Reilly conference in 2002, on building a bioinformatics nation. The same themes of openness and integration reappeared in Stein's talk on The Future of Genomic Databases.

Stein asks, "Open Data + open source = reproducible science?" Not exactly. Stein presents some emerging solutions to the remaining obstacles: big data sets, complex workflows, unportable code and data access restrictions.

Cloud computing, specifically colocation of data and compute, enables handling big data. Containers (ie Docker) address the problem of code portability. The Global Alliance is working towards providing APIs both to encapsulate technical complexity and to provide a control point at which to enforce restrictions.

In case we're wondering what to do with all the machine cycles made available by Amazon and Google, bioinformatics workflows are growing in complexity. Workflow managers like Seqware and Galaxy provide a formalized description of multistep processes and manage tools and their dependencies.

Legal restrictions hinder data integration. But, donors want their samples to contribute to research. Licensure for data access combined with uniform consent could reduce the friction resulting in a streamlined data access process. On the other hand, technical solutions involve homomorphic encryption and agent based federated queries.

As a parting thought, Stein notes that digital infrastructure enables experiments in incentive structures and economic models, citing micropayments, ratings, and challenges.

Andrea Califano

Andrea Califano spoke on the genotype to phenotype linkage in cancer. Thinking of the cell as an integrator of signals, Califo's group traces from gene or protein expression signatures of cell states (normal, neoplastic, metastatic) back through the network to the master regulators responsible for that signature. One related paper is Identification of Causal Genetic Drivers of Human Disease through Systems-Level Analysis of Regulatory Networks.

Paul Boutros Somatic Mutation Calling Challenge

Paul Boutros presented the Somatic Mutation Calling Challenge (SMC-DNA). He announced the intention for the SMC challenges to become a living benchmark, an objective standard against which future methods will be tested.

Paul also crowned the Broad Institute's MuTect (single nucleotide) and novoBreak (structural variants) by Ken Chen's lab at MD Anderson the winners of the synthetic tumor phase of SMC-DNA. The plan is to announce winners on real tumor data in February after experimental validation.

The Winners

The SMC challenge is a bit unique for DREAM in its level of specialization. In the other challenge, a couple of methods were highlighted: Gaussian process regression and dictionary learning for sparse representation.

But, increasingly, the main differentiator is application of biological domain knowledge, especially with respect to selecting and processing features. Li Liu of Arizona State's Biodesign Institute, for example, won part of the Accute Myoloid Leukemia challenge by weighting proteins based on their evolutionary conservation.

Another theme is that genetic features seem to have poor signal compared to more downstream features, gene expression or clinical variables. Peddinti Gopalacharyulu, a top performer in the Gene Essentiality Challenge, commented that perhaps the way to use genetics is to extract the component of gene expression that is not explained by genetic features.

DREAM 9.5

Two of the Dream 9.5 challenges are follow-ups to the Somatic Mutation Calling challenge from the 8.5 round. The SMC empire expands into RNA and tumor heterogeneity. In the olfaction challenge, the goal is to predict, from molecular features, odor as described by human subjects. The Prostate cancer challenge asks participants to classify patients according to survival using data sourced from the comparator arms of clinical trials.

For the DREAM 10 round, there's an imaging challenge in the works and a sequel to the ALS challenge challenge from DREAM 7.

On to RECOMB

That's just the DREAM part of the meeting, or, really, the subset that fit into my brain. As an added bonus, there were several representatives from Cytoscape-related projects and some conversation about the Global Alliance for Genomics and Health.

Tuesday, October 14, 2014

Let's learn us a Haskell

Let's say you've been meaning to learn Haskell for a long time, secretly yearning for purely functional programming, laziness and a type system based on more category theory than you can shake a functor at.

Now's your chance. Erik Meijer is teaching an online class Introduction to Functional Programming on edX, about which he says, "This course will use Haskell as the medium for understanding the basic principles of functional programming."

It starts today, but I've gotten a head start by working through the first few chapters of Learn You a Haskell for Great Good! which most agree is the best place to get started with Haskell.

Anyone up for a Seattle study group?

Tutorials

Books

Other resources

It must be some pack-rat instinct that makes me compile these lists.