Monday, October 14, 2013

Concurrency and Parallelism - What's the difference?

For a while, I've been coming across references to the difference between concurrency and parallelism. The definitions go something like this: Concurrency concerns "interleaved threads of execution with access to shared state" which is distinct from parallelism because "parallel operations run simultaneously".

I'm quoting from - "Clojure Programming" by Chas Emerick, Brian Carper and Christophe Grand - which is a perfectly good book. I've seen similar definitions elsewhere, so I don't want to pick on these guys in particular. I'm going to disagree a bit, but overall, the book is really well done and I'm enjoying it.

My beef is this: I couldn't see the utility of the distinction they're drawing. I couldn't see why you'd want to design a program differently to run as threads scheduled on a single core versus threads scheduled on several cores. In fact, treating those cases the same seems like a plus.

In contrast, there are some distinctions between types of concurrency that are useful. Knowing your code will be distributed across machines tells you to bring network latency into the picture. Likewise, only certain problems are amenable to the single-instruction-multiple-data (SIMD) model of vector processors such as GPUs. These considerations have a real impact. But, why the pedantry over concurrency versus parallelism?

I was about to write a little rant about why this distinction is useless. But, keeping an open mind, I googled around a bit and up popped a talk by Rob Pike called "Concurrency Is Not Parallelism". Change of plan. Rob Pike is a bad-ass, well known as a Unix pioneer, Bell Labs veteran and Google Distinguished Engineer. New plan: go back to school and find out why I'm wrong.

Pike's talk explains things beautifully, and not just because he's wearing an orange suit jacket and a gopher t-shirt. Here's my understanding of the take-away:

Concurrency is a more general and abstract idea than parallelism. Concurrency is about the decomposition of a problem into subtasks at the design level. If you're creating a concurrent design, you haven't said yet whether your design will be executed in parallel. Parallelism is a detail to be decided at run-time. Which brings us to the good part.

Whenever you can take two things that were previously conjoined and let them vary independently, you're making progress. The two things in question here are the design - the decomposition of a problem into concurrent parts - and the execution of those parts, perhaps in parallel. Making this separation allows programs to be expressed correctly and structured clearly while making good use of available resources whether that's one core or many.

This important point is what's missing from the definitions above. That and they're comparing things at different levels of generality.

Next, Pike relates these ideas to Go. The language provides three concurrency primitives: Go routines, channels and select. Go routines are like threads but much cheaper. During execution, they're mapped onto OS threads by a scheduler. Channels and select statements enable communication. Go is an implementation of concepts have their origin in the classic paper Communicating Sequential Processes by Tony Hoare.

The moral of the story? Learn from the masters ...and from the gophers.

More

Tony Hoare's paper is on several "greats" lists including these: