Monday, May 31, 2010

How many distinct paradigms of programming are there?

I learned a few styles of programming in school. At least, I heard of them somewhere in passing.

  • Procedural
  • Object-oriented
  • Functional
  • Logical

Then, I come across a thread on Hacker News asking Ask HN: Do you recognize this approach to programming?. The answer seems to be something called functional reactive programming. So, I started wondering, "How many distinct styles or paradigms of programming are there?"

That led me to Peter Van Roy's The principal programming paradigms and Programming Paradigms for Dummies. PVR is a co-author of Concepts, Techniques, and Models of Computer Programming, which I've heard described as, "If you liked SICP, you'll like this."

Wednesday, May 26, 2010

Attention, Intelligence, Creativity and Flow

Most coders are aware of the importance of pure uninterrupted concentration. Creative work of any kind requires focuses attention. The state of flow happens when the spotlight of attention is completely focused on an activity. A piece by Jonah Lehrer, Attention and Intelligence, reminded me of that happy bubble so easily popped by meetings, spouses and pointy-haired bosses. He writes, "Our mind has strict cognitive limitations - selective attention helps us compensate."

Herbert Simon said, "A wealth of information creates a poverty of attention."

William James famously wrote, "Everyone knows what attention is... It implies withdrawal from some things in order to deal effectively with others."

Mihaly Csikszentmihalyi tells us, "Flow describes a state of experience that is engrossing, intrinsically rewarding and outside the parameters of worry and boredom." The components of Flow are:

  • Clear goals.
  • Attention is focused on a limited stimulus field. There is full concentration, complete involvement. Focus of awareness is narrowed down to the activity itself.
  • A loss of self-consciousness, action and awareness merge.
  • Immediate feedback; behavior can be adjusted as needed.
  • Balance between ability and challenge.
  • A sense of control and serenity; freedom from worry about failure.
  • Timelessness; thoroughly focused on the present.
  • Intrinsic motivation; the experience becomes its own reward, resulting in effortlessness of action.

Not entirely unrelated is Daniel Pink's analysis of motivation as:

  • Autonomy: The urge to direct our own lives
  • Mastery: The desire to get better and better at something that matters
  • Purpose: The yearning to do what we do in the service of something larger than ourselves

From Palm Sunday by Kurt Vonnuget:

Most of my adult life has been spent in bringing to some kind of order sheets of paper eight and a half inches wide and eleven inches long. This severely limited activity has allowed me to ignore many a storm. It has also caused many of the worst storms I ignored. My mates have often been angered by how much attention I pay to paper and how little attention I pay to them.
I can only reply that the secret to success in every human endeavor is total concentration. Ask any great athlete.
To put it another way: Sometimes I don't consider myself very good at life, so I hide in my profession.
I know what Delilah really did to Samson to make him as weak as a baby. She didn't have to cut his hair off. All she had to do was break his concentration.

All this is a long way of saying multitasking sucks, or as someone with a fistful of yen might say, "What was that? This is not a charade. We need total concentration."

Links

Friday, May 21, 2010

Using R for Introductory Statistics, 3.1

Pairs of categorical data

The grades data.frame holds two columns of letter grades, giving pairs of categorical data, like so:

    prev grade
1    B+    B+ 
2    A-    A- 
3    B+    A- 
...
122  B     B

This type of data can be summarized by the table function, which counts the occurrence of each possible pair of letter grades. But first, I was never a fan of plus-minus grading, so lets do away with that.

> grades2 <- data.frame( prev=factor(gsub("[+]|-| ", "", as.character(grades$prev)), levels=c('A','B','C','D','F')), grade=factor(gsub("[+]|-| ", "", as.character(grades$grade)), levels=c('A','B','C','D','F')) )
> table(grades2)
    grade
prev  A  B  C  D  F
   A 22  6  3  2  0
   B  4 15  5  1  3
   C  3  2  9  9  7
   D  0  1  4  3  1
   F  1  2  4  4 11

You might want to compute row (1) or column (2) sums, using margin.table:

> margin.table(table(grades2), 1)
prev
 A  B  C  D  F 
33 28 30  9 22 

Of the students who got an A on the first test, what proportion also got an A on the second test? Those types of questions are answered by prop.table().

> options(digits=1)
> prop.table(table(grades2), 1)
    grade
prev    A    B    C    D    F
   A 0.67 0.18 0.09 0.06 0.00
   B 0.14 0.54 0.18 0.04 0.11
   C 0.10 0.07 0.30 0.30 0.23
   D 0.00 0.11 0.44 0.33 0.11
   F 0.05 0.09 0.18 0.18 0.50
> options(digits=4)

Finally, this type of data can be displayed as a stacked barplot.

m <- t(as.matrix(florida[,2:3]))
m.prop <- prop.table(m, margin=2)
colnames(m.prop) <- florida$County

# fool around with margins and set style of axis labels
# mar=c(bottom, left, top, right)
# las=2 => always perpendicular to the axis
old = par(mar=c(6,4,6,2)+0.1, las=2)

# cex.names => "character expansion" of bar labels
# args.legend => position the legend out of the plot area
barplot(m.prop[,order(m.prop[2,])], legend.text=T, cex.names=0.40, args.legend=list(x=82,y=1.2), main="2000 Election results in Florida", sub='county')

# reset old parameters
par(old)

Tuesday, May 11, 2010

Amazon's Dynamo distributed key-value store

Amazon's Dynamo [is] a highly available key-value storage system, [which] sacrifices consistency under certain failure scenarios. It makes extensive use of object versioning and application-assisted conflict resolution in a manner that provides a novel interface for developers to use.

You typically hear two rationales for NoSQL data stores. First, to selectively relax ACID properties to gain performance in distributed systems. The second is to support data models that fit poorly into tables with set operations, such as documents or objects. Although I'm more interested in the later, this paper is all about the former. Even so, it's an impressive piece of software engineering.

Dynamo is a distributed key-value store otherwise known as a distributed hash table, which is like a normal hash table except the buckets are on different nodes. It can store small objects on lots of servers and look them up by primary key. It's API is get(key, context) and put(key, context, object). The context is like a cookie. It's whatever the server wants to you remind it of in your next request. Nodes are assigned blocks of key-space using consistent hashing, ensuring that keys and/or load is evenly distributed among the nodes. One big motivation for a system like this is fault tolerance. Nodes can come and go and the system keeps working.

Redundant copies of data are kept on several nodes via replication. In RDBMS's, replication is typically done in ways that favor consistency over availability. This is one reason why RDBMS's don't scale out easily. Dynamo trades consistency for availability, especially for write operations using optimistic replication, meaning that replicas are guaranteed to converge only when the system has been quiesced for a period of time. Changes are allowed to propagate to replicas in the background, and concurrent, disconnected work is tolerated. Conflicting changes must be detected and resolved, later. You'll hear the terms lazy replication and eventual consistency thrown around with roughly the same meaning.

Dynamo resolves conflicts during reads and hands conflicting updates to the client application to resolve in an application-dependent way. Consistency is application defined anyway, right?

As a real-world analogy, writing paper checks is a distributed system which cannot guarantee 'consistency'. Conflict resolution happens in the clearing process. Also, distributed source control systems must face similar issues. How does GIT handle this sort of thing?

Strategies

  • Vector clocks - detect conflicting updates.
  • Sloppy Quorum and hinted handoff - handle failure and recovery.
  • Merkle hash trees - A tree in which parents are hashes of child nodes. They are a quick way to compute a diff, detecting divergence between nodes.
  • Gossip based protocol - distribute node membership and up/down status information.

Links

Does Google know if you're honest?

We're all getting used to the idea that companies like Google and Facebook know a lot about us. Combined with our social networks and tribal instincts, this data can be exploited to sell us consumer junk. Although I'm sure mercantile concerns will keep the data miners busy for a while, what else could they tell about us given all that data?

How well could they infer your honesty? Respect for authority? Laziness? Intelligence? Whether you're a good person in general? What else about your temperament is revealed in the mouse tracks you leave all over the web? And what use could that be put to? The political uses are already being done, so that's old news.

How about an uber-credit report with a run down of numerical scores like some Dungeons and Dragons character? I wouldn't mind seeing that before I hired someone or went into business together. Or, for that matter, bought an asset-backed security from someone.

Open systems of politics and economics depend so heavily on trust. And, it's a wonder that we maintain as much trust as we do under current circumstances. How do you feel about being an open book in a global small town where everyone knows everyone else's business?

Updates

The Wall Street Journal has a piece on a company call RapLeaf.

RapLeaf's segments recently included a person's household income range, age range, political leaning, and gender and age of children in the household, as well as interests in topics including religion, the Bible, gambling, tobacco, adult entertainment and "get rich quick" offers. In all, RapLeaf segmented people into more than 400 categories