Digithead's Lab Notebook: Programming languages

Showing posts with label Programming languages. Show all posts

Tuesday, January 27, 2015

Haskell class wrap-up

[From the old-posts-that-I've-sat-on-for-entirely-too-long-for-no-apparent-reason department...]

Back in December, I finished FP101x, Introduction to Functional Programming. I'm stoked that I finally learned me a (little) Haskell, after wanting to get around to it for so long.

The first part of the course was very straight-forward covering the basics of programming in the functional style. But the difficulty ramped up quickly.

A couple of labs were particularly mind-bending, not just for me judging by the message boards. Both were based on Functional Pearl papers and featured monads prominantly. The first was on monad parser combinators and the second was based on A Poor Man's Concurrency Monad. Combining concurrency (of a simple kind), monads and continuation passing is a lot to throw at people at once.

The abrupt shift to more challenging material is part of a philosophy of "teaching the students to fish for themselves". So is introducing new material in the labs rather than in the lectures. This style of teaching alienated a number of students. It's not my favorite, but I can roll with it.

Just be aware that the course requires some self-directed additional reading and don't flail around trying to solve to homeworks without sufficient information.

More Haskell

Now that the class is over, I'd like to find time to continue learning Haskell:

Finish Learn You a Haskell
Write yourself a Scheme in 48 hours
99 Haskell Problems
Typeclassopedia
Chris Allen, aka bitemyapp has a really great How to learn Haskell guide as well as some interesting thoughts about furthering your functional education
The Haskell Road to Logic, Maths and Programming

One reason I wanted to learn Haskell is to be able to read some of the Haskell-ish parts of the programming languages literature:

Why Functional Programming Matters - John Hughes
Theorems for free! - Philip Wadler
The essence of functional programming - Philip Wadler
A history of Haskell: being lazy with class - Paul Hudak, John Hughes, Simon Peyton Jones, Philip Wadler

Wednesday, December 24, 2014

What the #@$% is a Monad?

Monads are like fight club. The first rule of monads is don't blog about monads.

Kind of a design pattern for functional programming, monads are already the subject of more than enough well intentioned but confusing tutorials. We'll not commit the monad tutorial fallacy here. But, monads are needed for a couple of the labs from FP101x, an online class in Haskell - labs with a throw-'em-into-the-deep-end quality to them.

Here's a quick list of some of the better resources I found, while struggling to get a handle on these super-abstract objects of mystery.

Starting points

Functors, Applicatives, And Monads In Pictures, by Aditya Bhargava, is a friendly place to start.
Brian Beckman: Don't fear the Monad
FPComplete's School of Haskell explains what monads are and why to use them.

Phillip Wadler

It's been said that "Monads are hard because there are so many bad monad tutorials getting in the way of finally finding Wadler’s nice paper." Find it here:

Monads for functional programming

Need more?

Those got me over the first hump, but here are some I may want to come back to later:

You Could Have Invented Monads! (And Maybe You Already Have)
Monads for the Curious Programmer by Bartosz Milewski who used to be in Black Sabbath. See also Milewki's easiest way to understand Monads.
Monads and Gonads a Google Tech-Talk by Douglas Crockford
Is There Anything Left to Say about Monads?
Monads in small bites
Monads for Dummies
What are monads and why are they useful?

To put monads in a more general context, here's a really great guide to Getting started with Haskell.

Saturday, November 22, 2014

Haskell class, so far

Well, I'm about 5 weeks into Introduction to Functional Programming, a.k.a. FP101x, an online class taught in Haskell by Eric Meijer. The class itself is a couple weeks ahead of that; I'm lagging a bit. So, how is it so far, you ask?

The first 4 weeks covered basic functional concepts and how to express them in Haskell, closely following chapters 1-7 of the book, Graham Hutton's Programming in Haskell:

Defining and applying functions
Haskell's type system
- parametric types
- type classes
- type signatures of curried functions
pattern matching
list comprehensions
recursion
higher-order functions

Haskell's hierarchy of type classes is elegant, but some obvious things seem to be missing. For example, you can't show a function. But, it would be really helpful to show something like a docstring, or at least the function's type signature. Also machine-word sized Int's don't automatically promote, so if n is an Int, n/5 produces a type error.

Most of the concepts were familiar already from other functional languages, Scheme via SICP, OCAML via Dan Grossman's programming languages class, and Clojure via The Joy of Clojure. So, this early part was mostly a matter of learning Haskell's syntax.

Some nifty examples

a recursive definition of factorial:

factorial :: Integer -> Integer
factorial 0 = 1
factorial n = n * (factorial (n-1))

sum of the first 8 powers of 2:
```
sum (map (2^) [0..7])
```

a recursive definition of map:

map :: (a -> b) -> [a] -> [b]
map f [] = []
map f (x:xs) = f x : map f xs

get all adjacent pairs of elements from a list:

pairs :: [a] -> [(a,a)]
pairs xs = zip xs (tail xs)

check if a list of elements that can be ordered is sorted by confirming that each pair of elements is ordered:
```
sorted :: Ord a => [a] -> Bool
sorted xs = and [x <= y |(x,y) <- pairs xs]
```

Haskell attains its sparse beauty by leaving a lot implied. One thing I figured out during my brief time with OCAML also seems to apply to Haskell. Although these languages lack the forest of parentheses you'll encounter in Lispy languages, it's not that the parentheses aren't there; you just can't see them. A key to reading Haskell is understanding the rules of precedence, associativity and fixity that imply the missing parentheses.

Pre- cedence	Left associative	Non- associative	Right associative
9	`!!`		`.`
8			`^`,`^^`,`**`
7	`*`, `/`, `div`, `mod`, `rem`, `quot`
6	`+`,`-`
5			`:`,`++`
4		`==`, `/=`, `<`, `<=`, `>`, `>=`, `elem`, `notElem`
3			`&&`
2			`\|\|`
1	`>>`, `>>=`
0			`$`, `$!`, `seq`

Another key is reading type signatures of curried functions, as currying is the default in Haskell and is relied upon extensively in composing functions, particularly in the extra terse "point-free" style.

Currently, I'm trying to choke down Graham Hutton's Addendum on Monads. If I end up understanding that, it'll get me a code-monkey merit badge, for sure.

Friday, April 11, 2014

Clojure Koans

In an attempt to reach a bit higher plane of enlightenment with respect to Clojure, I did the Clojure Koans. What a great way to get familiar with a new language.

It might be worth watching the video solutions: Clojure Koans Walkthrough in Light Table.

Monday, October 14, 2013

Concurrency and Parallelism - What's the difference?

For a while, I've been coming across references to the difference between concurrency and parallelism. The definitions go something like this: Concurrency concerns "interleaved threads of execution with access to shared state" which is distinct from parallelism because "parallel operations run simultaneously".

I'm quoting from - "Clojure Programming" by Chas Emerick, Brian Carper and Christophe Grand - which is a perfectly good book. I've seen similar definitions elsewhere, so I don't want to pick on these guys in particular. I'm going to disagree a bit, but overall, the book is really well done and I'm enjoying it.

My beef is this: I couldn't see the utility of the distinction they're drawing. I couldn't see why you'd want to design a program differently to run as threads scheduled on a single core versus threads scheduled on several cores. In fact, treating those cases the same seems like a plus.

In contrast, there are some distinctions between types of concurrency that are useful. Knowing your code will be distributed across machines tells you to bring network latency into the picture. Likewise, only certain problems are amenable to the single-instruction-multiple-data (SIMD) model of vector processors such as GPUs. These considerations have a real impact. But, why the pedantry over concurrency versus parallelism?

I was about to write a little rant about why this distinction is useless. But, keeping an open mind, I googled around a bit and up popped a talk by Rob Pike called "Concurrency Is Not Parallelism". Change of plan. Rob Pike is a bad-ass, well known as a Unix pioneer, Bell Labs veteran and Google Distinguished Engineer. New plan: go back to school and find out why I'm wrong.

Pike's talk explains things beautifully, and not just because he's wearing an orange suit jacket and a gopher t-shirt. Here's my understanding of the take-away:

Concurrency is a more general and abstract idea than parallelism. Concurrency is about the decomposition of a problem into subtasks at the design level. If you're creating a concurrent design, you haven't said yet whether your design will be executed in parallel. Parallelism is a detail to be decided at run-time. Which brings us to the good part.

Whenever you can take two things that were previously conjoined and let them vary independently, you're making progress. The two things in question here are the design - the decomposition of a problem into concurrent parts - and the execution of those parts, perhaps in parallel. Making this separation allows programs to be expressed correctly and structured clearly while making good use of available resources whether that's one core or many.

This important point is what's missing from the definitions above. That and they're comparing things at different levels of generality.

Next, Pike relates these ideas to Go. The language provides three concurrency primitives: Go routines, channels and select. Go routines are like threads but much cheaper. During execution, they're mapped onto OS threads by a scheduler. Channels and select statements enable communication. Go is an implementation of concepts have their origin in the classic paper Communicating Sequential Processes by Tony Hoare.

The moral of the story? Learn from the masters ...and from the gophers.

Tony Hoare's paper is on several "greats" lists including these:

Great Works in Programming Languages Collected by Benjamin C. Pierce
Classic Papers in Programming Languages and Logic

Wednesday, November 14, 2012

Functional Programming in Scala

Here I go, congratulating myself for finishing Functional Programming Principles in Scala, an online class taught by Martin Odersky, creator of the Scala programming language, professor at EPFL in Lausanne, Switzerland and co-founder of Typesafe.

Scala is a well engineered modern successor to Java. It's a statically-typed functional/OO language with curly-brace syntax, pattern matching and type inference that runs on the JVM. Scala's libraries are well-thought-out and consistent. And, it's type system is some serious rocket-science.

The course took some material and examples from the classic Structure and Interpretation of Computer Programs (SICP), which is a great way to learn functional concepts in a new language. Topics included:

higher-order functions
pattern matching
immutability and immutable collections
laziness
touched on inductive proofs of correctness

Boatload of Scala resources

Class staff and students compiled a great list of resources for learning Scala.

Scala API Docs
Scalacheat
Twitter's Scala school and Effective Scala
About Scala on StackOverflow
Learning Scala in small bites
CheatSheet

While we're on the topic, Twitter has invested big in Scala, open-sourcing some of the results. The presentation Systems Programming at Twitter gives a peek into how Twitter uses Scala for "programming the datacenter". Kestrel is a distributed message queue.

Books

Principles for Good Design

The class was a great introduction to the language and some ideas that will come in handy in any language. I had a lot of fun with it. Odersky wrapped up the course with some principles for good program design.

Name everything you can
Put operations into natural scopes
Keep degrees of freedom for future refinements

Tuesday, August 28, 2012

The Joy of Clojure

Clojure is a modern Lisp dialect that runs on the JVM. Since its release in 2007, it's become quite popular with elite hackers of a certain persuasion. Michael Fogus and Chris Houser's Joy of Clojure is a thoroughly enjoyable bit of summer reading explaining the philosophy behind the language, showing how Clojure promotes simplicity and flexibility, deals with time and state, and generally brings fun back into programming.

Concurrency is one of the motivating factors behind Clojure. Hickey's approach to sane management of state is anchored on the concept of immutability. The language encourages "immutability by default" with persistent collections - efficient implementations of list, vector, map and set that preserve historical versions of their state. When concurrency becomes necessary, Clojure provides some very modern constructs including STM (software transactional memory), agents similar to the actor model of Erlang, and atomic values.

Aside from the parentheses, one big difference between Clojure and Java (as well as fellow JVM resident, Scala) is dynamic typing. Entities in Clojure are typically modeled in terms of the basic data structures already mentioned, especially maps and vectors. In practice, this ends up looking like JSON or Javascript objects, which are just bags of properties. The book includes a basic implementation of prototype based inheritance as an example. I've always thought that was more appropriate to dynamic languages than building class hierarchies (as in Python and Ruby). You might go so far as to say that JSON is to Clojure what lists are to classic Lisp.

Clojure's interop with it's host platform gracefully bridges the conceptual gap between Java and Lisp, enabling Clojure to call into Java code making available the wealth of Java libraries. It's equally possible to expose Clojure APIs to Java code. Dynamically generating Java classes and proxies as well as creating and implementing interfaces on the fly leads to the belief in the Clojure community that "Clojure does Java better than Java." In spite of the differences in semantics, Clojure feels like a natural layer on top of Java, raising the levels of abstraction and dynamism while easing many pain-points that every Java programmer stumbles over.

The Clojure language is also branching out beyond the JVM. Clojurescript is a Clojure to javascript cross-compiler. Another offshoot targets Microsoft dot.net's CLR.

Aside from persistent data structures and Java interop, Clojure comes with a bunch of functional goodness built in. Clojure's multimethods put control of method dispatch into the programmer's hands. Macros let you construct your own flow-of-control structures, as demonstrated by do-until and unless examples in the book. Illustrating how Lisp and it's macros can be used to construct little languages or DSLs, the book implements a mini-SQL interpreter.

One especially nice aspect of the book is the putting it all together sections, which cover examples like: lazy quicksort, A*, and a builder for chess moves. These longer examples are still bite sized, making them more easily digested than the extended case studies found in some books.

The Joy of Clojure is not an introductory book nor is it a language reference. It will appeal to the reader who already has some programming experience. It's a good idea to spend some time with the online tutorials first. Where the book is strongest is answering the why questions, getting you started learning how to think about programming in Clojure and showing you how Clojure changes the way you think.

Companies using Clojure:

Prismatic: a highly addictive social content recommendation engine who's backend pipeline and APIs are written mostly in Clojure. Prismatic's architecture is a great example of how to use machine learning and social graphs in a real application.
Climate corporation: data driven weather insurance
Relevance: a consultancy and home of Clojure/core.
Lonocloud

Popular Clojure projects and libraries:

Leiningen a build tool and dependency management system
Incanter a Clojure-based, R-like platform for statistical computing and graphics
Chris Granger's Light Table IDE
Noir web framework
Datomic: a distributed rethinking of the database
core.logic: Prolog-like logic programming for Clojure

Videos

Luckily for those wanting to learn more without leaving their hammock, there are lots of videos about Clojure. A lot of clear thinking about software, whether you do it in Clojure or not, can be found in Rich Hickey's talks.

More Clojure Stuff

Planet Clojure

Monday, September 26, 2011

Hipster programming languages

If you look at the programming languages that are popular these days, a few patterns emerge. I'm not talking about languages that have the most hits on the job sites. I'm talking about what the cool kids are coding in - the folks that hang out on hacker-news or at Strange Loop. Languages like Clojure, Scala and CoffeeScript. What do these diverse languages have in common other than an aura of geek-chic?

Functional programming is emphasized over object-oriented programming.
Common patterns for manipulating lists: map, filter, reduce.
Modern syntax in which everything is an expression and syntactic noise like semicolons is reduced.
CoffeeScript compiles to JavaScript, while both Clojure and Scala target the JVM. Targeting legacy platforms seems to be getting easier and more popular.
Innovative approaches to concurrency.

This last point deserves some elaboration. It might be a stretch to compare the event-driven nature of node.js with immutable data structures with actor model and software transactional memory, but, at heart, these are all strategies for dealing with concurrency. One place where Java was ahead of its peers was concurrency, so it's cool that Clojure and Scala are taking the next steps in concurrent programming on the JVM.

CoffeeScript

CoffeeScript is javascript, redesigned. It cleans up the syntax adding many of the niceties familiar from Ruby and Python. Curly braces and semicolons are out. String interpolation, list-comprehensions, default arguments, and more tasty sugar are in. Its creator, Jeremy Ashkenas, believes in code as literature and it shows all the way through the project. Take a look at the annotated source code for the CoffeeScript grammar and see if it doesn't make you weep for the ugliness of your own code.

Don't forget, CoffeeScript gets about 10 times hipper when combined with node.js, the event-driven app-server based on google's V8 javascript engine.

Clojure

Clojure is a dialect of lisp targeting the JVM. One of Clojure's key features is a set of immutable functional data structures, with efficiency comparable to Java's mutable collection classes. The beauty of a Lisp dialect is that the syntax of the language is also it's data representation. Code is data, data is code. There are lots of Clojure resources floating around, including a famous talk by Rich Hickey on state, identity, value and time and a project to port SICP to Clojure. Extra points if your client-side code is written in ClojureScript.

Several hip people have recommended reading The Joy of Clojure. Also on Rich Hickey's bookshelf is Chris Okazaki's book Purely Functional Data Structures.

Scala

Scala is a strongly typed object-functional hybrid. It's targets include the JVM and Microsoft's CLR. It's an academic language derived from the ML family, but meant to be a pragmatic replacement for Java. It has a C++ like reputation for being fully understood only by guru level developers. One of it's key features is a type system that is Turing complete in itself. I guess I'm not completely convinced that a rocket-science type system is the answer, but it's there's cool stuff in there - generics done properly, higher-kinded types, which as near as I can tell takes parametric types to a level of meta beyond generics. One nice thing is that Scala has a tighter mapping to Java than Clojure so the interop between the two is a little more reasonable.

The Akka project is a Scala platform for concurrent applications, providing both the actor model and software transactional memory. Those wanting to learn more can track down some interesting talks by language designer Martin Odersky available, plus the Scala Types podcast.

A couple more

Not enough languages for you? I'll throw in a couple more hip languages, R and Haskell. Truly cool kids know Haskell. What can I say, except that I am not yet that cool. I need to go out to a shed in the woods with a couple of books and learn me some Haskell.

R may have a bastardized syntax, but, eventually, it's functional core shines through. R is seeing a surge in popularity based on the highly hip and trendy field of data science, where it's powerful statistical methods and graphing come in handy. Aside from mclapply, R is a bit lacking in support for concurrency. [See correction below in comments!] Rhipe and Revolution Analytics's RHadoop are trying to change that by enabling distributed data analysis with R and Hadoop.

Fashion

You might be tempted to say it's all fashion. What goes around comes around. To some extent that's true, but, in each of these languages, there's something new and worthwhile to be learned. We have a ways to go before code is as expressive as we want it to be. Someone smart said that you'll like a programming language in proportion to what it teaches you. Mostly, I want to remind myself to set aside some time to play with these languages and see what new tricks they have to teach this old dog.

PS: When this post grows up, it wants to be Programming language development: the past 5 years by the very hip Michael Fogus.

Saturday, June 04, 2011

Environments in R

One interesting thing about R is that you can get down into the insides fairly easily. You're allowed to see more of how things are put together than in most languages. One of the ways R does this is by having first-class environments.

At first glance, environments are simple enough. An environment is just a place to store variables - a set of bindings between symbols and objects. If you start up R and make an assignment, you're adding an entry in the global environment.

> a <- 1234
> e <- globalenv()
> ls()
[1] "a" "e"
> ls(e)
[1] "a" "e"
> e$a
[1] 1234
> class(e)
[1] "environment"

Hmmm, the variable e is part of the global environment and it refers to the global environment, too, which is kind-of circular.

> ls(e$e$e$e$e$e$e$e)
[1] "a" "e"

We'd better cut that out, before we're sucked into a cosmic vortex.

> rm(e)

Most functional languages have some concept of environments, which serves as a higher level of abstraction over implementation details like allocating variables on the heap or stack. Saying that environments are first-class means that you can manipulate them from within the language, which is less common. Several advanced language features of R are built out of environments. We'll look at functions, packages and namespaces, and point out several Scheme-like features in R.

But first, the basics. The R Language Definition gives this definition:

Environments can be thought of as consisting of two things: a frame, which is a set of symbol-value pairs, and an enclosure, a pointer to an enclosing environment. When R looks up the value for a symbol the frame is examined and if a matching symbol is found its value will be returned. If not, the enclosing environment is then accessed and the process repeated. Environments form a tree structure in which the enclosures play the role of parents. The tree of environments is rooted in an empty environment, available through emptyenv(), which has no parent.

You can make a new environment with new.env() and assign a couple variables. The assign function works, as does the odd but convenient dollar sign notation. Think of the dollar sign as equivalent to the 'dot' operator that dereferences object members in Java-ish languages.

> my.env <- new.env()
> my.env
<environment: 0x114a9d940>
> ls(my.env)
character(0)
> assign("a", 999, envir=my.env)
> my.env$foo = "This is the variable foo."
> ls(my.env)
[1] "a"   "foo"

Now we have two variables named a, one in the global environment, the other in our new environment. Let's stick another variable b in the global environment, just for kicks.

> a
[1] 1234
> my.env$a
[1] 999
> b <- 4567

Also, note that the parent environment of my.env is the global environment.

> parent.env(my.env)
<environment: R_GlobalEnv>

A variable can be accessed using get or the dollar operator. By default, get continues up the chain of parents until it either finds a binding or reaches the empty environment. The dollar operator looks specifically in the given environment.

> get('a', envir=my.env)
[1] 999
> get('b', envir=my.env)
[1] 4567
> my.env$a
[1] 999
> my.env$b
NULL

Functions and environments

Functions have their own environments. This is the key to implementing closures. If you've never heard of a closure, it's just a function packaged up with some state. In fact, some say, closures are a poor man's object, while other insist it's the other way 'round. The R Language Definition explains the relationship between functions and environments like this:

Functions (or more precisely, function closures) have three basic components: a formal argument list, a body and an environment. [...] A function's environment is the environment that was active at the time that the function was created. [...] When a function is called, a new environment (called the evaluation environment) is created, whose enclosure is the environment from the function closure. This new environment is initially populated with the unevaluated arguments to the function; as evaluation proceeds, local variables are created within it.

When a function is evaluated, R looks in a series of environments for any variables in scope. The evaluation environment is first, then the function's enclosing environment, which will be the global environment for functions defined in the workspace. So, the global variable a, which had the value 1234 last time we looked, can be referenced inside a function.

> f <- function(x) { x + a }
> environment(f)
<environment: R_GlobalEnv>
> f(4321)
[1] 5555

We can change a function's environment if we want to.

> environment(f) <- my.env
> environment(f)
<environment: 0x114a9d940>
> my.env$a
[1] 999
> f(1)
[1] 1000

Suppose we wanted a counter to keep track of progress of some kind. That could be written and applied like so:

> createCounter <- function(value) { function(i) { value <<- value+i} }
> counter <- createCounter(0)
> counter(1)
> a <- counter(0)
> a
[1] 1
> counter(1)
> counter(1)
> a <- counter(1)
> a
[1] 4
> a <- counter(5)
> a
[1] 9

Notice the special <<- assignment operator. If we had used the normal <- assignment operator, we would have created a new variable 'value' in the evaluation environment of the function masking the value in the function closure environment. That environment disappears as soon as the function returns, sending our new value into the ether. What we want to do is change the value in the function closure environment, so that assignments to value will be persistent across invocations of our counter. Mutable state is generally not the default in functional languages, so we have to use the special assignment operator.

Just to look under the covers, where is that mutable state? In the counter function's enclosing environment.

> ls(environment(counter))
[1] "value"
> environment(counter)$value
[1] 9

For those that geek out on this stuff, this is an implementation of Paul Graham's Accumulator Generator from his article Revenge of the Nerds, which, years ago, I struggled to implement in Java.

Inspired by Scheme, lexical scoping is R's major point of departure from the S language. Gentleman and Ihaka's papers R: A Language for Data Analysis and Graphics (pdf) and Lexical Scope and Statistical Computing (pdf) describe some of their language design decisions around this point.

For functions defined in a package, the situation gets a bit more interesting. The various parts of the plot function are visible below, including a parameter list, (x, y, and some other junk), a block of code, elided here, and an environment, which is the namespace for the graphics package. Packages and namespaces are our next topic.

> plot
function (x, y, ...) 
{
  ...blah, blah, blah...
}
<environment: namespace:graphics>

Packages and namespaces

Walking up the chain of environments starting with the global environment, we see the packages loaded into R.

> globalenv()
<environment: R_GlobalEnv>
> g <- globalenv()
> while (environmentName(g) != 'R_EmptyEnv') { g <- parent.env(g); cat(str(g, give.attr=F)) }
<environment: 0x100fdf078>
<environment: package:stats>
<environment: package:graphics>
<environment: package:grDevices>
<environment: package:utils>
<environment: package:datasets>
<environment: package:methods>
<environment: 0x101a19f58>
<environment: base>
<environment: R_EmptyEnv>

Oddly, you can't test environments for equality. If you try, it says, "comparison (1) is possible only for atomic and list types". That's why we test for the end of the chain by name.

This same information can be had in slightly nicer form using search.

> search()
 [1] ".GlobalEnv"        "tools:RGUI"        "package:stats"     "package:graphics" 
 [5] "package:grDevices" "package:utils"     "package:datasets"  "package:methods"  
 [9] "Autoloads"         "package:base"

By now, you can guess how attach works. It creates an environment and slots it into the list right after the global environment, then populates it with the objects we're attaching.

beatles <- list('george'='guitar','ringo'='drums','paul'='bass guitar','john'='guitar')
> attach(beatles)
> search()
 [1] ".GlobalEnv"        "beatles"           "tools:RGUI"        "package:stats"    
 [5] "package:graphics"  "package:grDevices" "package:utils"     "package:datasets" 
 [9] "package:methods"   "Autoloads"         "package:base"     
> john
[1] "guitar"
> paul
[1] "bass guitar"
> george
[1] "guitar"
> ringo
[1] "drums"

Attaching a package using library adds an entry to the chain of environments. A package can optionally have another environment, a namespace, whose purpose is to prevent naming clashes between packages and hide internal implementation details. R Internals explains it like this:

A package pkg with a name space defines two environments namespace:pkg and package:pkg. It is package:pkg that can be attached and form part of the search path.

When a namespaced package is loaded, a new environment is created and all exported items are copied into it. That's package:pkg in the example above and is what you see in the search path. The namespace becomes the environment for the functions in that package. The parent environment of the namespace holds all the imports declared by the package. And the parent of that is a special copy of the base environment whose parent is the global environment.

We can see what namespaces are loaded using loadedNamespaces.

> loadedNamespaces()
[1] "base"      "graphics"  "grDevices" "methods"   "stats"     "tools"    
[7] "utils"

What if the same name is used in multiple environments? In general, R walks up the chain of environments and uses the first binding for a symbol it finds. R is smart enough to distinguish functions from other types. Here we try to mask the mean function, but R can still find it, knowing that we're trying to apply a function.

> z = list(mean='fluffernutter')
> attach(z)
> mean
[1] "fluffernutter"
> mean
[1] "fluffernutter"
> mean(c(1,2,3,4))
[1] 2.5
> detach(z)

We can mask a function with another function. Now, the mean of any list of numbers is "flapdoodle".

> z = list(mean=function(x){ return("flapdoodle") })
> attach(z)
The following object(s) are masked from 'package:base':
    mean
> mean(c(4,5,6,7))
[1] "flapdoodle"

The double-colon operator will let us specify which mean function we want. And, if you like to break the rules, the triple-colon operator lets you reach inside namespaces and touch private non-exported elements.

> base::mean(c(6,7,8,9))
[1] 7.5

So, there you have two fairly advanced language features built on the simple abstraction of environments. Thrown in for free is a nice look at R's functional side.

Is that everything you wanted to know about environments but were afraid to ask? Be warned that I'm just figuring this stuff out myself. If I've gotten anything bass-ackwards, please let me know. There's more information below, in case you can't get enough.

More Information

R Language Definition Environments
Environment Access functions
R Internals on Environments and variable lookup
Scope
Evaluation of functions
Name Space Internals
Loading and Unloading Name Spaces
Double Colon and Triple Colon Operators
Package name spaces from Writing R Extensions
A Simple Implementation of Name Spaces for R, Luke Tierney, 2003
R environments and formula objects teaches you about "reaching back up the call stack with your zombie programmer hand to eat the brains of the code that called you". Who could resist that?
For an education on functional programming and closures, see SICP.

Tuesday, March 30, 2010

Javascript for language geeks

A couple years back, I was surprised to discover some cool things about Javascript. I was really into Scheme at the time having read Structure and Interpretation of Computer Programs (aka SICP). It turns out, Javascript was strongly influenced by Scheme, and underneath a blasé curly-brace syntax was some cool stuff. Specifically, Javascript is a dynamically-typed object/functional scripting language as are Ruby and Python. But it's prototype-based object system is very different from either of those. Anyway, here are some notes from a little presentation I did, slightly edited and updated.

History

Javascript was created by Brendan Eich during the good old days of the browser wars. Netscape shipped LiveScript in Navigator 2.0 beta in September 1995 and re-released the language as JavaScript in March 1996. ECMA post-facto specification, ECMAScript standard: ECMA-262, followed. Douglas Crockford's The State and Future of Javascript tells the gory tale of the fall of the fourth edition and the rise of the fifth.

Edition 1	June 1997
Edition 2	June 1998
Edition 3	December 1999
Edition 4	...dragged on from 2000 through about 2008 and was finally abandoned
Edition 5	December 2009

Closures

A closure is a package containing a function and some state. Here's an example. The purpose of an accumulator is to keep a running total, which requires state. Every now and then we can add something to the total. “Isn't a closure just a poor man's object?” you might well ask... or is it the other way around? Anyway, in Javascript, there's not much difference. Functions are objects and closures just tack on some state.

function createAccumulator(value) {
    return function(i) {
        return value += i;
    }
}

> acc = createAccumulator(100)
> acc(1)
101
> acc(1)
102
> acc(23)
125

Bag 'o properties

A Javascript object is just a bag of properties, like a hash. Since functions are a first class data type, some of those properties are functions. Properties obviously can change at run time. Want to add a method to an object? OK. Delete one? Fine. Change the behavior of a method? Just as easy. This is how objects in a dynamically typed language should be.

// creating a moose object
var moose = new Object();
moose.name = "Bullwinkle";
moose.species = "Alces alces";
moose.toString = function() {
 return this.name + ", " + this.species;
}

> print(moose);
Bullwinkle, Alces alces

> print("moose.name = " + moose.name);
moose.name = Bullwinkle

> print("moose[\"name\"] = " + moose["name"]);
moose["name"] = Bullwinkle

> m.greet = function(who) {  
  print("Hello, " + who + " my name is " + this.name + "!");
}

> m.greet("Chris")
Hello, Chris my name is Bullwinkle!

Constructors

Constructors are a little weird, but kinda cool. By convention, constructors are capitalized. Otherwise, they are just functions. The new keyword creates a new object which becomes this in the body of the fuction, which initializes the object. Because the constructor can build differently formatted objects in different circumstances, it's more like a factory than the poor hobbled Java constructor.

# moose constructor
function Moose(name) {
 this.name = name;
 this.species = "Alces alces";
 this.toString = function() {
  return this.name + ", " + this.species;
 }
}

> m = new Moose("Marty")
Marty, Alces alces

Prototypes

The weirdest part of Javascript is its inheritance mechanism. Prototyping is inheritance without classes. We don't define templates for stamping out sets of similar objects. Instead, we pick one distinguished object and say, "All these objects are like that one". In other words, we delegate to our prototype object anything we don't need to handle in a customized way.

The new spec says it this way:

In a class-based object-oriented language, in general, state is carried by instances, methods are carried by classes, and inheritance is only of structure and behaviour. In ECMAScript, the state and methods are carried by objects, and structure, behaviour, and state are all inherited.

What's a little weird is how Javascript implements the idea syntactically. Have a look at the example below. Let's factor out stuff common to all moose.

moose_proto = new Object();
moose_proto.species = "Alces alces"
moose_proto.toString = function() {
 return this.name + ", " + this.species;
}

function Moose(name) {
    this.name = name;
}
Moose.prototype = moose_proto;

> a = new Moose("Bullwinkle");
> print(a);
Bullwinkle, Alces alces

To me, it's strange that x.prototype = y doesn't mean "The prototype of x is y." It means, "If x is a constructor, the objects created by x will have y as their prototype." Ya gotta admit, that's nutty. Also nutty is trying to implement something like the Java keyword super. Look around and you'll find a number of long and involved ways to get that behavior to work correctly.

One of key aspects of inheritance in Java is polymorphism. But, in duck-typed dynamic languages, every method call is polymorphic already. No class hierarchy needed. And why bother mucking about with user-defined type hierarchies anyway? Aren't dynamic languages supposed to free us from diddling about with that sort of thing?

I'm glad classes didn't make it into Javascript 5. But, I wouldn't be surprised if they come eventually, for a couple of reasons. Some people think prototypes are too weird and scary for the average Joe Coder. I don't, but I do think Javascript obfuscates the issue a bit. ActionScript has them. Prototype has them. I suppose that means there's a demand, so they'll probably come.

Tuesday, November 03, 2009

Elements of Programming

Don't you hate it when someone reads the first chapter of a book and then decides they're qualified to review it? So do I. But based on the first chapter, Elements of Programming by Alexander Stepanov, Paul McJones is something I might want to come back to, so here's my reminder to myself.

Stepanov is the guy behind the C++ Standard Template Library (STL), an advocate of generic programming and a critic of OOP. (papers here)

The authors have done for C++ something that is more commonly done for the functional languages, which is to put the language on a formal mathematical basis. Their central insight is the parallel between algorithms and data structures and algebraic structures. Abstract algebraic structures consist of one or more sets and one or more operations on those sets.

The formalism introduced in the book is based on objects with mutable state "computers with memory constitute the only available realization of a universal computing device". They seem to value that C++ is "faithful to the underlying machine". I wonder how this formalism will hold up to parallel computation, in contrast to the immutability and statelessness that seem popular in these increasingly parallel days. I also wonder about the comparison between this formalism and the lambda calculus.

They define a notion called regularity. A regular function is a pure side-effect-free function (If I'm reading right). Procedures and types can also be regular. A regular type is required to implement operation for equality, assignment, destructor, default constructor, copy constructor, total ordering, and underlying type. A concept is a minimal set of requirements a data type must satisfy for an algorithm to be applicable. Algorithms can then be implemented in terms of concepts rather than concrete data types. (I think?) Concepts can have semantic requirement like linear time complexity for a given operation, bringing efficiency into the mix.

Whether one likes this book will be strongly influenced by how well one likes generic programming in C++. And maybe whether one wants one's theory sullied by implementation details. I like the idea that generic programming can be something more than making a type checker happy -- more than the type-safe collections of Java's puny generics. And who would have thought you could extract a pretty mathematical theory from the warty ugster known as C++?

A reviewer on Amazon suggests a related Google Tech Talk "A Possible Future of Software Development" by Sean Parent.

Saturday, September 05, 2009

Closure

I've heard nice things about Clojure. Stuart Halloway has a book (Programming Clojure) out through the Pragmatic Programmers. The book grew out of Stuart's Java.next series of articles. There's also a podcast.

Clojure, like Scala, is functional programming for the JVM. Scala (which I fooled around with a little) descends from the ML/Caml/O-caml family of languages with an emphasis on pattern matching and static typing with type inference. Clojure is a dynamic Ruby-influenced dialect of Lisp specifically targeted to the JVM (hence the j). Yet another programming language on the digithead's cool-technology-I-want-to-play-with list.

Update

Clojure elevator pitch
Clojure tutorial slides and video
Matt Sears' Clojure links
Mark Volkmann's Clojure tutorial
Incanter is a Clojure-based, R-like platform for statistical computing and graphics
Labrepl is an environment for exploring the Clojure language
Clojure links from a commenter on Hacker News
Thoughts on Clojure
Book: Joy of Clojure

Thursday, March 12, 2009

Split split

Apparently, there's some disagreement about what it means to split a string into substrings. Biological data frequently comes in good old fashioned tab-delimited text files. That's OK 'cause they're easily parsed in the language and platform of your choice. Most languages with any pretention of string processing offer a split function. So, you read the files line-by-line and split each line on the tab character to get an array of fields.

The disagreement comes about when there are empty fields. Since we're talking text files, there's no saying, "NOT NULL", so it's my presumption that empty fields are possible. Consider the following JUnit test.

import org.apache.log4j.Logger;
import static org.junit.Assert.*;
import org.junit.Test;

public class TestSplit {
  private static final Logger log = Logger.getLogger("unit-test");

  @Test
  public void test1() {
    String[] fields = "foo\t\t\t\t\t\t\tbar".split("\t");
    log.info("fields.length = " + fields.length);
    assertEquals(fields.length, 8);
  }

  @Test
  public void test2() {
    // 7 tabs
    String[] fields = "\t\t\t\t\t\t\t".split("\t");
    log.info("fields.length = " + fields.length);
    assertEquals(fields.length, 8);
  }
}

The first test works. You end up with 8 fields, of which the middle 6 are empty. The second test fails. You get an empty array. I expected this to return an array of 8 empty strings. Java's mutant cousin, Javascript get's this right, as does Python.

Rhino 1.6 release 5 2006 11 18
js> a = "\t\t\t\t\t\t\t";
js> fields = a.split("\t")
,,,,,,,
js> fields.length
8

Python 2.5.1 (r251:54863, Jan 17 2008, 19:35:17)
>>> a = "\t\t\t\t\t\t\t"
>>> fields = a.split("\t")
['', '', '', '', '', '', '', '']
>>> len(fields)
8

Oddly enough, Ruby agrees with Java as does Perl.

>> str = "\t\t\t\t\t\t\t"
=> "\t\t\t\t\t\t\t"
>> fields = str.split("\t")
=> []
>> fields.length
=> 0

My perl is way rusty, so sue me. but, I think this is more or less it:

$str = "\t\t\t\t\t\t\t";
@fields = split(/\t/, $str);
print("fields = [@fields]\n");
$len = @fields;
print("length = $len\n");

Which yields:

fields = []
length = 0

How totally annoying!

Saturday, December 27, 2008

Functional Programming

I've harbored a secret desire to learn Haskell for a few years now. Simon Peyton-Jones is one of the key people behind Haskell. His web site at MSR has tons of papers, a tutorial on concurrent programming in Haskell, and a video lecture of A taste of Haskell. There's also a Simon Peyton-Jones podcast at SE-Radio.

What is Haskell

Haskell is a programming language that is

purely functional

lazy

higher order

strongly typed

general purpose

Why should I care?

Functional programming will make you think differently about programming

Mainstream languages are all about state

Functional programming is all about values

Whether or not you drink the Haskell Kool-Aid, you'll be a better programmer in whatever language you regularly use

I should read a Haskell book or two, and, in related functional goodness, I keep reading how great Practical Common Lisp is. I also need to fulfill my quest to finish SICP. I've read the first three chapters twice, doing the examples once in Scheme and again in OCAML. I've read chapter 4 on interpreters. I need to work through the examples in that chapter and take in the final fifth chapter.

Sunday, December 21, 2008

Principle of least surprise my ass

Let's start off by saying I'm not anti-Ruby. I like Ruby. Ruby is cool. Matz is cool. But, a while back I was wondering, What is a Ruby code block? My feeble curiosity has been revealed for the half-assed dilettantery it is by Paul Cantrell. Mr. Cantrell chases this question down, grabs it by the scruff of the neck, and wrings it out like a bulldog with a new toy. He also rocks on the piano, by the way.

So in fact, there are no less than seven -- count 'em, SEVEN -- different closure-like constructs in Ruby:

block (implicitly passed, called with yield)

block (&b => f(&b) => yield)

block (&b => b.call)

Proc.new

proc

lambda

method

This is quite a dizzing array of syntactic options, with subtle semantics differences that are not at all obvious, and riddled with minor special cases. It's like a big bear trap from programmers who expect the language to just work. Why are things this way? Because Ruby is:

designed by implementation, and

defined by implementation.

Again, neither I nor Mr. P.C. are bashing Ruby. He shows how to pull off some tasty functional goodness like transparent lazy lists later in the article. Thanks to railspikes for the link.

Friday, December 12, 2008

Educating Software Developers

Slashdot linked to Bjarne Stroustrup on Educating Software Developers which follows up on an earlier article, The 'Anti-Java' Professor and the Jobless Programmers. The Anti-Java professor is Robert Dewar at NYU, who coauthored a short paper, Computer Science Education: Where Are the Software Engineers of Tomorrow? They contend that computer science curricula have been dumbed down to counter falling enrollment post-dot-com-crash and partially blame Java, which fosters reliance on libraries and garbage collection. But, not all of their critique can be written off as language bigotry. The result?

We are training easily replaceable professionals.

Dewar advocates:

mentoring
reading code
working in groups
learning to reuse code

Those sound like solid points to me. One thing the field of medicine really gets right is an emphasis on mentoring. Mentoring is the heart of residency, which depending on specialty can last from 3 to 7 years. By the time a physician graduates from residency, they will have performed hundreds of procedures and seen thousands of patients under the guidance of an attending physician. I've often wished there was more of this in the computing field.

Over the years, I've accumulated a list of topics I wish I'd been exposed to as a CS undergrad.

source control
command line foo
linking and dependency management
security
software design: interfaces, refactoring, patterns, components, APIs
software architecture: design with large components - app servers, databases, message queues, transactions, etc.
data modeling: schema design and not just relational DBs. Hierarchical (XML) and graph (OO, RDF) representation
models of computation
- imperative
- OO
- functional
- logical
- pipes and filters
- and how they're are related.
scientific/engineering computing: MatLab, R
probability and statistics, statistical computing, analytics
writing

Of course, then my undergraduate degree would have taken 7 years... On second thought, only my Dad would have complained.

What do you wish you'd learned in college? Post a comment!

Monday, November 10, 2008

Clean code, properties, and patterns

Robert C. Martin has a new book out called Clean Code that looks worthwhile, as if I'm not far enough behind in reading. I also came across his Principles and Patterns article. Completely unrelated to that, The Universal Design Pattern by Steve Yegge describes object system of Javascript as an instance of what he calls the Properties Pattern. Java's static types and even classes in Python and Ruby feel constrained after working with Javascript's bag-of-properties approach to objects. There's also some interesting stuff at a blog called Red and Sensual Java, if you can get past the name.

Friday, September 26, 2008

Java.next

Stuart Halloway says in his Java.next series on the Relevance Blog:

Many of our best ideas on how to use a Java Virtual Machine (JVM) are now being baked into more advanced languages for the JVM. These languages tend to provide two significant advantages: essence over ceremony and functional programming.

Thursday, August 14, 2008

Threads are evil

The docs for Sqlite, under the heading "Threads are evil", pointed me to a paper by Edward A. Lee of UC Berkeley called The Problem with Threads.

Concurrency in software is difficult. However, much of this difficulty is a consequence of the abstractions for concurrency that we have chosen to use. The dominant one in use today for general purpose computing is threads. But non-trivial multi-threaded programs are incomprehensible to humans.

Lee argues that threads are wildly nondeterministic by default, while "nondeterminism should be judiciously and carefully introduced where needed, and should be explicit in programs."

Thursday, June 19, 2008

Rhinos and Tigers

From Steve Yegge, (author of the next big language) a transcript of a talk supposedly about Rhino and javascript, but also about the virtual machines, scripting, and languages in general.

So, you know, it's got some flaws. Some of which make you want to go scrape your teeth on the sidewalk because it's less painful. So it's true, but what language doesn't have some flaws?

Low Level Virtual Machine (LLVM)