"Is there a package for obfuscating code in #rstats?", someone asked. "The S4 object system?!" came the snarky reply. If you're smiling right now, you know that it wouldn't be funny if it weren't at least a little bit true.
Options: S3, S4 or R5?
There can be little doubt that object oriented programming in R is the cause of some confusion. We'll look at S4 classes more closely in a minute, but be warned that S4 classes are just one of at least three object systems available to the R programmer:
- S3: simple and lightweight
- S4: formal classes implemented by the methods package
- R5: Reference classes
It's not super clear when to use which, at least not to me. It seems to depend strongly on style and personal preference. The Bioconductor folks, for example, make heavy use of S4 classes. Google, on the other hand, advises to "avoid S4 objects and methods when possible".
Here's the way it looks to me. S3 classes feel a bit like Javascript classes - easy, loose and informal. S4 classes are rigid, verbose and harder to understand. But, they offer a better separation between interface and implementation, along with some advanced features like multiple dispatch, validation and type coercion. Reference classes (aka R5) encapsulate mutable state and look more like familiar Java-style classes. They're new and pass-by-reference can violate expectations of R users.
An S4 class example
Now, let's return to S4 classes with a simple example. First, we define a class to represent people.
# define an S4 class for people
setClass(
"Person",
representation(name="character", age="numeric"),
prototype(name=NA_character_, age=NA_real_)
)
A person has a name and an age, which default to NAs of their respective types - character string and numeric. For the sake of demonstrating polymorphism, let's define a couple subclasses.
# define subclasses for different types of people
setClass("Musician",
representation(instrument="character"),
contains="Person")
setClass("Programmer",
representation(language="character"),
contains="Person")
There's no reason not to write normal R functions that take S4 classes as arguments. Polymorphism is called for when a method has different implementations for different classes. In that case, we declare a generic method.
# create a generic method called 'talent' that
# dispatches on the type of object it's applied to
setGeneric(
"talent",
function(object) {
standardGeneric("talent")
}
)
The following code implements two subtypes of person, each with a talent for something.
setMethod(
"talent",
signature("Programmer"),
function(object) {
paste("Codes in",
paste(object@language, collapse=", "))
}
)
setMethod(
"talent",
signature("Musician"),
function(object) {
paste("Plays the",
paste(object@instrument, collapse=", "))
}
)
Now, let's make some talented people.
# create some talented people
donald <- new("Programmer",
name="Donald Knuth",
age=74,
language=c("MMIX"))
coltrane <- new("Musician",
name="John Coltrane",
age=40,
instrument=c("Tenor Sax", "Alto Sax"))
miles <- new("Musician",
name="Miles Dewey Davis",
instrument=c("Trumpet"))
monk <- new("Musician",
name="Theloneous Sphere Monk",
instrument=c("Piano"))
talent(miles)
[1] "Plays the Trumpet"
talent(donald)
[1] "Codes in MMIX"
talent(coltrane)
[1] "Plays the Tenor Sax, Alto Sax"
Mutability
One common stumbling block with S4 classes concerns changes in state. For instance, we might want to give our hard-working employees a raise.
setClass("Employee",
representation(boss="Person", salary="numeric"),
contains = "Person"
)
setGeneric(
"raise",
function(object, percent=0) {
standardGeneric("raise")
}
)
setMethod(
"raise",
signature("Employee"),
function(object, percent=0) {
object@salary <- object@salary * (1+percent)
object
}
)
True to it's functional heritage, R deals with immutable values. Changes in state happen by making new objects. The trick is to return the new object from the mutator methods and capture it on the way out.
smithers <- new("Employee",
name="Waylon Smithers",
boss=new("Person",name="Mr. Burns"),
salary=100000)
# doesn't work?!?!
raise(smithers, percent=15)
smithers@salary
[1] 100000
Setting a new salary creates a new value. Notice that we return the modified object from the raise function. Don't forget to catch it.
# remember to reassign smithers to the new value
smithers <- raise(smithers, percent=15)
smithers@salary
[1] 115000
Multiple Inheritance
Through the magic of multiple inheritance, the lowly Code Monkey is both a programmer and an employee. Just set the contains value to indicate its two parent classes.
setClass("Code Monkey",
contains=c("Programmer","Employee"))
setMethod(
"talent",
signature("Code Monkey"),
function(object) {
paste("Codes in",
paste(object@language, collapse=", "),
"for", object@boss@name)
}
)
chris <- new("Code Monkey",
name="Chris",
age=29,
boss=new("Person", name="The Man"),
salary=2L,
language=c("Java", "R", "Python", "Clojure"))
talent(chris)
[1] "Codes in Java, R, Python, Clojure for The Man"
So, there you have it - encapsulation, polymorphism and inheritance in S4 classes. Complete code for this example is in gist:3670578.
OO in R resources
It's lucky that there are loads of places to go to learn about S4 classes.
- First, look at Hadley Wickhams's devtools wiki which has a boatload of information for R package developers, in addition to info on S3, S4 and reference classes. Also from Hadley is a slide deck on Object Oriented Programming.
- S4 Classes in 15 pages, more or less
- How S4 Methods Work by John Chambers
- The R-docs for the Methods package are comprehensive. See:
- as
- setClass
- setGeneric
- setMethod
- showMethods
- getMethod
- Class Definitions
- Tools for Managing Generic Functions
- validObject
help(package="methods")
- Dirk Eddelbuettel and Romain Francois gave a Google TechTalk titled Integrating R with C++: Rcpp, RInside, and RProtobuf, covering integration between R and C++, but also has some good information on OO programming in R, particularly starting around the one hour mark (1:00). Romain Fancois' slide deck Object Oriented Design(s) in R is really good.
- Inside-R's references on the class systems:
- A section of Introduction to R covers Classes, generic functions and object orientation with S3 classes, The classic bank-account example in the section on Scope
- R5 Reference classes
- Slides from Martin Morgan on Reference Classes
- R for Programmers, by Norman Matloff of UCSD, or buy Norm's book: The Art of R Programming
- A (Not So) Short Introduction to S4
Totally OT, but: Trane played tenor and soprano. I'm not aware of any tracks on which he played alto.
ReplyDelete@cellocgw, No not off topic at all. Thank you for the correction. What a total flub on my part. I should be forced to listen to a Justin Bieber album as a punishment.
DeleteHeh. http://www.thegearpage.net/board/showthread.php?t=756171
ReplyDelete