Thursday, February 20, 2014

Regression with multiple predictors

Now that I'm ridiculously behind in the Stanford Online Statistical Learning class, I thought it would be fun to try to reproduce the figure on page 36 of the slides from chapter 3 or page 81 of the book. The result is a curvaceous surface that slices neatly through the data set.

I'm not sure what incurable chemical imbalance explains why such a thing sounds like fun to me. But let's go get us some data, specifically the Advertising data set from the book's website:

Advertising <- read.csv("", 
    row.names = 1)

Previously, we fooled around with linear models with one predictor. We now extend regression to the case with multiple predictors.

\[ \hat{y} = \hat{\beta}_0 + \hat{\beta}_1 x_1 + \hat{\beta}_2 x_2 +···+ \hat{\beta}_p x_p \]

Or, in matrix notation:

\[ \hat{Y} = X \cdot \hat{\beta} + \hat{\beta_0} \]

Where the dimensions of X are n by p, and \( \beta \) is p by 1 resulting in a Y of size n by 1, a prediction for each of the n samples.

Find a fit for Sales as a function of TV and Radio advertising plus the interaction between the two.

fit2 = lm(Sales ~ TV * Radio, data = Advertising)

Before making the plot, define a helper function to evenly divide the ranges. This probably exists somewhere in R already, but I couldn't find it.

evenly_divide <- function(series, r = range(series), n = 10) {
    c(r[1], 1:n/n * (r[2] - r[1]) + r[1])

Generate a 2D grid over which we'll predict Sales.

x = evenly_divide(Advertising$TV, n = 16)
y = evenly_divide(Advertising$Radio, n = 16)

Using the coefficients of the fit, create a function f that computes the predicted response. It would be nice to use predict for this purpose. From a functional programming point of view, it seems like there should be a function that takes a fit and returns a function equivalent to f. If such a thing exists, I'd like to find it.

beta = coef(fit2)
f <- function(x, y) beta[1] + beta[2] * x + beta[3] * y + beta[4] * x * y
z = outer(x, y, f)

I copied the coloring of the regression surface from the examples for persp. I'm guessing the important part is create a color vector of the right size to cycle through the facets correctly.

nrz <- nrow(z)
ncz <- ncol(z)
nbcol <- 100

# Create color palette
palette <- colorRampPalette(c("blue", "cyan", "green"))
color <- palette(nbcol)

# Compute the z-value at the facet centres
zfacet <- z[-1, -1] + z[-1, -ncz] + z[-nrz, -1] + z[-nrz, -ncz]

# Recode facet z-values into color indices
facetcol <- cut(zfacet, nbcol)

OK, finally we get to the plotting. A call to persp sets up the coordinate system and renders the regression surface.

With the coordinate system we got in the return value, we plot the line segments representing the residuals, using a higher transparency for points under the surface. We do the same for the actual points, so they look a bit like they're underneath the surface.

# Draw the perspective plot
res <- persp(x, y, z, theta = 30, phi = 20, col = color[facetcol], xlab = "TV", 
    ylab = "Radio", zlab = "Sales")

# Draw the residual line segments
xy.true = trans3d(Advertising$TV, Advertising$Radio, Advertising$Sales, pmat = res) = trans3d(Advertising$TV, Advertising$Radio, predict(fit2, Advertising), 
    pmat = res)
colors = rep("#00000080", nrow(Advertising))
colors[residuals(fit2) < 0] = "#00000030"
segments(xy.true$x, xy.true$y,$x,$y, col = colors)

# Draw the original data points
colors = rep("#cc0000", nrow(Advertising))
colors[residuals(fit2) < 0] = "#cc000030"
points(trans3d(Advertising$TV, Advertising$Radio, Advertising$Sales, pmat = res), 
    col = colors, pch = 16)

Fit surface

To be honest, this doesn't look much like the figure in the book and there's reason to doubt the information conveying ability of 3D graphics, but whatever - it looks pretty cool!