Digithead's Lab Notebook: Machine-learning: gradient descent

Wednesday, October 26, 2011

Machine-learning: gradient descent

The first section of Andrew Ng's Machine Learning class is about applying gradient descent to linear regression problems.

Andrew Ng

Our input data is an m-by-n matrix X, where we have m training examples with n features each. For these training examples, we know the expected outputs y where y is the variable we're trying to predict. We want to find a line defined by the parameter vector ϴ that minimizes the squared error between the line and our data points.

Gradient descent takes a cost function, which is the squared error of the prediction vs. the training data. I think the 2 in the denominator is there so that it cancels out when we take the derivative, leaving us with a simpler gradient function.