본문 바로가기

Coursera/Mathematics for ML and Data Science

Calculus for Machine Learning and Data Science (11)

728x90

Optimization in Neural Networks and Newton’s Method

Newton’s Method

Newton's Method

Newton’s method is an alternative to the gradient descent.

In principle, Newton’s method is used to find the zeros of a function (where f(x) = 0).

012

This is simply the formula of subtracting the previous point $x$ from the slope we calculate.

Since Newton’s method is to find a zero of a function (f), it behaves like the derivative of another function (g) when we use it for optimization.

Hence, to find the function's (f) derivative, we utilize the second function (g) derivative.

We use function (g) as an example, but it can be like function (h), etc.

Newton's Method - An example

Here are the optimization steps in computing Newton’s Method.

012345
Newton's Method for Optimization

The Second Derivative

01234567

The second derivative tells us if something is a maximum or a minimum.

The derivative of zeros (solving for the derivative) are the candidates for either a maximum or minimum, but we don’t know exactly the case (maximum or minimum) and the second derivative tells us.

01

The first derivative gives us the information of the slope and the second derivative gives us the nature of the concave.

The Hessian

A matrix with second derivatives is called a Hessian and the elements of the matrix consist of the coefficients of the partial second derivatives.

0123456

Hessians and concavity

01234

The eigenvalues from the Hessian matrix show whether the second derivative is positive (concave up) or negative (concave down).

When the eigenvalues are not all positive or negative we get a saddle point, which can be a maximum and a minimum.

Newton's Method for two variables

01

The Hessian matrix is used for Newton’s method with more than one variable by replacing the second derivative part.

01234
An example of Newton's method optimization

All the information provided is based on the Calculus for Machine Learning and Data Science |  Coursera from DeepLearning.AI

728x90