Optimization in Neural Networks and Newton’s Method
Newton’s Method
Newton's Method
Newton’s method is an alternative to the gradient descent.
In principle, Newton’s method is used to find the zeros of a function (where f(x) = 0).
This is simply the formula of subtracting the previous point $x$ from the slope we calculate.
Since Newton’s method is to find a zero of a function (f), it behaves like the derivative of another function (g) when we use it for optimization.
Hence, to find the function's (f) derivative, we utilize the second function (g) derivative.
We use function (g) as an example, but it can be like function (h), etc.
Newton's Method - An example
Here are the optimization steps in computing Newton’s Method.
The Second Derivative
The second derivative tells us if something is a maximum or a minimum.
The derivative of zeros (solving for the derivative) are the candidates for either a maximum or minimum, but we don’t know exactly the case (maximum or minimum) and the second derivative tells us.
The first derivative gives us the information of the slope and the second derivative gives us the nature of the concave.
The Hessian
A matrix with second derivatives is called a Hessian and the elements of the matrix consist of the coefficients of the partial second derivatives.
Hessians and concavity
The eigenvalues from the Hessian matrix show whether the second derivative is positive (concave up) or negative (concave down).
When the eigenvalues are not all positive or negative we get a saddle point, which can be a maximum and a minimum.
Newton's Method for two variables
The Hessian matrix is used for Newton’s method with more than one variable by replacing the second derivative part.
All the information provided is based on the Calculus for Machine Learning and Data Science | Coursera from DeepLearning.AI
'Coursera > Mathematics for ML and Data Science' 카테고리의 다른 글
Probability & Statistics for Machine Learning & Data Science (1) (0) | 2024.09.03 |
---|---|
Probability & Statistics for Machine Learning & Data Science (0) (0) | 2024.09.02 |
Calculus for Machine Learning and Data Science (10) (1) | 2024.08.31 |
Calculus for Machine Learning and Data Science (9) (0) | 2024.08.30 |
Calculus for Machine Learning and Data Science (8) (1) | 2024.08.29 |