Derivatives and Optimization
Optimization
Introduction to optimization
We use derivatives for optimization.
Using the sauna example, we want to find the coolest place.
Optimization is when we want to find the maximum or the minimum of the function.
Local minima: Low points with zero slopes that are not the lowest points (orange arrows in the 3rd slide)
Global minima: Lowest point with zero slope (blue arrow in the 3rd slide)
In machine learning, we optimize the model to minimize the error and best fit the given dataset.
Optimization of squared loss - The one powerline problem
With one power line, we can allocate a house on the blue power line to minimize the cost.
Optimization of squared loss - The two powerline problem
With two power lines, we take the squared differences between the house and the power lines.
We take the square to ease the derivation.
Optimization of squared loss - The three powerline problem
The process is the same for the three power lines with the two power lines.
$(x-a)^2+(x-b)^2+(x-c)^2$
Correct. The cost function of this problem is the sum of the cost to connect to each powerline, as the total sum of squares.
$x = {a+b+c\over 3}$
Correct. The derivative of the cost function is $2(x-a)+2(x-b)+2(x-c)$ and equating this to 0 gives $x = {a+b+c\over 3}$
Optimization of log-loss - Part 1
Coin 1
Correct. Since you win when you see 7H and 3T, then choosing the coin with greater chances of seeing heads makes sense.
We use calculus to find the derivative of the function (solve for function g).
Using logarithm simplifies solving for G.
We use the negative log loss because the output of log loss will be a negative number when p is between 0 and 1, hence putting a negative sign in the front changes the output to a positive number and we minimize it instead of maximizing it with the coin toss example.
Optimization of log-loss - Part 2
Products of decimal points return smaller numbers and as they get too small, computers might be not able to handle them well.
However, with logarithms, the logarithm of a very small number is a very large negative number that computers can handle.
So with complicated products, we may want to use a logarithm that can simplify the formula, especially when computing derivatives.
All the information provided is based on the Calculus for Machine Learning and Data Science | Coursera from DeepLearning.AI
'Coursera > Mathematics for ML and Data Science' 카테고리의 다른 글
Calculus for Machine Learning and Data Science (7) (2) | 2024.08.28 |
---|---|
Calculus for Machine Learning and Data Science (6) (0) | 2024.08.27 |
Calculus for Machine Learning and Data Science (4) (0) | 2024.08.25 |
Calculus for Machine Learning and Data Science (3) (0) | 2024.08.24 |
Calculus for Machine Learning and Data Science (2) (0) | 2024.08.23 |