Calculus for Machine Learning and Data Science (5)

728x90

Derivatives and Optimization

Optimization

Introduction to optimization

We use derivatives for optimization.

Using the sauna example, we want to find the coolest place.

Optimization is when we want to find the maximum or the minimum of the function.

Local minima: Low points with zero slopes that are not the lowest points (orange arrows in the 3rd slide)

Global minima: Lowest point with zero slope (blue arrow in the 3rd slide)

In machine learning, we optimize the model to minimize the error and best fit the given dataset.

Optimization of squared loss - The one powerline problem

With one power line, we can allocate a house on the blue power line to minimize the cost.

Optimization of squared loss - The two powerline problem

With two power lines, we take the squared differences between the house and the power lines.

We take the square to ease the derivation.

Optimization of squared loss - The three powerline problem

The process is the same for the three power lines with the two power lines.

What is the cost function of this problem?

$(x-a)^2+(x-b)^2+(x-c)^2$

Correct. The cost function of this problem is the sum of the cost to connect to each powerline, as the total sum of squares.

Where should you put the house to minimize the cost?

$x = {a+b+c\over 3}$

Correct. The derivative of the cost function is $2(x-a)+2(x-b)+2(x-c)$ and equating this to 0 gives $x = {a+b+c\over 3}$

Optimization of log-loss - Part 1

Which of the three coins would maximize your chances of winning?

Coin 1

Correct. Since you win when you see 7H and 3T, then choosing the coin with greater chances of seeing heads makes sense.

We use calculus to find the derivative of the function (solve for function g).

Using logarithm simplifies solving for G.

We use the negative log loss because the output of log loss will be a negative number when p is between 0 and 1, hence putting a negative sign in the front changes the output to a positive number and we minimize it instead of maximizing it with the coin toss example.

Optimization of log-loss - Part 2

Products of decimal points return smaller numbers and as they get too small, computers might be not able to handle them well.

However, with logarithms, the logarithm of a very small number is a very large negative number that computers can handle.

So with complicated products, we may want to use a logarithm that can simplify the formula, especially when computing derivatives.

All the information provided is based on the Calculus for Machine Learning and Data Science | Coursera from DeepLearning.AI

728x90

저작자표시 비영리 변경금지

'Coursera > Mathematics for ML and Data Science' 카테고리의 다른 글

Calculus for Machine Learning and Data Science (7) (2)	2024.08.28
Calculus for Machine Learning and Data Science (6) (0)	2024.08.27
Calculus for Machine Learning and Data Science (4) (0)	2024.08.25
Calculus for Machine Learning and Data Science (3) (0)	2024.08.24
Calculus for Machine Learning and Data Science (2) (0)	2024.08.23

안녕하세요

Calculus for Machine Learning and Data Science (5)

Derivatives and Optimization

Optimization

Introduction to optimization

Optimization of squared loss - The one powerline problem

Optimization of squared loss - The two powerline problem

Optimization of squared loss - The three powerline problem

Optimization of log-loss - Part 1

Optimization of log-loss - Part 2

'Coursera > Mathematics for ML and Data Science' 카테고리의 다른 글

티스토리툴바

Calculus for Machine Learning and Data Science (5)

Derivatives and Optimization

Optimization

Introduction to optimization

Optimization of squared loss - The one powerline problem

Optimization of squared loss - The two powerline problem

Optimization of squared loss - The three powerline problem

Optimization of log-loss - Part 1

Optimization of log-loss - Part 2

'Coursera > Mathematics for ML and Data Science' 카테고리의 다른 글

'Coursera/Mathematics for ML and Data Science' Related Articles

티스토리툴바