본문 바로가기

Coursera/Mathematics for ML and Data Science

Calculus for Machine Learning and Data Science (5)

728x90

Derivatives and Optimization

Optimization

Introduction to optimization

012

We use derivatives for optimization.

Using the sauna example, we want to find the coolest place.

Optimization is when we want to find the maximum or the minimum of the function.

Local minima: Low points with zero slopes that are not the lowest points (orange arrows in the 3rd slide)

Global minima: Lowest point with zero slope (blue arrow in the 3rd slide)

In machine learning, we optimize the model to minimize the error and best fit the given dataset.

Optimization of squared loss - The one powerline problem

01

With one power line, we can allocate a house on the blue power line to minimize the cost.

Optimization of squared loss - The two powerline problem

0123

With two power lines, we take the squared differences between the house and the power lines.

We take the square to ease the derivation.

Optimization of squared loss - The three powerline problem

01

The process is the same for the three power lines with the two power lines.

What is the cost function of this problem?

더보기

$(x-a)^2+(x-b)^2+(x-c)^2$

Correct. The cost function of this problem is the sum of the cost to connect to each powerline, as the total sum of squares.

Where should you put the house to minimize the cost?

더보기

$x = {a+b+c\over 3}$

Correct. The derivative of the cost function is $2(x-a)+2(x-b)+2(x-c)$ and equating this to 0 gives $x = {a+b+c\over 3}$

01

Optimization of log-loss - Part 1

Which of the three coins would maximize your chances of winning?

더보기

Coin 1

Correct. Since you win when you see 7H and 3T, then choosing the coin with greater chances of seeing heads makes sense.

012

We use calculus to find the derivative of the function (solve for function g).

Using logarithm simplifies solving for G.

We use the negative log loss because the output of log loss will be a negative number when p is between 0 and 1, hence putting a negative sign in the front changes the output to a positive number and we minimize it instead of maximizing it with the coin toss example.

Optimization of log-loss - Part 2

012

Products of decimal points return smaller numbers and as they get too small, computers might be not able to handle them well.

However, with logarithms, the logarithm of a very small number is a very large negative number that computers can handle.

So with complicated products, we may want to use a logarithm that can simplify the formula, especially when computing derivatives.

 

All the information provided is based on the Calculus for Machine Learning and Data Science |  Coursera from DeepLearning.AI

728x90