Calculus for Machine Learning and Data Science (9)

728x90

Optimization in Neural Networks and Newton’s Method

Optimization in Neural Networks

Regression with a perceptron

Perceptron can be seen as a linear regression, where inputs are multiplied with weights. We output a prediction using the formula wx + b and optimize the weights (w) and bias (b).

We can think of a perceptron as a single node that does the computation/calculation.

Regression with a perceptron - Loss function

A reason we dived the squared error by 1/2 is when we take the derivative of the error we have that lingering 2. So by multiplying the squared error by 1/2, we can cancel out the 2 in computing the derivative.

We minimize the error/loss using the gradient descent method.

Regression with a perceptron - Gradient Descent

Performing gradient descent with a perception

To get the partial derivatives, we use the chain rule to compute the derivatives and update the variables.

Classification with Perceptron

For classification, we add an activation function on top of the linear regression formula.

For classification, we add an activation function on top of the linear regression formula.

To update the continuous number (output of the linear regression) we use an activation function (here, we use a sigmoid function) so we can get the numbers crunched into 0 and 1 (make it non-linear).

Classification with Perceptron - The sigmoid function

The horizontal line is the input and the vertical line is the output.

On the last slide, we can add and subtract 1 to split the formula, cancel the common numbers, and factor it for simplification.

Classification with Perceptron - Gradient Descent

We can use the squared loss used previously with regression problems, but for classification, we use the log loss function.

We use log loss because the math works nicely and also there’s a probabilistic nature to it and the output of the classification problem is a probability.

Classification with Perceptron - Calculating the derivatives

Recall that the derivative of the sigmoid is sigmoid times 1 minus sigmoid, hence replaced with $\hat y$ multiplied by the derivative of the inner part.

Classification with a Neural Network

Forward pass of the 2-layer neural network

A neural network consists of multiple perceptrons.

Classification with a Neural Network - Minimizing log-loss

Performing partial derivatives of weights and biases

We update the weights and biases to minimize the loss function (log loss) which is done by performing gradient descent.

Gradient Descent and Backpropagation

Superscript numbers with square brackets denote the layer number

Backpropagation: Process of updating weights and biases based on the computed loss using gradient descent and partial derivatives

All the information provided is based on the Calculus for Machine Learning and Data Science | Coursera from DeepLearning.AI

728x90

저작자표시 비영리 변경금지

'Coursera > Mathematics for ML and Data Science' 카테고리의 다른 글

Calculus for Machine Learning and Data Science (11) (2)	2024.09.01
Calculus for Machine Learning and Data Science (10) (2)	2024.08.31
Calculus for Machine Learning and Data Science (8) (1)	2024.08.29
Calculus for Machine Learning and Data Science (7) (2)	2024.08.28
Calculus for Machine Learning and Data Science (6) (0)	2024.08.27

안녕하세요

Calculus for Machine Learning and Data Science (9)

Optimization in Neural Networks and Newton’s Method

Optimization in Neural Networks

Regression with a perceptron

Regression with a perceptron - Loss function

Regression with a perceptron - Gradient Descent

Classification with Perceptron

Classification with Perceptron - The sigmoid function

Classification with Perceptron - Gradient Descent

Classification with Perceptron - Calculating the derivatives

Classification with a Neural Network

Classification with a Neural Network - Minimizing log-loss

Gradient Descent and Backpropagation

'Coursera > Mathematics for ML and Data Science' 카테고리의 다른 글

티스토리툴바

Calculus for Machine Learning and Data Science (9)

Optimization in Neural Networks and Newton’s Method

Optimization in Neural Networks

Regression with a perceptron

Regression with a perceptron - Loss function

Regression with a perceptron - Gradient Descent

Classification with Perceptron

Classification with Perceptron - The sigmoid function

Classification with Perceptron - Gradient Descent

Classification with Perceptron - Calculating the derivatives

Classification with a Neural Network

Classification with a Neural Network - Minimizing log-loss

Gradient Descent and Backpropagation

'Coursera > Mathematics for ML and Data Science' 카테고리의 다른 글

관련글

티스토리툴바