Optimization in Neural Networks and Newton’s Method
Optimization in Neural Networks
Regression with a perceptron
Perceptron can be seen as a linear regression, where inputs are multiplied with weights. We output a prediction using the formula wx + b and optimize the weights (w) and bias (b).
We can think of a perceptron as a single node that does the computation/calculation.
Regression with a perceptron - Loss function
A reason we dived the squared error by 1/2 is when we take the derivative of the error we have that lingering 2. So by multiplying the squared error by 1/2, we can cancel out the 2 in computing the derivative.
We minimize the error/loss using the gradient descent method.
Regression with a perceptron - Gradient Descent
To get the partial derivatives, we use the chain rule to compute the derivatives and update the variables.
Classification with Perceptron
For classification, we add an activation function on top of the linear regression formula.
For classification, we add an activation function on top of the linear regression formula.
To update the continuous number (output of the linear regression) we use an activation function (here, we use a sigmoid function) so we can get the numbers crunched into 0 and 1 (make it non-linear).
Classification with Perceptron - The sigmoid function
The horizontal line is the input and the vertical line is the output.
On the last slide, we can add and subtract 1 to split the formula, cancel the common numbers, and factor it for simplification.
Classification with Perceptron - Gradient Descent
We can use the squared loss used previously with regression problems, but for classification, we use the log loss function.
We use log loss because the math works nicely and also there’s a probabilistic nature to it and the output of the classification problem is a probability.
Classification with Perceptron - Calculating the derivatives
Recall that the derivative of the sigmoid is sigmoid times 1 minus sigmoid, hence replaced with $\hat y$ multiplied by the derivative of the inner part.
Classification with a Neural Network
A neural network consists of multiple perceptrons.
Classification with a Neural Network - Minimizing log-loss
We update the weights and biases to minimize the loss function (log loss) which is done by performing gradient descent.
Gradient Descent and Backpropagation
Backpropagation: Process of updating weights and biases based on the computed loss using gradient descent and partial derivatives
All the information provided is based on the Calculus for Machine Learning and Data Science | Coursera from DeepLearning.AI
'Coursera > Mathematics for ML and Data Science' 카테고리의 다른 글
Calculus for Machine Learning and Data Science (11) (2) | 2024.09.01 |
---|---|
Calculus for Machine Learning and Data Science (10) (1) | 2024.08.31 |
Calculus for Machine Learning and Data Science (8) (1) | 2024.08.29 |
Calculus for Machine Learning and Data Science (7) (2) | 2024.08.28 |
Calculus for Machine Learning and Data Science (6) (0) | 2024.08.27 |