본문 바로가기

Coursera/Deep Learning Specialization

Neural Networks and Deep Learning (2)

728x90

Neural Networks Basics

Set up a machine learning problem with a neural network mindset and use vectorization to speed up your models.

Learning Objectives


  • Build a logistic regression model structured as a shallow neural network
  • Build the general architecture of a learning algorithm, including parameter initialization, cost function, gradient calculation, and optimization implementation (gradient descent)
  • Implement computationally efficient and highly vectorized versions of models
  • Compute derivatives for logistic regression, using a backpropagation mindset
  • Use Numpy functions and Numpy matrix/vector operations
  • Work with iPython Notebooks
  • Implement vectorization across multiple training examples
  • Explain the concept of broadcasting
728x90

Logistic Regression as a Neural Network

Binary Classification

Cat vs not cat

Our goal with binary classification is to classify data into two classes, usually 0 or 1.

Logistic Regression

Given X, we want to find $\hat y$ or the probability of target (y) = 1 and we can get this by using a formula from linear algebra: y = ax + b (in AI, we use y = wx + b).

We use parameters w (weights) and b (bias) to multiply our data to get the prediction.

We apply a sigmoid function to our formula (in the slide) to ensure our predicted output falls within 0 and 1.

 

What are the parameters of logistic regression?

더보기

W, an $n_x$ dimensional vector, and b, a real number.

Logistic Regression Cost Function

We’d think that square loss is a reasonable way to approach a loss function, but it doesn’t work well on gradient descent during optimization due to being non-convex with possibly multiple local optima.

 

What is the difference between the cost function and the loss function for logistic regression?

더보기

The loss function computes the error for a single training example; the cost function is the average of the loss functions of the entire training set.

Gradient Descent

01

We initialize our parameters w and b with some values and try to find the minimum value for the loss function.

For logistic regression, we can initialize our w and b because the loss function has a convex shape.

For other loss functions, we initialize w and b with random values.

We use gradient descent to iterate the process of updating parameters to find the global optimum.

To update, we take partial derivatives of the parameters multiplied by the learning rate.

The learning rate is set to update our parameters not too slowly or drastically toward where the gradient is heading.

 

True or false. A convex function always has multiple local optima.

더보기

False

Derivatives

Derivative can also be thought of as a slope (height over width).

It tells us how much height moves if we change/update our width.

So the function of variable (a) is the height and a variable (a) is the width.

 

On a straight line, the function's derivative…

더보기

doesn’t change

More Derivative Examples

01

If the function is not a straight line, depending on the location, the value of the function changes instead of increasing at a constant level.

With our example $a^2$, the derivative is 2a, so depending on the value, the function outputs a different value instead of constantly increasing by 4 times.

So the derivative tells us how much the function (height) moves when the value (width) moves by some amount.

And we can think of the derivative as a slope of the line (how much we moved a point).

 

All the information provided is based on the Deep Learning Specialization | Coursera from DeepLearning.AI

728x90