Neural Networks and Deep Learning (8)

728x90

▤ 목차

Shallow Neural Networks

Quiz

728x90

Q1

Which of the following is true? (Check all that apply.)

$a^{[2]}$ denotes the activation vector of the 2nd layer.
$a^{[2](12)}$ denotes the activation vector of the 2nd layer for the 12th training example.
$a^{[2](12)}$ denotes activation vector of the 12th layer on the 2nd training example.
$a^{[2]}_4$ is the activation output of the 2nd layer for the 4th training example
$X$ is a matrix in which each row is one training example.
$a^{[2]}_4$ is the activation output by the 4th neuron of the 2nd layer
$X$ is a matrix in which each column is one training example.

Answer

1, 2, 6, 7

Q2

$w^{[4]}_3$ is the row vector of parameters of the fourth layer and third neuron.
$w^{[4]}_3$ is the column vector of parameters of the fourth layer and third neuron.
$a^{[3](2)}$ denotes the activation vector of the second layer for the third example.
$w^{[4]}_3$ is the column vector of parameters of the third layer and fourth neuron.
$a^{[2]}$ denotes the activation vector of the second layer.
$a^{[2]}_3$ denotes the activation vector of the second layer for the third example.

Hint

The vectors $w^{[j]}_k$ are column vectors

Answer

Yes. The vector $w^{[i]}_j$ is the column vector of parameters of the i-th layer and j-th neuron of that layer.

Yes. In our convention $a^{[j]}$ denotes the activation function of the j-th layer.

Q3

The sigmoid function is only mentioned as an activation function for historical reasons. The tanh is always preferred without exceptions in all the layers of a Neural Network. True/False?

Answer

False

Yes. Although the tanh almost always works better than the sigmoid function when used in hidden layers, thus is always preferred as an activation function, the exception is for the output layer in classification problems.

Q4

The tanh activation is not always better than the sigmoid activation function for hidden units because the mean of its output is closer to zero, and so it centers the data, making learning complex for the next layer. True/False?

Answer

False

Yes. As seen in the lecture the output of the tanh is between -1 and 1, it thus centers the data which makes the learning simpler for the next layer.

Q5

Which of these is a correct vectorized implementation of forward propagation for layer $l$, where $1 ≤ l≤L$?

$Z^{[l]} = W^{[l]}A^{[l]} + b^{[l]} \\ A^{[l+1]} = g^{[l+1]}(Z^{[l]})$
$Z^{[l]} = W^{[l]}A^{[l]} + b^{[l-1]} \\ A^{[l+1]} = g^{[l]}(Z^{[l]})$
$Z^{[l]} = W^{[l-1]}A^{[l]} + b^{[l-1]} \\ A^{[l]} = g^{[l]}(Z^{[l]})$
$Z^{[l]} = W^{[l]}A^{[l-1]} + b^{[l]} \\ A^{[l]} = g^{[l]}(Z^{[l]})$

Answer

Q6

The use of the ReLU activation function is becoming rare because the ReLU function has no derivative for $c=0$. True/False?

Answer

False

Yes. Although the ReLU function has no derivative at $c=0$ this rarely causes any problems in practice. Moreover, it has become the default activation function in many cases, as explained in the lectures.

Q7

Using linear activation functions in the hidden layers of a multilayer neural network is equivalent to using a single layer. True/False?

Answer

True

Yes. When the identity or linear activation function $g(c) = c$ is used the output of the composition of layers is equivalent to the computations made by a single layer.

Q8

When building a binary classifier for recognizing cats (y=1) vs raccoons (y=0). Using the sigmoid function as an activation function for the hidden layers is better. True/False?

Answer

False

Yes. Using tanh almost always works better than the sigmoid function for hidden layers.

Q9

Consider the following code:

x = np.random.rand(3, 2)
y = np.sum(x, axis=0, keepdims=True)

What will be y.shape?

(1, 2)
(3, 1)
(3,)
(2,)

Answer

Yes. By choosing the axis=0 the sum is computed over each column of the array, thus the resulting array is a row vector with 2 entries. Since the option keepdims=True is used the first dimension is kept, thus (1, 2).

Q10

Consider the following code:

x = np.random.rand(4, 5)
y = np.sum(x, axis=1)

What will be y.shape?

(1, 5)
(5,)
(4,)
(4, 1)

Answer

Yes. By using axis=1 the sum is computed over each row of the array, thus the resulting array is a column vector with 4 entries. Since the option keepdims was not used the array doesn't keep the second dimension.

Q11

Suppose you have built a neural network. You decide to initialize the weights and biases to be zero. Which of the following statements is true?

Each neuron in the first hidden layer will compute the same thing, but neurons in different layers will compute different things, thus we have accomplished “symmetry breaking” as described in the lecture.
Each neuron in the first hidden layer will perform the same computation. So even after multiple iterations of gradient descent, each neuron in the layer will be computing the same thing as other neurons.
Each neuron in the first hidden layer will perform the same computation in the first iteration. But after one iteration of gradient descent, they will learn to compute different things because we have “broken symmetry”.
The first hidden layer’s neurons will perform different computations from each other even in the first iteration; their parameters will thus keep evolving in their own way.

Answer

Q12

Suppose you have built a neural network with one hidden layer and tanh as an activation function for the hidden layer. You decide to initialize the weights to small random numbers and the biases to zero. The first hidden layer’s neurons will perform different computations from each other even in the first iteration. True/False?

False No. Since the weights are most likely different, each neuron will do a different computation.
True Yes. Since the weights are most likely different, each neuron will do a different computation.

Answer

Q13

A single output and single layer neural network that uses the sigmoid function as activation is equivalent to the logistic regression. True/False?

Answer

True

Yes. The logistic regression model can be expressed by $\hat y = \sigma(Wx + b)$. This is the same as $a^{[1]} = \sigma(W^{[1]}X + b)$.

Q14

You have built a network using the tanh activation for all the hidden units. You initialize the weights to relatively large values, using np.random.randn(..,..)*1000. What will happen?

This will cause the inputs of the tanh to also be very large, thus causing gradients to be close to zero. The optimization algorithm will thus become slow.
So long as you initialize the weights randomly gradient descent is not affected by whether the weights are large or small.
This will cause the inputs of the tanh to also be very large, thus causing gradients to also become large. You therefore have to set $α$ to a very small value to prevent divergence; this will slow down learning.
This will cause the inputs of the tanh to also be very large, causing the units to be “highly activated” and thus speed up learning compared to if the weights had to start from small values.

Answer

Yes. tanh becomes flat for large values; this leads its gradient to be close to zero. This slows down the optimization algorithm.

Q15

Which of the following is true about the ReLU activation functions?

They cause several problems in practice because they have no derivative at 0. That is why Leaky ReLU was invented.
They are only used in the case of regression problems, such as predicting house prices.
They are the go-to option when you don't know what activation function to choose for hidden layers.
They are increasingly being replaced by the tanh in most cases.

Answer

All the information provided is based on the Deep Learning Specialization | Coursera from DeepLearning.AI

728x90

저작자표시 비영리 변경금지

'Coursera > Deep Learning Specialization' 카테고리의 다른 글

Neural Networks and Deep Learning (10) (0)	2024.11.28
Neural Networks and Deep Learning (9) (0)	2024.11.27
Neural Networks and Deep Learning (7) (1)	2024.11.25
Neural Networks and Deep Learning (6) (0)	2024.11.23
Neural Networks and Deep Learning (5) (1)	2024.11.18

안녕하세요

Neural Networks and Deep Learning (8)

Shallow Neural Networks

Quiz

Q1

Q2

Q3

Q4

Q5

Q6

Q7

Q8

Q9

Q10

Q11

Q12

Q13

Q14

Q15

'Coursera > Deep Learning Specialization' 카테고리의 다른 글

티스토리툴바

Neural Networks and Deep Learning (8)

Shallow Neural Networks

Quiz

Q1

Q2

Q3

Q4

Q5

Q6

Q7

Q8

Q9

Q10

Q11

Q12

Q13

Q14

Q15

'Coursera > Deep Learning Specialization' 카테고리의 다른 글

관련글

티스토리툴바