본문 바로가기
Coursera/Deep Learning Specialization

Neural Networks and Deep Learning (8)

by Fresh Red 2024. 11. 26.
728x90
반응형

▤ 목차

    Shallow Neural Networks

    Quiz

    728x90

    Q1

    Which of the following is true? (Check all that apply.)

    1. $a^{[2]}$ denotes the activation vector of the 2nd layer.
    2. $a^{[2](12)}$ denotes the activation vector of the 2nd layer for the 12th training example.
    3. $a^{[2](12)}$ denotes activation vector of the 12th layer on the 2nd training example.
    4. $a^{[2]}_4$ is the activation output of the 2nd layer for the 4th training example
    5. $X$ is a matrix in which each row is one training example.
    6. $a^{[2]}_4$ is the activation output by the 4th neuron of the 2nd layer
    7. $X$ is a matrix in which each column is one training example.

    Answer

    더보기

    1, 2, 6, 7

    Q2

    1. $w^{[4]}_3$ is the row vector of parameters of the fourth layer and third neuron.
    2. $w^{[4]}_3$ is the column vector of parameters of the fourth layer and third neuron.
    3. $a^{[3](2)}$ denotes the activation vector of the second layer for the third example.
    4. $w^{[4]}_3$ is the column vector of parameters of the third layer and fourth neuron.
    5. $a^{[2]}$ denotes the activation vector of the second layer.
    6. $a^{[2]}_3$ denotes the activation vector of the second layer for the third example.

    Hint

    더보기

    The vectors $w^{[j]}_k$ are column vectors

    Answer

    더보기

    2

    Yes. The vector $w^{[i]}_j$ is the column vector of parameters of the i-th layer and j-th neuron of that layer.

    5

    Yes. In our convention $a^{[j]}$ denotes the activation function of the j-th layer.

    Q3

    The sigmoid function is only mentioned as an activation function for historical reasons. The tanh is always preferred without exceptions in all the layers of a Neural Network. True/False?

    Answer

    더보기

    False

    Yes. Although the tanh almost always works better than the sigmoid function when used in hidden layers, thus is always preferred as an activation function, the exception is for the output layer in classification problems.

    Q4

    The tanh activation is not always better than the sigmoid activation function for hidden units because the mean of its output is closer to zero, and so it centers the data, making learning complex for the next layer. True/False?

    Answer

    더보기

    False

    Yes. As seen in the lecture the output of the tanh is between -1 and 1, it thus centers the data which makes the learning simpler for the next layer.

    Q5

    Which of these is a correct vectorized implementation of forward propagation for layer $l$, where $1 ≤ l≤L$?

    1. $Z^{[l]} = W^{[l]}A^{[l]} + b^{[l]} \\ A^{[l+1]} = g^{[l+1]}(Z^{[l]})$
    2. $Z^{[l]} = W^{[l]}A^{[l]} + b^{[l-1]} \\ A^{[l+1]} = g^{[l]}(Z^{[l]})$
    3. $Z^{[l]} = W^{[l-1]}A^{[l]} + b^{[l-1]} \\ A^{[l]} = g^{[l]}(Z^{[l]})$
    4. $Z^{[l]} = W^{[l]}A^{[l-1]} + b^{[l]} \\ A^{[l]} = g^{[l]}(Z^{[l]})$

    Answer

    Q6

    The use of the ReLU activation function is becoming rare because the ReLU function has no derivative for $c=0$. True/False?

    Answer

    더보기

    False

    Yes. Although the ReLU function has no derivative at $c=0$ this rarely causes any problems in practice. Moreover, it has become the default activation function in many cases, as explained in the lectures.

    Q7

    Using linear activation functions in the hidden layers of a multilayer neural network is equivalent to using a single layer. True/False?

    Answer

    더보기

    True

    Yes. When the identity or linear activation function $g(c) = c$ is used the output of the composition of layers is equivalent to the computations made by a single layer.

    Q8

    When building a binary classifier for recognizing cats (y=1) vs raccoons (y=0). Using the sigmoid function as an activation function for the hidden layers is better. True/False?

    Answer

    더보기

    False

    Yes. Using tanh almost always works better than the sigmoid function for hidden layers.

    Q9

    Consider the following code:

    x = np.random.rand(3, 2)
    y = np.sum(x, axis=0, keepdims=True)

    What will be y.shape?

    1. (1, 2)
    2. (3, 1)
    3. (3,)
    4. (2,)

    Answer

    더보기

    1

    Yes. By choosing the axis=0 the sum is computed over each column of the array, thus the resulting array is a row vector with 2 entries. Since the option keepdims=True is used the first dimension is kept, thus (1, 2).

    Q10

    Consider the following code:

    x = np.random.rand(4, 5)
    y = np.sum(x, axis=1)

    What will be y.shape?

    1. (1, 5)
    2. (5,)
    3. (4,)
    4. (4, 1)

    Answer

    더보기

    3

    Yes. By using axis=1 the sum is computed over each row of the array, thus the resulting array is a column vector with 4 entries. Since the option keepdims was not used the array doesn't keep the second dimension.

    Q11

    Suppose you have built a neural network. You decide to initialize the weights and biases to be zero. Which of the following statements is true?

    1. Each neuron in the first hidden layer will compute the same thing, but neurons in different layers will compute different things, thus we have accomplished “symmetry breaking” as described in the lecture.
    2. Each neuron in the first hidden layer will perform the same computation. So even after multiple iterations of gradient descent, each neuron in the layer will be computing the same thing as other neurons.
    3. Each neuron in the first hidden layer will perform the same computation in the first iteration. But after one iteration of gradient descent, they will learn to compute different things because we have “broken symmetry”.
    4. The first hidden layer’s neurons will perform different computations from each other even in the first iteration; their parameters will thus keep evolving in their own way.

    Answer

    Q12

    Suppose you have built a neural network with one hidden layer and tanh as an activation function for the hidden layer. You decide to initialize the weights to small random numbers and the biases to zero. The first hidden layer’s neurons will perform different computations from each other even in the first iteration. True/False?

    1. False No. Since the weights are most likely different, each neuron will do a different computation.
    2. True Yes. Since the weights are most likely different, each neuron will do a different computation.

    Answer

    Q13

    A single output and single layer neural network that uses the sigmoid function as activation is equivalent to the logistic regression. True/False?

    Answer

    더보기

    True

    Yes. The logistic regression model can be expressed by $\hat y = \sigma(Wx + b)$. This is the same as $a^{[1]} = \sigma(W^{[1]}X + b)$.

    Q14

    You have built a network using the tanh activation for all the hidden units. You initialize the weights to relatively large values, using np.random.randn(..,..)*1000. What will happen?

    1. This will cause the inputs of the tanh to also be very large, thus causing gradients to be close to zero. The optimization algorithm will thus become slow.
    2. So long as you initialize the weights randomly gradient descent is not affected by whether the weights are large or small.
    3. This will cause the inputs of the tanh to also be very large, thus causing gradients to also become large. You therefore have to set $α$ to a very small value to prevent divergence; this will slow down learning.
    4. This will cause the inputs of the tanh to also be very large, causing the units to be “highly activated” and thus speed up learning compared to if the weights had to start from small values.

    Answer

    더보기

    1

    Yes. tanh becomes flat for large values; this leads its gradient to be close to zero. This slows down the optimization algorithm.

    Q15

    Which of the following is true about the ReLU activation functions?

    1. They cause several problems in practice because they have no derivative at 0. That is why Leaky ReLU was invented.
    2. They are only used in the case of regression problems, such as predicting house prices.
    3. They are the go-to option when you don't know what activation function to choose for hidden layers.
    4. They are increasingly being replaced by the tanh in most cases.

    Answer

    All the information provided is based on the Deep Learning Specialization | Coursera from DeepLearning.AI

    728x90
    반응형

    home top bottom
    }