Deep Neural Networks
Quiz
Q1
What is stored in the 'cache' during forward propagation for later use in backward propagation?
- $W^{[l]}$
- $Z^{[l]}$
- $b^{[l]}$
- $A^{[l]}$
Answer
2
Yes. This value is useful in the calculation of $dW^{[l]}$ in the backward propagation.
Q2
We use the “cache” in implementing forward and backward propagation to pass useful values to the next layer in the forward propagation. True/False?
Answer
False
Correct. The "cache" is used in our implementation to store values computed during forward propagation to be used in backward propagation.
Q3
Which ones are "hyperparameters"? (Check all that apply.)
- activation values $a^{[l]}$
- learning rate $\alpha$
- size of the hidden layers $n^{[l]}$
- weight matrices $W^{[l]}$
- bias vectors $b^{[l]}$
- number of iterations
- number of layers $L$ in the neural network
Answer
2, 3, 6, 7
Hyperparameters are the parameters that we can change to optimize our networks.
Q4
Which of the following are the “parameters” of a neural network? (Check all that apply.)
- $W^{[l]}$ the weight matrices.
- $g^{[l]}$ the activation functions.
- $L$ the number of layers of the neural network.
- $b^{[l]}$ the bias vector.
Answer
1, 4
Correct. The weight matrices and the bias vectors are the parameters of the network.
Q5
Which of the following statements is true?
- The deeper layers of a neural network are typically computing more complex features of the input than the earlier layers.
- The earlier layers of a neural network are typically computing more complex features of the input than the deeper layers.
Answer
1
Q6
Vectorization allows us to compute $a^{[l]}$ for all the examples on a batch at the same time without using a for loop. True/False?
Answer
True
Correct. Vectorization allows us to compute the activation for all the training examples at the same time, avoiding the use of a for loop.
Q7
Vectorization allows you to compute forward propagation in an $L$-layer neural network without an explicit for-loop (or any other explicit iterative loop) over the layers $l=1, 2, …, L$. True/False?
Answer
False
Forward propagation propagates the input through the layers, although for shallow networks we may just write all the lines $(a^{[2]} = g^{[2]}(z^{[2]}), z^{[2]} = W^{[2]}a^{[1]} + b^{[2]}, \dots)$ in a deeper network, we cannot avoid a for loop iterating over the layers: $(a^{[l]} = g^{[l]}(z^{[l]}), z^{[l]} = W^{[l]}a^{[l-1]} + b^{[l]}, \dots)$.
Q8
During forward propagation, in the forward function for layer $l$ you need to know what is the activation function in a layer (sigmoid, tanh, ReLU, etc.). During backpropagation, the corresponding backward function also needs to know what the activation function for the layer $l$ is, since the gradient depends on it. True/False?
Answer
True
Yes, as you've seen in week 3, each activation has a different derivative. Thus, during backpropagation, you need to know which activation was used in the forward propagation to be able to compute the correct derivative.
Q9
If L is the number of layers of a neural network then $dZ^{[L]} = A^{[L]}−Y$. True/False?
Answer
True
Yes. The gradient of the output layer depends on the difference between the value computed during the forward propagation process and the target values.
Q10
For any mathematical function you can compute with an L-layered deep neural network with N hidden units there is a shallow neural network that requires only $\textrm{log}\,N$ units, but it is very difficult to train.
Answer
False
Correct. On the contrary, some mathematical functions can be computed using an L-layered neural network and a given number of hidden units; but using a shallow neural network the number of necessary hidden units grows exponentially.
Q11
There are certain functions with the following properties: (i) To compute the function using a shallow network circuit, you will need a large network (where we measure size by the number of logic gates in the network), but (ii) To compute it using a deep network circuit, you need only an exponentially smaller network. True/False?
Answer
True
Q12
A shallow neural network with a single hidden layer and 6 hidden units can compute any function that a neural network with 2 hidden layers and 6 hidden units can compute. True/False?
Hint
As seen during the lectures there are functions you can compute with a "small" L-layer deep neural network that shallower networks require exponentially more hidden units to compute.
Answer
False
Q13
In the general case if we are training with $m$ examples what is the shape of $A^{[l]}$?
- $(n^{[l]}, m)$
- $(m, n^{[l]})$
- $(m, n^{[l+1]})$
- $(n^{[l+1]}, m)$
Hint
The number of rows in $A^{[1]}$ corresponds to the number of units in the l-th layer.
Answer
1
All the information provided is based on the Deep Learning Specialization | Coursera from DeepLearning.AI
'Coursera > Deep Learning Specialization' 카테고리의 다른 글
Improving Deep Neural Networks: Hyperparameter Tuning, Regularization, and Optimization (1) (0) | 2024.12.10 |
---|---|
Improving Deep Neural Networks: Hyperparameter Tuning, Regularization, and Optimization (0) (1) | 2024.12.05 |
Neural Networks and Deep Learning (10) (0) | 2024.11.28 |
Neural Networks and Deep Learning (9) (0) | 2024.11.27 |
Neural Networks and Deep Learning (8) (0) | 2024.11.26 |