吴恩达课程深度学习错题集
2018-02-08 10:42
561 查看
1.Logistic regression’s weights w should be initialized randomly rather than to all zeros, because if you initialize to all zeros, then logistic regression will fail to learn a useful decision boundary because it will fail to “break symmetry”, True/False?
No, Logistic Regression doesn't have a hidden layer. If you initialize the weights to zeros, the first example x fed in the logistic regression will output zero but the derivatives of the Logistic Regression depend on the input x (because there's no hidden layer) which is not zero. So at the second iteration, the weights values follow x's distribution and are different from each other if x is not a constant vector.
2 If you have 10,000,000 examples, how would you split the train/dev/test set?
98% train . 1% dev . 1% test
3. 在exponentially weighted average曲线中,减小beta值会让曲线左移。
个人解释:越高的beta值,某点处的值被平均到后面的就越多。减小beta值可以让出现在后面的值较多得“补回来”,视觉上看应该是曲线左移。
4.After setting up your train/dev/test sets, the City Council comes across another 1,000,000 images, called the “citizens’ data”. Apparently the citizens of Peacetopia are so scared of birds that they volunteered to take pictures of the sky and label them, thus contributing these additional 1,000,000 images. These images are different from the distribution of images the City Council had originally given you, but you think it could help your algorithm.You should not add the citizens’ data to the training set, because this will cause the training and dev/test set distributions to become different, thus hurting dev and test set performance. True/False?False
5.市议会的一位成员对机器学习知之甚少, 并认为应该将100万公民的数据图像添加到测试组中。您的意见是:(B、C)A、一个更大的测试集将减慢迭代的速度, 因为在测试集上评估模型的计算费用。B、这将导致开发和测试集分布变得不同。这是一个坏主意, 因为你没有瞄准你想要击中的地方。C、测试集不再反映您最关心的数据 (安全摄像机拍的) 的分布。D、与其余的数据相比,100万公民的数据图像没有一个一致的 x->> y 映射 (类似于纽约市/底特律住房价格的例子, 从讲座)
6. Because pooling layers do not have parameters, they do not affect the backpropagation (derivatives) calculation.
False
7.You have an input volume that is 32x32x16, and apply max pooling with a stride of 2 and a filter size of 2. What is the output volume?
16x16x16
池化层总是2维的
8.Which ones of the following statements on Residual Networks are true? (Check all that apply.)
Using a skip-connection helps the gradient to backpropagate and thus helps you to train deeper networksA ResNet with L layers would have on the order of L2 skip connections in total.The skip-connections compute a complex non-linear function of the input to pass to a deeper layer in the network.The skip-connection makes it easy for the network to learn an identity mapping between the input and the output within the ResNet block.9.You are working on a factory automation task. Your system will see a can of soft-drink coming down a conveyor belt, and you want it to take a picture and decide whether (i) there is a soft-drink can in the image, and if so (ii) its bounding box. Since the soft-drink can is round, the bounding box is always square, and the soft drink can always appears as the same size in the image. There is at most one soft drink can in each image. Here’re some typical images in your training set:
What is the most appropriate set of output units for your neural network?Logistic unit, bx, by10.
Alice proposes to simplify the GRU by always removing the Γu. I.e., setting Γu = 1. Betty proposes to simplify the GRU by removing the Γr. I. e., setting Γr = 1 always. Which of these models is more likely to work without vanishing gradient problems even when trained on very long input sequences?
Betty’s model (removing Γr), because if Γu≈0 for a timestep, the gradient can propagate back through that timestep without much decay.
Yes. For the signal to backpropagate without vanishing, we need c<t> to be highly dependant on c<t−1>.
11.
Suppose you have a 10000 word vocabulary, and are learning 500-dimensional word embeddings.The GloVe model minimizes this objective:
min∑10,000i=1∑10,000j=1f(Xij)(θTiej+bi+b′j−logXij)2Which of these statements are correct? Check all that apply.
θi and ej
should be initialized randomly at the beginning of training
.
No, Logistic Regression doesn't have a hidden layer. If you initialize the weights to zeros, the first example x fed in the logistic regression will output zero but the derivatives of the Logistic Regression depend on the input x (because there's no hidden layer) which is not zero. So at the second iteration, the weights values follow x's distribution and are different from each other if x is not a constant vector.
2 If you have 10,000,000 examples, how would you split the train/dev/test set?
98% train . 1% dev . 1% test
3. 在exponentially weighted average曲线中,减小beta值会让曲线左移。
个人解释:越高的beta值,某点处的值被平均到后面的就越多。减小beta值可以让出现在后面的值较多得“补回来”,视觉上看应该是曲线左移。
4.After setting up your train/dev/test sets, the City Council comes across another 1,000,000 images, called the “citizens’ data”. Apparently the citizens of Peacetopia are so scared of birds that they volunteered to take pictures of the sky and label them, thus contributing these additional 1,000,000 images. These images are different from the distribution of images the City Council had originally given you, but you think it could help your algorithm.You should not add the citizens’ data to the training set, because this will cause the training and dev/test set distributions to become different, thus hurting dev and test set performance. True/False?False
5.市议会的一位成员对机器学习知之甚少, 并认为应该将100万公民的数据图像添加到测试组中。您的意见是:(B、C)A、一个更大的测试集将减慢迭代的速度, 因为在测试集上评估模型的计算费用。B、这将导致开发和测试集分布变得不同。这是一个坏主意, 因为你没有瞄准你想要击中的地方。C、测试集不再反映您最关心的数据 (安全摄像机拍的) 的分布。D、与其余的数据相比,100万公民的数据图像没有一个一致的 x->> y 映射 (类似于纽约市/底特律住房价格的例子, 从讲座)
6. Because pooling layers do not have parameters, they do not affect the backpropagation (derivatives) calculation.
False
7.You have an input volume that is 32x32x16, and apply max pooling with a stride of 2 and a filter size of 2. What is the output volume?
16x16x16
池化层总是2维的
8.Which ones of the following statements on Residual Networks are true? (Check all that apply.)
Using a skip-connection helps the gradient to backpropagate and thus helps you to train deeper networksA ResNet with L layers would have on the order of L2 skip connections in total.The skip-connections compute a complex non-linear function of the input to pass to a deeper layer in the network.The skip-connection makes it easy for the network to learn an identity mapping between the input and the output within the ResNet block.9.You are working on a factory automation task. Your system will see a can of soft-drink coming down a conveyor belt, and you want it to take a picture and decide whether (i) there is a soft-drink can in the image, and if so (ii) its bounding box. Since the soft-drink can is round, the bounding box is always square, and the soft drink can always appears as the same size in the image. There is at most one soft drink can in each image. Here’re some typical images in your training set:
What is the most appropriate set of output units for your neural network?Logistic unit, bx, by10.
Alice proposes to simplify the GRU by always removing the Γu. I.e., setting Γu = 1. Betty proposes to simplify the GRU by removing the Γr. I. e., setting Γr = 1 always. Which of these models is more likely to work without vanishing gradient problems even when trained on very long input sequences?
Betty’s model (removing Γr), because if Γu≈0 for a timestep, the gradient can propagate back through that timestep without much decay.
Yes. For the signal to backpropagate without vanishing, we need c<t> to be highly dependant on c<t−1>.
11.
Suppose you have a 10000 word vocabulary, and are learning 500-dimensional word embeddings.The GloVe model minimizes this objective:
min∑10,000i=1∑10,000j=1f(Xij)(θTiej+bi+b′j−logXij)2Which of these statements are correct? Check all that apply.
θi and ej
should be initialized randomly at the beginning of training
.
相关文章推荐
- 吴恩达深度学习课程deeplearning.ai课程作业:Class 1 Week 4 assignment4_1
- 吴恩达深度学习工程师课程概述
- 吴恩达深度学习课程资料
- 吴恩达Coursera深度学习课程 DeepLearning.ai 编程作业——Optimization Methods(2-2)
- 【吴恩达 Coursera深度学习课程】 Neural Networks and Deep Learning 第一周课后习题
- 吴恩达深度学习课程编程作业(1-2)
- 吴恩达深度学习课程心得
- 深度学习哪家强?吴恩达、Udacity和Fast.ai的课程我们替你分析好了
- 吴恩达神经网络和深度学习课程自学笔记(八)之机器学习策略
- 加拿大银行首席分析师对吴恩达深度学习课程的领悟心得
- 吴恩达Coursera深度学习课程 DeepLearning.ai 编程作业——Initialize parameter(2-1.1)
- 吴恩达深度学习课程deeplearning.ai课程作业:Class 4 Week 4 Face Recognition for the Happy House
- 吴恩达Coursera深度学习课程 DeepLearning.ai 编程作业——Regularization(2-1.2)
- 李飞飞、吴恩达、Bengio等人的15大顶级深度学习课程(转)
- 吴恩达深度学习课程deeplearning.ai课程作业:Class 2 Week 1 1.Initialization
- Operations on word vectors-v2 吴恩达老师深度学习课程第五课第二周编程作业1
- 吴恩达Coursera深度学习课程 DeepLearning.ai 提炼笔记(4-2)-- 深度卷积模型
- 吴恩达Coursera深度学习课程 DeepLearning.ai 提炼笔记(1-3)-- 浅层神经网络(转载)
- 吴恩达神经网络和深度学习课程自学笔记(二)之神经网络基础
- http://www.52nlp.cn/tag/tensorflow Andrew Ng (吴恩达) 深度学习课程小结 tensorflow