本周的作业分为了两部分:
- 卷积神经网络的模型搭建
- 用TensorFlow来训练卷积神经网络
Part1:Convolutional Neural Networks: Step by Step
主要内容:
- convolution funtions:
- Zero Padding
- Convolve window
- Convolution forward
- Convolution backward (optional)
- Pooling functions:
- Pooling forward
- Create mask
- Distribute value
- Pooling backward (optional)
Convolutional Neural Networks
创建CNN的主要函数
1. Zero Padding
先创建一个padding函数,用来输入图像X,输出padding后的图像,这里使用的是np.pad()
函数,
1 | a = np.pad(a, ((0,0), (1,1), (0,0), (3,3), (0,0)), 'constant', constant_values = (..,..)) |
1 |
|
2.Single step of convolution
创建一个单步的卷积运算,也就是一次输入一个切片,大小和卷积核相同,对应元素相乘再求和,最后再加个bias项。
1 | # GRADED FUNCTION: conv_single_step |
3.Convolutional Neural Networks - Forward pass
创建一次完整的卷积过程,也就是利用上面的一次卷积,进行for循环。进行切片的时候,注意边界vert_start, vert_end, horiz_start and horiz_end
这一步应该先弄清楚A_prev,A,W,b的维度,超参数项包括了stride和pad
$$ n_H = \lfloor \frac{n_{H_{prev}} - f + 2 \times pad}{stride} \rfloor +1 $$
$$ n_W = \lfloor \frac{n_{W_{prev}} - f + 2 \times pad}{stride} \rfloor +1 $$
$$ n_C = \text{number of filters used in the convolution}$$
1 | # GRADED FUNCTION: conv_forward |
Pooling layer
创建池化层,注意得到的维度需要向下取整,用int()对float()进行转换
$$ n_H = \lfloor \frac{n_{H_{prev}} - f}{stride} \rfloor +1 $$
$$ n_W = \lfloor \frac{n_{W_{prev}} - f}{stride} \rfloor +1 $$
$$ n_C = n_{C_{prev}}$$
同样需要先进行切边,而后分为max和average两种,分别用np.max和np.mean
1 | def pool_forward(A_prev, hparameters, mode = "max"): |
Backpropagation in convolutional neural networks
卷积神经网络的求导是比较难以理解的,这里有卷积层的求导和池化层的求导。
1.Convolutional layer backward pass
假设经过卷积层后我们的输出$Z = W \times A +b$
那么反向传播过程中需要求的就是$dA,dW,db$,其中$dA$是原输入的数据,包含了原图像中的每一个像素,
而这个时候假设从后面传过来的$dZ$是已经知道的。
1.计算dA
从公式可以看出,$dA = W \times dZ$,具体一点,$dA$的每一个切片就是$W_c$乘上$dZ$在输出图片的每一个像素的求和结果,从矩阵的角度,每一次$W_c\times dZ_{hw}$得到的就是从单个输出的图片像素到输入图片切片(大小为W)的映射。因此公式为:
$$ dA += \sum _{h=0} ^{n_H} \sum_{w=0} ^{n_W} W_c \times dZ_{hw} $$
1 | da_prev_pad[vert_start:vert_end, horiz_start:horiz_end, :] += W[:,:,:,c] * dZ[i, h, w, c] |
2.计算dW
$dW = A \times dZ$,而更具体一点,因为W对Z的每一个像素都是有作用的,所以就等于每一个输入图片的切片乘以对应输出图片像素的导数,然后再求和!
$$ dW_c += \sum _{h=0} ^{n_H} \sum_{w=0} ^ {n_W} a_{slice} \times dZ_{hw} $$
1 | dW[:,:,:,c] += a_slice * dZ[i, h, w, c] |
3.计算db
$$ db = \sum_h \sum_w dZ_{hw} $$
1 | db[:,:,:,c] += dZ[i, h, w, c] |
所以得到以下:
1 | def conv_backward(dZ, cache): |
Pooling layer - backward pass
这里max pooling和average poolling要分开处理。
1. Max pooling - backward pass
假设pool size是$2 \times 2$的,那么,4个像素中只有1个留下来了,其余的都没有效果了,所以在max pooling中,从后面传递过来的导数值,只作用在max的那个元素,而且继续往前传递,不做任何改动,在其余3个元素的导数都是0。
创建一个mask矩阵,让最大值为1,其余的都为0,这样子就可以作为一个映射矩阵向前映射了。
$$ X = \begin{bmatrix}1 && 3 \\ 4 && 2 \end{bmatrix} \quad \rightarrow \quad M =\begin{bmatrix}
0 && 0 \\
1 && 0
\end{bmatrix}$$
1 | def create_mask_from_window(x): |
2. Average pooling - backward pass
和max不同,average pooling相当于把backward传过来的值分成了$n_H \times n_W$等分。所以要计算的参数就比max pooling多很多了,这也就是为什么一般都用max pooling,不用average pooling
$$ dZ = 1 \quad \rightarrow \quad dZ =\begin{bmatrix}
1/4 && 1/4 \\
1/4 && 1/4
\end{bmatrix}$$
1 | def distribute_value(dz, shape): |
结合两种方法:
1 | def pool_backward(dA, cache, mode = "max"): |
Part2:Convolutional Neural Networks: Application
用TensorFlow来搭建卷积神经网络。
1.Create placeholders
先创建placeholders,用来训练中传递X,Y
1 |
|
2.Initialize parameters
用来初始化参数,主要是W1,W2,在这里就没有用b了
用W = tf.get_variable("W", [1,2,3,4], initializer = ...)
initializer 用tf.contrib.layers.xavier_initializer
1 | # GRADED FUNCTION: initialize_parameters |
记得这只是创建了图而已,并没有真正的初始化参数,在执行中还需要
init = tf.global_variables_initializer()
sess_test.run(init)
3. Forward propagation
模型为:CONV2D -> RELU -> MAXPOOL -> CONV2D -> RELU -> MAXPOOL -> FLATTEN -> FULLYCONNECTED
1 | - Conv2D: stride 1, padding is "SAME" |
用到的函数:
tf.nn.conv2d(X,W1, strides = [1,s,s,1], padding = ‘SAME’): given an input $X$ and a group of filters $W1$, this function convolves $W1$’s filters on X. The third input ([1,f,f,1]) represents the strides for each dimension of the input (m, n_H_prev, n_W_prev, n_C_prev). You can read the full documentation here
tf.nn.max_pool(A, ksize = [1,f,f,1], strides = [1,s,s,1], padding = ‘SAME’): given an input A, this function uses a window of size (f, f) and strides of size (s, s) to carry out max pooling over each window. You can read the full documentation here
tf.nn.relu(Z1): computes the elementwise ReLU of Z1 (which can be any shape). You can read the full documentation here.
tf.contrib.layers.flatten(P): given an input P, this function flattens each example into a 1D vector it while maintaining the batch-size. It returns a flattened tensor with shape [batch_size, k]. You can read the full documentation here.
tf.contrib.layers.fully_connected(F, num_outputs): given a the flattened input F, it returns the output computed using a fully connected layer. You can read the full documentation here.
In the last function above (tf.contrib.layers.fully_connected
), the fully connected layer automatically initializes weights in the graph and keeps on training them as you train the model. Hence, you did not need to initialize those weights when initializing the parameters.
1 | # GRADED FUNCTION: forward_propagation |
4. Compute cost
- tf.nn.softmax_cross_entropy_with_logits(logits = Z3, labels = Y): computes the softmax entropy loss. This function both computes the softmax activation function as well as the resulting loss. You can check the full documentation here.这个函数已经包含了计算softmax,还有求cross-entropy两件事了。
- tf.reduce_mean: computes the mean of elements across dimensions of a tensor. Use this to sum the losses over all the examples to get the overall cost. You can check the full documentation here.
1 | # GRADED FUNCTION: compute_cost |
5. Model
把前面的函数都结合起来,创建一个完整的模型。
其中random_mini_batches()
已经给我们了,优化器使用了
optimizer = tf.train.AdamOptimizer(learning_rate).minimize(cost)
1 | # GRADED FUNCTION: model |
得到效果如图: