[关闭]
@Perfect-Demo 2018-05-01T08:43:04.000000Z 字数 15230 阅读 2642

deep_learning_month4_week1_convolution_model_step_by_step

机器学习深度学习

代码已上传github:
https://github.com/PerfectDemoT/my_deeplearning_homework


说明:本文解释了如何一步步建立CNN的卷积层,不过只包含一步步建立的函数,并没有形成一个能够使用的模型,如果大佬们有兴趣,也是可以自己动手将其整合为一个比较强大的模型的

本文描述了如何正向进行卷积,池化(包含最大池化以及平均池化)。以及反响传播时如何对各变量(W,b,A)求导(dA,dW,db),以及对于池化的反向传播

1. 首先让我们来看看正向求卷积的操作

1. 先导入包,并设置一下绘图

代码如下,此不赘述

  1. import numpy as np
  2. import h5py
  3. import matplotlib.pyplot as plt
  4. plt.rcParams['figure.figsize'] = (5.0, 4.0) # set default size of plots
  5. plt.rcParams['image.interpolation'] = 'nearest'
  6. plt.rcParams['image.cmap'] = 'gray'
  7. np.random.seed(1)

2. 接下来首先进行Zero-padding操作

padding操作就是相当于在图的四周加上规定的像素。这里给一段解释吧:

It allows you to use a CONV layer without necessarily shrinking the height and width of the volumes. This is important for building deeper networks, since otherwise the height/width would shrink as you go to deeper layers. An important special case is the "same" convolution, in which the height/width is exactly preserved after one layer.
It helps us keep more of the information at the border of an image. Without padding, very few values at the next layer would be affected by pixels as the edges of an image

下面我们来进行zero-padding操作,并且给出代码:

  1. # GRADED FUNCTION: zero_pad
  2. def zero_pad(X, pad):
  3. """
  4. Pad with zeros all images of the dataset X. The padding is applied to the height and width of an image,
  5. as illustrated in Figure 1.
  6. Argument:
  7. X -- python numpy array of shape (m, n_H, n_W, n_C) representing a batch of m images
  8. pad -- integer, amount of padding around each image on vertical and horizontal dimensions
  9. Returns:
  10. X_pad -- padded image of shape (m, n_H + 2*pad, n_W + 2*pad, n_C)
  11. """
  12. ### START CODE HERE ### (≈ 1 line)
  13. X_pad = np.pad(X , ((0 , 0) , (pad , pad) , (pad , pad) , (0 , 0)) , 'constant' , constant_values = 0)
  14. #说明一下这个用法:由于X是一个四维的,所以后面的也是四个二维的参数,第二个中来嗯个pad表示矩阵左右是否需要加上pad列
  15. ### END CODE HERE ###
  16. return X_pad

说明一下np.pad()的用法:

由于X是一个四维的,所以后面的也是四个二维的参数,第二个中来嗯个pad表示矩阵左右是否需要加上pad列

所以输出一下是这样:

  1. np.random.seed(1)
  2. x = np.random.randn(4, 3, 3, 2)
  3. x_pad = zero_pad(x, 2)
  4. print ("x.shape =", x.shape)
  5. print ("x_pad.shape =", x_pad.shape)
  6. print ("x[1,1] =", x[1,1])
  7. print ("x_pad[1,1] =", x_pad[1,1])
  8. fig, axarr = plt.subplots(1, 2)
  9. axarr[0].set_title('x')
  10. axarr[0].imshow(x[0,:,:,0])
  11. axarr[1].set_title('x_pad')
  12. axarr[1].imshow(x_pad[0,:,:,0])

下面是显示:

  1. x.shape = (4, 3, 3, 2)
  2. x_pad.shape = (4, 7, 7, 2)
  3. x[1,1] = [[ 0.90085595 -0.68372786]
  4. [-0.12289023 -0.93576943]
  5. [-0.26788808 0.53035547]]
  6. x_pad[1,1] = [[0. 0.]
  7. [0. 0.]
  8. [0. 0.]
  9. [0. 0.]
  10. [0. 0.]
  11. [0. 0.]
  12. [0. 0.]]

3. 下面进行简单的卷积步骤演示

(这里怕是要让你失望了,因为这一步暂时只是一个位乘。。。)
所以我直接上代码吧,,,没啥可解释的。。。

  1. def conv_single_step(a_slice_prev, W, b):
  2. """
  3. Apply one filter defined by parameters W on a single slice (a_slice_prev) of the output activation
  4. of the previous layer.
  5. Arguments:
  6. a_slice_prev -- slice of input data of shape (f, f, n_C_prev)
  7. W -- Weight parameters contained in a window - matrix of shape (f, f, n_C_prev)
  8. b -- Bias parameters contained in a window - matrix of shape (1, 1, 1)
  9. Returns:
  10. Z -- a scalar value, result of convolving the sliding window (W, b) on a slice x of the input data
  11. """
  12. ### START CODE HERE ### (≈ 2 lines of code)
  13. # Element-wise product between a_slice and W. Add bias.
  14. s = np.multiply(a_slice_prev, W) + b
  15. # Sum over all entries of the volume s
  16. Z = np.sum(s)
  17. ### END CODE HERE ###
  18. return Z

输出一下:

  1. np.random.seed(1)
  2. a_slice_prev = np.random.randn(4, 4, 3)
  3. W = np.random.randn(4, 4, 3)
  4. b = np.random.randn(1, 1, 1)
  5. Z = conv_single_step(a_slice_prev, W, b)
  6. print("Z =", Z)

长这样

  1. Z = -23.16021220252078

4. 现在开始真正的卷积网络前向传播了

(当然,这里不包含全连接层)

注意几个公式:



另外对于slice,我们姑且叫它矩阵切片吧,我们在代码中有确切的导出过程,那个一定要注意到。
下面看代码:

  1. # GRADED FUNCTION: conv_forward
  2. def conv_forward(A_prev, W, b, hparameters):
  3. """
  4. Implements the forward propagation for a convolution function
  5. Arguments:
  6. A_prev -- output activations of the previous layer, numpy array of shape (m, n_H_prev, n_W_prev, n_C_prev)
  7. W -- Weights, numpy array of shape (f, f, n_C_prev, n_C)
  8. b -- Biases, numpy array of shape (1, 1, 1, n_C)
  9. hparameters -- python dictionary containing "stride" and "pad"
  10. Returns:
  11. Z -- conv output, numpy array of shape (m, n_H, n_W, n_C)
  12. cache -- cache of values needed for the conv_backward() function
  13. """
  14. ### START CODE HERE ###
  15. # Retrieve dimensions from A_prev's shape (≈1 line)
  16. #(m, n_H_prev, n_W_prev, n_C_prev) = A_prev.shape
  17. (m, n_H_prev, n_W_prev, n_C_prev) = A_prev.shape
  18. # Retrieve dimensions from W's shape (≈1 line)
  19. (f, f, n_C_prev, n_C) = W.shape
  20. (f, f, n_C_prev, n_C) = W.shape
  21. # Retrieve information from "hparameters" (≈2 lines)
  22. stride = hparameters['stride']
  23. pad = hparameters['pad']
  24. # Compute the dimensions of the CONV output volume using the formula given above. Hint: use int() to floor. (≈2 lines)
  25. n_H = 1 + int((n_H_prev + 2 * pad - f) / stride)
  26. n_W = 1 + int((n_W_prev + 2 * pad - f) / stride)
  27. # Initialize the output volume Z with zeros. (≈1 line)
  28. Z = np.zeros((m, n_H, n_W, n_C))
  29. # Create A_prev_pad by padding A_prev
  30. A_prev_pad = zero_pad(A_prev, pad)
  31. #接下来开始卷积操作
  32. for i in range(m): #对每一个训练样例 # loop over the batch of training examples
  33. a_prev_pad = A_prev_pad[i] #选择该训练样例 # Select ith training example's padded activation
  34. for h in range(n_H): # loop over vertical axis of the output volume
  35. for w in range(n_W): # loop over horizontal axis of the output volume
  36. for c in range(n_C): # loop over channels (= #filters) of the output volume
  37. #注意:上面的i,h,w,c都是从0开始的
  38. # Find the corners of the current "slice" (≈4 lines)
  39. #聚焦起始与结束位置
  40. vert_start = h * stride
  41. vert_end = vert_start + f
  42. horiz_start = w * stride
  43. horiz_end = horiz_start + f
  44. # Use the corners to define the (3D) slice of a_prev_pad (See Hint above the cell). (≈1 line)
  45. a_slice_prev = a_prev_pad[vert_start:vert_end, horiz_start:horiz_end, :]
  46. # Convolve the (3D) slice with the correct filter W and bias b, to get back one output neuron. (≈1 line)
  47. Z[i, h, w, c] = np.sum(np.multiply(a_slice_prev, W[:, :, :, c]) + b[:, :, :, c])
  48. ### END CODE HERE ###
  49. # Making sure your output shape is correct
  50. assert(Z.shape == (m, n_H, n_W, n_C))
  51. # Save information in "cache" for the backprop
  52. cache = (A_prev, W, b, hparameters)
  53. return Z, cache

49-56行,就是矩阵切片得出的地方么,聚焦该起始与结束位置的导出。
(另外部分的解释,在代码的注释里有,注意看)

下面我们输出看看:

  1. np.random.seed(1)
  2. A_prev = np.random.randn(10,4,4,3)
  3. W = np.random.randn(2,2,3,8)
  4. b = np.random.randn(1,1,1,8)
  5. hparameters = {"pad" : 2,
  6. "stride": 1}
  7. Z, cache_conv = conv_forward(A_prev, W, b, hparameters)
  8. print("Z's mean =", np.mean(Z))
  9. print("cache_conv[0][1][2][3] =", cache_conv[0][1][2][3])

结果如下:

  1. Z's mean = 0.15585932488906465
  2. cache_conv[0][1][2][3] = [-0.20075807 0.18656139 0.41005165]

5. 接下来是池化了

(注意有最大池化和平均池化,据NG说的最常用的是最大池化)

同样,我们贴出对于过滤器位置数学公式:



对于最大池化以及平均池化的解释如下:

Max-pooling layer: slides an (f,ff,f) window over the input and stores the max value of the window in the output.

Average-pooling layer: slides an (f,ff,f) window over the input and stores the average value of the window in the output.

(这里我默认大家都看过课程了,并且上面已经给出了解释,所以此不赘述)

代码如下:

  1. # GRADED FUNCTION: pool_forward
  2. def pool_forward(A_prev, hparameters, mode = "max"):
  3. """
  4. Implements the forward pass of the pooling layer
  5. Arguments:
  6. A_prev -- Input data, numpy array of shape (m, n_H_prev, n_W_prev, n_C_prev)
  7. hparameters -- python dictionary containing "f" and "stride"
  8. mode -- the pooling mode you would like to use, defined as a string ("max" or "average")
  9. Returns:
  10. A -- output of the pool layer, a numpy array of shape (m, n_H, n_W, n_C)
  11. cache -- cache used in the backward pass of the pooling layer, contains the input and hparameters
  12. """
  13. # Retrieve dimensions from the input shape
  14. (m, n_H_prev, n_W_prev, n_C_prev) = A_prev.shape
  15. # Retrieve hyperparameters from "hparameters"
  16. f = hparameters["f"]
  17. stride = hparameters["stride"]
  18. # Define the dimensions of the output
  19. n_H = int(1 + (n_H_prev - f) / stride)
  20. n_W = int(1 + (n_W_prev - f) / stride)
  21. n_C = n_C_prev
  22. # Initialize output matrix A
  23. A = np.zeros((m, n_H, n_W, n_C))
  24. ### START CODE HERE ###
  25. for i in range(m): # loop over the training examples
  26. for h in range(n_H): # loop on the vertical axis of the output volume
  27. for w in range(n_W): # loop on the horizontal axis of the output volume
  28. for c in range (n_C): # loop over the channels of the output volume
  29. # Find the corners of the current "slice" (≈4 lines)
  30. vert_start = h * stride
  31. vert_end = vert_start + f
  32. horiz_start = w * stride
  33. horiz_end = horiz_start + f
  34. # Use the corners to define the current slice on the ith training example of A_prev, channel c. (≈1 line)
  35. a_prev_slice = A_prev[i, vert_start:vert_end, horiz_start:horiz_end, c]
  36. # Compute the pooling operation on the slice. Use an if statment to differentiate the modes. Use np.max/np.mean.
  37. if mode == "max":
  38. A[i, h, w, c] = np.max(a_prev_slice)
  39. elif mode == "average":
  40. A[i, h, w, c] = np.mean(a_prev_slice)
  41. ### END CODE HERE ###
  42. # Store the input and hparameters in "cache" for pool_backward()
  43. cache = (A_prev, hparameters)
  44. # Making sure your output shape is correct
  45. assert(A.shape == (m, n_H, n_W, n_C))
  46. return A, cache

大家可以看到,对于池化,其实前面和卷积有很相似的地方,都是对过滤器的定位。
另外,我们这里调用了np.max以及np.mean,如果不了解的同学,需要去了解一下numpy库里的这两个方法。

下面可以看看这个函数调用:

  1. np.random.seed(1)
  2. A_prev = np.random.randn(2, 4, 4, 3)
  3. hparameters = {"stride" : 1, "f": 4}
  4. A, cache = pool_forward(A_prev, hparameters)
  5. print("mode = max")
  6. print("A =", A)
  7. print()
  8. A, cache = pool_forward(A_prev, hparameters, mode = "average")
  9. print("mode = average")
  10. print("A =", A)

输出如下:

  1. mode = max
  2. A = [[[[ 1.74481176 1.6924546 2.10025514]]]
  3. [[[ 1.19891788 1.51981682 2.18557541]]]]
  4. mode = average
  5. A = [[[[-0.09498456 0.11180064 -0.14263511]]]
  6. [[[-0.09525108 0.28325018 0.33035185]]]]

2. 反向传播

1. 对于dA , dW , db .求解

公式贴出如下:



对于他们分别解释一下:

dA_prev : gradient of the cost with respect to the input of the conv layer (A_prev),
numpy array of shape (m, n_H_prev, n_W_prev, n_C_prev)

dW : gradient of the cost with respect to the weights of the conv layer (W)
numpy array of shape (f, f, n_C_prev, n_C)

db : gradient of the cost with respect to the biases of the conv layer (b)
numpy array of shape (1, 1, 1, n_C)

然后代码如下:

  1. def conv_backward(dZ, cache):
  2. """
  3. Implement the backward propagation for a convolution function
  4. """
  5. ### START CODE HERE ###
  6. # Retrieve information from "cache"
  7. (A_prev, W, b, hparameters) = cache
  8. # Retrieve dimensions from A_prev's shape
  9. (m, n_H_prev, n_W_prev, n_C_prev) = A_prev.shape
  10. # Retrieve dimensions from W's shape
  11. (f, f, n_C_prev, n_C) = W.shape
  12. # Retrieve information from "hparameters"
  13. stride = hparameters['stride']
  14. pad = hparameters['pad']
  15. # Retrieve dimensions from dZ's shape
  16. (m, n_H, n_W, n_C) = dZ.shape
  17. # Initialize dA_prev, dW, db with the correct shapes
  18. dA_prev = np.zeros((m, n_H_prev, n_W_prev, n_C_prev))
  19. dW = np.zeros((f, f, n_C_prev, n_C))
  20. db = np.zeros((1, 1, 1, n_C))
  21. # Pad A_prev and dA_prev
  22. A_prev_pad = zero_pad(A_prev, pad)
  23. dA_prev_pad = zero_pad(dA_prev, pad)
  24. for i in range(m): # loop over the training examples
  25. # select ith training example from A_prev_pad and dA_prev_pad
  26. a_prev_pad = A_prev_pad[i]
  27. da_prev_pad = dA_prev_pad[i]
  28. for h in range(n_H): # loop over vertical axis of the output volume
  29. for w in range(n_W): # loop over horizontal axis of the output volume
  30. for c in range(n_C): # loop over the channels of the output volume
  31. # Find the corners of the current "slice"
  32. vert_start = h * stride
  33. vert_end = vert_start + f
  34. horiz_start = w * stride
  35. horiz_end = horiz_start + f
  36. # Use the corners to define the slice from a_prev_pad
  37. a_slice = a_prev_pad[vert_start:vert_end, horiz_start:horiz_end, :]
  38. # Update gradients for the window and the filter's parameters using the code formulas given above
  39. da_prev_pad[vert_start:vert_end, horiz_start:horiz_end, :] += W[:,:,:,c] * dZ[i, h, w, c]
  40. dW[:,:,:,c] += a_slice * dZ[i, h, w, c]
  41. db[:,:,:,c] += dZ[i, h, w, c]
  42. # Set the ith training example's dA_prev to the unpaded da_prev_pad (Hint: use X[pad:-pad, pad:-pad, :])
  43. dA_prev[i, :, :, :] = dA_prev_pad[i, pad:-pad, pad:-pad, :]
  44. ### END CODE HERE ###
  45. # Making sure your output shape is correct
  46. assert(dA_prev.shape == (m, n_H_prev, n_W_prev, n_C_prev))
  47. return dA_prev, dW, db

其中的参数dZ 以及 cache

dZ : gradient of the cost with respect to the output of the conv layer (Z), numpy array of shape (m, n_H, n_W, n_C)
cache : cache of values needed for the conv_backward(), output of conv_forward()

这里按照公式走,仔细看看会发现是这么回事,不过确实挺复杂,毕竟有很多个矩阵切片操作

输出看看

  1. np.random.seed(1)
  2. dA, dW, db = conv_backward(Z, cache_conv)
  3. print("dA_mean =", np.mean(dA))
  4. print("dW_mean =", np.mean(dW))
  5. print("db_mean =", np.mean(db))

结果如下:

  1. dA_mean = 9.60899067587
  2. dW_mean = 10.5817412755
  3. db_mean = 76.3710691956

2. 池化层的反向传播

1. 对于最大池化

接下来将创建一个叫create_mask_from_window的函数,传入参数是一个矩阵切片(其实就是一个小矩阵)可以返回一组bool矩阵,即如果该位置是最大值,就为True,否则是False。

直接上实例,看看输出还是更好理解:

  1. def create_mask_from_window(x):
  2. """
  3. Creates a mask from an input matrix x, to identify the max entry of x.
  4. Arguments:
  5. x -- Array of shape (f, f)
  6. Returns:
  7. mask -- Array of the same shape as window, contains a True at the position corresponding to the max entry of x.
  8. """
  9. ### START CODE HERE ### (≈1 line)
  10. mask = (x == np.max(x))
  11. ### END CODE HERE ###
  12. return mask

输出

  1. np.random.seed(1)
  2. x = np.random.randn(2,3)
  3. mask = create_mask_from_window(x)
  4. print('x = ', x)
  5. print("mask = ", mask)

结果是:

  1. x = [[ 1.62434536 -0.61175641 -0.52817175]
  2. [-1.07296862 0.86540763 -2.3015387 ]]
  3. mask = [[ True False False]
  4. [False False False]]

这样是不是就好理解多了呢。
对了,这个最后是用在最大池化的反向传播里的。


2. 对于平均池化

接下来这个是对于平均池化的
我们会创建一个distribute_value函数,其中参数是dz,以及切片矩阵的规模,最后返回的是一个shape规模的矩阵,里面的值为

还是看看实例把,这样更好理解

  1. def distribute_value(dz, shape):
  2. """
  3. Distributes the input value in the matrix of dimension shape
  4. Arguments:
  5. dz -- input scalar
  6. shape -- the shape (n_H, n_W) of the output matrix for which we want to distribute the value of dz
  7. Returns:
  8. a -- Array of size (n_H, n_W) for which we distributed the value of dz
  9. """
  10. ### START CODE HERE ###
  11. # Retrieve dimensions from shape (≈1 line)
  12. (n_H, n_W) = shape
  13. # Compute the value to distribute on the matrix (≈1 line)
  14. average = dz / (n_H * n_W)
  15. # Create a matrix where every entry is the "average" value (≈1 line)
  16. a = np.ones(shape) * average
  17. ### END CODE HERE ###
  18. return a

输出一下:

  1. a = distribute_value(2, (2,2))
  2. print('distributed value =', a)

结果:

  1. distributed value = [[ 0.5 0.5]
  2. [ 0.5 0.5]]

好了,以上就是对于两种池化的反向传播的内部原理的代码实现,然后我们现在需要将他们合起来,看下面

3. 池化反向传播整合

我们先直接看看代码

  1. def pool_backward(dA, cache, mode = "max"):
  2. """
  3. Implements the backward pass of the pooling layer
  4. Arguments:
  5. dA -- gradient of cost with respect to the output of the pooling layer, same shape as A
  6. cache -- cache output from the forward pass of the pooling layer, contains the layer's input and hparameters
  7. mode -- the pooling mode you would like to use, defined as a string ("max" or "average")
  8. Returns:
  9. dA_prev -- gradient of cost with respect to the input of the pooling layer, same shape as A_prev
  10. """
  11. ### START CODE HERE ###
  12. # Retrieve information from cache (≈1 line)
  13. (A_prev, hparameters) = cache
  14. # Retrieve hyperparameters from "hparameters" (≈2 lines)
  15. stride = hparameters['stride']
  16. f = hparameters['f']
  17. # Retrieve dimensions from A_prev's shape and dA's shape (≈2 lines)
  18. m, n_H_prev, n_W_prev, n_C_prev = A_prev.shape
  19. m, n_H, n_W, n_C = dA.shape
  20. # Initialize dA_prev with zeros (≈1 line)
  21. dA_prev = np.zeros_like(A_prev)
  22. for i in range(m): # loop over the training examples
  23. # select training example from A_prev (≈1 line)
  24. a_prev = A_prev[i]
  25. for h in range(n_H): # loop on the vertical axis
  26. for w in range(n_W): # loop on the horizontal axis
  27. for c in range(n_C): # loop over the channels (depth)
  28. # Find the corners of the current "slice" (≈4 lines)
  29. vert_start = h * stride
  30. vert_end = vert_start + f
  31. horiz_start = w * stride
  32. horiz_end = horiz_start + f
  33. # Compute the backward propagation in both modes.
  34. if mode == "max":
  35. # Use the corners and "c" to define the current slice from a_prev (≈1 line)
  36. a_prev_slice = a_prev[vert_start:vert_end, horiz_start:horiz_end, c]
  37. # Create the mask from a_prev_slice (≈1 line)
  38. mask = create_mask_from_window(a_prev_slice)
  39. # Set dA_prev to be dA_prev + (the mask multiplied by the correct entry of dA) (≈1 line)
  40. dA_prev[i, vert_start: vert_end, horiz_start: horiz_end, c] += mask * dA[i, vert_start, horiz_start, c]
  41. elif mode == "average":
  42. # Get the value a from dA (≈1 line)
  43. da = dA[i, vert_start, horiz_start, c]
  44. # Define the shape of the filter as fxf (≈1 line)
  45. shape = (f, f)
  46. # Distribute it to get the correct slice of dA_prev. i.e. Add the distributed value of da. (≈1 line)
  47. dA_prev[i, vert_start: vert_end, horiz_start: horiz_end, c] += distribute_value(da, shape)
  48. ### END CODE ###
  49. # Making sure your output shape is correct
  50. assert(dA_prev.shape == A_prev.shape)
  51. return dA_prev

个人认为这里还是有不少地方需要注意的,参数也非常多,(然后等我自己把这里完全缕清楚了我再写一写理解吧,现在我怕误人子弟。。。也欢迎大佬提意见,可以上简书,我的这些图片几乎都是上传在简书的云端,可以通过图片链接找到我,或者简书搜 PerfectDemoT)

然后输出:

  1. np.random.seed(1)
  2. A_prev = np.random.randn(5, 5, 3, 2)
  3. hparameters = {"stride" : 1, "f": 2}
  4. A, cache = pool_forward(A_prev, hparameters)
  5. dA = np.random.randn(5, 4, 2, 2)
  6. dA_prev = pool_backward(dA, cache, mode = "max")
  7. print("mode = max")
  8. print('mean of dA = ', np.mean(dA))
  9. print('dA_prev[1,1] = ', dA_prev[1,1])
  10. print()
  11. dA_prev = pool_backward(dA, cache, mode = "average")
  12. print("mode = average")
  13. print('mean of dA = ', np.mean(dA))
  14. print('dA_prev[1,1] = ', dA_prev[1,1])

输出结果:

  1. mode = max
  2. mean of dA = 0.145713902729
  3. dA_prev[1,1] = [[ 0. 0. ]
  4. [ 5.05844394 -1.68282702]
  5. [ 0. 0. ]]
添加新批注
在作者公开此批注前,只有你和作者可见。
回复批注