@vivounicorn 2020-03-24T07:23:43.000000Z 字数 97806 阅读 14807

机器学习与人工智能技术分享-第五章 深度神经网络

第五章 机器学习 CNN 深度神经网络 模型可视化


5. 深度神经网络

深度学习是基于多层神经网络的一种对数据进行自动表征学习的框架,能使人逐步摆脱传统的人工特征提取过程,它的基础之一是distributed representation,读论文时注意以下概念区分:

  • Distributional representation
    Distributional representation是基于某种分布假设和上下文共现的一类表示方法,比如,对于词的表示来说:有相似意义的词具有相似的分布。
    几类常见的Distributional representation模型:
  • Distributed representation
    Distributed representation是对实体(比如:词、车系编号、微博用户id等等)稠密、低维、实数的向量表示,也就是常说的embedding,它不需要做分布假设,向量的每个维度代表实体在某个空间下的隐含特征。
    几类常见的Distributed representation模型:
    • Collobert and Weston embeddings
    • HLBL embeddings

关于Distributional representation和Distributed representation以及几个相关概念,看论文Word representations:
A simple and general method for semi-supervised learning

5.1 反向传播







5.2 卷积网络结构演化史


5.3 CNN基本原理


5.3.1 Sigmoid激活函数


Logistic函数最早是Pierre François Verhulst在研究人口增长问题时提出的,由于其强悍的普适性(从概率角度的理解见前面对Logistic Regression的讲解)而被广泛应用(在传统机器学习中派生出Logistic Regression),但是实践中,它作为激活函数有两个重要缺点:

  • 梯度消失问题(Vanishing Gradient Problem)
  • 激活输出非0均值问题
    假设一个样本一个样本的学习,当前层输出非0均值信号给下一层神经元时:如果输入值大于0,则后续通过权重计算出的梯度也大于0,反之亦然,这会导致整个网络训练速度变慢,虽然采用batch的方式训练会缓解这个问题,但毕竟在训练中是拖后腿的,所以Yann LeCun在《Efficient BackPro》一文中也提到了解决的trick。

Tanh函数是另外一种Sigmoid函数,它的输出是0均值的,Yann LeCun给出的一种经验激活函数形式为:



5.3.2 输入层


5.3.3 卷积层



其中Complex Conjugate

卷积层的作用:当数据及其周边有局部关联性时可以起到滤波、去噪、找特征的作用;每一个卷积核做特征提取得到结果称为feature map,利用不同卷积核做卷积会得到一系列feature map,这些feature map大小为长深度(卷积核的个数)并作为下一层的输入。

  • 平滑
  • 滤波
  • 投影
    卷积是个内积操作,如果把模板(卷积核)拉直后看做一个基向量,那么滑动窗口每滑动一次就会产生一个向量,把这个向量往基向量上做投影就得到feature map,如果模板有多个,则组成一组基,投影后得到一组feature map。


5.3.4 Zero-Padding


大家如果使用Tenserflow会知道它的padding参数有两个值:SAME,代表做类似上图的Zero padding,使得输入的feature map和输出的feature map有相同的大小;VALID,代表不做padding操作。

5.3.5 采样层(pooling)



另外,如果卷积层的下一层是pooling层,那么每个feature map都会做pooling,与人类行为相比,pooling可以看做是观察图像某个特征区域是否有某种特性,对这个区域而言不关心这个特性具体表现在哪个位置(比如:看一个人脸上某个局部区域是否有个痘痘)。

5.3.6 全连接样层


5.3.7 参数求解



  • 全连接层


  • 卷积层












    假设下采样(pooling)层处于第层且feature map大小为3×3,其下一层为卷积层处于第层且通过两个2×2卷积核得到了两个feature map(蓝色虚框框住的网络结构)。




5.3.8 CNN在NLP领域应用实例

在NLP领域,文本分类是一类常用应用,传统方法是人工提取类似n-gram的各种特征以及各种交叉组合。文本类似图像天然有一种局部相关性,想到利用CNN做一种End to End的分类器,把提特征的工作交给模型。
对于一个句子,它是一维的,无法像图像一样直接处理,因此需要通过distributed representation learning得到词向量,或者在模型第一层增加一个embedding层起到类似作用,这样一个句子就变成二维的了:

1、预先训练好的结果,例如使用已经训练好的word2vec模型,相关资料:Using pre-trained word embeddings in a Keras model

  1. def build_embedding_cnn(max_caption_len, vocab_size):
  2. # 二分类问题
  3. nb_classes = 2
  4. # 词向量维度
  5. word_dim = 256
  6. # 卷积核个数
  7. nb_filters = 64
  8. # 使用max pooling的窗口大小
  9. nb_pool = 2
  10. # 卷积核大小
  11. kernel_size = 5
  12. # 模型结构定义
  13. model = Sequential()
  14. # 第一层是embedding层
  15. model.add(Embedding(output_dim=word_dim, input_dim=vocab_size, input_length=max_caption_len, name='main_input'))
  16. model.add(Dropout(0.5))
  17. # 第二层是激活函数为Relu的卷积层
  18. model.add(Convolution1D(nb_filters, kernel_size))
  19. model.add(Activation('relu'))
  20. # 第三层是max pooling层
  21. model.add(MaxPooling1D(nb_pool))
  22. model.add(Dropout(0.5))
  23. model.add(Flatten())
  24. # 第四层是全连接层
  25. model.add(Dense(256))
  26. model.add(Activation('relu'))
  27. model.add(Dropout(0.3))
  28. # 第五层是输出层
  29. model.add(Dense(nb_classes))
  30. model.add(Activation('softmax'))
  31. # 损失函数采用交叉熵,优化算法采用adadelta
  32. model.compile(loss='categorical_crossentropy',
  33. optimizer='adadelta',
  34. metrics=['accuracy'])
  35. return model



5.4 LeNet-5

最初的网络结构来源于论文:《Gradient-based learning applied to document recognition》(论文里使用原始未做规范化的数据时,INPUT是32×32的),我用以下结构做说明:

LeNet-5一共有8层:1个输入层+3个卷积层(C1、C3、C5)+2个下采样层(S2、S4)+1个全连接层(F6)+1个输出层,每层有多个feature map(自动提取的多组特征)。

5.4.1 输入层


5.4.2 C1卷积层

由6个feature map组成,每个feature map由5×5卷积核生成(feature map中每个神经元与输入层的5×5区域像素相连),考虑每个卷积核的bias,该层需要学习的参数个数为:(5×5+1)×6=156个,神经元连接数为:156×24×24=89856个。

5.4.3 S2下采样层

该层每个feature map一一对应上一层的feature map,由于每个单元的2×2感受野采用不重叠方式移动,所以会产生6个大小为12×12的下采样feature map,如果采用Max Pooling/Mean Pooling,则该层需要学习的参数个数为0个(如果采用非等权下采样——即采样核有权重,则该层需要学习的参数个数为:(2×2+1)×6=30个),神经元连接数为:30×12×12=4320个。

5.4.4 C3卷积层

这层略微复杂,S2神经元与C3是多对多的关系,比如最简单方式:用S2的所有feature map与C3的所有feature map做全连接(也可以对S2抽样几个feature map出来与C3某个feature map连接),这种全连接方式下:6个S2的feature map使用6个独立的5×5卷积核得到C3中1个feature map(生成每个feature map时对应一个bias),C3中共有16个feature map,所以该层需要学习的参数个数为:(5×5×6+1)×16=2416个,神经元连接数为:2416×8×8=154624个。

5.4.5 S4下采样层

同S2,如果采用Max Pooling/Mean Pooling,则该层需要学习的参数个数为0个,神经元连接数为:(2×2+1)×16×4×4=1280个。

5.4.6 C5卷积层

类似C3,用S4的所有feature map与C5的所有feature map做全连接,这种全连接方式下:16个S4的feature map使用16个独立的1×1卷积核得到C5中1个feature map(生成每个feature map时对应一个bias),C5中共有120个feature map,所以该层需要学习的参数个数为:(1×1×16+1)×120=2040个,神经元连接数为:2040个。

5.4.7 F6全连接层


5.4.8 输出层


Minist(Modified NIST)数据集下使用LeNet-5的训练可视化


5.4.9 LeNet-5代码实践

  1. import copy
  2. import numpy as np
  3. import pandas as pd
  4. import matplotlib
  5. matplotlib.use("Agg")
  6. import matplotlib.pyplot as plt
  7. from matplotlib.pyplot import plot,savefig
  8. from keras.datasets import mnist, cifar10
  9. from keras.models import Sequential, Graph
  10. from keras.layers.core import Dense, Dropout, Activation, Flatten, Reshape
  11. from keras.optimizers import SGD, RMSprop
  12. from keras.utils import np_utils
  13. from keras.regularizers import l2
  14. from keras.layers.convolutional import Convolution2D, MaxPooling2D, ZeroPadding2D, AveragePooling2D
  15. from keras.callbacks import EarlyStopping
  16. from keras.preprocessing.image import ImageDataGenerator
  17. from keras.layers.normalization import BatchNormalization
  18. import tensorflow as tf
  19. tf.python.control_flow_ops = tf
  20. from PIL import Image
  21. def build_LeNet5():
  22. model = Sequential()
  23. model.add(Convolution2D(6, 5, 5, border_mode='valid', input_shape = (28, 28, 1), dim_ordering='tf'))
  24. model.add(MaxPooling2D(pool_size=(2, 2)))
  25. model.add(Activation("relu"))
  26. model.add(Convolution2D(16, 5, 5, border_mode='valid'))
  27. model.add(MaxPooling2D(pool_size=(2, 2)))
  28. model.add(Activation("relu"))
  29. model.add(Convolution2D(120, 1, 1, border_mode='valid'))
  30. model.add(Flatten())
  31. model.add(Dense(84))
  32. model.add(Activation("sigmoid"))
  33. model.add(Dense(10))
  34. model.add(Activation('softmax'))
  35. return model
  36. if __name__=="__main__":
  37. from keras.utils.visualize_util import plot
  38. model = build_LeNet5()
  39. model.summary()
  40. plot(model, to_file="LeNet-5.png", show_shapes=True)
  41. (X_train, y_train), (X_test, y_test) = mnist.load_data()
  42. X_train = X_train.reshape(X_train.shape[0], 28, 28, 1).astype('float32') / 255
  43. X_test = X_test.reshape(X_test.shape[0], 28, 28, 1).astype('float32') / 255
  44. Y_train = np_utils.to_categorical(y_train, 10)
  45. Y_test = np_utils.to_categorical(y_test, 10)
  46. # training
  47. model.compile(loss='categorical_crossentropy',
  48. optimizer='adadelta',
  49. metrics=['accuracy'])
  50. batch_size = 128
  51. nb_epoch = 1
  52. model.fit(X_train, Y_train, batch_size=batch_size, nb_epoch=nb_epoch,
  53. verbose=1, validation_data=(X_test, Y_test))
  54. score = model.evaluate(X_test, Y_test, verbose=0)
  55. print('Test score:', score[0])
  56. print('Test accuracy:', score[1])
  57. y_hat = model.predict_classes(X_test)
  58. test_wrong = [im for im in zip(X_test,y_hat,y_test) if im[1] != im[2]]
  59. plt.figure(figsize=(10, 10))
  60. for ind, val in enumerate(test_wrong[:100]):
  61. plt.subplots_adjust(left=0, right=1, bottom=0, top=1)
  62. plt.subplot(10, 10, ind + 1)
  63. im = 1 - val[0].reshape((28,28))
  64. plt.axis("off")
  65. plt.text(0, 0, val[2], fontsize=14, color='blue')
  66. plt.text(8, 0, val[1], fontsize=14, color='red')
  67. plt.imshow(im, cmap='gray')
  68. savefig('error.jpg')

5.5 AlexNet

AlexNet在ILSVRC-2012的比赛中获得top5错误率15.3%的突破(第二名为26.2%),其原理来源于2012年Alex的论文《ImageNet Classification with Deep Convolutional Neural Networks》,这篇论文是深度学习火爆发展的一个里程碑和分水岭,加上硬件技术的发展,深度学习还会继续火下去。

5.5.1 网络结构分析

由于受限于当时的硬件设备,AlexNet在GPU粒度都做了设计,当时的GTX 580只有3G显存,为了能让模型在大量数据上跑起来,作者使用了两个GPU并行,并对网络结构做了切分,如下:


5.5.2 ReLu激活函数

AlexNet引入了ReLU激活函数,这个函数是神经科学家Dayan、Abott在《Theoretical Neuroscience》一书中提出的更精确的激活模型:


详情请阅读书中2.2 Estimating Firing Rates这一节。新激活模型的特点是:

  • 激活稀疏性(小于1时为0)
  • 单边抑制(不像Sigmoid是双边的)
  • 宽兴奋边界,非饱和性(ReLU导数始终为1),很大程度缓解了梯度消失问题

1、 原始ReLu
在这些前人研究的基础上(可参见 Hinton论文:《Rectified Linear Units Improve Restricted Boltzmann Machines》),类似Eq.2.9的新激活函数被引入:


  • 在原点不可微
    反向传播的梯度计算中会带来麻烦,所以Charles Dugas等人又提出Softplus来模拟上述ReLu函数(可视作其平滑版):

  • 过稀疏性

2、 Leaky ReLu

为了解决上述过稀疏性导致的大量神经元不被激活的问题,Leaky ReLu被提了出来:


3、Parametric ReLu
上述值是可以不通过人为指定而学习出的,于是Parametric ReLu被提了出来:


详情请阅读Kaiming He等人的《Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification》论文。

4、Randomized ReLu
Randomized ReLu 可以看做是leaky ReLu的随机版本,原理是:假设



5.5.3 Local Response Normalization

LRN利用相邻feature map做特征显著化,文中实验表明可以降低错误率,公式如下:




5.5.4 Overlapping Pooling


5.5.5 Dropout






5.5.6 数据扩充

5.5.7 多GPU训练

作者使用GTX 580来加速训练,但受限于当时硬件设备的发展,作者需要对网络结构做精细化设计,甚至需要考虑两块GPU之间如何及何时通信,现在的我们比较幸福,基本不用考虑这些。

5.5.8 AlexNet代码实践



  1. # -*- coding: utf-8 -*-
  2. import copy
  3. import numpy as np
  4. import pandas as pd
  5. import matplotlib
  6. matplotlib.use("Agg")
  7. import matplotlib.pyplot as plt
  8. import os
  9. from matplotlib.pyplot import plot,savefig
  10. from scipy.misc import toimage
  11. from keras.datasets import cifar10,mnist
  12. from keras.models import Sequential, Graph
  13. from keras.layers.core import Dense, Dropout, Activation, Flatten, Reshape
  14. from keras.optimizers import SGD, RMSprop
  15. from keras.utils import np_utils
  16. from keras.regularizers import l2
  17. from keras.layers.convolutional import Convolution2D, MaxPooling2D, ZeroPadding2D, AveragePooling2D
  18. from keras.callbacks import EarlyStopping
  19. from keras.preprocessing.image import ImageDataGenerator
  20. from keras.layers.normalization import BatchNormalization
  21. from keras.callbacks import ModelCheckpoint
  22. from keras import backend as K
  23. import tensorflow as tf
  24. tf.python.control_flow_ops = tf
  25. from PIL import Image
  26. def data_visualize(x, y, num):
  27. plt.figure()
  28. for i in range(0, num*num):
  29. axes=plt.subplot(num,num,i + 1)
  30. axes.set_title("label=" + str(y[i]))
  31. axes.set_xticks([0,10,20,30])
  32. axes.set_yticks([0,10,20,30])
  33. plt.imshow(toimage(x[i]))
  34. plt.tight_layout()
  35. plt.savefig('sample.jpg')
  36. #以下结构统一忽略LRN层
  37. def build_AlexNet(s):
  38. model = Sequential()
  39. #第一层,卷积层 + max pooling
  40. model.add(Convolution2D(96, 11, 11, border_mode='same', input_shape = s))
  41. model.add(Activation("relu"))
  42. model.add(MaxPooling2D(pool_size=(2, 2)))
  43. #第二层,卷积层 + max pooling
  44. model.add(Convolution2D(256, 5, 5, border_mode='same', activation='relu'))
  45. model.add(MaxPooling2D(pool_size=(2, 2)))
  46. #第三层,卷积层
  47. model.add(ZeroPadding2D((1,1)))
  48. model.add(Convolution2D(512, 3, 3, border_mode='same', activation='relu'))
  49. #第四层,卷积层
  50. model.add(ZeroPadding2D((1,1)))
  51. model.add(Convolution2D(1024, 3, 3, border_mode='same', activation='relu'))
  52. #第五层,卷积层
  53. model.add(ZeroPadding2D((1,1)))
  54. model.add(Convolution2D(1024, 3, 3, border_mode='same', activation='relu'))
  55. model.add(MaxPooling2D(pool_size=(2, 2)))
  56. model.add(Flatten())
  57. #第六层,全连接层
  58. model.add(Dense(3072, activation='relu'))
  59. model.add(Dropout(0.5))
  60. #第七层,全连接层
  61. model.add(Dense(4096, activation='relu'))
  62. model.add(Dropout(0.5))
  63. #第八层, 输出层
  64. model.add(Dense(10))
  65. model.add(Activation('softmax'))
  66. return model
  67. if __name__=="__main__":
  68. from keras.utils.visualize_util import plot
  69. //使用第三个GPU
  70. with tf.device('/gpu:3'):
  71. gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=1, allow_growth=True)
  72. //只有卡3可见防止tensorflow占用所有卡
  73. os.environ["CUDA_VISIBLE_DEVICES"]="3"
  74. tf.Session(config=K.tf.ConfigProto(allow_soft_placement=True,
  75. log_device_placement=True,
  76. gpu_options=gpu_options))
  77. (X_train, y_train), (X_test, y_test) = cifar10.load_data()
  78. data_visualize(X_train, y_train, 4)
  79. s = X_train.shape[1:]
  80. model = build_AlexNet(s)
  81. model.summary()
  82. plot(model, to_file="AlexNet.jpg", show_shapes=True)
  83. #定义输入数据并做归一化
  84. dim = 32
  85. channel = 3
  86. class_num = 10
  87. X_train = X_train.reshape(X_train.shape[0], dim, dim, channel).astype('float32') / 255
  88. X_test = X_test.reshape(X_test.shape[0], dim, dim, channel).astype('float32') / 255
  89. Y_train = np_utils.to_categorical(y_train, class_num)
  90. Y_test = np_utils.to_categorical(y_test, class_num)
  91. #预处理与数据扩充
  92. datagen = ImageDataGenerator(
  93. featurewise_center=False,
  94. samplewise_center=False,
  95. featurewise_std_normalization=False,
  96. samplewise_std_normalization=False,
  97. zca_whitening=False,
  98. rotation_range=25,
  99. width_shift_range=0.1,
  100. height_shift_range=0.1,
  101. horizontal_flip=False,
  102. vertical_flip=False)
  103. datagen.fit(X_train)
  104. model.compile(loss='categorical_crossentropy',
  105. optimizer='adadelta',
  106. metrics=['accuracy'])
  107. batch_size = 32
  108. nb_epoch = 10
  109. #import pdb
  110. #pdb.set_trace()
  111. ModelCheckpoint("weights-improvement-{epoch:02d}-{val_acc:.2f}.hdf5", monitor='val_loss', verbose=0, save_best_only=True, save_weights_only=False, mode='auto')
  112. model.fit(X_train, Y_train, batch_size=batch_size, nb_epoch=nb_epoch,
  113. verbose=1, validation_data=(X_test, Y_test))
  114. score = model.evaluate(X_test, Y_test, verbose=0)
  115. print('Test score:', score[0])
  116. print('Test accuracy:', score[1])
  117. y_hat = model.predict_classes(X_test)
  118. test_wrong = [im for im in zip(X_test,y_hat,y_test) if im[1] != im[2]]
  119. plt.figure(figsize=(10, 10))
  120. for ind, val in enumerate(test_wrong[:100]):
  121. plt.subplots_adjust(left=0, right=1, bottom=0, top=1)
  122. plt.subplot(10, 10, ind + 1)
  123. plt.axis("off")
  124. plt.text(0, 0, val[2][0], fontsize=14, color='blue')
  125. plt.text(8, 0, val[1], fontsize=14, color='red')
  126. plt.imshow(toimage(val[0]))
  127. savefig('Wrong.jpg')


5.6 VGG

在论文《Very Deep Convolutional Networks for Large-Scale Image Recognition》中提出,通过缩小卷积核大小来构建更深的网络。

5.6.1 网络结构


5.6.2 VGG代码实践

  1. # -*- coding: utf-8 -*-
  2. import copy
  3. import numpy as np
  4. import pandas as pd
  5. import matplotlib
  6. matplotlib.use("Agg")
  7. import matplotlib.pyplot as plt
  8. import os
  9. from matplotlib.pyplot import plot,savefig
  10. from scipy.misc import toimage
  11. from keras.datasets import cifar100,mnist
  12. from keras.models import Sequential, Graph
  13. from keras.layers.core import Dense, Dropout, Activation, Flatten, Reshape
  14. from keras.optimizers import SGD, RMSprop
  15. from keras.utils import np_utils
  16. from keras.regularizers import l2
  17. from keras.layers.convolutional import Convolution2D, MaxPooling2D, ZeroPadding2D, AveragePooling2D
  18. from keras.callbacks import EarlyStopping
  19. from keras.preprocessing.image import ImageDataGenerator
  20. from keras.layers.normalization import BatchNormalization
  21. from keras.callbacks import ModelCheckpoint
  22. from keras import backend as K
  23. import tensorflow as tf
  24. tf.python.control_flow_ops = tf
  25. from PIL import Image
  26. def data_visualize(x, y, num):
  27. plt.figure()
  28. for i in range(0, num*num):
  29. axes=plt.subplot(num,num,i + 1)
  30. axes.set_title("label=" + str(y[i]))
  31. axes.set_xticks([0,10,20,30])
  32. axes.set_yticks([0,10,20,30])
  33. plt.imshow(toimage(x[i]))
  34. plt.tight_layout()
  35. plt.savefig('sample.jpg')
  36. def build_VGG_16(s):
  37. model = Sequential()
  38. fm = 3
  39. model.add(ZeroPadding2D((1,1),input_shape=s))
  40. model.add(Convolution2D(64, fm, fm, activation='relu'))
  41. model.add(ZeroPadding2D((1,1)))
  42. model.add(Convolution2D(64, fm, fm, activation='relu'))
  43. model.add(MaxPooling2D((2,2), strides=(2,2)))
  44. model.add(ZeroPadding2D((1,1)))
  45. model.add(Convolution2D(128, fm, fm, activation='relu'))
  46. model.add(ZeroPadding2D((1,1)))
  47. model.add(Convolution2D(128, fm, fm, activation='relu'))
  48. model.add(MaxPooling2D((2,2), strides=(2,2)))
  49. model.add(ZeroPadding2D((1,1)))
  50. model.add(Convolution2D(256, fm, fm, activation='relu'))
  51. model.add(ZeroPadding2D((1,1)))
  52. model.add(Convolution2D(256, fm, fm, activation='relu'))
  53. model.add(ZeroPadding2D((1,1)))
  54. model.add(Convolution2D(256, fm, fm, activation='relu'))
  55. model.add(MaxPooling2D((2,2), strides=(2,2)))
  56. model.add(ZeroPadding2D((1,1)))
  57. model.add(Convolution2D(512, fm, fm, activation='relu'))
  58. model.add(ZeroPadding2D((1,1)))
  59. model.add(Convolution2D(512, fm, fm, activation='relu'))
  60. model.add(ZeroPadding2D((1,1)))
  61. model.add(Convolution2D(512, fm, fm, activation='relu'))
  62. model.add(MaxPooling2D((2,2), strides=(2,2)))
  63. model.add(ZeroPadding2D((1,1)))
  64. model.add(Convolution2D(512, fm, fm, activation='relu'))
  65. model.add(ZeroPadding2D((1,1)))
  66. model.add(Convolution2D(512, fm, fm, activation='relu'))
  67. model.add(ZeroPadding2D((1,1)))
  68. model.add(Convolution2D(512, fm, fm, activation='relu'))
  69. model.add(MaxPooling2D((2,2), strides=(2,2)))
  70. model.add(Flatten())
  71. model.add(Dense(4096, activation='relu'))
  72. model.add(Dropout(0.5))
  73. model.add(Dense(4096, activation='relu'))
  74. model.add(Dropout(0.5))
  75. model.add(Dense(100, activation='softmax'))
  76. return model
  77. def build_VGG_19(s):
  78. model = Sequential()
  79. fm = 3
  80. model.add(ZeroPadding2D((1,1),input_shape=s))
  81. model.add(Convolution2D(64, fm, fm, activation='relu'))
  82. model.add(ZeroPadding2D((1,1)))
  83. model.add(Convolution2D(64, fm, fm, activation='relu'))
  84. model.add(MaxPooling2D((2,2), strides=(2,2)))
  85. model.add(ZeroPadding2D((1,1)))
  86. model.add(Convolution2D(128, fm, fm, activation='relu'))
  87. model.add(ZeroPadding2D((1,1)))
  88. model.add(Convolution2D(128, fm, fm, activation='relu'))
  89. model.add(MaxPooling2D((2,2), strides=(2,2)))
  90. model.add(ZeroPadding2D((1,1)))
  91. model.add(Convolution2D(256, fm, fm, activation='relu'))
  92. model.add(ZeroPadding2D((1,1)))
  93. model.add(Convolution2D(256, fm, fm, activation='relu'))
  94. model.add(ZeroPadding2D((1,1)))
  95. model.add(Convolution2D(256, fm, fm, activation='relu'))
  96. model.add(ZeroPadding2D((1,1)))
  97. model.add(Convolution2D(256, fm, fm, activation='relu'))
  98. model.add(MaxPooling2D((2,2), strides=(2,2)))
  99. model.add(ZeroPadding2D((1,1)))
  100. model.add(Convolution2D(512, fm, fm, activation='relu'))
  101. model.add(ZeroPadding2D((1,1)))
  102. model.add(Convolution2D(512, fm, fm, activation='relu'))
  103. model.add(ZeroPadding2D((1,1)))
  104. model.add(Convolution2D(512, fm, fm, activation='relu'))
  105. model.add(ZeroPadding2D((1,1)))
  106. model.add(Convolution2D(512, fm, fm, activation='relu'))
  107. model.add(MaxPooling2D((2,2), strides=(2,2)))
  108. model.add(ZeroPadding2D((1,1)))
  109. model.add(Convolution2D(512, fm, fm, activation='relu'))
  110. model.add(ZeroPadding2D((1,1)))
  111. model.add(Convolution2D(512, fm, fm, activation='relu'))
  112. model.add(ZeroPadding2D((1,1)))
  113. model.add(Convolution2D(512, fm, fm, activation='relu'))
  114. model.add(ZeroPadding2D((1,1)))
  115. model.add(Convolution2D(512, fm, fm, activation='relu'))
  116. model.add(MaxPooling2D((2,2), strides=(2,2)))
  117. model.add(Flatten())
  118. model.add(Dense(4096, activation='relu'))
  119. model.add(Dropout(0.5))
  120. model.add(Dense(4096, activation='relu'))
  121. model.add(Dropout(0.5))
  122. model.add(Dense(100, activation='softmax'))
  123. return model
  124. if __name__=="__main__":
  125. from keras.utils.visualize_util import plot
  126. with tf.device('/gpu:2'):
  127. gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=1, allow_growth=True)
  128. os.environ["CUDA_VISIBLE_DEVICES"]="2"
  129. tf.Session(config=K.tf.ConfigProto(allow_soft_placement=True,
  130. log_device_placement=True,
  131. gpu_options=gpu_options))
  132. (X_train, y_train), (X_test, y_test) = cifar100.load_data()
  133. data_visualize(X_train, y_train, 4)
  134. s = X_train.shape[1:]
  135. print (s)
  136. model = build_VGG_16(s) #build_VGG_19(s)
  137. model.summary()
  138. plot(model, to_file="VGG.jpg", show_shapes=True)
  139. #定义输入数据并做归一化
  140. dim = 32
  141. channel = 3
  142. class_num = 100
  143. X_train = X_train.reshape(X_train.shape[0], dim, dim, channel).astype('float32') / 255
  144. X_test = X_test.reshape(X_test.shape[0], dim, dim, channel).astype('float32') / 255
  145. Y_train = np_utils.to_categorical(y_train, class_num)
  146. Y_test = np_utils.to_categorical(y_test, class_num)
  147. # this will do preprocessing and realtime data augmentation
  148. datagen = ImageDataGenerator(
  149. featurewise_center=False, # set input mean to 0 over the dataset
  150. samplewise_center=False, # set each sample mean to 0
  151. featurewise_std_normalization=False, # divide inputs by std of the dataset
  152. samplewise_std_normalization=False, # divide each input by its std
  153. zca_whitening=False, # apply ZCA whitening
  154. rotation_range=25, # randomly rotate images in the range (degrees, 0 to 180)
  155. width_shift_range=0.1, # randomly shift images horizontally (fraction of total width)
  156. height_shift_range=0.1, # randomly shift images vertically (fraction of total height)
  157. horizontal_flip=False, # randomly flip images
  158. vertical_flip=False) # randomly flip images
  159. datagen.fit(X_train)
  160. # training
  161. model.compile(loss='categorical_crossentropy',
  162. optimizer='adadelta',
  163. metrics=['accuracy'])
  164. batch_size = 32
  165. nb_epoch = 10
  166. #import pdb
  167. #pdb.set_trace()
  168. ModelCheckpoint("weights-improvement-{epoch:02d}-{val_acc:.2f}.hdf5", monitor='val_loss', verbose=0, save_best_only=False, save_weights_only=False, mode='auto')
  169. model.fit(X_train, Y_train, batch_size=batch_size, nb_epoch=nb_epoch,verbose=1, validation_data=(X_test, Y_test))
  170. score = model.evaluate(X_test, Y_test, verbose=0)
  171. print('Test score:', score[0])
  172. print('Test accuracy:', score[1])

5.7 MSRANet


5.7.1 PReLU


定义Parametric Rectifiers如下:



详情请阅读Kaiming He等人的《Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification》论文。

5.8 Highway Networks

Highway Networks在我看来是一种承上启下的结构,来源于论文《Highway Networks》借鉴了类似LSTM(后面会介绍)中门(gate)的思想,结构很通用(太通用的结构不一定是件好事儿),给出了一种建立更深网络的思路:

任何一层或几层都可以通过上述方式构建Block,公式中叫做transform gate,叫做carry gate,一般简单起见可以让,显然公式中需要有相同的维度(比如,可以通过zero-padding或者做映射),通过这种结构可以把网络做到很深(比如100层以上),并且优化没有那么困难,看着似乎提供了解决“深”网络学习问题的方案(下一节会解释“似乎”这个词)。

5.9 Residual Networks

残差网络在《Deep Residual Learning for Image Recognition》中被第一次提出,作者利用它在ILSVRC 2015的ImageNet 分类、检测、定位任务以及COCO 2015的检测、图像分割任务上均拿到第一名,也证明ResNet是比较通用的框架。

5.9.1 ResNet产生的动机


图中可以看到在CIFAR-10数据集上,20层网络在训练集和测试集上的表现都明显好于56层网络,这显然不是过拟合导致的,这个现象也不符合我们的直观映像:按理说多增加一层的模型效果应该好于未增加时的模型,最起码不应该变差(比如直接做恒等映射),于是作者提出原始的残差学习框架(也可以看成是Highway Networks在T=0.5时的特例):

与Highway Networks相比:
- HN的transform gate和carry

5.9.2 恒等映射

恒等映射在深度残差网络中究竟扮演什么角色呢?在《Identity Mappings in Deep Residual Networks》中作者做了分析,并提出新的残差block结构,将都改为恒等映射,通过这个变化使得信号在前向和反向传播中都有“干净”的路径(图中灰色部分),a为原始block结构,b为新的结构。。



其中为Batch Normalization。







5.9.3 模型集成角度看残差网络

Residual Networks Behave Like Ensembles of Relatively Shallow Networks》中把残差网络做展开,其实会发现以下关系:





5.9.4 残差网络中的短路径





5.9.5 代码实践

下面我们实现在《Deep Residual Learning for Image Recognition》中提到的ResNet-34,并演示在CIFAR-10下的训练效果。

  1. # -*- coding: utf-8 -*-
  2. from keras import backend as K
  3. from keras.layers.merge import add
  4. from keras.layers import Input, Activation, Dense, Flatten
  5. from keras.layers.convolutional import Conv2D, MaxPooling2D, AveragePooling2D
  6. from keras.layers.normalization import BatchNormalization
  7. from keras.regularizers import l1_l2
  8. from keras.models import Model
  9. class ResNet(object):
  10. '''残差网络基本模块定义'''
  11. name = 'resnet'
  12. def __init__(self, n):
  13. self.name = n
  14. def bn_relu(self, input):
  15. '''构建propoesd残差block中BN与ReLU子结构,针对tensorflow'''
  16. normalize = BatchNormalization(axis=3)(input)
  17. return Activation("relu")(normalize)
  18. def bn_relu_weight(self, filters, kernel_size, strides):
  19. '''构建propoesd残差block中BN->ReLu->Weight的子结构'''
  20. def inner_func(input):
  21. act = self.bn_relu(input)
  22. conv = Conv2D(filters=filters,
  23. kernel_size=kernel_size,
  24. strides=strides,
  25. padding='same',
  26. kernel_initializer='he_normal',
  27. kernel_regularizer=l1_l2(0.0001))(act)
  28. return conv
  29. return inner_func
  30. def weight_bn_relu(self, filters, kernel_size, strides):
  31. '''构建propoesd残差block中BN->ReLu->Weight的子结构'''
  32. def inner_func(input):
  33. return self.bn_relu(Conv2D(filters=filters,
  34. kernel_size=kernel_size,
  35. strides=strides,
  36. padding='same',
  37. kernel_initializer='he_normal',
  38. kernel_regularizer=l1_l2(0.0001))(input))
  39. return inner_func
  40. def shortcut(self, left, right):
  41. '''构建propoesd残差block中恒等映射的子结构,分两种情况,输入、输出维度一致&维度不一致'''
  42. left_shape = K.int_shape(left)
  43. right_shape = K.int_shape(right)
  44. stride_width = int(round(left_shape[1] / right_shape[1]))
  45. stride_height = int(round(left_shape[2] / right_shape[2]))
  46. equal_channels = left_shape[3] == right_shape[3]
  47. x_l = left
  48. # 如果输入输出维度不一致需要通过映射变一致,否则一致则返回单位矩阵,这个映射发生在两个不同维度block之间(论文中虚线部分)
  49. if left_shape != right_shape:
  50. x_l = Conv2D(filters=right_shape[3],
  51. kernel_size=(1, 1),
  52. strides=(int(round(left_shape[1] / right_shape[1])),
  53. int(round(left_shape[2] / right_shape[2]))),
  54. padding="valid",
  55. kernel_initializer="he_normal",
  56. kernel_regularizer=l1_l2(0.01, 0.0001))(left)
  57. x_l_1 = add([x_l, right])
  58. return x_l_1
  59. def basic_block(self, filters, strides=(1, 1), is_first_block=False):
  60. """34层以内的残差网络使用的block,2层一跨"""
  61. def inner_func(input):
  62. # 恒等映射
  63. if not is_first_block:
  64. conv1 = self.bn_relu_weight(filters=filters,
  65. kernel_size=(3, 3),
  66. strides=strides)(input)
  67. else:
  68. conv1 = Conv2D(filters=filters, kernel_size=(3, 3),
  69. strides=strides,
  70. padding="same",
  71. kernel_initializer="he_normal",
  72. kernel_regularizer=l1_l2(0.01, 0.0001))(input)
  73. # 残差网络
  74. residual = self.bn_relu_weight(filters=filters,
  75. kernel_size=(3, 3), strides=(1, 1))(conv1)
  76. # 构建一个两层的残差block
  77. return self.shortcut(input, residual)
  78. return inner_func
  79. def residual_block(self, block_func, filters, repeat_times, is_first_block):
  80. '''构建多层残差block'''
  81. def inner_func(input):
  82. for i in range(repeat_times):
  83. # 第一个block的第一层,其输入为pooling层
  84. if is_first_block:
  85. strides = (1, 1)
  86. else:
  87. if i == 0: # 每个残差block的第一层
  88. strides = (2, 2)
  89. else: # 每个残差block的非第一层
  90. strides = (1, 1)
  91. flag = i == 0 and is_first_block
  92. input = block_func(filters=filters,
  93. strides=strides,
  94. is_first_block=flag)(input)
  95. return input
  96. return inner_func
  97. def residual_builder(self, input_shape, softmax_num, func_type, repeat_times):
  98. '''指定输入、输出、残差block的类型、网络深度并构建残差网络'''
  99. input = Input(shape=input_shape)
  100. # 第一层为卷积层
  101. conv1 = self.weight_bn_relu(filters=64, kernel_size=(7, 7), strides=(2, 2))(input)
  102. # 第二层为max pooling层
  103. pool1 = MaxPooling2D(pool_size=(3, 3), strides=(2, 2), padding="same")(conv1)
  104. residual_block = pool1
  105. filters = 64
  106. # 接着16个残差block
  107. for i, r in enumerate(repeat_times):
  108. if i == 0:
  109. residual_block = self.residual_block(func_type,
  110. filters=filters,
  111. repeat_times=r,
  112. is_first_block=True)(residual_block)
  113. else:
  114. residual_block = self.residual_block(func_type,
  115. filters=filters,
  116. repeat_times=r,
  117. is_first_block=False)(residual_block)
  118. filters *= 2
  119. residual_block = self.bn_relu(residual_block)
  120. shape = K.int_shape(residual_block)
  121. # average pooling层
  122. pool2 = AveragePooling2D(pool_size=(shape[1], shape[2]),
  123. strides=(1, 1))(residual_block)
  124. flatten1 = Flatten()(pool2)
  125. # 全连接层
  126. dense1 = Dense(units=softmax_num,
  127. kernel_initializer="he_normal",
  128. activation="softmax")(flatten1)
  129. return Model(inputs=input, outputs=dense1)
  1. # -*- coding: utf-8 -*-
  2. import numpy as np
  3. import matplotlib
  4. import resnet
  5. matplotlib.use("Agg")
  6. import matplotlib.pyplot as plt
  7. import os
  8. from scipy.misc import toimage
  9. from keras.datasets import cifar10
  10. from keras.utils import np_utils
  11. from keras.preprocessing.image import ImageDataGenerator
  12. from keras.callbacks import ModelCheckpoint
  13. from keras import backend as K
  14. import tensorflow as tf
  15. tf.python.control_flow_ops = tf
  16. from keras.callbacks import ReduceLROnPlateau, CSVLogger, EarlyStopping
  17. lr_reducer = ReduceLROnPlateau(monitor='val_loss', factor=np.sqrt(0.5), cooldown=0, patience=3, min_lr=1e-6)
  18. early_stopper = EarlyStopping(monitor='val_acc', min_delta=0.0005, patience=15)
  19. csv_logger = CSVLogger('resnet34_cifar10.csv')
  20. def data_visualize(x, y, num):
  21. plt.figure()
  22. for i in range(0, num * num):
  23. axes = plt.subplot(num, num, i + 1)
  24. axes.set_title("label=" + str(y[i]))
  25. axes.set_xticks([0, 10, 20, 30])
  26. axes.set_yticks([0, 10, 20, 30])
  27. plt.imshow(toimage(x[i]))
  28. plt.tight_layout()
  29. plt.savefig('sample.jpg')
  30. if __name__ == "__main__":
  31. from keras.utils.vis_utils import plot_model
  32. with tf.device('/gpu:3'):
  33. gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=1, allow_growth=True)
  34. os.environ["CUDA_VISIBLE_DEVICES"] = "3"
  35. tf.Session(config=K.tf.ConfigProto(allow_soft_placement=True,
  36. log_device_placement=True,
  37. gpu_options=gpu_options))
  38. (X_train, y_train), (X_test, y_test) = cifar10.load_data()
  39. data_visualize(X_train, y_train, 4)
  40. # 定义输入数据并做归一化
  41. dim = 32
  42. channel = 3
  43. class_num = 10
  44. X_train = X_train.reshape(X_train.shape[0], dim, dim, channel).astype('float32') / 255
  45. X_test = X_test.reshape(X_test.shape[0], dim, dim, channel).astype('float32') / 255
  46. Y_train = np_utils.to_categorical(y_train, class_num)
  47. Y_test = np_utils.to_categorical(y_test, class_num)
  48. # this will do preprocessing and realtime data augmentation
  49. datagen = ImageDataGenerator(
  50. featurewise_center=False, # set input mean to 0 over the dataset
  51. samplewise_center=False, # set each sample mean to 0
  52. featurewise_std_normalization=False, # divide inputs by std of the dataset
  53. samplewise_std_normalization=False, # divide each input by its std
  54. zca_whitening=False, # apply ZCA whitening
  55. rotation_range=25, # randomly rotate images in the range (degrees, 0 to 180)
  56. width_shift_range=0.1, # randomly shift images horizontally (fraction of total width)
  57. height_shift_range=0.1, # randomly shift images vertically (fraction of total height)
  58. horizontal_flip=True, # randomly flip images
  59. vertical_flip=False) # randomly flip images
  60. datagen.fit(X_train)
  61. s = X_train.shape[1:]
  62. print(s)
  63. builder = resnet.ResNet("ResNet-test")
  64. resnet_34 = builder.residual_builder(s, class_num, builder.basic_block, [3, 4, 6, 3])
  65. model = resnet_34
  66. model.summary()
  67. #import pdb
  68. #pdb.set_trace()
  69. plot_model(model, to_file="ResNet.jpg", show_shapes=True)
  70. model.compile(loss='categorical_crossentropy',
  71. optimizer='adadelta',
  72. metrics=['accuracy'])
  73. batch_size = 32
  74. nb_epoch = 100
  75. # import pdb
  76. # pdb.set_trace()
  77. ModelCheckpoint("weights-improvement-{epoch:02d}-{val_acc:.2f}.hdf5", monitor='val_loss', verbose=0,
  78. save_best_only=False, save_weights_only=False, mode='auto')
  79. model.fit_generator(datagen.flow(X_train, Y_train, batch_size=batch_size),
  80. steps_per_epoch=X_train.shape[0],
  81. validation_data=(X_test, Y_test),
  82. epochs=nb_epoch,
  83. verbose=1,
  84. max_q_size=100,
  85. callbacks=[lr_reducer, early_stopper, csv_logger])
  86. score = model.evaluate(X_test, Y_test, verbose=0)
  87. print('Test score:', score[0])
  88. print('Test accuracy:', score[1])




5.10 Maxout Networks

Goodfellow等人在《Maxout Networks》一文中提出,这篇论文值得一看。

5.10.1 Maxout激活函数




实际上图所示的单个Maxout 单元本质是一个分段线性函数,而任意凸函数都可以通过分段线性函数来拟合,这个可以很直观的理解,以抛物线为例:每个节点都是一个线性函数,上图~节点输出对应下图~线段:

从全局上看,ReLU可以看做Maxout的一种特例,Maxout通过网络自动学习激活函数(从这个角度看Maxout也可以看做某种Network-In-Network结构),不对做限制,只要两个Maxout 单元就能拟合任意连续函数,关于这部分论文中有更详细的证明,这里不再赘述,实际上它与Dropout配合效果更好,这里可以回想下核方法(Kernel Method),核方法采用非线性核(如高斯核)也会有类似通过局部线性拟合来模拟非线性行为,但传统核方法会事先指定核函数(如高斯函数),而不是数据驱动的方式算出来,当然也有kernel组合方面的研究,但在我看来最终和神经网络殊途同归,其实都可以在神经网络的大框架下去思考(回想前面的SVM与神经网络的关系)。

5.11 Network in Network

NIN的思想来源于《Network In Network》,其亮点有2个方面:将传统卷积层替换为非线性卷积层以提升特征抽象能力;使用新的pooling层代替传统全连接层,后续出现的各个版本GoogLeNet也很大程度借鉴了这个思想。

5.11.1 NIN卷积层(MLP Convolution)


  • MLP能拟合任意函数,不需要做先验假设(如:线性可分、凸集);
  • MLP与卷积神经网络结构天然兼容,可以通过BP方便的做训练;
  • MLP本身也能做的较深,且特征能够得到复用;
  • 通过MLP做卷积可以起到feature map级联交叉加权组合的作用,能提升特征抽象能力:



5.11.2 NIN抽样层(Global Average Pooling)



5.12 GoogLeNet Inception V1

GoogLeNet是由google的Christian Szegedy等人在2014年的论文《Going Deeper with Convolutions》提出,其最大的亮点是提出一种叫Inception的结构,以此为基础构建GoogLeNet,并在当年的ImageNet分类和检测任务中获得第一,ps:GoogLeNet的取名是为了向YannLeCun的LeNet系列致敬。

5.12.1 一些思考



尴尬的是,现在的计算机体系结构更善于稠密数据的计算,而在非均匀分布的稀疏数据上的计算效率极差,比如稀疏性会导致的缓存miss率极高,于是需要一种方法既能发挥稀疏网络的优势又能保证计算效率。好在前人做了大量实验(如《On Two-Dimensional Sparse Matrix Partitioning: Models, Methods, and a Recipe》),发现对稀疏矩阵做聚类得到相对稠密的子矩阵可以大幅提高稀疏矩阵乘法性能,借鉴这个思想,作者提出Inception的结构。


5.12.2 GoogLeNet结构




5.12.3 代码实践

  1. # -*- coding: utf-8 -*-
  2. from keras.layers import Input, Conv2D, Dense, MaxPooling2D, AveragePooling2D
  3. from keras.layers import Dropout, Flatten, merge, ZeroPadding2D, Reshape, Activation
  4. from keras.models import Model
  5. from keras.regularizers import l1_l2
  6. import tensorflow as tf
  7. import googlenet_custom_layers
  8. def inception_module(name,
  9. input_layer,
  10. num_c_1x1,
  11. num_c_1x1_3x3_reduce,
  12. num_c_3x3,
  13. num_c_1x1_5x5_reduce,
  14. num_p_5x5,
  15. num_c_1x1_reduce):
  16. inception_1x1 = Conv2D(name=name+"/inception_1x1",
  17. filters=num_c_1x1,
  18. kernel_size=(1, 1),
  19. strides=(1, 1),
  20. padding='same',
  21. kernel_initializer='he_normal',
  22. activation='relu',
  23. kernel_regularizer=l1_l2(0.0001))(input_layer)
  24. inception_3x3_reduce = Conv2D(name=name+"/inception_3x3_reduce",
  25. filters=num_c_1x1_3x3_reduce,
  26. kernel_size=(1, 1),
  27. strides=(1, 1),
  28. padding='same',
  29. kernel_initializer='he_normal',
  30. activation='relu',
  31. kernel_regularizer=l1_l2(0.0001))(input_layer)
  32. inception_3x3 = Conv2D(name=name+"/inception_3x3",
  33. filters=num_c_3x3,
  34. kernel_size=(3, 3),
  35. strides=(1, 1),
  36. padding='same',
  37. kernel_initializer='he_normal',
  38. activation='relu',
  39. kernel_regularizer=l1_l2(0.0001))(inception_3x3_reduce)
  40. inception_5x5_reduce = Conv2D(name=name+"/inception_5x5_reduce",
  41. filters=num_c_1x1_5x5_reduce,
  42. kernel_size=(1, 1),
  43. strides=(1, 1),
  44. padding='same',
  45. kernel_initializer='he_normal',
  46. activation='relu',
  47. kernel_regularizer=l1_l2(0.0001))(input_layer)
  48. inception_5x5 = Conv2D(name=name+"/inception_5x5",
  49. filters=num_p_5x5,
  50. kernel_size=(5, 5),
  51. strides=(1, 1),
  52. padding='same',
  53. kernel_initializer='he_normal',
  54. activation='relu',
  55. kernel_regularizer=l1_l2(0.0001))(inception_5x5_reduce)
  56. inception_max_pool = MaxPooling2D(name=name+"/inception_max_pool",
  57. pool_size=(3, 3),
  58. strides=(1, 1),
  59. padding="same")(input_layer)
  60. inception_max_pool_proj = Conv2D(name=name+"/inception_max_pool_project",
  61. filters=num_c_1x1_reduce,
  62. kernel_size=(1, 1),
  63. strides=(1, 1),
  64. padding='same',
  65. kernel_initializer='he_normal',
  66. activation='relu',
  67. kernel_regularizer=l1_l2(0.0001))(inception_max_pool)
  68. print (inception_1x1.get_shape(), inception_3x3.get_shape(), inception_5x5.get_shape(), inception_max_pool_proj.get_shape())
  69. # inception_output = tf.concat(3, [inception_1x1, inception_3x3, inception_5x5, inception_max_pool_proj])
  70. from keras.layers.merge import concatenate
  71. #注意,由于变态的tensorflow更改了concat函数的参数顺序,需要注意自己的tf和keras版本
  72. #适时的将/usr/lib/python×××/site-packages/keras/backend/tensorflow_backend.py的1554行的代码由
  73. #return tf.concat([to_dense(x) for x in tensors], axis) 改为:
  74. #return tf.concat(axis, [to_dense(x) for x in tensors])
  75. inception_output = concatenate([inception_1x1, inception_3x3, inception_5x5, inception_max_pool_proj])
  76. return inception_output
  77. def googLeNet_inception_v1_building(input_shape, output_num, fine_tune=None):
  78. input_layer = Input(shape=input_shape)
  79. # 第一层,卷积层
  80. conv1_7x7 = Conv2D(name="conv1_7x7/2",
  81. filters=64,
  82. kernel_size=(7, 7),
  83. strides=(2, 2),
  84. padding='same',
  85. kernel_initializer='he_normal',
  86. activation='relu',
  87. kernel_regularizer=l1_l2(0.0001))(input_layer)
  88. conv1_zero_pad = ZeroPadding2D(padding=(1, 1))(conv1_7x7)
  89. # 第二层,max pooling层
  90. pool1_3x3 = MaxPooling2D(name="max_pool1_3x3/2",
  91. pool_size=(3, 3),
  92. strides=(2, 2),
  93. padding='valid')(conv1_zero_pad)
  94. # 第二层,LRN规范化
  95. #pool1_norm1 = tf.nn.lrn(pool1_3x3, 4, bias=1.0, alpha=0.001 / 9.0, beta=0.75, name='ax_pool1_3x3/norm1')
  96. pool1_norm1 = googlenet_custom_layers.LRN2D(name='max_pool1_3x3/norm1')(pool1_3x3)
  97. # 第四层,卷积层降维
  98. conv2_3x3_reduce = Conv2D(name="conv2_3x3_reduce/1",
  99. filters=64,
  100. kernel_size=(1, 1),
  101. padding='same',
  102. kernel_initializer='he_normal',
  103. activation='relu',
  104. kernel_regularizer=l1_l2(0.0001))(pool1_norm1)
  105. # 第五层,卷积层
  106. conv2_3x3 = Conv2D(name="conv2_3x3/1",
  107. filters=192,
  108. kernel_size=(3, 3),
  109. padding='same',
  110. kernel_initializer='he_normal',
  111. activation='relu',
  112. kernel_regularizer=l1_l2(0.0001))(conv2_3x3_reduce)
  113. # 第六层,LRN规范化
  114. #conv2_norm2 = tf.nn.lrn(conv2_3x3, 4, bias=1.0, alpha=0.001 / 9.0, beta=0.75, name='conv2_3x3/norm2')
  115. conv2_norm2 = googlenet_custom_layers.LRN2D(name='conv2_3x3/norm2')(conv2_3x3)
  116. conv2_zero_pad = ZeroPadding2D(padding=(1, 1))(conv2_norm2)
  117. # 第七层,max pooling层
  118. pool2_3x3 = MaxPooling2D(name="max_pool2_3x3",
  119. pool_size=(3, 3),
  120. strides=(2, 2),
  121. padding='valid')(conv2_zero_pad)
  122. # 第八层,inception 3a
  123. inception_3a = inception_module("inception_3a",pool2_3x3, 64, 96, 128, 16, 32, 32)
  124. # 第九层,inception 3b
  125. inception_3b = inception_module("inception_3b",inception_3a, 128, 128, 192, 32, 96, 64)
  126. inception_3b_zero_pad = ZeroPadding2D(padding=(1, 1))(inception_3b)
  127. # 第十层,max pooling层
  128. pool3_3x3 = MaxPooling2D(name="max_pool3_3x3/2",
  129. pool_size=(3, 3),
  130. strides=(2, 2),
  131. padding='valid')(inception_3b_zero_pad)
  132. # 第十一层,inception 4a
  133. inception_4a = inception_module("inception_4a",pool3_3x3, 192, 96, 208, 16, 48, 64)
  134. # 第十二层,分支loss1
  135. loss1_ave_pool = AveragePooling2D(name="loss1/ave_pool",
  136. pool_size=(5, 5),
  137. strides=(3, 3))(inception_4a)
  138. loss1_conv = Conv2D(name="loss1/conv",
  139. filters=128,
  140. kernel_size=(1, 1),
  141. padding='same',
  142. kernel_initializer='he_normal',
  143. activation='relu',
  144. kernel_regularizer=l1_l2(0.0001))(loss1_ave_pool)
  145. loss1_flat = Flatten()(loss1_conv)
  146. loss1_fc = Dense(1024,
  147. activation='relu',
  148. name="loss1/fc",
  149. kernel_regularizer=l1_l2(0.0001))(loss1_flat)
  150. loss1_drop_fc = Dropout(0.7)(loss1_fc)
  151. loss1_classifier = Dense(output_num,
  152. name="loss1/classifier",
  153. kernel_regularizer=l1_l2(0.0001))(loss1_drop_fc)
  154. loss1_classifier_act = Activation('softmax')(loss1_classifier)
  155. # 第十二层,inception_4b
  156. inception_4b = inception_module("inception_4b",inception_4a, 160, 112, 224, 24, 64, 64)
  157. # 第十三层,inception_4c
  158. inception_4c = inception_module("inception_4c",inception_4b, 128, 128, 256, 24, 64, 64)
  159. # 第十四层,inception_4c
  160. inception_4d = inception_module("inception_4d",inception_4c, 112, 144, 288, 32, 64, 64)
  161. # 第十五层,分支loss2
  162. loss2_ave_pool = AveragePooling2D(pool_size=(5, 5),
  163. strides=(3, 3),
  164. name='loss2/ave_pool')(inception_4d)
  165. loss2_conv = Conv2D(name="loss2/conv",
  166. filters=128,
  167. kernel_size=(1, 1),
  168. padding='same',
  169. kernel_initializer='he_normal',
  170. activation='relu',
  171. kernel_regularizer=l1_l2(0.0001))(loss2_ave_pool)
  172. loss2_flat = Flatten()(loss2_conv)
  173. loss2_fc = Dense(1024,
  174. activation='relu',
  175. name="loss2/fc",
  176. kernel_regularizer=l1_l2(0.0001))(loss2_flat)
  177. loss2_drop_fc = Dropout(0.7)(loss2_fc)
  178. loss2_classifier = Dense(output_num,
  179. name="loss2/classifier",
  180. kernel_regularizer=l1_l2(0.0001))(loss2_drop_fc)
  181. loss2_classifier_act = Activation('softmax')(loss2_classifier)
  182. # 第十五层,inception_4e
  183. inception_4e = inception_module("inception_4e",inception_4d, 256, 160, 320, 32, 128, 128)
  184. inception_4e_zero_pad = ZeroPadding2D(padding=(1, 1))(inception_4e)
  185. # 第十六层,max pooling层
  186. pool4_3x3 = MaxPooling2D(name="max_pool4_3x3",
  187. pool_size=(3, 3),
  188. strides=(2, 2),
  189. padding='valid')(inception_4e_zero_pad)
  190. # 第十七层,inception_5a
  191. inception_5a = inception_module("inception_5a",pool4_3x3, 256, 160, 320, 32, 128, 128)
  192. # 第十八层,inception_5b
  193. inception_5b = inception_module("inception_5b",inception_5a, 384, 192, 384, 48, 128, 128)
  194. # 第十九层,average pooling层
  195. pool5_7x7 = AveragePooling2D(name="ave_pool5_7x7",
  196. pool_size=(7, 7),
  197. strides=(1, 1))(inception_5b)
  198. loss3_flat = Flatten()(pool5_7x7)
  199. pool5_drop_7x7 = Dropout(0.4)(loss3_flat)
  200. # 第二十层,全连接层
  201. loss3_classifier = Dense(output_num,
  202. name="loss3/classifier",
  203. kernel_regularizer=l1_l2(0.0001))(pool5_drop_7x7)
  204. loss3_classifier_act = Activation('softmax')(loss3_classifier)
  205. googlenet_inception_v1 = Model(name="googlenet_inception_v1",
  206. input=input_layer,
  207. output=[loss1_classifier_act, loss2_classifier_act, loss3_classifier_act])
  208. if fine_tune:
  209. googlenet_inception_v1.load_weights(fine_tune)
  210. return googlenet_inception_v1
  1. from keras.layers.core import Layer
  2. import keras.backend as K
  3. class LRN2D(Layer):
  4. """
  5. This code is adapted from pylearn2.
  6. License at: https://github.com/lisa-lab/pylearn2/blob/master/LICENSE.txt
  7. """
  8. def __init__(self, alpha=1e-4, k=2, beta=0.75, n=5, **kwargs):
  9. if n % 2 == 0:
  10. raise NotImplementedError("LRN2D only works with odd n. n provided: " + str(n))
  11. super(LRN2D, self).__init__(**kwargs)
  12. self.alpha = alpha
  13. self.k = k
  14. self.beta = beta
  15. self.n = n
  16. def get_output(self, train):
  17. X = self.get_input(train)
  18. b, ch, r, c = K.shape(X)
  19. half_n = self.n // 2
  20. input_sqr = K.square(X)
  21. extra_channels = K.zeros((b, ch + 2 * half_n, r, c))
  22. input_sqr = K.concatenate([extra_channels[:, :half_n, :, :],
  23. input_sqr,
  24. extra_channels[:, half_n + ch:, :, :]],
  25. axis=1)
  26. scale = self.k
  27. for i in range(self.n):
  28. scale += self.alpha * input_sqr[:, i:i + ch, :, :]
  29. scale = scale ** self.beta
  30. return X / scale
  31. def get_config(self):
  32. config = {"name": self.__class__.__name__,
  33. "alpha": self.alpha,
  34. "k": self.k,
  35. "beta": self.beta,
  36. "n": self.n}
  37. base_config = super(LRN2D, self).get_config()
  38. return dict(list(base_config.items()) + list(config.items()))
  39. class PoolHelper(Layer):
  40. def __init__(self, **kwargs):
  41. super(PoolHelper, self).__init__(**kwargs)
  42. def call(self, x, mask=None):
  43. return x[:, :, 1:, 1:]
  44. def get_config(self):
  45. config = {}
  46. base_config = super(PoolHelper, self).get_config()
  47. return dict(list(base_config.items()) + list(config.items()))
  1. # -*- coding: utf-8 -*-
  2. import numpy as np
  3. import matplotlib
  4. matplotlib.use("Agg")
  5. import matplotlib.pyplot as plt
  6. import os
  7. from scipy.misc import toimage
  8. from keras.datasets import cifar10
  9. from keras.utils import np_utils
  10. from keras.preprocessing.image import ImageDataGenerator
  11. from keras.callbacks import ModelCheckpoint
  12. from keras import backend as K
  13. import tensorflow as tf
  14. tf.python.control_flow_ops = tf
  15. from keras.callbacks import ReduceLROnPlateau, CSVLogger, EarlyStopping
  16. lr_reducer = ReduceLROnPlateau(monitor='val_loss', factor=np.sqrt(0.5), cooldown=0, patience=3, min_lr=1e-6)
  17. early_stopper = EarlyStopping(monitor='val_acc', min_delta=0.0005, patience=15)
  18. csv_logger = CSVLogger('resnet34_cifar10.csv')
  19. import os
  20. import googlenet_inception_v1
  21. if __name__ == "__main__":
  22. from keras.utils.vis_utils import plot_model
  23. with tf.device('/gpu:4'):
  24. gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=1, allow_growth=True)
  25. os.environ["CUDA_VISIBLE_DEVICES"] = "4"
  26. tf.Session(config=K.tf.ConfigProto(allow_soft_placement=True,
  27. log_device_placement=True,
  28. gpu_options=gpu_options))
  29. (X_train, y_train), (X_test, y_test) = cifar10.load_data()
  30. # 定义输入数据并做归一化
  31. dim = 32
  32. channel = 3
  33. class_num = 10
  34. X_train = X_train.reshape(X_train.shape[0], dim, dim, channel).astype('float32') / 255
  35. X_test = X_test.reshape(X_test.shape[0], dim, dim, channel).astype('float32') / 255
  36. Y_train = np_utils.to_categorical(y_train, class_num)
  37. Y_test = np_utils.to_categorical(y_test, class_num)
  38. # this will do preprocessing and realtime data augmentation
  39. datagen = ImageDataGenerator(
  40. featurewise_center=False, # set input mean to 0 over the dataset
  41. samplewise_center=False, # set each sample mean to 0
  42. featurewise_std_normalization=False, # divide inputs by std of the dataset
  43. samplewise_std_normalization=False, # divide each input by its std
  44. zca_whitening=False, # apply ZCA whitening
  45. rotation_range=25, # randomly rotate images in the range (degrees, 0 to 180)
  46. width_shift_range=0.1, # randomly shift images horizontally (fraction of total width)
  47. height_shift_range=0.1, # randomly shift images vertically (fraction of total height)
  48. horizontal_flip=True, # randomly flip images
  49. vertical_flip=False) # randomly flip images
  50. datagen.fit(X_train)
  51. s = X_train.shape[1:]
  52. print(s)
  53. model = googlenet_inception_v1.googLeNet_inception_v1_building(s,class_num)
  54. model.summary()
  55. #import pdb
  56. #pdb.set_trace()
  57. plot_model(model, to_file="GoogLeNet-Inception-V1.jpg", show_shapes=True)
  58. model.compile(loss='categorical_crossentropy',
  59. optimizer='adadelta',
  60. metrics=['accuracy'])
  61. batch_size = 32
  62. nb_epoch = 100
  63. # import pdb
  64. # pdb.set_trace()
  65. ModelCheckpoint("weights-improvement-{epoch:02d}-{val_acc:.2f}.hdf5", monitor='val_loss', verbose=0,
  66. save_best_only=False, save_weights_only=False, mode='auto')
  67. for e in range(nb_epoch):
  68. batches = 0
  69. for X_batch, Y_batch in datagen.flow(X_train, Y_train, batch_size=64):
  70. loss = model.train_on_batch(X_batch, [Y_batch,Y_batch,Y_batch]) # note the three outputs
  71. print loss
  72. #print '\r\n'
  73. #loss_and_metrics = model.evaluate(X_test, [Y_test,Y_test,Y_test], batch_size=128)
  74. #model.fit(X_test, [Y_test,Y_test,Y_test], batch_size=64)
  75. batches += 1
  76. if batches >= len(X_train) / 64:
  77. # we need to break the loop by hand because
  78. # the generator loops indefinitely
  79. break
  80. score = model.evaluate(X_test, Y_test, verbose=0)
  81. print('Test score:', score[0])
  82. print('Test accuracy:', score[1])



5.13 GoogLeNet Inception V2

GoogLeNet Inception V2在《Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift》出现,最大亮点是提出了Batch Normalization方法,它起到以下作用:

5.13.1 一些思考

在机器学习中,我们通常会做一种假设:训练样本独立同分布(iid)且训练样本与测试样本分布一致,如果真实数据符合这个假设则模型效果可能会不错,反之亦然,这个在学术上叫Covariate Shift,所以从样本(外部)的角度说,对于神经网络也是一样的道理。从结构(内部)的角度说,由于神经网络由多层组成,样本在层与层之间边提特征边往前传播,如果每层的输入分布不一致,那么势必造成要么模型效果不好,要么学习速度较慢,学术上这个叫Internal Covariate Shift。

5.13.2 BN原理









5.13.3 卷积神经网络中的BN

卷积网络中采用权重共享策略,每个feature map只有一对需要学习。

5.13.4 代码实践

  1. import copy
  2. import numpy as np
  3. import pandas as pd
  4. import matplotlib
  5. matplotlib.use("Agg")
  6. import matplotlib.pyplot as plt
  7. from matplotlib.pyplot import plot,savefig
  8. from keras.datasets import mnist, cifar10
  9. from keras.models import Sequential
  10. from keras.layers.core import Dense, Dropout, Activation, Flatten, Reshape
  11. from keras.optimizers import SGD, RMSprop
  12. from keras.utils import np_utils
  13. from keras.regularizers import l2
  14. from keras.layers.convolutional import Convolution2D, MaxPooling2D, ZeroPadding2D, AveragePooling2D
  15. from keras.callbacks import EarlyStopping
  16. from keras.preprocessing.image import ImageDataGenerator
  17. from keras.layers.normalization import BatchNormalization
  18. import tensorflow as tf
  19. tf.python.control_flow_ops = tf
  20. from PIL import Image
  21. def build_LeNet5():
  22. model = Sequential()
  23. model.add(Convolution2D(96, 11, 11, border_mode='same', input_shape = (32, 32, 3), dim_ordering='tf'))
  24. #注释1 model.add(BatchNormalization())
  25. model.add(MaxPooling2D(pool_size=(2, 2)))
  26. #注释2 model.add(BatchNormalization())
  27. model.add(Activation("tanh"))
  28. model.add(Convolution2D(120, 1, 1, border_mode='valid'))
  29. #注释3 model.add(BatchNormalization())
  30. model.add(Flatten())
  31. model.add(Dense(10))
  32. model.add(BatchNormalization())
  33. model.add(Activation("relu"))
  34. #注释4 model.add(Dense(10))
  35. model.add(Activation('softmax'))
  36. return model
  37. if __name__=="__main__":
  38. from keras.utils.vis_utils import plot_model
  39. model = build_LeNet5()
  40. model.summary()
  41. plot_model(model, to_file="LeNet-5.png", show_shapes=True)
  42. (X_train, y_train), (X_test, y_test) = cifar10.load_data()#mnist.load_data()
  43. X_train = X_train.reshape(X_train.shape[0], 32, 32, 3).astype('float32') / 255
  44. X_test = X_test.reshape(X_test.shape[0], 32, 32, 3).astype('float32') / 255
  45. Y_train = np_utils.to_categorical(y_train, 10)
  46. Y_test = np_utils.to_categorical(y_test, 10)
  47. # this will do preprocessing and realtime data augmentation
  48. datagen = ImageDataGenerator(
  49. featurewise_center=False, # set input mean to 0 over the dataset
  50. samplewise_center=False, # set each sample mean to 0
  51. featurewise_std_normalization=False, # divide inputs by std of the dataset
  52. samplewise_std_normalization=False, # divide each input by its std
  53. zca_whitening=False, # apply ZCA whitening
  54. rotation_range=25, # randomly rotate images in the range (degrees, 0 to 180)
  55. width_shift_range=0.1, # randomly shift images horizontally (fraction of total width)
  56. height_shift_range=0.1, # randomly shift images vertically (fraction of total height)
  57. horizontal_flip=False, # randomly flip images
  58. vertical_flip=False) # randomly flip images
  59. datagen.fit(X_train)
  60. # training
  61. model.compile(loss='categorical_crossentropy',
  62. optimizer='adadelta',
  63. metrics=['accuracy'])
  64. batch_size = 32
  65. nb_epoch = 8
  66. model.fit(X_train, Y_train, batch_size=batch_size, nb_epoch=nb_epoch,
  67. verbose=1, validation_data=(X_test, Y_test))
  68. score = model.evaluate(X_test, Y_test, verbose=0)
  69. print('Test score:', score[0])
  70. print('Test accuracy:', score[1])


5.14 GoogLeNet Inception V3

GoogLeNet Inception V3在《Rethinking the Inception Architecture for Computer Vision》中提出(注意,在这篇论文中作者把该网络结构叫做v2版,我们以最终的v4版论文的划分为标准),该论文的亮点在于:

5.14.1 网络结构设计的准则


5.14.2 平滑样本标注

对于多分类的样本标注一般是one-hot的,例如[0,0,0,1],使用类似交叉熵的损失函数会使得模型学习中对ground truth标签分配过于置信的概率,并且由于ground truth标签的logit值与其他标签差距过大导致,出现过拟合,导致降低泛化性。一种解决方法是加正则项,即对样本标签给个概率分布做调节,使得样本标注变成“soft”的,例如[0.1,0.2,0.1,0.6],这种方式在实验中降低了top-1和top-5的错误率0.2%。

5.14.3 网络结构

5.14.4 代码实践

为了能在单机跑起来,对feature map做了缩减,为适应cifar10的输入大小,对输入的stride做了调整,代码如下。

  1. # -*- coding: utf-8 -*-
  2. import numpy as np
  3. from keras.layers import Input, merge, Dropout, Dense, Lambda, Flatten, Activation, merge
  4. from keras.layers.convolutional import MaxPooling2D, Conv2D, AveragePooling2D
  5. from keras.layers.normalization import BatchNormalization
  6. from keras.layers.merge import concatenate, add
  7. from keras.regularizers import l1_l2
  8. from keras.models import Model
  9. from keras.callbacks import CSVLogger, ReduceLROnPlateau, ModelCheckpoint, EarlyStopping
  10. lr_reducer = ReduceLROnPlateau(monitor='val_loss', factor=np.sqrt(0.5), cooldown=0, patience=3, min_lr=1e-6)
  11. early_stopper = EarlyStopping(monitor='val_acc', min_delta=0.0005, patience=15)
  12. csv_logger = CSVLogger('resnet34_cifar10.csv')
  13. from keras.utils.vis_utils import plot_model
  14. import os
  15. from keras.preprocessing.image import ImageDataGenerator
  16. from keras.utils import np_utils
  17. from keras.datasets import cifar10
  18. from keras import backend as K
  19. import tensorflow as tf
  20. tf.python.control_flow_ops = tf
  21. import warnings
  22. warnings.filterwarnings('ignore')
  23. filter_control = 8
  24. def bn_relu(input):
  25. """Helper to build a BN -> relu block
  26. """
  27. norm = BatchNormalization()(input)
  28. return Activation("relu")(norm)
  29. def before_inception(input_shape, small_mode=False):
  30. input_layer = input_shape
  31. if small_mode:
  32. strides = (1, 1)
  33. else:
  34. strides = (2, 2)
  35. before_conv1_3x3 = Conv2D(name="before_conv1_3x3/2",
  36. filters=32 // filter_control,
  37. kernel_size=(3, 3),
  38. strides=strides,
  39. kernel_initializer='he_normal',
  40. activation='relu',
  41. kernel_regularizer=l1_l2(0.00001))(input_layer)
  42. before_conv2_3x3 = Conv2D(name="before_conv2_3x3/1",
  43. filters=32 // filter_control,
  44. kernel_size=(3, 3),
  45. strides=(1, 1),
  46. kernel_initializer='he_normal',
  47. activation='relu',
  48. kernel_regularizer=l1_l2(0.00001))(before_conv1_3x3)
  49. before_conv3_3x3 = Conv2D(name="before_conv3_3x3/1",
  50. filters=64 // filter_control,
  51. kernel_size=(3, 3),
  52. strides=(1, 1),
  53. kernel_initializer='he_normal',
  54. activation='relu',
  55. padding='same',
  56. kernel_regularizer=l1_l2(0.00001))(before_conv2_3x3)
  57. before_pool1_3x3 = MaxPooling2D(name="before_pool1_3x3/2",
  58. pool_size=(3, 3),
  59. strides=strides,
  60. padding='valid')(before_conv3_3x3)
  61. before_conv4_3x3 = Conv2D(name="before_conv4_3x3/1",
  62. filters=80 // filter_control,
  63. kernel_size=(3, 3),
  64. strides=(1, 1),
  65. kernel_initializer='he_normal',
  66. activation='relu',
  67. padding='valid',
  68. kernel_regularizer=l1_l2(0.00001))(before_pool1_3x3)
  69. before_conv5_3x3 = Conv2D(name="before_conv3_3x3/2",
  70. filters=192 // filter_control,
  71. kernel_size=(3, 3),
  72. strides=strides,
  73. kernel_initializer='he_normal',
  74. activation='relu',
  75. padding='valid',
  76. kernel_regularizer=l1_l2(0.00001))(before_conv4_3x3)
  77. before_conv6_3x3 = Conv2D(name="before_conv6_3x3/1",
  78. filters=288 // filter_control,
  79. kernel_size=(3, 3),
  80. strides=(1, 1),
  81. kernel_initializer='he_normal',
  82. activation='relu',
  83. padding='valid',
  84. kernel_regularizer=l1_l2(0.00001))(before_conv5_3x3)
  85. return before_conv6_3x3
  86. def inception_A(i, input_shape):
  87. input_layer = input_shape
  88. # (20,20,288)
  89. inception_A_conv1_1x1 = Conv2D(name="inception_A_conv1_1x1/1" + i,
  90. filters=64 // filter_control,
  91. kernel_size=(1, 1),
  92. strides=(1, 1),
  93. kernel_initializer='he_normal',
  94. activation='relu',
  95. padding='same',
  96. kernel_regularizer=l1_l2(0.00001))(input_layer)
  97. inception_A_conv2_3x3 = Conv2D(name="inception_A_conv2_3x3/1" + i,
  98. filters=96 // filter_control,
  99. kernel_size=(3, 3),
  100. strides=(1, 1),
  101. kernel_initializer='he_normal',
  102. activation='relu',
  103. padding='same',
  104. kernel_regularizer=l1_l2(0.00001))(inception_A_conv1_1x1)
  105. inception_A_conv3_3x3 = Conv2D(name="inception_A_conv3_3x3/1" + i,
  106. filters=96 // filter_control,
  107. kernel_size=(3, 3),
  108. strides=(1, 1),
  109. kernel_initializer='he_normal',
  110. activation='relu',
  111. padding='same',
  112. kernel_regularizer=l1_l2(0.00001))(inception_A_conv2_3x3)
  113. inception_A_conv4_1x1 = Conv2D(name="inception_A_conv4_1x1/1" + i,
  114. filters=48 // filter_control,
  115. kernel_size=(1, 1),
  116. strides=(1, 1),
  117. kernel_initializer='he_normal',
  118. activation='relu',
  119. padding='same',
  120. kernel_regularizer=l1_l2(0.00001))(input_layer)
  121. inception_A_conv5_3x3 = Conv2D(name="inception_A_conv5_3x3/1" + i,
  122. filters=64 // filter_control,
  123. kernel_size=(3, 3),
  124. strides=(1, 1),
  125. kernel_initializer='he_normal',
  126. activation='relu',
  127. padding='same',
  128. kernel_regularizer=l1_l2(0.00001))(inception_A_conv4_1x1)
  129. inception_A_pool1_3x3 = AveragePooling2D(name="inception_A_pool1_3x3/1" + i,
  130. pool_size=(3, 3),
  131. strides=(1, 1),
  132. padding='same')(input_layer)
  133. inception_A_conv6_1x1 = Conv2D(name="inception_A_conv6_1x1/1" + i,
  134. filters=64 // filter_control,
  135. kernel_size=(1, 1),
  136. strides=(1, 1),
  137. kernel_initializer='he_normal',
  138. activation='relu',
  139. padding='same',
  140. kernel_regularizer=l1_l2(0.00001))(inception_A_pool1_3x3)
  141. inception_A_conv7_1x1 = Conv2D(name="inception_A_conv7_1x1/1" + i,
  142. filters=64 // filter_control,
  143. kernel_size=(1, 1),
  144. strides=(1, 1),
  145. kernel_initializer='he_normal',
  146. activation='relu',
  147. padding='same',
  148. kernel_regularizer=l1_l2(0.00001))(input_layer)
  149. inception_A_merge1 = concatenate([inception_A_conv3_3x3, inception_A_conv5_3x3, inception_A_conv6_1x1, inception_A_conv7_1x1])
  150. return bn_relu(inception_A_merge1)
  151. def inception_B(i, input_shape):
  152. input_layer = input_shape
  153. inception_B_conv1_1x1 = Conv2D(name="inception_B_conv1_1x1/1" + i,
  154. filters=128 // filter_control,
  155. kernel_size=(1, 1),
  156. strides=(1, 1),
  157. kernel_initializer='he_normal',
  158. activation='relu',
  159. padding='same',
  160. kernel_regularizer=l1_l2(0.00001))(input_layer)
  161. inception_B_conv2_1x7 = Conv2D(name="inception_A_conv2_3x3/1" + i,
  162. filters=128 // filter_control,
  163. kernel_size=(1, 7),
  164. strides=(1, 1),
  165. kernel_initializer='he_normal',
  166. activation='relu',
  167. padding='same',
  168. kernel_regularizer=l1_l2(0.00001))(inception_B_conv1_1x1)
  169. inception_B_conv3_7x1 = Conv2D(name="inception_B_conv3_7x1/1" + i,
  170. filters=128 // filter_control,
  171. kernel_size=(7, 1),
  172. strides=(1, 1),
  173. kernel_initializer='he_normal',
  174. activation='relu',
  175. padding='same',
  176. kernel_regularizer=l1_l2(0.00001))(inception_B_conv2_1x7)
  177. inception_B_conv4_1x7 = Conv2D(name="inception_B_conv4_1x7/1" + i,
  178. filters=128 // filter_control,
  179. kernel_size=(1, 7),
  180. strides=(1, 1),
  181. kernel_initializer='he_normal',
  182. activation='relu',
  183. padding='same',
  184. kernel_regularizer=l1_l2(0.00001))(inception_B_conv3_7x1)
  185. inception_B_conv5_7x1 = Conv2D(name="inception_B_conv5_7x1/1" + i,
  186. filters=192 // filter_control,
  187. kernel_size=(7, 1),
  188. strides=(1, 1),
  189. kernel_initializer='he_normal',
  190. activation='relu',
  191. padding='same',
  192. kernel_regularizer=l1_l2(0.00001))(inception_B_conv4_1x7)
  193. inception_B_conv6_1x1 = Conv2D(name="inception_B_conv6_1x1/1" + i,
  194. filters=128 // filter_control,
  195. kernel_size=(1, 1),
  196. strides=(1, 1),
  197. kernel_initializer='he_normal',
  198. activation='relu',
  199. padding='same',
  200. kernel_regularizer=l1_l2(0.00001))(input_layer)
  201. inception_B_conv7_1x7 = Conv2D(name="inception_B_conv7_1x7/1" + i,
  202. filters=128 // filter_control,
  203. kernel_size=(1, 7),
  204. strides=(1, 1),
  205. kernel_initializer='he_normal',
  206. activation='relu',
  207. padding='same',
  208. kernel_regularizer=l1_l2(0.00001))(inception_B_conv6_1x1)
  209. inception_B_conv8_7x1 = Conv2D(name="inception_B_conv8_7x1/1" + i,
  210. filters=192 // filter_control,
  211. kernel_size=(7, 1),
  212. strides=(1, 1),
  213. kernel_initializer='he_normal',
  214. activation='relu',
  215. padding='same',
  216. kernel_regularizer=l1_l2(0.00001))(inception_B_conv7_1x7)
  217. inception_B_pool1_3x3 = AveragePooling2D(name="inception_B_pool1_3x3/1" + i,
  218. pool_size=(3, 3),
  219. strides=(1, 1),
  220. padding='same')(input_layer)
  221. inception_B_conv9_1x1 = Conv2D(name="inception_B_conv9_1x1/1" + i,
  222. filters=192 // filter_control,
  223. kernel_size=(1, 1),
  224. strides=(1, 1),
  225. kernel_initializer='he_normal',
  226. activation='relu',
  227. padding='same',
  228. kernel_regularizer=l1_l2(0.00001))(inception_B_pool1_3x3)
  229. inception_B_conv10_1x1 = Conv2D(name="inception_B_conv10_1x1/1" + i,
  230. filters=192 // filter_control,
  231. kernel_size=(1, 1),
  232. strides=(1, 1),
  233. kernel_initializer='he_normal',
  234. activation='relu',
  235. padding='same',
  236. kernel_regularizer=l1_l2(0.00001))(input_layer)
  237. inception_B_merge1 = concatenate(
  238. [inception_B_conv5_7x1, inception_B_conv8_7x1, inception_B_conv9_1x1, inception_B_conv10_1x1])
  239. return bn_relu(inception_B_merge1)
  240. def inception_C(i, input_shape):
  241. input_layer = input_shape
  242. inception_C_conv1_1x1 = Conv2D(name="inception_C_conv1_1x1/1" + i,
  243. filters=448 // filter_control,
  244. kernel_size=(1, 1),
  245. strides=(1, 1),
  246. kernel_initializer='he_normal',
  247. activation='relu',
  248. padding='same',
  249. kernel_regularizer=l1_l2(0.00001))(input_layer)
  250. inception_C_conv2_3x3 = Conv2D(name="inception_C_conv2_3x3/1" + i,
  251. filters=384 // filter_control,
  252. kernel_size=(3, 3),
  253. strides=(1, 1),
  254. kernel_initializer='he_normal',
  255. activation='relu',
  256. padding='same',
  257. kernel_regularizer=l1_l2(0.00001))(inception_C_conv1_1x1)
  258. inception_C_conv3_1x3 = Conv2D(name="inception_C_conv3_1x3/1" + i,
  259. filters=384 // filter_control,
  260. kernel_size=(1, 3),
  261. strides=(1, 1),
  262. kernel_initializer='he_normal',
  263. activation='relu',
  264. padding='same',
  265. kernel_regularizer=l1_l2(0.00001))(inception_C_conv2_3x3)
  266. inception_C_conv4_3x1 = Conv2D(name="inception_C_conv4_3x1/1" + i,
  267. filters=384 // filter_control,
  268. kernel_size=(3, 1),
  269. strides=(1, 1),
  270. kernel_initializer='he_normal',
  271. activation='relu',
  272. padding='same',
  273. kernel_regularizer=l1_l2(0.00001))(inception_C_conv2_3x3)
  274. inception_C_merge1 = concatenate([inception_C_conv3_1x3, inception_C_conv4_3x1])
  275. inception_C_conv5_1x1 = Conv2D(name="inception_C_conv5_1x1/1" + i,
  276. filters=384 // filter_control,
  277. kernel_size=(1, 1),
  278. strides=(1, 1),
  279. kernel_initializer='he_normal',
  280. activation='relu',
  281. padding='same',
  282. kernel_regularizer=l1_l2(0.00001))(input_layer)
  283. inception_C_conv6_1x3 = Conv2D(name="inception_C_conv6_1x3/1" + i,
  284. filters=384 // filter_control,
  285. kernel_size=(1, 3),
  286. strides=(1, 1),
  287. kernel_initializer='he_normal',
  288. activation='relu',
  289. padding='same',
  290. kernel_regularizer=l1_l2(0.00001))(inception_C_conv5_1x1)
  291. inception_C_conv7_3x1 = Conv2D(name="inception_C_conv7_3x1/1" + i,
  292. filters=384 // filter_control,
  293. kernel_size=(3, 1),
  294. strides=(1, 1),
  295. kernel_initializer='he_normal',
  296. activation='relu',
  297. padding='same',
  298. kernel_regularizer=l1_l2(0.00001))(inception_C_conv5_1x1)
  299. inception_C_merge2 = concatenate([inception_C_conv6_1x3, inception_C_conv7_3x1])
  300. inception_C_pool1_3x3 = AveragePooling2D(name="inception_C_pool1_3x3/1" + i,
  301. pool_size=(3, 3),
  302. strides=(1, 1),
  303. padding='same')(input_layer)
  304. inception_C_conv8_1x1 = Conv2D(name="inception_C_conv8_1x1/1" + i,
  305. filters=192 // filter_control,
  306. kernel_size=(1, 1),
  307. strides=(1, 1),
  308. kernel_initializer='he_normal',
  309. activation='relu',
  310. padding='same',
  311. kernel_regularizer=l1_l2(0.00001))(inception_C_pool1_3x3)
  312. inception_C_conv9_1x1 = Conv2D(name="inception_C_conv9_1x1/1" + i,
  313. filters=320 // filter_control,
  314. kernel_size=(1, 1),
  315. strides=(1, 1),
  316. kernel_initializer='he_normal',
  317. activation='relu',
  318. padding='same',
  319. kernel_regularizer=l1_l2(0.00001))(input_layer)
  320. inception_C_merge3 = concatenate(
  321. [inception_C_merge1, inception_C_merge2, inception_C_conv8_1x1, inception_C_conv9_1x1])
  322. return bn_relu(inception_C_merge3)
  323. def create_inception_v3(input_shape, nb_classes=10, small_mode=False):
  324. input_layer = Input(input_shape)
  325. x = before_inception(input_layer, small_mode)
  326. # 3 x Inception A
  327. for i in range(3):
  328. x = inception_A(str(i), x)
  329. # 5 x Inception B
  330. for i in range(5):
  331. x = inception_B(str(i), x)
  332. # 2 x Inception C
  333. for i in range(2):
  334. x = inception_C(str(i), x)
  335. x = AveragePooling2D((8, 8), strides=(1, 1))(x)
  336. # Dropout
  337. x = Dropout(0.8)(x)
  338. x = Flatten()(x)
  339. # Output
  340. out = Dense(output_dim=nb_classes, activation='softmax')(x)
  341. model = Model(input_layer, output=out, name='Inception-v3')
  342. return model
  343. if __name__ == "__main__":
  344. with tf.device('/gpu:3'):
  345. gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=1, allow_growth=True)
  346. os.environ["CUDA_VISIBLE_DEVICES"] = "3"
  347. tf.Session(config=K.tf.ConfigProto(allow_soft_placement=True,
  348. log_device_placement=True,
  349. gpu_options=gpu_options))
  350. (x_train, y_train), (x_test, y_test) = cifar10.load_data()
  351. # reorder dimensions for tensorflow
  352. x_train = np.transpose(x_train.astype('float32') / 255., (0, 1, 2, 3))
  353. x_test = np.transpose(x_test.astype('float32') / 255., (0, 1, 2, 3))
  354. print('x_train shape:', x_train.shape)
  355. print(x_train.shape[0], 'train samples')
  356. print(x_test.shape[0], 'test samples')
  357. # convert class vectors to binary class matrices
  358. y_train = np_utils.to_categorical(y_train)
  359. y_test = np_utils.to_categorical(y_test)
  360. s = x_train.shape[1:]
  361. batch_size = 128
  362. nb_epoch = 10
  363. nb_classes = 10
  364. model = create_inception_v3(s, nb_classes)
  365. model.summary()
  366. plot_model(model, to_file="GoogLeNet-Inception-V3.jpg", show_shapes=True)
  367. model.compile(optimizer='adadelta',
  368. loss='categorical_crossentropy',
  369. metrics=['accuracy'])
  370. model.fit(x_train, y_train,
  371. batch_size=batch_size, nb_epoch=nb_epoch, verbose=1,
  372. validation_data=(x_test, y_test), shuffle=True,
  373. callbacks=[])
  374. # Model saving callback
  375. checkpointer = ModelCheckpoint("weights-improvement-{epoch:02d}-{val_acc:.2f}.hdf5", monitor='val_loss',
  376. verbose=0,
  377. save_best_only=False, save_weights_only=False, mode='auto')
  378. print('Using real-time data augmentation.')
  379. datagen_train = ImageDataGenerator(
  380. featurewise_center=False,
  381. samplewise_center=False,
  382. featurewise_std_normalization=False,
  383. samplewise_std_normalization=False,
  384. zca_whitening=False,
  385. rotation_range=0,
  386. width_shift_range=0.125,
  387. height_shift_range=0.125,
  388. horizontal_flip=True,
  389. vertical_flip=False)
  390. datagen_train.fit(x_train)
  391. history = model.fit_generator(datagen_train.flow(x_train, y_train, batch_size=batch_size, shuffle=True),
  392. samples_per_epoch=x_train.shape[0],
  393. nb_epoch=nb_epoch, verbose=1,
  394. validation_data=(x_test, y_test),
  395. callbacks=[lr_reducer, early_stopper, csv_logger, checkpointer])

5.15 GoogLeNet Inception V4/ResNet V1/V2

这三种结构在《Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning》一文中提出,论文的亮点是:提出了效果更好的GoogLeNet Inception v4网络结构;与残差网络融合,提出效果不逊于v4但训练速度更快的结构。

5.15.1 GoogLeNet Inception V4网络结构

5.15.2 GoogLeNet Inception ResNet网络结构

5.15.3 代码实践

GoogLeNet Inception ResNet V2

  1. # -*- coding: utf-8 -*-
  2. import numpy as np
  3. from keras.layers import Input, merge, Dropout, Dense, Lambda, Flatten, Activation
  4. from keras.layers.convolutional import MaxPooling2D, Conv2D, AveragePooling2D
  5. from keras.layers.normalization import BatchNormalization
  6. from keras.layers.merge import concatenate, add
  7. from keras.regularizers import l1_l2
  8. from keras.models import Model
  9. from keras.callbacks import CSVLogger, ReduceLROnPlateau, ModelCheckpoint, EarlyStopping
  10. lr_reducer = ReduceLROnPlateau(monitor='val_loss', factor=np.sqrt(0.5), cooldown=0, patience=3, min_lr=1e-6)
  11. early_stopper = EarlyStopping(monitor='val_acc', min_delta=0.0005, patience=15)
  12. csv_logger = CSVLogger('resnet34_cifar10.csv')
  13. from keras.utils.vis_utils import plot_model
  14. import os
  15. from keras.preprocessing.image import ImageDataGenerator
  16. from keras.utils import np_utils
  17. from keras.datasets import cifar10
  18. from keras import backend as K
  19. import tensorflow as tf
  20. tf.python.control_flow_ops = tf
  21. import warnings
  22. warnings.filterwarnings('ignore')
  23. filter_control = 8
  24. def bn_relu(input):
  25. """Helper to build a BN -> relu block
  26. """
  27. norm = BatchNormalization()(input)
  28. return Activation("relu")(norm)
  29. def inception_resnet_stem(input_shape, small_mode=False):
  30. input_layer = input_shape
  31. if small_mode:
  32. strides = (1, 1)
  33. else:
  34. strides = (2, 2)
  35. stem_conv1_3x3 = Conv2D(name="stem_conv1_3x3/2",
  36. filters=32 // filter_control,
  37. kernel_size=(3, 3),
  38. strides=strides,
  39. kernel_initializer='he_normal',
  40. activation='relu',
  41. kernel_regularizer=l1_l2(0.0001))(input_layer)
  42. stem_conv2_3x3 = Conv2D(name="stem_conv2_3x3/1",
  43. filters=32 // filter_control,
  44. kernel_size=(3, 3),
  45. strides=(1, 1),
  46. kernel_initializer='he_normal',
  47. activation='relu',
  48. kernel_regularizer=l1_l2(0.0001))(stem_conv1_3x3)
  49. stem_conv3_3x3 = Conv2D(name="stem_conv3_3x3/1",
  50. filters=64 // filter_control,
  51. kernel_size=(3, 3),
  52. strides=(1, 1),
  53. padding='same',
  54. kernel_initializer='he_normal',
  55. activation='relu',
  56. kernel_regularizer=l1_l2(0.0001))(stem_conv2_3x3)
  57. stem_pool1_3x3 = MaxPooling2D(name="stem_pool1_3x3/2",
  58. pool_size=(3, 3),
  59. strides=strides,
  60. padding='valid')(stem_conv3_3x3)
  61. stem_conv4_3x3 = Conv2D(name="stem_conv4_3x3/2",
  62. filters=96 // filter_control,
  63. kernel_size=(3, 3),
  64. strides=strides,
  65. padding='valid',
  66. kernel_initializer='he_normal',
  67. activation='relu',
  68. kernel_regularizer=l1_l2(0.0001))(stem_conv3_3x3)
  69. stem_merge1 = concatenate([stem_pool1_3x3, stem_conv4_3x3])
  70. stem_conv5_1x1 = Conv2D(name="stem_conv5_1x1/1",
  71. filters=64 // filter_control,
  72. kernel_size=(1, 1),
  73. strides=(1, 1),
  74. padding='same',
  75. kernel_initializer='he_normal',
  76. activation='relu',
  77. kernel_regularizer=l1_l2(0.0001))(stem_merge1)
  78. stem_conv6_3x3 = Conv2D(name="stem_conv6_3x3/1",
  79. filters=96 // filter_control,
  80. kernel_size=(3, 3),
  81. strides=(1, 1),
  82. kernel_initializer='he_normal',
  83. activation='relu',
  84. kernel_regularizer=l1_l2(0.0001))(stem_conv5_1x1)
  85. stem_conv7_1x1 = Conv2D(name="stem_conv7_1x1/1",
  86. filters=64 // filter_control,
  87. kernel_size=(1, 1),
  88. strides=(1, 1),
  89. padding='same',
  90. kernel_initializer='he_normal',
  91. activation='relu',
  92. kernel_regularizer=l1_l2(0.0001))(stem_merge1)
  93. stem_conv8_7x1 = Conv2D(name="stem_conv8_7x1/1",
  94. filters=64 // filter_control,
  95. kernel_size=(7, 1),
  96. strides=(1, 1),
  97. padding='same',
  98. kernel_initializer='he_normal',
  99. activation='relu',
  100. kernel_regularizer=l1_l2(0.0001))(stem_conv7_1x1)
  101. stem_conv9_1x7 = Conv2D(name="stem_conv8_1x7/1",
  102. filters=64 // filter_control,
  103. kernel_size=(1, 7),
  104. strides=(1, 1),
  105. padding='same',
  106. kernel_initializer='he_normal',
  107. activation='relu',
  108. kernel_regularizer=l1_l2(0.0001))(stem_conv8_7x1)
  109. stem_conv10_3x3 = Conv2D(name="stem_conv10_3x3/1",
  110. filters=96 // filter_control,
  111. kernel_size=(3, 3),
  112. strides=(1, 1),
  113. padding='valid',
  114. kernel_initializer='he_normal',
  115. activation='relu',
  116. kernel_regularizer=l1_l2(0.0001))(stem_conv9_1x7)
  117. stem_merge2 = concatenate([stem_conv6_3x3, stem_conv10_3x3])
  118. stem_pool2_3x3 = MaxPooling2D(name="stem_pool2_3x3/2",
  119. pool_size=(3, 3),
  120. strides=strides,
  121. padding='valid')(stem_merge2)
  122. stem_conv11_3x3 = Conv2D(name="stem_conv11_3x3/2",
  123. filters=192 // filter_control,
  124. kernel_size=(3, 3),
  125. strides=strides,
  126. padding='valid',
  127. kernel_initializer='he_normal',
  128. activation='relu',
  129. kernel_regularizer=l1_l2(0.0001))(stem_merge2)
  130. stem_merge3 = concatenate([stem_pool2_3x3, stem_conv11_3x3])
  131. return bn_relu(stem_merge3)
  132. def inception_resnet_v2_A(i, input):
  133. # 输入是一个ReLU激活
  134. init = input
  135. inception_A_conv1_1x1 = Conv2D(name="inception_A_conv1_1x1/1" + i,
  136. filters=32 // filter_control,
  137. kernel_size=(1, 1),
  138. strides=(1, 1),
  139. padding='same',
  140. kernel_initializer='he_normal',
  141. activation='relu',
  142. kernel_regularizer=l1_l2(0.0001))(input)
  143. inception_A_conv2_1x1 = Conv2D(name="inception_A_conv2_1x1/1" + i,
  144. filters=32 // filter_control,
  145. kernel_size=(1, 1),
  146. strides=(1, 1),
  147. padding='same',
  148. kernel_initializer='he_normal',
  149. activation='relu',
  150. kernel_regularizer=l1_l2(0.0001))(input)
  151. inception_A_conv3_3x3 = Conv2D(name="inception_A_conv3_3x3/1" + i,
  152. filters=32 // filter_control,
  153. kernel_size=(3, 3),
  154. strides=(1, 1),
  155. padding='same',
  156. kernel_initializer='he_normal',
  157. activation='relu',
  158. kernel_regularizer=l1_l2(0.0001))(inception_A_conv2_1x1)
  159. inception_A_conv4_1x1 = Conv2D(name="inception_A_conv4_1x1/1" + i,
  160. filters=32 // filter_control,
  161. kernel_size=(1, 1),
  162. strides=(1, 1),
  163. padding='same',
  164. kernel_initializer='he_normal',
  165. activation='relu',
  166. kernel_regularizer=l1_l2(0.0001))(input)
  167. inception_A_conv5_3x3 = Conv2D(name="inception_A_conv5_3x3/1" + i,
  168. filters=48 // filter_control,
  169. kernel_size=(3, 3),
  170. strides=(1, 1),
  171. padding='same',
  172. kernel_initializer='he_normal',
  173. activation='relu',
  174. kernel_regularizer=l1_l2(0.0001))(inception_A_conv4_1x1)
  175. inception_A_conv6_3x3 = Conv2D(name="inception_A_conv6_3x3/1" + i,
  176. filters=64 // filter_control,
  177. kernel_size=(3, 3),
  178. strides=(1, 1),
  179. padding='same',
  180. kernel_initializer='he_normal',
  181. activation='relu',
  182. kernel_regularizer=l1_l2(0.0001))(inception_A_conv5_3x3)
  183. inception_merge1 = concatenate([inception_A_conv1_1x1, inception_A_conv3_3x3, inception_A_conv6_3x3])
  184. inception_A_conv7_1x1 = Conv2D(name="inception_A_conv7_1x1/1" + i,
  185. filters=384 // filter_control,
  186. kernel_size=(1, 1),
  187. strides=(1, 1),
  188. padding='same',
  189. activation='linear')(inception_merge1)
  190. out = add([input, inception_A_conv7_1x1])
  191. return bn_relu(out)
  192. def inception_resnet_v2_B(i, input):
  193. # 输入是一个ReLU激活
  194. init = input
  195. inception_B_conv1_1x1 = Conv2D(name="inception_B_conv1_1x1/1" + i,
  196. filters=192 // filter_control,
  197. kernel_size=(1, 1),
  198. strides=(1, 1),
  199. padding='same',
  200. activation='relu')(input)
  201. inception_B_conv2_1x1 = Conv2D(name="inception_B_conv2_1x1/1" + i,
  202. filters=128 // filter_control,
  203. kernel_size=(1, 1),
  204. strides=(1, 1),
  205. padding='same',
  206. activation='relu')(input)
  207. inception_B_conv3_1x7 = Conv2D(name="inception_B_conv3_1x7/1" + i,
  208. filters=160 // filter_control,
  209. kernel_size=(1, 7),
  210. strides=(1, 1),
  211. padding='same',
  212. activation='relu')(inception_B_conv2_1x1)
  213. inception_B_conv4_7x1 = Conv2D(name="inception_B_conv4_7x1/1" + i,
  214. filters=192 // filter_control,
  215. kernel_size=(7, 1),
  216. strides=(1, 1),
  217. padding='same',
  218. activation='relu')(inception_B_conv3_1x7)
  219. inception_B_merge = concatenate([inception_B_conv1_1x1, inception_B_conv4_7x1])
  220. inception_B_conv7_1x1 = Conv2D(name="inception_B_conv7_1x1/1" + i,
  221. filters=1154 // filter_control,
  222. kernel_size=(1, 1),
  223. strides=(1, 1),
  224. padding='same',
  225. activation='linear')(inception_B_merge)
  226. out = add([input, inception_B_conv7_1x1])
  227. return bn_relu(out)
  228. def inception_resnet_v2_C(i, input):
  229. # 输入是一个ReLU激活
  230. inception_C_conv1_1x1 = Conv2D(name="inception_C_conv1_1x1/1" + i,
  231. filters=192 // filter_control,
  232. kernel_size=(1, 1),
  233. strides=(1, 1),
  234. padding='same',
  235. activation='relu')(input)
  236. inception_C_conv2_1x1 = Conv2D(name="inception_C_conv2_1x1/1" + i,
  237. filters=192 // filter_control,
  238. kernel_size=(1, 1),
  239. strides=(1, 1),
  240. padding='same',
  241. activation='relu')(input)
  242. inception_C_conv3_1x3 = Conv2D(name="inception_C_conv3_1x3/1" + i,
  243. filters=224 // filter_control,
  244. kernel_size=(1, 3),
  245. strides=(1, 1),
  246. padding='same',
  247. activation='relu')(inception_C_conv2_1x1)
  248. inception_C_conv3_3x1 = Conv2D(name="inception_C_conv3_3x1/1" + i,
  249. filters=256 // filter_control,
  250. kernel_size=(3, 1),
  251. strides=(1, 1),
  252. padding='same',
  253. activation='relu')(inception_C_conv3_1x3)
  254. ir_merge = concatenate([inception_C_conv1_1x1, inception_C_conv3_3x1])
  255. inception_C_conv4_1x1 = Conv2D(name="inception_C_conv4_1x1/1" + i,
  256. filters=2048 // filter_control,
  257. kernel_size=(1, 1),
  258. strides=(1, 1),
  259. padding='same',
  260. activation='linear')(ir_merge)
  261. out = add([input, inception_C_conv4_1x1])
  262. return bn_relu(out)
  263. def reduction_A(input, k=192, l=224, m=256, n=384):
  264. pool_size = (3, 3)
  265. strides = (2, 2)
  266. reduction_A_pool1 = MaxPooling2D(name="reduction_A_pool1/2",
  267. pool_size=pool_size,
  268. strides=strides,
  269. padding='valid')(input)
  270. reduction_A_conv1_3x3 = Conv2D(name="reduction_A_conv1_3x3/1",
  271. filters=n // filter_control,
  272. kernel_size=pool_size,
  273. strides=strides,
  274. activation='relu')(input)
  275. reduction_A_conv2_1x1 = Conv2D(name="reduction_A_conv2_1x1/1",
  276. filters=k // filter_control,
  277. kernel_size=(1, 1),
  278. strides=(1, 1),
  279. padding='same',
  280. activation='relu')(input)
  281. reduction_A_conv2_3x3 = Conv2D(name="reduction_A_conv2_3x3/1",
  282. filters=l // filter_control,
  283. kernel_size=(3, 3),
  284. strides=(1, 1),
  285. padding='same',
  286. activation='relu')(reduction_A_conv2_1x1)
  287. reduction_A_conv3_3x3 = Conv2D(name="reduction_A_conv3_3x3/1",
  288. filters=m // filter_control,
  289. kernel_size=pool_size,
  290. strides=strides,
  291. activation='relu')(reduction_A_conv2_3x3)
  292. reduction_A_merge = concatenate([reduction_A_pool1, reduction_A_conv1_3x3, reduction_A_conv3_3x3])
  293. return reduction_A_merge
  294. def reduction_B(input):
  295. pool_size = (3, 3)
  296. strides = (2, 2)
  297. reduction_B_pool1 = MaxPooling2D(name="reduction_B_pool1/2",
  298. pool_size=pool_size,
  299. strides=strides,
  300. padding='valid')(input)
  301. reduction_B_conv1_1x1 = Conv2D(name="reduction_B_conv3_3x3/1",
  302. filters=256 // filter_control,
  303. kernel_size=(1, 1),
  304. strides=(1, 1),
  305. padding='same',
  306. activation='relu')(input)
  307. reduction_B_conv2_3x3 = Conv2D(name="reduction_B_conv2_3x3/1",
  308. filters=288 // filter_control,
  309. kernel_size=pool_size,
  310. strides=strides,
  311. activation='relu')(reduction_B_conv1_1x1)
  312. reduction_B_conv3_1x1 = Conv2D(name="reduction_B_conv3_1x1/1",
  313. filters=256 // filter_control,
  314. kernel_size=(1, 1),
  315. strides=(1, 1),
  316. padding='same',
  317. activation='relu')(input)
  318. reduction_B_conv4_3x3 = Conv2D(name="reduction_B_conv4_3x3/1",
  319. filters=288 // filter_control,
  320. kernel_size=pool_size,
  321. strides=strides,
  322. activation='relu')(reduction_B_conv3_1x1)
  323. reduction_B_conv5_1x1 = Conv2D(name="reduction_B_conv5_1x1/1",
  324. filters=256 // filter_control,
  325. kernel_size=(1, 1),
  326. strides=(1, 1),
  327. padding='same',
  328. activation='relu')(input)
  329. reduction_B_conv5_3x3 = Conv2D(name="reduction_B_conv5_3x3/1",
  330. filters=288 // filter_control,
  331. kernel_size=(3, 3),
  332. strides=(1, 1),
  333. padding='same',
  334. activation='relu')(reduction_B_conv5_1x1)
  335. reduction_B_conv6_3x3 = Conv2D(name="reduction_B_conv6_3x3/1",
  336. filters=320 // filter_control,
  337. kernel_size=pool_size,
  338. strides=strides,
  339. activation='relu')(reduction_B_conv5_3x3)
  340. reduction_B_merge = concatenate(
  341. [reduction_B_pool1, reduction_B_conv2_3x3, reduction_B_conv4_3x3, reduction_B_conv6_3x3])
  342. return reduction_B_merge
  343. def create_inception_resnet_v2(input_shape, nb_classes=10, small_mode=False):
  344. input_layer = Input(input_shape)
  345. x = inception_resnet_stem(input_layer, small_mode)
  346. # 10 x Inception Resnet A
  347. for i in range(10):
  348. x = inception_resnet_v2_A(str(i), x)
  349. # Reduction A
  350. x = reduction_A(x, k=256, l=256, m=384, n=384)
  351. # 20 x Inception Resnet B
  352. for i in range(20):
  353. x = inception_resnet_v2_B(str(i), x)
  354. # 对32*32*3的数据可以更改pooling层
  355. aout = AveragePooling2D((5, 5), strides=(3, 3))(x)
  356. aout = Conv2D(name="conv1_1x1/1",
  357. filters=128,
  358. kernel_size=(1, 1),
  359. strides=(1, 1),
  360. padding='same',
  361. activation='relu')(aout)
  362. aout = Conv2D(name="conv1_5x5/1",
  363. filters=768,
  364. kernel_size=(5, 5),
  365. strides=(1, 1),
  366. padding='same',
  367. activation='relu')(aout)
  368. aout = Flatten()(aout)
  369. aout = Dense(nb_classes, activation='softmax')(aout)
  370. # Reduction Resnet B
  371. x = reduction_B(x)
  372. # 10 x Inception Resnet C
  373. for i in range(10):
  374. x = inception_resnet_v2_C(str(i), x)
  375. # 需要视情况更改
  376. x = AveragePooling2D((4, 4), strides=(1, 1))(x)
  377. # Dropout
  378. x = Dropout(0.8)(x)
  379. x = Flatten()(x)
  380. # Output
  381. out = Dense(output_dim=nb_classes, activation='softmax')(x)
  382. # 简单起见去掉附加目标函数
  383. # model = Model(input_layer, output=[out, aout], name='Inception-Resnet-v2')
  384. model = Model(input_layer, output=out, name='Inception-Resnet-v2')
  385. return model
  386. if __name__ == "__main__":
  387. with tf.device('/gpu:3'):
  388. gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=1, allow_growth=True)
  389. os.environ["CUDA_VISIBLE_DEVICES"] = "3"
  390. tf.Session(config=K.tf.ConfigProto(allow_soft_placement=True,
  391. log_device_placement=True,
  392. gpu_options=gpu_options))
  393. (x_train, y_train), (x_test, y_test) = cifar10.load_data()
  394. # reorder dimensions for tensorflow
  395. x_train = np.transpose(x_train.astype('float32') / 255., (0, 1, 2, 3))
  396. x_test = np.transpose(x_test.astype('float32') / 255., (0, 1, 2, 3))
  397. print('x_train shape:', x_train.shape)
  398. print(x_train.shape[0], 'train samples')
  399. print(x_test.shape[0], 'test samples')
  400. # convert class vectors to binary class matrices
  401. y_train = np_utils.to_categorical(y_train)
  402. y_test = np_utils.to_categorical(y_test)
  403. s = x_train.shape[1:]
  404. batch_size = 128
  405. nb_epoch = 10
  406. nb_classes = 10
  407. model = create_inception_resnet_v2(s, nb_classes, False, True)
  408. model.summary()
  409. plot_model(model, to_file="GoogLeNet-Inception-Resnet-V2.jpg", show_shapes=True)
  410. model.compile(optimizer='adadelta',
  411. loss='categorical_crossentropy',
  412. metrics=['accuracy'])
  413. # Model saving callback
  414. checkpointer = ModelCheckpoint("weights-improvement-{epoch:02d}-{val_acc:.2f}.hdf5", monitor='val_loss',
  415. verbose=0,
  416. save_best_only=False, save_weights_only=False, mode='auto')
  417. print('Using real-time data augmentation.')
  418. datagen_train = ImageDataGenerator(
  419. featurewise_center=False,
  420. samplewise_center=False,
  421. featurewise_std_normalization=False,
  422. samplewise_std_normalization=False,
  423. zca_whitening=False,
  424. rotation_range=0,
  425. width_shift_range=0.125,
  426. height_shift_range=0.125,
  427. horizontal_flip=True,
  428. vertical_flip=False)
  429. datagen_train.fit(x_train)
  430. history = model.fit_generator(datagen_train.flow(x_train, y_train, batch_size=batch_size, shuffle=True),
  431. samples_per_epoch=x_train.shape[0],
  432. nb_epoch=nb_epoch, verbose=1,
  433. validation_data=(x_test, y_test),
  434. callbacks=[lr_reducer, early_stopper, csv_logger, checkpointer])

5.16 模型可视化

5.16.1 一些说明

神经网络本身包含了一系列特征提取器,理想的feature map应该是稀疏的以及包含典型的局部信息,通过模型可视化能有一些直观的认识并帮助我们调试模型,比如:feature map与原图很接近,说明它没有学到什么特征;或者它几乎是一个纯色的图,说明它太过稀疏,可能是我们feature map数太多了。可视化有很多种,比如:feature map可视化、权重可视化等等,我以feature map可视化为例。

利用keras,采用在imagenet 1000分类的数据集上预训练好的googLeNet inception v3做实验,以下面两张图作为输入。

从左往右看,可以看到整个特征提取的过程,有的分离背景、有的提取轮廓,有的提取色差,但也能发现10、11层中间两个feature map是纯色的,可能这一层feature map数有点多了,另外北汽绅宝D50的光晕对feature map中光晕的影响也能比较明显看到。

5.16.2 代码实践


  1. # -*- coding: utf-8 -*-
  2. from keras.applications import InceptionV3
  3. from keras.applications.inception_v3 import preprocess_input
  4. from keras.preprocessing import image
  5. from keras.models import Model
  6. from keras.applications.imagenet_utils import decode_predictions
  7. import numpy as np
  8. import cv2
  9. from cv2 import *
  10. import matplotlib.pyplot as plt
  11. import scipy as sp
  12. from scipy.misc import toimage
  13. def test_opencv():
  14. # 加载摄像头
  15. cam = VideoCapture(0) # 0 -> 摄像头序号,如果有两个三个四个摄像头,要调用哪一个数字往上加嘛
  16. # 抓拍 5 张小图片
  17. for x in range(0, 5):
  18. s, img = cam.read()
  19. if s:
  20. imwrite("o-" + str(x) + ".jpg", img)
  21. def load_original(img_path):
  22. # 把原始图片压缩为 299*299大小
  23. im_original = cv2.resize(cv2.imread(img_path), (299, 299))
  24. im_converted = cv2.cvtColor(im_original, cv2.COLOR_BGR2RGB)
  25. plt.figure(0)
  26. plt.subplot(211)
  27. plt.imshow(im_converted)
  28. return im_original
  29. def load_fine_tune_googlenet_v3(img):
  30. # 加载fine-tuning googlenet v3模型,并做预测
  31. model = InceptionV3(include_top=True, weights='imagenet')
  32. model.summary()
  33. x = image.img_to_array(img)
  34. x = np.expand_dims(x, axis=0)
  35. x = preprocess_input(x)
  36. preds = model.predict(x)
  37. print('Predicted:', decode_predictions(preds))
  38. plt.subplot(212)
  39. plt.plot(preds.ravel())
  40. plt.show()
  41. return model, x
  42. def extract_features(ins, layer_id, filters, layer_num):
  43. '''
  44. 提取指定模型指定层指定数目的feature map并输出到一幅图上.
  45. :param ins: 模型实例
  46. :param layer_id: 提取指定层特征
  47. :param filters: 每层提取的feature map数
  48. :param layer_num: 一共提取多少层feature map
  49. :return: None
  50. '''
  51. if len(ins) != 2:
  52. print('parameter error:(model, instance)')
  53. return None
  54. model = ins[0]
  55. x = ins[1]
  56. if type(layer_id) == type(1):
  57. model_extractfeatures = Model(input=model.input, output=model.get_layer(index=layer_id).output)
  58. else:
  59. model_extractfeatures = Model(input=model.input, output=model.get_layer(name=layer_id).output)
  60. fc2_features = model_extractfeatures.predict(x)
  61. if filters > len(fc2_features[0][0][0]):
  62. print('layer number error.', len(fc2_features[0][0][0]),',',filters)
  63. return None
  64. for i in range(filters):
  65. plt.subplots_adjust(left=0, right=1, bottom=0, top=1)
  66. plt.subplot(filters, layer_num, layer_id + 1 + i * layer_num)
  67. plt.axis("off")
  68. if i < len(fc2_features[0][0][0]):
  69. plt.imshow(fc2_features[0, :, :, i])
  70. # 层数、模型、卷积核数
  71. def extract_features_batch(layer_num, model, filters):
  72. '''
  73. 批量提取特征
  74. :param layer_num: 层数
  75. :param model: 模型
  76. :param filters: feature map数
  77. :return: None
  78. '''
  79. plt.figure(figsize=(filters, layer_num))
  80. plt.subplot(filters, layer_num, 1)
  81. for i in range(layer_num):
  82. extract_features(model, i, filters, layer_num)
  83. plt.savefig('sample.jpg')
  84. plt.show()
  85. def extract_features_with_layers(layers_extract):
  86. '''
  87. 提取hypercolumn并可视化.
  88. :param layers_extract: 指定层列表
  89. :return: None
  90. '''
  91. hc = extract_hypercolumn(x[0], layers_extract, x[1])
  92. ave = np.average(hc.transpose(1, 2, 0), axis=2)
  93. plt.imshow(ave)
  94. plt.show()
  95. def extract_hypercolumn(model, layer_indexes, instance):
  96. '''
  97. 提取指定模型指定层的hypercolumn向量
  98. :param model: 模型
  99. :param layer_indexes: 层id
  100. :param instance: 模型
  101. :return:
  102. '''
  103. feature_maps = []
  104. for i in layer_indexes:
  105. feature_maps.append(Model(input=model.input, output=model.get_layer(index=i).output).predict(instance))
  106. hypercolumns = []
  107. for convmap in feature_maps:
  108. for i in convmap[0][0][0]:
  109. upscaled = sp.misc.imresize(convmap[0, :, :, i], size=(299, 299), mode="F", interp='bilinear')
  110. hypercolumns.append(upscaled)
  111. return np.asarray(hypercolumns)
  112. if __name__ == '__main__':
  113. img_path = 'd:\car3.jpg'
  114. img = load_original(img_path)
  115. x = load_fine_tune_googlenet_v3(img)
  116. extract_features_batch(15, x, 3)
  117. extract_features_with_layers([1, 4, 7])
  118. extract_features_with_layers([1, 4, 7, 10, 11, 14, 17])


1、《Understanding the Bias-Variance Tradeoff》
2、《Boosting Algorithms as Gradient Descent in Function Space》
3、《Optimal Action Extraction for Random Forests and
Boosted Trees》
4、《Applying Neural Network Ensemble Concepts for Modelling Project Success》
5、《Introduction to Boosted Trees》
6、《Machine Learning:Perceptrons》
7、《An overview of gradient descent optimization algorithms》
8、《Ad Click Prediction: a View from the Trenches》
9、《Improving the Convergence of Back-Propagation Learning with Second Order Methods》
11、《Adaptive Subgradient Methods for Online Learning and Stochastic Optimization》
11、《Sparse Allreduce: Efficient Scalable Communication for Power-Law Data》
12、《Asynchronous Parallel Stochastic Gradient Descent》
13、《Large Scale Distributed Deep Networks》
14、《Introduction to Optimization —— Second Order Optimization Methods》
15、《On the complexity of steepest descent, Newton’s and regularized Newton’s methods for nonconvex unconstrained optimization》
16、《On Discriminative vs. Generative classifiers: A comparison of logistic regression and naive Bayes 》
17、《Parametric vs Nonparametric Models》
18、《XGBoost: A Scalable Tree Boosting System》
20、《Computer vision: LeNet-5, AlexNet, VGG-19, GoogLeNet》
21、François Chollet在Quora上的专题问答:
23、《Upsampling and Image Segmentation with Tensorflow and TF-Slim》
