机器学习 中文版

1. 一些基本概念

1.1 生成式模型与判别式模型

给个例子感觉一下: 如果我想知道一个人A说的是哪个国家的语言,我应该怎么办呢?

  • 生成式模型
  • 判别式模型

如果我有输入数据,并且想通过标注去区分不同数据属于哪一类,生成式模型是在学习样本和标注的联合概率分布 而判别式模型是在学习条件概率

  • 生成式模型可以通过贝叶斯公式转化为,并用于分类,而联合概率分布也可用于其他目的,比如用来生成样本对

  • 判别式模型的主要任务是找到一个或一系列超平面,利用它(们)划分给定样本到给定分类,这也能直白的体现出“判别”模型这个名称。


一些理论可看:On Discriminative vs Generative classifiers: A comparison of logistic regression and naive Bayes

  1. 常见生成式模型

    • Naive Bayes
    • Gaussians
    • Mixtures of Gaussians
    • Mixtures of Experts
    • Mixtures of Multinomials
    • HMM
    • Markov random fields
    • Sigmoidal belief networks
    • Bayesian networks
  2. 常见判别式模型

    • Linear regression
    • Logistic regression
    • SVM
    • Perceptron
    • Traditional Neural networks
    • Nearest neighbor
    • Conditional random fields

1.2 参数学习与非参学习


1.2.1 参数学习


  1. 选择某种形式的函数并通过机器学习用一系列固定个数的参数尽可能表征这些数据的某种模式;
  2. 不管数据量有多大,函数参数的个数是固定的,即参数个数不随着样本量的增大而增加,从关系上说它们相互独立;
  3. 往往对数据有较强的假设,如分布的假设,空间的假设等。
  4. 常用参数学习的模型有:
    • Logistic Regression
    • Linear Regression
    • Polynomial regression
    • Linear Discriminant Analysis
    • Perceptron
    • Naive Bayes
    • Simple Neural Networks
    • 使用线性核的SVM
    • Mixture models
    • K-means
    • Hidden Markov models
    • Factor analysis / pPCA / PMF

1.2.2 非参学习


  1. 数据决定了函数形式,函数参数个数不固定;
  2. 随着数据量的增加,参数个数一般也会随之增长;
  3. 对数据本身做较少的先验假设。
  4. 一些常用的非参学习模型:
    • k-Nearest Neighbors
    • Decision Trees like CART and C4.5
    • 使用非线性核的SVM
    • Gradient Boosted Decision Trees
    • Gaussian processes for regression
    • Dirichlet process mixtures
    • infinite HMMs
    • infinite latent factor models

进一步知识可以看:Parametric vs Nonparametric Models

1.3 监督学习、非监督学习与强化学习

1.3.1 监督学习


picture from here

1.3.2 非监督学习


picture from here

1.3.3 强化学习



picture from here


2. 建模方法回顾


2.0 偏差与方差

  • 在机器学习算法中,偏差是由先验假设的不合理带来的模型误差,高偏差会导致欠拟合: 所谓欠拟合是指对特征和标注之间的因果关系学习不到位,导致模型本身没有较好的学到历史经验的现象;
  • 方差表征的是模型误差对样本发生一定变化时的敏感度,高方差会导致过拟合:模型对训练样本中的随机噪声也做了拟合学习,导致在未知样本上应用时出现效果较差的现象;
  • 机器学习模型的核心之一在于其推广能力,即在未知样本上的表现。



我们希望用 估计 ,如果使用基于square loss 的线性回归,则误差分析如下:


2.1 线性回归-Linear Regression


2.1.1 模型原理


  • 预测函数
  • 参数学习-采用最小二乘法


  • 自变量相互独立,无多重共线性
  • 因变量是自变量的线性加权组合:
  • 所有样本独立同分布(iid),且误差项服从以下分布:





2.1.2 损失函数

损失函数1 —— Least Square Loss

进一步阅读可参考:Least Squares

Q: 模型和损失的关系是什么?

2.2 支持向量机-Support Vector Machine


2.2.1 模型原理










2.2.2 损失函数

损失函数2 —— Hinge Loss

使用hinge loss将SVM套入机器学习框架,让它更容易理解。此时原始约束最优化问题变成损失函数是hinge loss且正则项是L2正则的无约束最优化问题:


到此为止,SVM和普通的判别模型没什么两样,也没有support vector的概念,它之所以叫SVM就得说它的对偶形式了,通过拉格朗日乘数法对原始问题做对偶变换:





2.2.3 核方法


from Kernel Trick

1、Kernel Methods for Pattern Analysis
2、the kernel trick for distances


  • SVM
  • Perceptron
  • PCA
  • Gaussian processes
  • Canonical correlation analysis
  • Ridge regression
  • Spectral clustering


SVM学习——Coordinate Desent Method
SVM学习——Sequential Minimal Optimization
SVM学习——Improvements to Platt’s SMO Algorithm

2.3 逻辑回归-Logistic Regression


2.3.1 模型原理

逻辑回归是一种判别模型,与线性回归类似,它有比较强的先验假设 :

  • 假设因变量服从贝努利分布
  • 假设训练样本服从钟形分布,例如高斯分布:
  • 是样本标注,布尔类型,取值为0或1;
  • 是样本的特征向量。



采用 MLE 或者 MAP 做参数求解:

2.3.2 损失函数

损失函数3 —— Cross Entropy Loss

简单理解,从概率角度:Cross Entropy损失函数衡量的是两个概率分布之间的相似性,对真实分布估计的越准损失越小;从信息论角度:用编码方式对由编码方式产生的信息做编码,如果两种编码方式越接近,产生的信息损失越小。与Cross Entropy相关的一个概念是Kullback–Leibler divergence,后者是衡量两个概率分布接近程度的标量值,定义如下:

当两个分布完全一致时其值为0,显然Cross Entropy与Kullback–Leibler divergence的关系是:

关于交叉熵及其周边原理,有一篇文章写得特别好:Visual Information Theory

2.4 Bagging and Boosting框架


2.4.1 Bagging框架

Bagging(Breiman, 1996) 方法是通过对训练样本和特征做有放回的抽样,并拟合若干个基础模型进而通过投票方式做最终分类决策的框架。每个基础分类器(可以是树形结构、神经网络等等任何分类模型)的特点是低偏差、高方差,框架通过(加权)投票方式降低方差,使得整体趋于低偏差、低方差

假设任务是学习一个模型 ,我们通过抽样生成生成 个数据集,并训练得到个基础分类器


2.4.2 Boosting框架

Boosting(Freund & Shapire, 1996) 通过迭代方式训练若干基础分类器,每个分类器依据上一轮分类器产生的残差做权重调整,每轮的分类器需要够“简单”,具有高偏差、低方差的特点,框架再辅以(加权)投票方式降低偏差,使得整体趋于低偏差、低方差


AnyBoost Algorithm

Q: boosting 和 margin的关系是什么(机器学习中margin的定义为)?
Q: 类似bagging,为什么boosting能够通过reweight及投票方式降低整体偏差?

2.5 Additive Tree 模型

Additive tree models (ATMs)是指基础模型是树形结构的一类融合模型,可做分类、回归,很多经典的模型可以被看做ATM模型,比如Random forest 、Adaboost with trees、GBDT等。

ATM 对N棵决策树做加权融合,其判别函数为:

2.5.1 Random Forests

Random Forest 属于bagging类模型,每棵树会使用各自随机抽样样本和特征被独立的训练。

2.5.2 AdaBoost with trees

AdaBoost with trees通过训练多个弱分类器来组合得到一个强分类器,每次迭代会生成一棵高偏差、低方差的树形弱分类器,每一轮的训练会更关注上一轮被分类器分错的样本,为其加大权重,训练过程如下:

From Bishop(2006)

2.5.3 Gradient Boosting Decision Tree

Gradient boosted 是一类boosting的技术,不同于Adaboost加大误分样本权重的策略,它每次迭代加的是上一轮梯度更新值:




Regression Tree Ensemble from chentianqi



  • 提前终止(Early Stopping)
  • 收缩(Shrinkage)

    从迭代的角度可以看成是学习率(learning rate),从融合(ensemble)的角度可以看成每棵树的权重,的大小经验上可以取0.1,它是对模型泛化性和训练时长的折中;
  • 抽样(Subsampling)
  • 目标函数中显式的正则化项

  • 参数放弃(Dropout)
    模拟深度学习里随机放弃更新权重的方法,可以在每新增一棵树的时候拟合随机抽取的一些树的残差,相关方法可以参考:DART: Dropouts meet Multiple Additive Regression Trees,文中对该方法和Shrinkage的方法做了比较:

XGBoost源码在: https://github.com/dmlc中,其包含非常棒的设计思想和实现,建议大家都去学习一下,一起添砖加瓦。原理部分我就不再多写了,看懂一篇论文即可,但特别需要注意的是文中提到的weighted quantile sketch算法,它用来解决当样本集权重分布不一致时如何选择分裂节点的问题:XGBoost: A Scalable Tree Boosting System

2.5.4 简单的例子


  1. import urllib
  2. import matplotlib
  3. import os
  4. matplotlib.use('Agg')
  5. from matplotlib import pyplot as plt
  6. from mpl_toolkits.mplot3d import proj3d
  7. import numpy as np
  8. from mpl_toolkits.mplot3d import Axes3D
  9. from sklearn.externals.joblib import Memory
  10. from sklearn.datasets import load_svmlight_file
  11. from sklearn import metrics
  12. from sklearn.metrics import roc_auc_score
  13. from sklearn import svm
  14. from sklearn.linear_model import LogisticRegression
  15. from sklearn.linear_model import Ridge
  16. from sklearn.ensemble import GradientBoostingClassifier
  17. from mpl_toolkits.mplot3d import Axes3D
  18. from matplotlib import cm
  19. from matplotlib.ticker import LinearLocator, FormatStrFormatter
  20. from sklearn.tree import DecisionTreeClassifier
  21. import keras
  22. from keras.models import Sequential
  23. from keras.layers.core import Dense,Dropout,Activation
  24. def download(outpath):
  25. filename=outpath+"/fourclass_scale"
  26. if os.path.exists(filename) == False:
  27. urllib.urlretrieve("https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/binary/fourclass_scale",filename)
  28. def data_building():
  29. dtrain = load_svmlight_file('fourclass_scale')
  30. train_d=dtrain[0].toarray()
  31. train_l=dtrain[1]
  32. x1 = train_d[:,0]
  33. x2 = train_d[:,1]
  34. y = train_l
  35. px1 = []
  36. px2 = []
  37. pl = []
  38. nx1 = []
  39. nx2 = []
  40. nl = []
  41. idx = 0
  42. for i in y:
  43. if i == 1:
  44. px1.append(x1[idx]-0.5)
  45. px2.append(x2[idx]+0.5)
  46. pl.append(i)
  47. else:
  48. nx1.append(x1[idx]+0.8)
  49. nx2.append(x2[idx]-0.8)
  50. nl.append(i)
  51. idx = idx + 1
  52. x_axis, y_axis = np.meshgrid(np.linspace(x1.min(), x1.max(), 100), np.linspace(x2.min(), x2.max(), 100))
  53. return x_axis, y_axis, px1, px2, nx1, nx2, train_d, train_l
  54. def paint(name, x_axis, y_axis, px1, px2, nx1, nx2, z):
  55. fig = plt.figure()
  56. ax = Axes3D(fig)
  57. ax=plt.subplot(projection='3d')
  58. ax.scatter(px1,px2,c='r')
  59. ax.scatter(nx1,nx2,c='g')
  60. ax.plot_surface(x_axis, y_axis,z.reshape(x_axis.shape), rstride=8, cstride=8, alpha=0.3)
  61. ax.contourf(x_axis, y_axis, z.reshape(x_axis.shape), zdir='z', offset=-100, cmap=cm.coolwarm)
  62. ax.contourf(x_axis, y_axis, z.reshape(x_axis.shape), levels=[0,max(z)], cmap=cm.hot)
  63. ax.set_xlabel('X')
  64. ax.set_ylabel('Y')
  65. ax.set_zlabel('Z')
  66. fig.savefig(name+".png", format='png')
  67. def svc(x_axis, y_axis, x,y):
  68. clf = svm.SVC()
  69. clf.fit(x, y)
  70. y = clf.predict(np.c_[x_axis.ravel(), y_axis.ravel()])
  71. return y
  72. def lr(x_axis, y_axis, x,y):
  73. clf = LogisticRegression()
  74. clf.fit(x, y)
  75. y = clf.predict(np.c_[x_axis.ravel(), y_axis.ravel()])
  76. return y
  77. def ridge(x_axis, y_axis, x,y):
  78. clf = Ridge()
  79. clf.fit(x, y)
  80. y = clf.predict(np.c_[x_axis.ravel(), y_axis.ravel()])
  81. return y
  82. def dt(x_axis, y_axis, x,y):
  83. clf = GradientBoostingClassifier()
  84. clf.fit(x, y)
  85. y = clf.predict(np.c_[x_axis.ravel(), y_axis.ravel()])
  86. return y
  87. def nn(x_axis, y_axis, x,y):
  88. model = Sequential()
  89. model.add(Dense(20, input_dim=2))
  90. model.add(Activation('relu'))
  91. model.add(Dense(20))
  92. model.add(Activation('relu'))
  93. model.add(Dense(1, activation='tanh'))
  94. model.compile(loss='mse',
  95. optimizer='adam',
  96. metrics=['accuracy'])
  97. model.fit(x,y,batch_size=20, nb_epoch=50, validation_split=0.2)
  98. y = model.predict(np.c_[x_axis.ravel(), y_axis.ravel()],batch_size=20)
  99. return y
  100. if __name__ == '__main__':
  101. download("/root")
  102. x_axis, y_axis, px1, px2, nx1, nx2, train_d, train_l = data_building()
  103. z = svc(x_axis, y_axis, train_d, train_l)
  104. paint("svc", x_axis, y_axis, px1, px2, nx1, nx2, z)
  105. z = lr(x_axis, y_axis, train_d, train_l)
  106. paint("lr", x_axis, y_axis, px1, px2, nx1, nx2, z)
  107. z = ridge(x_axis, y_axis, train_d, train_l)
  108. paint("ridge", x_axis, y_axis, px1, px2, nx1, nx2, z)
  109. z = dt(x_axis, y_axis, train_d, train_l)
  110. paint("gbdt", x_axis, y_axis, px1, px2, nx1, nx2, z)
  111. z = nn(x_axis, y_axis, train_d, train_l)
  112. paint("nn", x_axis, y_axis, px1, px2, nx1, nx2, z)

2.6 人工神经网络-Neural Network

神经网络在维基百科上的定义是:NN is a network inspired by biological neural networks (the central nervous systems of animals, in particular the brain) which are used to estimate or approximate functions that can depend on a large number of inputs that are generally unknown.(from wikipedia)

2.6.1 神经元




2.6.2 神经网络的常用结构



神经元的各种组合方式得到性质不一的神经网络结构 :






Google DeepMind 记忆神经网络(用于AlphaGo)

2.6.3 一个简单的神经网络例子

假设随机变量 , 使用3层神经网络拟合该分布:

  1. import numpy as np
  2. import matplotlib
  3. matplotlib.use('Agg')
  4. import matplotlib.pyplot as plt
  5. import random
  6. import math
  7. import keras
  8. from keras.models import Sequential
  9. from keras.layers.core import Dense,Dropout,Activation
  10. def gd(x,m,s):
  11. left=1/(math.sqrt(2*math.pi)*s)
  12. right=math.exp(-math.pow(x-m,2)/(2*math.pow(s,2)))
  13. return left*right
  14. def pt(x, y1, y2):
  15. if len(x) != len(y1) or len(x) != len(y2):
  16. print 'input error.'
  17. return
  18. plt.figure(num=1, figsize=(20, 6))
  19. plt.title('NN fitting Gaussian distribution', size=14)
  20. plt.xlabel('x', size=14)
  21. plt.ylabel('y', size=14)
  22. plt.plot(x, y1, color='b', linestyle='--', label='Gaussian distribution')
  23. plt.plot(x, y2, color='r', linestyle='-', label='NN fitting')
  24. plt.legend(loc='upper left')
  25. plt.savefig('ann.png', format='png')
  26. def ann(train_d, train_l, prd_d):
  27. if len(train_d) == 0 or len(train_d) != len(train_l):
  28. print 'training data error.'
  29. return
  30. model = Sequential()
  31. model.add(Dense(30, input_dim=1))
  32. model.add(Activation('relu'))
  33. model.add(Dense(30))
  34. model.add(Activation('relu'))
  35. model.add(Dense(1, activation='sigmoid'))
  36. model.compile(loss='mse',
  37. optimizer='rmsprop',
  38. metrics=['accuracy'])
  39. model.fit(train_d,train_l,batch_size=250, nb_epoch=50, validation_split=0.2)
  40. p = model.predict(prd_d,batch_size=250)
  41. return p
  42. if __name__ == '__main__':
  43. x = np.linspace(-5, 5, 10000)
  44. idx = random.sample(x, 900)
  45. train_d = []
  46. train_l = []
  47. for i in idx:
  48. train_d.append(x[i])
  49. train_l.append(gd(x[i],0,1))
  50. y1 = []
  51. y2 = []
  52. for i in x:
  53. y1.append(gd(i,0,1))
  54. y2 = ann(np.array(train_d).reshape(len(train_d), 1), np.array(train_l), np.array(x).reshape(len(x), 1))
  55. pt(x, y1, y2.tolist())

3. 机器学习中的统一框架


3.1 目标函数


很多模型可以用这种形式框起来,比如linear regression、logistic regression、SVM、additive models、k-means,neural networks 等等。其中损失函数部分用来控制模型的拟合能力,期望降低偏差,正则项部分用来提升模型泛化能力,期望降低方差,最优模型是对偏差和方差的最优折中。

3.1.1 损失函数



实践当中很少直接使用0-1损失做优化(当然也有这么用的如:Direct 0-1 Loss Minimization and Margin Maximization with BoostingAlgorithms for Direct 0–1 Loss Optimization in Binary Classification,但总的来说应用有限),原因如下:

  • 0-1损失的优化是组合优化问题且为NP-hard,无法在多项式时间内求得;
  • 损失函数非凸非光滑,很多优化方法无法使用;
  • 对权重的更新可能会导致损失函数大的变化,即变化不光滑;
  • 只能使用正则,其他正则形式都不起作用;
  • 即使使用正则,依然是非凸非光滑,优化求解困难。

由于0-1损失的问题,所以以上损失函数都是对它的近似。原理细节可以参考:Understanding Machine Learning: From Theory to Algorithms


3.1.2 正则化项




假设模型参数也服从某种概率分布: , 可以采用极大后验概率估计(MAP)求解参数。

3.1.3 L2 正则


3.1.4 L1 正则


3.1.5 正则化的几何解释

L1 and L2 Regularization

给定向量, 定义 正则,其中


from wiki

3.1.6 Dropout正则化与数据扩充


3.2 神经网络框架


3.2.1 Linear Regression


3.2.2 Logistic Regression


3.2.3 Support Vector Machine


3.2.4 Bootstrap Neural Networks


3.2.5 Boosting Neural Network


4. 最优化原理

4.1 泰勒定理


4.1.1 泰勒展开式


4.1.2 泰勒中值定理


4.2 梯度下降法

4.2.1 基本原理



4.2.2 迭代框架

4.2.3 批量梯度下降


From michaeljancsy


  • 模型学习与收敛过程通常是平滑的和稳定的;
  • 关于收敛条件有成熟完备的理论;
  • 针对它有不少利用二阶信息加速收敛的技术,例如conjugate gradient;
  • 对样本噪声点相对不敏感。


  • 收敛速度慢;
  • 对初始点敏感;
  • 数据集的变化无法被学习到; captured.
  • 不太适用于大规模数据。

4.2.4 随机梯度下降

完全随机梯度下降(Stochastic Gradient Descent,可以想想这里为什么用Stochastic而不用Random?)每次选择一个样本更新权重,这样会带来一些噪声,但可能得到更好的解,试想很多问题都有大量局部最优解,传统批量梯度下降由于每次收集所有样后更新梯度值,当初始点确定后基本会落入到离它最近的洼地,而随机梯度下降由于噪声的引入会使它有高概率跳出当前洼地,选择变多从而可能找到更好的洼地。

From michaeljancsy


  • SGD的收敛速度更快;
  • SGD相对来说对初始点不敏感,容易找到更优方案;
  • SGD相对适合于大规模训练数据;
  • SGD能够捕捉到样本数据的变化;
  • 噪声样本可能导致权重波动从而造成无法收敛到局部最优解,步长的设计对其非常重要。


4.2.5 小批量梯度下降

小批量梯度下降(Mini-batch Gradient Descent)是对SGD和BGD的折中,采用相对小的样本集学习,样本集大小随着学习过程保持或逐步加大,这样既能有随机带来的好处,又能使用二阶优化信息加速收敛,目前主流机器学习工具几乎都支持小批量学习。

From michaeljancsy


4.2.6 牛顿法



为方便起见,使用 代替 .

4.2.7 Momentum

SGD的一大缺点是 只和当前样本有关系,如果样本存在噪声则会导致权重波动,一种自然的想法就是即考虑历史梯度又考虑新样本的梯度:


  • 在初始阶段,历史梯度信息会极大加速学习过程(比如n=2时);
  • 当准备穿越函数波谷时,差的学习率会导致权重向相反方向更新,于是学习过程会发生回退,这时有动量项的帮助则有可能越过这个波谷;
  • 最后在梯度几乎为0的时候,动量项的存在又可能会使它跳出当前局部最小值,于是可能找到更好的最优值点。

Nesterov accelerated gradient 是对动量法的一种改进,具体做法是:首先在之前的方向上迈一大步(棕色向量),之后计算在该点的梯度(红色向量),然后计算两个向量的和,得到的向量(绿色向量)作为最终优化方向。

From G. Hinton's lecture 6c

4.2.8 AdaGrad

Adagrad同样是基于梯度的方法,对每个参数给一个学习率,因此对于常出现的权重可以给个小的更新,而不常出现的则给予大的更新,于是对于稀疏数据集就很有效,这个方法常用于大规模神经网络,Google的FTRL-Proximal也使用了类似方法,可参见:Google Ad Click Prediction a View from the TrenchesFollow-the-Regularized-Leader and Mirror Descent:
Equivalence Theorems and L1 Regularization


  • 在学习前期,梯度比较小regularizer比较大,所以梯度会被放大;
  • 在学习后期,梯度比较大regularizer比较小,所以梯度会被缩小。


4.2.9 AdaDelta


  • 经过几轮的训练会导致正则化太小;
  • 需要设置一个全局学习率;
  • 当我们更新,等式左边和右边的单位不一致。

对于第一个短板,设置一个窗口,仅使用最近几轮的梯度值去更新正则项但计算 太复杂,所以使用类似动量法的策略:





来源于Becker 和 LeCuns' 的hessian估计法:



From Zeiler



From Karpathy

From SGD optimization on loss surface contours

4.2.10 Adam




4.3 并行SGD

SGD相对简单并且被证明有较好的收敛性质和精度,所以自然而然就想到将其扩展到大规模数据集上,就像Hadoop/Spark的基本框架是MapReduce,并行机器学习的常见框架有两种: AllReduce 和 Parameter Server(PS)。

4.3.1 AllReduce



From MPI Tutorials



From Huasha Zhao & John Canny

非常好的开源实现有John Langfordvowpal wabbit陈天奇Rabit(轻量级、可容错)。并行计算的关键之一是如何在大规模数据集下计算目标函数的梯度值,AllReduce框架很适合这种任务,比如:vw通过构建一个二叉树来管理机器节点,其中一个节点会被当做master,其他节点作为slave,master管理着slave并定期接受它们的心跳,每个子节点的计算结果会被其父节点收集,到达根节点后累加并广播到其所有子节点,一个简单的例子如下:



4.3.2 参数服务器(Parameter Server)

参数服务器强调模型训练时参数的并行异步更新,最早是由Google的Jeffrey Dean团队提出,为了解决深度学习的参数学习问题,其基本思想是:将数据集划分为若干子数据集,每个子数据集所在的节点都运行着一个模型的副本,通过独立 部署的参数服务器组织模型的所有权重,其基本操作有:Fatching:每隔n次迭代,从参数服务器获取参数权重,Pushing:每隔m次迭代,向参数服务器推送本地梯度更新值,之后参数服务器会更新相关参数权重,其基本架构如下:


From Jeffrey Dean: Large Scale Distributed Deep Networks

参数服务器是一个非常好的机器学习框架,尤其在深度学习的应用场景中,有篇不错的文章: 参数服务器——分布式机器学习的新杀器。开源的实现中比较好的是bosen项目和李沐ps-lite(现已集成到DMLC项目中)。

  1. // data structure of ftrl solver.
  2. type FtrlSolver struct {
  3. Alpha float64 `json:"Alpha"`
  4. Beta float64 `json:"Beta"`
  5. L1 float64 `json:"L1"`
  6. L2 float64 `json:"L2"`
  7. Featnum int `json:"Featnum"`
  8. Dropout float64 `json:"Dropout"`
  9. N []float64 `json:"N"`
  10. Z []float64 `json:"Z"`
  11. Weights util.Pvector `json:"Weights"`
  12. Init bool `json:"Init"`
  13. }
  14. // data structure of parameter server.
  15. type FtrlParamServer struct {
  16. FtrlSolver
  17. ParamGroupNum int
  18. LockSlots []sync.Mutex
  19. log log4go.Logger
  20. }
  21. // fetch parameter group for update n and z value.
  22. func (fps *FtrlParamServer) FetchParamGroup(n []float64, z []float64, group int) error {
  23. if !fps.FtrlSolver.Init {
  24. fps.log.Error("[FtrlParamServer-FetchParamGroup] Initialize fast ftrl solver error.")
  25. return errors.New("[FtrlParamServer-FetchParamGroup] Initialize fast ftrl solver error.")
  26. }
  27. var start int = group * ParamGroupSize
  28. var end int = util.MinInt((group+1)*ParamGroupSize, fps.FtrlSolver.Featnum)
  29. fps.LockSlots[group].Lock()
  30. for i := start; i < end; i++ {
  31. n[i] = fps.FtrlSolver.N[i]
  32. z[i] = fps.FtrlSolver.Z[i]
  33. }
  34. fps.LockSlots[group].Unlock()
  35. return nil
  36. }
  37. // fetch parameter from server.
  38. func (fps *FtrlParamServer) FetchParam(n []float64, z []float64) error {
  39. if !fps.FtrlSolver.Init {
  40. fps.log.Error("[FtrlParamServer-FetchParam] Initialize fast ftrl solver error.")
  41. return errors.New("[FtrlParamServer-FetchParam] Initialize fast ftrl solver error.")
  42. }
  43. for i := 0; i < fps.ParamGroupNum; i++ {
  44. err := fps.FetchParamGroup(n, z, i)
  45. if err != nil {
  46. fps.log.Error(fmt.Sprintf("[FtrlParamServer-FetchParam] Initialize fast ftrl solver error.", err.Error()))
  47. return errors.New(fmt.Sprintf("[FtrlParamServer-FetchParam] Initialize fast ftrl solver error.", err.Error()))
  48. }
  49. }
  50. return nil
  51. }
  52. // push parameter group for upload n and z value.
  53. func (fps *FtrlParamServer) PushParamGroup(n []float64, z []float64, group int) error {
  54. if !fps.FtrlSolver.Init {
  55. fps.log.Error("[FtrlParamServer-PushParamGroup] Initialize fast ftrl solver error.")
  56. return errors.New("[FtrlParamServer-PushParamGroup] Initialize fast ftrl solver error.")
  57. }
  58. var start int = group * ParamGroupSize
  59. var end int = util.MinInt((group+1)*ParamGroupSize, fps.FtrlSolver.Featnum)
  60. fps.LockSlots[group].Lock()
  61. for i := start; i < end; i++ {
  62. fps.FtrlSolver.N[i] += n[i]
  63. fps.FtrlSolver.Z[i] += z[i]
  64. n[i] = 0
  65. z[i] = 0
  66. }
  67. fps.LockSlots[group].Unlock()
  68. return nil
  69. }
  70. // push weight update to parameter server.
  71. func (fw *FtrlWorker) PushParam(param_server *FtrlParamServer) error {
  72. if !fw.FtrlSolver.Init {
  73. fw.log.Error("[FtrlWorker-PushParam] Initialize fast ftrl solver error.")
  74. return errors.New("[FtrlWorker-PushParam] Initialize fast ftrl solver error.")
  75. }
  76. for i := 0; i < fw.ParamGroupNum; i++ {
  77. err := param_server.PushParamGroup(fw.NUpdate, fw.ZUpdate, i)
  78. if err != nil {
  79. fw.log.Error(fmt.Sprintf("[FtrlWorker-PushParam] Initialize fast ftrl solver error.", err.Error()))
  80. return errors.New(fmt.Sprintf("[FtrlWorker-PushParam] Initialize fast ftrl solver error.", err.Error()))
  81. }
  82. }
  83. return nil
  84. }
  85. // to do update for all weights.
  86. func (fw *FtrlWorker) Update(
  87. x util.Pvector,
  88. y float64,
  89. param_server *FtrlParamServer) float64 {
  90. if !fw.FtrlSolver.Init {
  91. return 0.
  92. }
  93. var weights util.Pvector = make(util.Pvector, fw.FtrlSolver.Featnum)
  94. var gradients []float64 = make([]float64, fw.FtrlSolver.Featnum)
  95. var wTx float64 = 0.
  96. for i := 0; i < len(x); i++ {
  97. item := x[i]
  98. if util.UtilGreater(fw.FtrlSolver.Dropout, 0.0) {
  99. rand_prob := util.UniformDistribution()
  100. if rand_prob < fw.FtrlSolver.Dropout {
  101. continue
  102. }
  103. }
  104. var idx int = item.Index
  105. if idx >= fw.FtrlSolver.Featnum {
  106. continue
  107. }
  108. var val float64 = fw.FtrlSolver.GetWeight(idx)
  109. weights = append(weights, util.Pair{idx, val})
  110. gradients = append(gradients, item.Value)
  111. wTx += val * item.Value
  112. }
  113. var pred float64 = util.Sigmoid(wTx)
  114. var grad float64 = pred - y
  115. util.VectorMultiplies(gradients, grad)
  116. for k := 0; k < len(weights); k++ {
  117. var i int = weights[k].Index
  118. var g int = i / ParamGroupSize
  119. if fw.ParamGroupStep[g]%fw.FetchStep == 0 {
  120. param_server.FetchParamGroup(
  121. fw.FtrlSolver.N,
  122. fw.FtrlSolver.Z,
  123. g)
  124. }
  125. var w_i float64 = weights[k].Value
  126. var grad_i float64 = gradients[k]
  127. var sigma float64 = (math.Sqrt(fw.FtrlSolver.N[i]+grad_i*grad_i) - math.Sqrt(fw.FtrlSolver.N[i])) / fw.FtrlSolver.Alpha
  128. fw.FtrlSolver.Z[i] += grad_i - sigma*w_i
  129. fw.FtrlSolver.N[i] += grad_i * grad_i
  130. fw.ZUpdate[i] += grad_i - sigma*w_i
  131. fw.NUpdate[i] += grad_i * grad_i
  132. if fw.ParamGroupStep[g]%fw.PushStep == 0 {
  133. param_server.PushParamGroup(fw.NUpdate, fw.ZUpdate, g)
  134. }
  135. fw.ParamGroupStep[g] += 1
  136. }
  137. return pred
  138. }

4.4 二阶优化方法

4.4.1 概览


其中 被称作步长,向量 被称作搜索方向,它一般要求是一个能使目标函数值(最小化问题)下降的方向,即满足:

进一步说, 的通项式有以下形式:


  • 在 Steepest Descent 法中 是一个单位矩阵;
  • 在 Newton 法中, 是一个精确的Hessian 矩阵
  • 在 Quasi-Newton 法中, 是对Hessian矩阵的估计。

这类优化方法大体分两种,要么是先确定优化方向后确定步长(line search),要么是先确定步长后确定优化方向(trust region)。

以常用的line search为例,如何找到较好的步长 呢?好的步长它需要满足以下条件:

  • Armijo 条件
    充分下降条件,即要使步长在非精确一维搜索中能保证目标函数 下降,则它需要满足以下不等式:

    Armijo 条件的几何解释如下:


  • Curvature 条件

  • Wolfe 条件
    步长同时满足Armijo 条件和Curvature 条件则被称为其满足Wolfe 条件。

4.4.2 牛顿法(Newton Method)




但总的来说牛顿法由于需要求解Hessian 矩阵,所以计算代价过大,对问题规模较大的优化问题力不从心。

4.4.3 拟牛顿法(Quasi-Newton Method)

为解决Hessian 矩阵计算代价的问题,想到通过一阶信息去估计它的办法,于是涌现出一类方法,其中最有代表性的是DFP和BFGS(L-BFGS),其原理如下:



5. 深度神经网络

深度学习是基于多层神经网络的一种对数据进行自动表征学习的框架,能使人逐步摆脱传统的人工特征提取过程,它的基础之一是distributed representation,读论文时注意以下概念区分:

  • Distributional representation
    Distributional representation是基于某种分布假设和上下文共现的一类表示方法,比如,对于词的表示来说:有相似意义的词具有相似的分布。
    几类常见的Distributional representation模型:
  • Distributed representation
    Distributed representation是对实体(比如:词、车系编号、微博用户id等等)稠密、低维、实数的向量表示,也就是常说的embedding,它不需要做分布假设,向量的每个维度代表实体在某个空间下的隐含特征。
    几类常见的Distributed representation模型:
    • Collobert and Weston embeddings
    • HLBL embeddings

关于Distributional representation和Distributed representation以及几个相关概念,看论文Word representations:
A simple and general method for semi-supervised learning

5.1 反向传播







5.2 卷积网络结构演化史


5.3 CNN基本原理


5.3.1 Sigmoid激活函数


Logistic函数最早是Pierre François Verhulst在研究人口增长问题时提出的,由于其强悍的普适性(从概率角度的理解见前面对Logistic Regression的讲解)而被广泛应用(在传统机器学习中派生出Logistic Regression),但是实践中,它作为激活函数有两个重要缺点:

  • 梯度消失问题(Vanishing Gradient Problem)
  • 激活输出非0均值问题
    假设一个样本一个样本的学习,当前层输出非0均值信号给下一层神经元时:如果输入值大于0,则后续通过权重计算出的梯度也大于0,反之亦然,这会导致整个网络训练速度变慢,虽然采用batch的方式训练会缓解这个问题,但毕竟在训练中是拖后腿的,所以Yann LeCun在《Efficient BackPro》一文中也提到了解决的trick。

Tanh函数是另外一种Sigmoid函数,它的输出是0均值的,Yann LeCun给出的一种经验激活函数形式为:



5.3.2 输入层


5.3.3 卷积层



其中Complex Conjugate

卷积层的作用:当数据及其周边有局部关联性时可以起到滤波、去噪、找特征的作用;每一个卷积核做特征提取得到结果称为feature map,利用不同卷积核做卷积会得到一系列feature map,这些feature map大小为长深度(卷积核的个数)并作为下一层的输入。

  • 平滑
  • 滤波
  • 投影
    卷积是个内积操作,如果把模板(卷积核)拉直后看做一个基向量,那么滑动窗口每滑动一次就会产生一个向量,把这个向量往基向量上做投影就得到feature map,如果模板有多个,则组成一组基,投影后得到一组feature map。


5.3.4 Zero-Padding


大家如果使用Tenserflow会知道它的padding参数有两个值:SAME,代表做类似上图的Zero padding,使得输入的feature map和输出的feature map有相同的大小;VALID,代表不做padding操作。

5.3.5 采样层(pooling)



另外,如果卷积层的下一层是pooling层,那么每个feature map都会做pooling,与人类行为相比,pooling可以看做是观察图像某个特征区域是否有某种特性,对这个区域而言不关心这个特性具体表现在哪个位置(比如:看一个人脸上某个局部区域是否有个痘痘)。

5.3.6 全连接样层


5.3.7 参数求解



  • 全连接层


  • 卷积层












    假设下采样(pooling)层处于第层且feature map大小为3×3,其下一层为卷积层处于第层且通过两个2×2卷积核得到了两个feature map(蓝色虚框框住的网络结构)。




5.3.8 CNN在NLP领域应用实例

在NLP领域,文本分类是一类常用应用,传统方法是人工提取类似n-gram的各种特征以及各种交叉组合。文本类似图像天然有一种局部相关性,想到利用CNN做一种End to End的分类器,把提特征的工作交给模型。
对于一个句子,它是一维的,无法像图像一样直接处理,因此需要通过distributed representation learning得到词向量,或者在模型第一层增加一个embedding层起到类似作用,这样一个句子就变成二维的了:

1、预先训练好的结果,例如使用已经训练好的word2vec模型,相关资料:Using pre-trained word embeddings in a Keras model

  1. def build_embedding_cnn(max_caption_len, vocab_size):
  2. # 二分类问题
  3. nb_classes = 2
  4. # 词向量维度
  5. word_dim = 256
  6. # 卷积核个数
  7. nb_filters = 64
  8. # 使用max pooling的窗口大小
  9. nb_pool = 2
  10. # 卷积核大小
  11. kernel_size = 5
  12. # 模型结构定义
  13. model = Sequential()
  14. # 第一层是embedding层
  15. model.add(Embedding(output_dim=word_dim, input_dim=vocab_size, input_length=max_caption_len, name='main_input'))
  16. model.add(Dropout(0.5))
  17. # 第二层是激活函数为Relu的卷积层
  18. model.add(Convolution1D(nb_filters, kernel_size))
  19. model.add(Activation('relu'))
  20. # 第三层是max pooling层
  21. model.add(MaxPooling1D(nb_pool))
  22. model.add(Dropout(0.5))
  23. model.add(Flatten())
  24. # 第四层是全连接层
  25. model.add(Dense(256))
  26. model.add(Activation('relu'))
  27. model.add(Dropout(0.3))
  28. # 第五层是输出层
  29. model.add(Dense(nb_classes))
  30. model.add(Activation('softmax'))
  31. # 损失函数采用交叉熵,优化算法采用adadelta
  32. model.compile(loss='categorical_crossentropy',
  33. optimizer='adadelta',
  34. metrics=['accuracy'])
  35. return model



5.4 LeNet-5

最初的网络结构来源于论文:《Gradient-based learning applied to document recognition》(论文里使用原始未做规范化的数据时,INPUT是32×32的),我用以下结构做说明:

LeNet-5一共有8层:1个输入层+3个卷积层(C1、C3、C5)+2个下采样层(S2、S4)+1个全连接层(F6)+1个输出层,每层有多个feature map(自动提取的多组特征)。

5.4.1 输入层


5.4.2 C1卷积层

由6个feature map组成,每个feature map由5×5卷积核生成(feature map中每个神经元与输入层的5×5区域像素相连),考虑每个卷积核的bias,该层需要学习的参数个数为:(5×5+1)×6=156个,神经元连接数为:156×24×24=89856个。

5.4.3 S2下采样层

该层每个feature map一一对应上一层的feature map,由于每个单元的2×2感受野采用不重叠方式移动,所以会产生6个大小为12×12的下采样feature map,如果采用Max Pooling/Mean Pooling,则该层需要学习的参数个数为0个(如果采用非等权下采样——即采样核有权重,则该层需要学习的参数个数为:(2×2+1)×6=30个),神经元连接数为:30×12×12=4320个。

5.4.4 C3卷积层

这层略微复杂,S2神经元与C3是多对多的关系,比如最简单方式:用S2的所有feature map与C3的所有feature map做全连接(也可以对S2抽样几个feature map出来与C3某个feature map连接),这种全连接方式下:6个S2的feature map使用6个独立的5×5卷积核得到C3中1个feature map(生成每个feature map时对应一个bias),C3中共有16个feature map,所以该层需要学习的参数个数为:(5×5×6+1)×16=2416个,神经元连接数为:2416×8×8=154624个。

5.4.5 S4下采样层

同S2,如果采用Max Pooling/Mean Pooling,则该层需要学习的参数个数为0个,神经元连接数为:(2×2+1)×16×4×4=1280个。

5.4.6 C5卷积层

类似C3,用S4的所有feature map与C5的所有feature map做全连接,这种全连接方式下:16个S4的feature map使用16个独立的1×1卷积核得到C5中1个feature map(生成每个feature map时对应一个bias),C5中共有120个feature map,所以该层需要学习的参数个数为:(1×1×16+1)×120=2040个,神经元连接数为:2040个。

5.4.7 F6全连接层


5.4.8 输出层


Minist(Modified NIST)数据集下使用LeNet-5的训练可视化


5.4.9 LeNet-5代码实践

  1. import copy
  2. import numpy as np
  3. import pandas as pd
  4. import matplotlib
  5. matplotlib.use("Agg")
  6. import matplotlib.pyplot as plt
  7. from matplotlib.pyplot import plot,savefig
  8. from keras.datasets import mnist, cifar10
  9. from keras.models import Sequential, Graph
  10. from keras.layers.core import Dense, Dropout, Activation, Flatten, Reshape
  11. from keras.optimizers import SGD, RMSprop
  12. from keras.utils import np_utils
  13. from keras.regularizers import l2
  14. from keras.layers.convolutional import Convolution2D, MaxPooling2D, ZeroPadding2D, AveragePooling2D
  15. from keras.callbacks import EarlyStopping
  16. from keras.preprocessing.image import ImageDataGenerator
  17. from keras.layers.normalization import BatchNormalization
  18. import tensorflow as tf
  19. tf.python.control_flow_ops = tf
  20. from PIL import Image
  21. def build_LeNet5():
  22. model = Sequential()
  23. model.add(Convolution2D(6, 5, 5, border_mode='valid', input_shape = (28, 28, 1), dim_ordering='tf'))
  24. model.add(MaxPooling2D(pool_size=(2, 2)))
  25. model.add(Activation("relu"))
  26. model.add(Convolution2D(16, 5, 5, border_mode='valid'))
  27. model.add(MaxPooling2D(pool_size=(2, 2)))
  28. model.add(Activation("relu"))
  29. model.add(Convolution2D(120, 1, 1, border_mode='valid'))
  30. model.add(Flatten())
  31. model.add(Dense(84))
  32. model.add(Activation("sigmoid"))
  33. model.add(Dense(10))
  34. model.add(Activation('softmax'))
  35. return model
  36. if __name__=="__main__":
  37. from keras.utils.visualize_util import plot
  38. model = build_LeNet5()
  39. model.summary()
  40. plot(model, to_file="LeNet-5.png", show_shapes=True)
  41. (X_train, y_train), (X_test, y_test) = mnist.load_data()
  42. X_train = X_train.reshape(X_train.shape[0], 28, 28, 1).astype('float32') / 255
  43. X_test = X_test.reshape(X_test.shape[0], 28, 28, 1).astype('float32') / 255
  44. Y_train = np_utils.to_categorical(y_train, 10)
  45. Y_test = np_utils.to_categorical(y_test, 10)
  46. # training
  47. model.compile(loss='categorical_crossentropy',
  48. optimizer='adadelta',
  49. metrics=['accuracy'])
  50. batch_size = 128
  51. nb_epoch = 1
  52. model.fit(X_train, Y_train, batch_size=batch_size, nb_epoch=nb_epoch,
  53. verbose=1, validation_data=(X_test, Y_test))
  54. score = model.evaluate(X_test, Y_test, verbose=0)
  55. print('Test score:', score[0])
  56. print('Test accuracy:', score[1])
  57. y_hat = model.predict_classes(X_test)
  58. test_wrong = [im for im in zip(X_test,y_hat,y_test) if im[1] != im[2]]
  59. plt.figure(figsize=(10, 10))
  60. for ind, val in enumerate(test_wrong[:100]):
  61. plt.subplots_adjust(left=0, right=1, bottom=0, top=1)
  62. plt.subplot(10, 10, ind + 1)
  63. im = 1 - val[0].reshape((28,28))
  64. plt.axis("off")
  65. plt.text(0, 0, val[2], fontsize=14, color='blue')
  66. plt.text(8, 0, val[1], fontsize=14, color='red')
  67. plt.imshow(im, cmap='gray')
  68. savefig('error.jpg')

5.5 AlexNet

AlexNet在ILSVRC-2012的比赛中获得top5错误率15.3%的突破(第二名为26.2%),其原理来源于2012年Alex的论文《ImageNet Classification with Deep Convolutional Neural Networks》,这篇论文是深度学习火爆发展的一个里程碑和分水岭,加上硬件技术的发展,深度学习还会继续火下去。

5.5.1 网络结构分析

由于受限于当时的硬件设备,AlexNet在GPU粒度都做了设计,当时的GTX 580只有3G显存,为了能让模型在大量数据上跑起来,作者使用了两个GPU并行,并对网络结构做了切分,如下:


5.5.2 ReLu激活函数

AlexNet引入了ReLU激活函数,这个函数是神经科学家Dayan、Abott在《Theoretical Neuroscience》一书中提出的更精确的激活模型:


详情请阅读书中2.2 Estimating Firing Rates这一节。新激活模型的特点是:

  • 激活稀疏性(小于1时为0)
  • 单边抑制(不像Sigmoid是双边的)
  • 宽兴奋边界,非饱和性(ReLU导数始终为1),很大程度缓解了梯度消失问题

1、 原始ReLu
在这些前人研究的基础上(可参见 Hinton论文:《Rectified Linear Units Improve Restricted Boltzmann Machines》),类似Eq.2.9的新激活函数被引入:


  • 在原点不可微
    反向传播的梯度计算中会带来麻烦,所以Charles Dugas等人又提出Softplus来模拟上述ReLu函数(可视作其平滑版):

  • 过稀疏性

2、 Leaky ReLu

为了解决上述过稀疏性导致的大量神经元不被激活的问题,Leaky ReLu被提了出来:


3、Parametric ReLu
上述值是可以不通过人为指定而学习出的,于是Parametric ReLu被提了出来:


详情请阅读Kaiming He等人的《Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification》论文。

4、Randomized ReLu
Randomized ReLu 可以看做是leaky ReLu的随机版本,原理是:假设



5.5.3 Local Response Normalization

LRN利用相邻feature map做特征显著化,文中实验表明可以降低错误率,公式如下:




5.5.4 Overlapping Pooling


5.5.5 Dropout






5.5.6 数据扩充

5.5.7 多GPU训练

作者使用GTX 580来加速训练,但受限于当时硬件设备的发展,作者需要对网络结构做精细化设计,甚至需要考虑两块GPU之间如何及何时通信,现在的我们比较幸福,基本不用考虑这些。

5.5.8 AlexNet代码实践



  1. # -*- coding: utf-8 -*-
  2. import copy
  3. import numpy as np
  4. import pandas as pd
  5. import matplotlib
  6. matplotlib.use("Agg")
  7. import matplotlib.pyplot as plt
  8. import os
  9. from matplotlib.pyplot import plot,savefig
  10. from scipy.misc import toimage
  11. from keras.datasets import cifar10,mnist
  12. from keras.models import Sequential, Graph
  13. from keras.layers.core import Dense, Dropout, Activation, Flatten, Reshape
  14. from keras.optimizers import SGD, RMSprop
  15. from keras.utils import np_utils
  16. from keras.regularizers import l2
  17. from keras.layers.convolutional import Convolution2D, MaxPooling2D, ZeroPadding2D, AveragePooling2D
  18. from keras.callbacks import EarlyStopping
  19. from keras.preprocessing.image import ImageDataGenerator
  20. from keras.layers.normalization import BatchNormalization
  21. from keras.callbacks import ModelCheckpoint
  22. from keras import backend as K
  23. import tensorflow as tf
  24. tf.python.control_flow_ops = tf
  25. from PIL import Image
  26. def data_visualize(x, y, num):
  27. plt.figure()
  28. for i in range(0, num*num):
  29. axes=plt.subplot(num,num,i + 1)
  30. axes.set_title("label=" + str(y[i]))
  31. axes.set_xticks([0,10,20,30])
  32. axes.set_yticks([0,10,20,30])
  33. plt.imshow(toimage(x[i]))
  34. plt.tight_layout()
  35. plt.savefig('sample.jpg')
  36. #以下结构统一忽略LRN层
  37. def build_AlexNet(s):
  38. model = Sequential()
  39. #第一层,卷积层 + max pooling
  40. model.add(Convolution2D(96, 11, 11, border_mode='same', input_shape = s))
  41. model.add(Activation("relu"))
  42. model.add(MaxPooling2D(pool_size=(2, 2)))
  43. #第二层,卷积层 + max pooling
  44. model.add(Convolution2D(256, 5, 5, border_mode='same', activation='relu'))
  45. model.add(MaxPooling2D(pool_size=(2, 2)))
  46. #第三层,卷积层
  47. model.add(ZeroPadding2D((1,1)))
  48. model.add(Convolution2D(512, 3, 3, border_mode='same', activation='relu'))
  49. #第四层,卷积层
  50. model.add(ZeroPadding2D((1,1)))
  51. model.add(Convolution2D(1024, 3, 3, border_mode='same', activation='relu'))
  52. #第五层,卷积层
  53. model.add(ZeroPadding2D((1,1)))
  54. model.add(Convolution2D(1024, 3, 3, border_mode='same', activation='relu'))
  55. model.add(MaxPooling2D(pool_size=(2, 2)))
  56. model.add(Flatten())
  57. #第六层,全连接层
  58. model.add(Dense(3072, activation='relu'))
  59. model.add(Dropout(0.5))
  60. #第七层,全连接层
  61. model.add(Dense(4096, activation='relu'))
  62. model.add(Dropout(0.5))
  63. #第八层, 输出层
  64. model.add(Dense(10))
  65. model.add(Activation('softmax'))
  66. return model
  67. if __name__=="__main__":
  68. from keras.utils.visualize_util import plot
  69. //使用第三个GPU
  70. with tf.device('/gpu:3'):
  71. gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=1, allow_growth=True)
  72. //只有卡3可见防止tensorflow占用所有卡
  73. os.environ["CUDA_VISIBLE_DEVICES"]="3"
  74. tf.Session(config=K.tf.ConfigProto(allow_soft_placement=True,
  75. log_device_placement=True,
  76. gpu_options=gpu_options))
  77. (X_train, y_train), (X_test, y_test) = cifar10.load_data()
  78. data_visualize(X_train, y_train, 4)
  79. s = X_train.shape[1:]
  80. model = build_AlexNet(s)
  81. model.summary()
  82. plot(model, to_file="AlexNet.jpg", show_shapes=True)
  83. #定义输入数据并做归一化
  84. dim = 32
  85. channel = 3
  86. class_num = 10
  87. X_train = X_train.reshape(X_train.shape[0], dim, dim, channel).astype('float32') / 255
  88. X_test = X_test.reshape(X_test.shape[0], dim, dim, channel).astype('float32') / 255
  89. Y_train = np_utils.to_categorical(y_train, class_num)
  90. Y_test = np_utils.to_categorical(y_test, class_num)
  91. #预处理与数据扩充
  92. datagen = ImageDataGenerator(
  93. featurewise_center=False,
  94. samplewise_center=False,
  95. featurewise_std_normalization=False,
  96. samplewise_std_normalization=False,
  97. zca_whitening=False,
  98. rotation_range=25,
  99. width_shift_range=0.1,
  100. height_shift_range=0.1,
  101. horizontal_flip=False,
  102. vertical_flip=False)
  103. datagen.fit(X_train)
  104. model.compile(loss='categorical_crossentropy',
  105. optimizer='adadelta',
  106. metrics=['accuracy'])
  107. batch_size = 32
  108. nb_epoch = 10
  109. #import pdb
  110. #pdb.set_trace()
  111. ModelCheckpoint("weights-improvement-{epoch:02d}-{val_acc:.2f}.hdf5", monitor='val_loss', verbose=0, save_best_only=True, save_weights_only=False, mode='auto')
  112. model.fit(X_train, Y_train, batch_size=batch_size, nb_epoch=nb_epoch,
  113. verbose=1, validation_data=(X_test, Y_test))
  114. score = model.evaluate(X_test, Y_test, verbose=0)
  115. print('Test score:', score[0])
  116. print('Test accuracy:', score[1])
  117. y_hat = model.predict_classes(X_test)
  118. test_wrong = [im for im in zip(X_test,y_hat,y_test) if im[1] != im[2]]
  119. plt.figure(figsize=(10, 10))
  120. for ind, val in enumerate(test_wrong[:100]):
  121. plt.subplots_adjust(left=0, right=1, bottom=0, top=1)
  122. plt.subplot(10, 10, ind + 1)
  123. plt.axis("off")
  124. plt.text(0, 0, val[2][0], fontsize=14, color='blue')
  125. plt.text(8, 0, val[1], fontsize=14, color='red')
  126. plt.imshow(toimage(val[0]))
  127. savefig('Wrong.jpg')


5.6 VGG

在论文《Very Deep Convolutional Networks for Large-Scale Image Recognition》中提出,通过缩小卷积核大小来构建更深的网络。

5.6.1 网络结构


5.6.2 VGG代码实践

  1. # -*- coding: utf-8 -*-
  2. import copy
  3. import numpy as np
  4. import pandas as pd
  5. import matplotlib
  6. matplotlib.use("Agg")
  7. import matplotlib.pyplot as plt
  8. import os
  9. from matplotlib.pyplot import plot,savefig
  10. from scipy.misc import toimage
  11. from keras.datasets import cifar100,mnist
  12. from keras.models import Sequential, Graph
  13. from keras.layers.core import Dense, Dropout, Activation, Flatten, Reshape
  14. from keras.optimizers import SGD, RMSprop
  15. from keras.utils import np_utils
  16. from keras.regularizers import l2
  17. from keras.layers.convolutional import Convolution2D, MaxPooling2D, ZeroPadding2D, AveragePooling2D
  18. from keras.callbacks import EarlyStopping
  19. from keras.preprocessing.image import ImageDataGenerator
  20. from keras.layers.normalization import BatchNormalization
  21. from keras.callbacks import ModelCheckpoint
  22. from keras import backend as K
  23. import tensorflow as tf
  24. tf.python.control_flow_ops = tf
  25. from PIL import Image
  26. def data_visualize(x, y, num):
  27. plt.figure()
  28. for i in range(0, num*num):
  29. axes=plt.subplot(num,num,i + 1)
  30. axes.set_title("label=" + str(y[i]))
  31. axes.set_xticks([0,10,20,30])
  32. axes.set_yticks([0,10,20,30])
  33. plt.imshow(toimage(x[i]))
  34. plt.tight_layout()
  35. plt.savefig('sample.jpg')
  36. def build_VGG_16(s):
  37. model = Sequential()
  38. fm = 3
  39. model.add(ZeroPadding2D((1,1),input_shape=s))
  40. model.add(Convolution2D(64, fm, fm, activation='relu'))
  41. model.add(ZeroPadding2D((1,1)))
  42. model.add(Convolution2D(64, fm, fm, activation='relu'))
  43. model.add(MaxPooling2D((2,2), strides=(2,2)))
  44. model.add(ZeroPadding2D((1,1)))
  45. model.add(Convolution2D(128, fm, fm, activation='relu'))
  46. model.add(ZeroPadding2D((1,1)))
  47. model.add(Convolution2D(128, fm, fm, activation='relu'))
  48. model.add(MaxPooling2D((2,2), strides=(2,2)))
  49. model.add(ZeroPadding2D((1,1)))
  50. model.add(Convolution2D(256, fm, fm, activation='relu'))
  51. model.add(ZeroPadding2D((1,1)))
  52. model.add(Convolution2D(256, fm, fm, activation='relu'))
  53. model.add(ZeroPadding2D((1,1)))
  54. model.add(Convolution2D(256, fm, fm, activation='relu'))
  55. model.add(MaxPooling2D((2,2), strides=(2,2)))
  56. model.add(ZeroPadding2D((1,1)))
  57. model.add(Convolution2D(512, fm, fm, activation='relu'))
  58. model.add(ZeroPadding2D((1,1)))
  59. model.add(Convolution2D(512, fm, fm, activation='relu'))
  60. model.add(ZeroPadding2D((1,1)))
  61. model.add(Convolution2D(512, fm, fm, activation='relu'))
  62. model.add(MaxPooling2D((2,2), strides=(2,2)))
  63. model.add(ZeroPadding2D((1,1)))
  64. model.add(Convolution2D(512, fm, fm, activation='relu'))
  65. model.add(ZeroPadding2D((1,1)))
  66. model.add(Convolution2D(512, fm, fm, activation='relu'))
  67. model.add(ZeroPadding2D((1,1)))
  68. model.add(Convolution2D(512, fm, fm, activation='relu'))
  69. model.add(MaxPooling2D((2,2), strides=(2,2)))
  70. model.add(Flatten())
  71. model.add(Dense(4096, activation='relu'))
  72. model.add(Dropout(0.5))
  73. model.add(Dense(4096, activation='relu'))
  74. model.add(Dropout(0.5))
  75. model.add(Dense(100, activation='softmax'))
  76. return model
  77. def build_VGG_19(s):
  78. model = Sequential()
  79. fm = 3
  80. model.add(ZeroPadding2D((1,1),input_shape=s))
  81. model.add(Convolution2D(64, fm, fm, activation='relu'))
  82. model.add(ZeroPadding2D((1,1)))
  83. model.add(Convolution2D(64, fm, fm, activation='relu'))
  84. model.add(MaxPooling2D((2,2), strides=(2,2)))
  85. model.add(ZeroPadding2D((1,1)))
  86. model.add(Convolution2D(128, fm, fm, activation='relu'))
  87. model.add(ZeroPadding2D((1,1)))
  88. model.add(Convolution2D(128, fm, fm, activation='relu'))
  89. model.add(MaxPooling2D((2,2), strides=(2,2)))
  90. model.add(ZeroPadding2D((1,1)))
  91. model.add(Convolution2D(256, fm, fm, activation='relu'))
  92. model.add(ZeroPadding2D((1,1)))
  93. model.add(Convolution2D(256, fm, fm, activation='relu'))
  94. model.add(ZeroPadding2D((1,1)))
  95. model.add(Convolution2D(256, fm, fm, activation='relu'))
  96. model.add(ZeroPadding2D((1,1)))
  97. model.add(Convolution2D(256, fm, fm, activation='relu'))
  98. model.add(MaxPooling2D((2,2), strides=(2,2)))
  99. model.add(ZeroPadding2D((1,1)))
  100. model.add(Convolution2D(512, fm, fm, activation='relu'))
  101. model.add(ZeroPadding2D((1,1)))
  102. model.add(Convolution2D(512, fm, fm, activation='relu'))
  103. model.add(ZeroPadding2D((1,1)))
  104. model.add(Convolution2D(512, fm, fm, activation='relu'))
  105. model.add(ZeroPadding2D((1,1)))
  106. model.add(Convolution2D(512, fm, fm, activation='relu'))
  107. model.add(MaxPooling2D((2,2), strides=(2,2)))
  108. model.add(ZeroPadding2D((1,1)))
  109. model.add(Convolution2D(512, fm, fm, activation='relu'))
  110. model.add(ZeroPadding2D((1,1)))
  111. model.add(Convolution2D(512, fm, fm, activation='relu'))
  112. model.add(ZeroPadding2D((1,1)))
  113. model.add(Convolution2D(512, fm, fm, activation='relu'))
  114. model.add(ZeroPadding2D((1,1)))
  115. model.add(Convolution2D(512, fm, fm, activation='relu'))
  116. model.add(MaxPooling2D((2,2), strides=(2,2)))
  117. model.add(Flatten())
  118. model.add(Dense(4096, activation='relu'))
  119. model.add(Dropout(0.5))
  120. model.add(Dense(4096, activation='relu'))
  121. model.add(Dropout(0.5))
  122. model.add(Dense(100, activation='softmax'))
  123. return model
  124. if __name__=="__main__":
  125. from keras.utils.visualize_util import plot
  126. with tf.device('/gpu:2'):
  127. gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=1, allow_growth=True)
  128. os.environ["CUDA_VISIBLE_DEVICES"]="2"
  129. tf.Session(config=K.tf.ConfigProto(allow_soft_placement=True,
  130. log_device_placement=True,
  131. gpu_options=gpu_options))
  132. (X_train, y_train), (X_test, y_test) = cifar100.load_data()
  133. data_visualize(X_train, y_train, 4)
  134. s = X_train.shape[1:]
  135. print (s)
  136. model = build_VGG_16(s) #build_VGG_19(s)
  137. model.summary()
  138. plot(model, to_file="VGG.jpg", show_shapes=True)
  139. #定义输入数据并做归一化
  140. dim = 32
  141. channel = 3
  142. class_num = 100
  143. X_train = X_train.reshape(X_train.shape[0], dim, dim, channel).astype('float32') / 255
  144. X_test = X_test.reshape(X_test.shape[0], dim, dim, channel).astype('float32') / 255
  145. Y_train = np_utils.to_categorical(y_train, class_num)
  146. Y_test = np_utils.to_categorical(y_test, class_num)
  147. # this will do preprocessing and realtime data augmentation
  148. datagen = ImageDataGenerator(
  149. featurewise_center=False, # set input mean to 0 over the dataset
  150. samplewise_center=False, # set each sample mean to 0
  151. featurewise_std_normalization=False, # divide inputs by std of the dataset
  152. samplewise_std_normalization=False, # divide each input by its std
  153. zca_whitening=False, # apply ZCA whitening
  154. rotation_range=25, # randomly rotate images in the range (degrees, 0 to 180)
  155. width_shift_range=0.1, # randomly shift images horizontally (fraction of total width)
  156. height_shift_range=0.1, # randomly shift images vertically (fraction of total height)
  157. horizontal_flip=False, # randomly flip images
  158. vertical_flip=False) # randomly flip images
  159. datagen.fit(X_train)
  160. # training
  161. model.compile(loss='categorical_crossentropy',
  162. optimizer='adadelta',
  163. metrics=['accuracy'])
  164. batch_size = 32
  165. nb_epoch = 10
  166. #import pdb
  167. #pdb.set_trace()
  168. ModelCheckpoint("weights-improvement-{epoch:02d}-{val_acc:.2f}.hdf5", monitor='val_loss', verbose=0, save_best_only=False, save_weights_only=False, mode='auto')
  169. model.fit(X_train, Y_train, batch_size=batch_size, nb_epoch=nb_epoch,verbose=1, validation_data=(X_test, Y_test))
  170. score = model.evaluate(X_test, Y_test, verbose=0)
  171. print('Test score:', score[0])
  172. print('Test accuracy:', score[1])

5.7 MSRANet


5.7.1 PReLU


定义Parametric Rectifiers如下:



详情请阅读Kaiming He等人的《Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification》论文。

5.8 Highway Networks

Highway Networks在我看来是一种承上启下的结构,来源于论文《Highway Networks》借鉴了类似LSTM(后面会介绍)中门(gate)的思想,结构很通用(太通用的结构不一定是件好事儿),给出了一种建立更深网络的思路:

任何一层或几层都可以通过上述方式构建Block,公式中叫做transform gate,叫做carry gate,一般简单起见可以让,显然公式中需要有相同的维度(比如,可以通过zero-padding或者做映射),通过这种结构可以把网络做到很深(比如100层以上),并且优化没有那么困难,看着似乎提供了解决“深”网络学习问题的方案(下一节会解释“似乎”这个词)。

5.9 Residual Networks

残差网络在《Deep Residual Learning for Image Recognition》中被第一次提出,作者利用它在ILSVRC 2015的ImageNet 分类、检测、定位任务以及COCO 2015的检测、图像分割任务上均拿到第一名,也证明ResNet是比较通用的框架。

5.9.1 ResNet产生的动机


图中可以看到在CIFAR-10数据集上,20层网络在训练集和测试集上的表现都明显好于56层网络,这显然不是过拟合导致的,这个现象也不符合我们的直观映像:按理说多增加一层的模型效果应该好于未增加时的模型,最起码不应该变差(比如直接做恒等映射),于是作者提出原始的残差学习框架(也可以看成是Highway Networks在T=0.5时的特例):

与Highway Networks相比:
- HN的transform gate和carry

5.9.2 恒等映射

恒等映射在深度残差网络中究竟扮演什么角色呢?在《Identity Mappings in Deep Residual Networks》中作者做了分析,并提出新的残差block结构,将都改为恒等映射,通过这个变化使得信号在前向和反向传播中都有“干净”的路径(图中灰色部分),a为原始block结构,b为新的结构。。



其中为Batch Normalization。







5.9.3 模型集成角度看残差网络

Residual Networks Behave Like Ensembles of Relatively Shallow Networks》中把残差网络做展开,其实会发现以下关系:





5.9.4 残差网络中的短路径





5.9.5 代码实践

下面我们实现在《Deep Residual Learning for Image Recognition》中提到的ResNet-34,并演示在CIFAR-10下的训练效果。

  1. # -*- coding: utf-8 -*-
  2. from keras import backend as K
  3. from keras.layers.merge import add
  4. from keras.layers import Input, Activation, Dense, Flatten
  5. from keras.layers.convolutional import Conv2D, MaxPooling2D, AveragePooling2D
  6. from keras.layers.normalization import BatchNormalization
  7. from keras.regularizers import l1_l2
  8. from keras.models import Model
  9. class ResNet(object):
  10. '''残差网络基本模块定义'''
  11. name = 'resnet'
  12. def __init__(self, n):
  13. self.name = n
  14. def bn_relu(self, input):
  15. '''构建propoesd残差block中BN与ReLU子结构,针对tensorflow'''
  16. normalize = BatchNormalization(axis=3)(input)
  17. return Activation("relu")(normalize)
  18. def bn_relu_weight(self, filters, kernel_size, strides):
  19. '''构建propoesd残差block中BN->ReLu->Weight的子结构'''
  20. def inner_func(input):
  21. act = self.bn_relu(input)
  22. conv = Conv2D(filters=filters,
  23. kernel_size=kernel_size,
  24. strides=strides,
  25. padding='same',
  26. kernel_initializer='he_normal',
  27. kernel_regularizer=l1_l2(0.0001))(act)
  28. return conv
  29. return inner_func
  30. def weight_bn_relu(self, filters, kernel_size, strides):
  31. '''构建propoesd残差block中BN->ReLu->Weight的子结构'''
  32. def inner_func(input):
  33. return self.bn_relu(Conv2D(filters=filters,
  34. kernel_size=kernel_size,
  35. strides=strides,
  36. padding='same',
  37. kernel_initializer='he_normal',
  38. kernel_regularizer=l1_l2(0.0001))(input))
  39. return inner_func
  40. def shortcut(self, left, right):
  41. '''构建propoesd残差block中恒等映射的子结构,分两种情况,输入、输出维度一致&维度不一致'''
  42. left_shape = K.int_shape(left)
  43. right_shape = K.int_shape(right)
  44. stride_width = int(round(left_shape[1] / right_shape[1]))
  45. stride_height = int(round(left_shape[2] / right_shape[2]))
  46. equal_channels = left_shape[3] == right_shape[3]
  47. x_l = left
  48. # 如果输入输出维度不一致需要通过映射变一致,否则一致则返回单位矩阵,这个映射发生在两个不同维度block之间(论文中虚线部分)
  49. if left_shape != right_shape:
  50. x_l = Conv2D(filters=right_shape[3],
  51. kernel_size=(1, 1),
  52. strides=(int(round(left_shape[1] / right_shape[1])),
  53. int(round(left_shape[2] / right_shape[2]))),
  54. padding="valid",
  55. kernel_initializer="he_normal",
  56. kernel_regularizer=l1_l2(0.01, 0.0001))(left)
  57. x_l_1 = add([x_l, right])
  58. return x_l_1
  59. def basic_block(self, filters, strides=(1, 1), is_first_block=False):
  60. """34层以内的残差网络使用的block,2层一跨"""
  61. def inner_func(input):
  62. # 恒等映射
  63. if not is_first_block:
  64. conv1 = self.bn_relu_weight(filters=filters,
  65. kernel_size=(3, 3),
  66. strides=strides)(input)
  67. else:
  68. conv1 = Conv2D(filters=filters, kernel_size=(3, 3),
  69. strides=strides,
  70. padding="same",
  71. kernel_initializer="he_normal",
  72. kernel_regularizer=l1_l2(0.01, 0.0001))(input)
  73. # 残差网络
  74. residual = self.bn_relu_weight(filters=filters,
  75. kernel_size=(3, 3), strides=(1, 1))(conv1)
  76. # 构建一个两层的残差block
  77. return self.shortcut(input, residual)
  78. return inner_func
  79. def residual_block(self, block_func, filters, repeat_times, is_first_block):
  80. '''构建多层残差block'''
  81. def inner_func(input):
  82. for i in range(repeat_times):
  83. # 第一个block的第一层,其输入为pooling层
  84. if is_first_block:
  85. strides = (1, 1)
  86. else:
  87. if i == 0: # 每个残差block的第一层
  88. strides = (2, 2)
  89. else: # 每个残差block的非第一层
  90. strides = (1, 1)
  91. flag = i == 0 and is_first_block
  92. input = block_func(filters=filters,
  93. strides=strides,
  94. is_first_block=flag)(input)
  95. return input
  96. return inner_func
  97. def residual_builder(self, input_shape, softmax_num, func_type, repeat_times):
  98. '''指定输入、输出、残差block的类型、网络深度并构建残差网络'''
  99. input = Input(shape=input_shape)
  100. # 第一层为卷积层
  101. conv1 = self.weight_bn_relu(filters=64, kernel_size=(7, 7), strides=(2, 2))(input)
  102. # 第二层为max pooling层
  103. pool1 = MaxPooling2D(pool_size=(3, 3), strides=(2, 2), padding="same")(conv1)
  104. residual_block = pool1
  105. filters = 64
  106. # 接着16个残差block
  107. for i, r in enumerate(repeat_times):
  108. if i == 0:
  109. residual_block = self.residual_block(func_type,
  110. filters=filters,
  111. repeat_times=r,
  112. is_first_block=True)(residual_block)
  113. else:
  114. residual_block = self.residual_block(func_type,
  115. filters=filters,
  116. repeat_times=r,
  117. is_first_block=False)(residual_block)
  118. filters *= 2
  119. residual_block = self.bn_relu(residual_block)
  120. shape = K.int_shape(residual_block)
  121. # average pooling层
  122. pool2 = AveragePooling2D(pool_size=(shape[1], shape[2]),
  123. strides=(1, 1))(residual_block)
  124. flatten1 = Flatten()(pool2)
  125. # 全连接层
  126. dense1 = Dense(units=softmax_num,
  127. kernel_initializer="he_normal",
  128. activation="softmax")(flatten1)
  129. return Model(inputs=input, outputs=dense1)
  1. # -*- coding: utf-8 -*-
  2. import numpy as np
  3. import matplotlib
  4. import resnet
  5. matplotlib.use("Agg")
  6. import matplotlib.pyplot as plt
  7. import os
  8. from scipy.misc import toimage
  9. from keras.datasets import cifar10
  10. from keras.utils import np_utils
  11. from keras.preprocessing.image import ImageDataGenerator
  12. from keras.callbacks import ModelCheckpoint
  13. from keras import backend as K
  14. import tensorflow as tf
  15. tf.python.control_flow_ops = tf
  16. from keras.callbacks import ReduceLROnPlateau, CSVLogger, EarlyStopping
  17. lr_reducer = ReduceLROnPlateau(monitor='val_loss', factor=np.sqrt(0.5), cooldown=0, patience=3, min_lr=1e-6)
  18. early_stopper = EarlyStopping(monitor='val_acc', min_delta=0.0005, patience=15)
  19. csv_logger = CSVLogger('resnet34_cifar10.csv')
  20. def data_visualize(x, y, num):
  21. plt.figure()
  22. for i in range(0, num * num):
  23. axes = plt.subplot(num, num, i + 1)
  24. axes.set_title("label=" + str(y[i]))
  25. axes.set_xticks([0, 10, 20, 30])
  26. axes.set_yticks([0, 10, 20, 30])
  27. plt.imshow(toimage(x[i]))
  28. plt.tight_layout()
  29. plt.savefig('sample.jpg')
  30. if __name__ == "__main__":
  31. from keras.utils.vis_utils import plot_model
  32. with tf.device('/gpu:3'):
  33. gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=1, allow_growth=True)
  34. os.environ["CUDA_VISIBLE_DEVICES"] = "3"
  35. tf.Session(config=K.tf.ConfigProto(allow_soft_placement=True,
  36. log_device_placement=True,
  37. gpu_options=gpu_options))
  38. (X_train, y_train), (X_test, y_test) = cifar10.load_data()
  39. data_visualize(X_train, y_train, 4)
  40. # 定义输入数据并做归一化
  41. dim = 32
  42. channel = 3
  43. class_num = 10
  44. X_train = X_train.reshape(X_train.shape[0], dim, dim, channel).astype('float32') / 255
  45. X_test = X_test.reshape(X_test.shape[0], dim, dim, channel).astype('float32') / 255
  46. Y_train = np_utils.to_categorical(y_train, class_num)
  47. Y_test = np_utils.to_categorical(y_test, class_num)
  48. # this will do preprocessing and realtime data augmentation
  49. datagen = ImageDataGenerator(
  50. featurewise_center=False, # set input mean to 0 over the dataset
  51. samplewise_center=False, # set each sample mean to 0
  52. featurewise_std_normalization=False, # divide inputs by std of the dataset
  53. samplewise_std_normalization=False, # divide each input by its std
  54. zca_whitening=False, # apply ZCA whitening
  55. rotation_range=25, # randomly rotate images in the range (degrees, 0 to 180)
  56. width_shift_range=0.1, # randomly shift images horizontally (fraction of total width)
  57. height_shift_range=0.1, # randomly shift images vertically (fraction of total height)
  58. horizontal_flip=True, # randomly flip images
  59. vertical_flip=False) # randomly flip images
  60. datagen.fit(X_train)
  61. s = X_train.shape[1:]
  62. print(s)
  63. builder = resnet.ResNet("ResNet-test")
  64. resnet_34 = builder.residual_builder(s, class_num, builder.basic_block, [3, 4, 6, 3])
  65. model = resnet_34
  66. model.summary()
  67. #import pdb
  68. #pdb.set_trace()
  69. plot_model(model, to_file="ResNet.jpg", show_shapes=True)
  70. model.compile(loss='categorical_crossentropy',
  71. optimizer='adadelta',
  72. metrics=['accuracy'])
  73. batch_size = 32
  74. nb_epoch = 100
  75. # import pdb
  76. # pdb.set_trace()
  77. ModelCheckpoint("weights-improvement-{epoch:02d}-{val_acc:.2f}.hdf5", monitor='val_loss', verbose=0,
  78. save_best_only=False, save_weights_only=False, mode='auto')
  79. model.fit_generator(datagen.flow(X_train, Y_train, batch_size=batch_size),
  80. steps_per_epoch=X_train.shape[0],
  81. validation_data=(X_test, Y_test),
  82. epochs=nb_epoch,
  83. verbose=1,
  84. max_q_size=100,
  85. callbacks=[lr_reducer, early_stopper, csv_logger])
  86. score = model.evaluate(X_test, Y_test, verbose=0)
  87. print('Test score:', score[0])
  88. print('Test accuracy:', score[1])




5.10 Maxout Networks

Goodfellow等人在《Maxout Networks》一文中提出,这篇论文值得一看。

5.10.1 Maxout激活函数




实际上图所示的单个Maxout 单元本质是一个分段线性函数,而任意凸函数都可以通过分段线性函数来拟合,这个可以很直观的理解,以抛物线为例:每个节点都是一个线性函数,上图~节点输出对应下图~线段:

从全局上看,ReLU可以看做Maxout的一种特例,Maxout通过网络自动学习激活函数(从这个角度看Maxout也可以看做某种Network-In-Network结构),不对做限制,只要两个Maxout 单元就能拟合任意连续函数,关于这部分论文中有更详细的证明,这里不再赘述,实际上它与Dropout配合效果更好,这里可以回想下核方法(Kernel Method),核方法采用非线性核(如高斯核)也会有类似通过局部线性拟合来模拟非线性行为,但传统核方法会事先指定核函数(如高斯函数),而不是数据驱动的方式算出来,当然也有kernel组合方面的研究,但在我看来最终和神经网络殊途同归,其实都可以在神经网络的大框架下去思考(回想前面的SVM与神经网络的关系)。

5.11 Network in Network

NIN的思想来源于《Network In Network》,其亮点有2个方面:将传统卷积层替换为非线性卷积层以提升特征抽象能力;使用新的pooling层代替传统全连接层,后续出现的各个版本GoogLeNet也很大程度借鉴了这个思想。

5.11.1 NIN卷积层(MLP Convolution)


  • MLP能拟合任意函数,不需要做先验假设(如:线性可分、凸集);
  • MLP与卷积神经网络结构天然兼容,可以通过BP方便的做训练;
  • MLP本身也能做的较深,且特征能够得到复用;
  • 通过MLP做卷积可以起到feature map级联交叉加权组合的作用,能提升特征抽象能力:



5.11.2 NIN抽样层(Global Average Pooling)



5.12 GoogLeNet Inception V1

GoogLeNet是由google的Christian Szegedy等人在2014年的论文《Going Deeper with Convolutions》提出,其最大的亮点是提出一种叫Inception的结构,以此为基础构建GoogLeNet,并在当年的ImageNet分类和检测任务中获得第一,ps:GoogLeNet的取名是为了向YannLeCun的LeNet系列致敬。

5.12.1 一些思考



尴尬的是,现在的计算机体系结构更善于稠密数据的计算,而在非均匀分布的稀疏数据上的计算效率极差,比如稀疏性会导致的缓存miss率极高,于是需要一种方法既能发挥稀疏网络的优势又能保证计算效率。好在前人做了大量实验(如《On Two-Dimensional Sparse Matrix Partitioning: Models, Methods, and a Recipe》),发现对稀疏矩阵做聚类得到相对稠密的子矩阵可以大幅提高稀疏矩阵乘法性能,借鉴这个思想,作者提出Inception的结构。


5.12.2 GoogLeNet结构




5.12.3 代码实践

  1. # -*- coding: utf-8 -*-
  2. from keras.layers import Input, Conv2D, Dense, MaxPooling2D, AveragePooling2D
  3. from keras.layers import Dropout, Flatten, merge, ZeroPadding2D, Reshape, Activation
  4. from keras.models import Model
  5. from keras.regularizers import l1_l2
  6. import tensorflow as tf
  7. import googlenet_custom_layers
  8. def inception_module(name,
  9. input_layer,
  10. num_c_1x1,
  11. num_c_1x1_3x3_reduce,
  12. num_c_3x3,
  13. num_c_1x1_5x5_reduce,
  14. num_p_5x5,
  15. num_c_1x1_reduce):
  16. inception_1x1 = Conv2D(name=name+"/inception_1x1",
  17. filters=num_c_1x1,
  18. kernel_size=(1, 1),
  19. strides=(1, 1),
  20. padding='same',
  21. kernel_initializer='he_normal',
  22. activation='relu',
  23. kernel_regularizer=l1_l2(0.0001))(input_layer)
  24. inception_3x3_reduce = Conv2D(name=name+"/inception_3x3_reduce",
  25. filters=num_c_1x1_3x3_reduce,
  26. kernel_size=(1, 1),
  27. strides=(1, 1),
  28. padding='same',
  29. kernel_initializer='he_normal',
  30. activation='relu',
  31. kernel_regularizer=l1_l2(0.0001))(input_layer)
  32. inception_3x3 = Conv2D(name=name+"/inception_3x3",
  33. filters=num_c_3x3,
  34. kernel_size=(3, 3),
  35. strides=(1, 1),
  36. padding='same',
  37. kernel_initializer='he_normal',
  38. activation='relu',
  39. kernel_regularizer=l1_l2(0.0001))(inception_3x3_reduce)
  40. inception_5x5_reduce = Conv2D(name=name+"/inception_5x5_reduce",
  41. filters=num_c_1x1_5x5_reduce,
  42. kernel_size=(1, 1),
  43. strides=(1, 1),
  44. padding='same',
  45. kernel_initializer='he_normal',
  46. activation='relu',
  47. kernel_regularizer=l1_l2(0.0001))(input_layer)
  48. inception_5x5 = Conv2D(name=name+"/inception_5x5",
  49. filters=num_p_5x5,
  50. kernel_size=(5, 5),
  51. strides=(1, 1),
  52. padding='same',
  53. kernel_initializer='he_normal',
  54. activation='relu',
  55. kernel_regularizer=l1_l2(0.0001))(inception_5x5_reduce)
  56. inception_max_pool = MaxPooling2D(name=name+"/inception_max_pool",
  57. pool_size=(3, 3),
  58. strides=(1, 1),
  59. padding="same")(input_layer)
  60. inception_max_pool_proj = Conv2D(name=name+"/inception_max_pool_project",
  61. filters=num_c_1x1_reduce,
  62. kernel_size=(1, 1),
  63. strides=(1, 1),
  64. padding='same',
  65. kernel_initializer='he_normal',
  66. activation='relu',
  67. kernel_regularizer=l1_l2(0.0001))(inception_max_pool)
  68. print (inception_1x1.get_shape(), inception_3x3.get_shape(), inception_5x5.get_shape(), inception_max_pool_proj.get_shape())
  69. # inception_output = tf.concat(3, [inception_1x1, inception_3x3, inception_5x5, inception_max_pool_proj])
  70. from keras.layers.merge import concatenate
  71. #注意,由于变态的tensorflow更改了concat函数的参数顺序,需要注意自己的tf和keras版本
  72. #适时的将/usr/lib/python×××/site-packages/keras/backend/tensorflow_backend.py的1554行的代码由
  73. #return tf.concat([to_dense(x) for x in tensors], axis) 改为:
  74. #return tf.concat(axis, [to_dense(x) for x in tensors])
  75. inception_output = concatenate([inception_1x1, inception_3x3, inception_5x5, inception_max_pool_proj])
  76. return inception_output
  77. def googLeNet_inception_v1_building(input_shape, output_num, fine_tune=None):
  78. input_layer = Input(shape=input_shape)
  79. # 第一层,卷积层
  80. conv1_7x7 = Conv2D(name="conv1_7x7/2",
  81. filters=64,
  82. kernel_size=(7, 7),
  83. strides=(2, 2),
  84. padding='same',
  85. kernel_initializer='he_normal',
  86. activation='relu',
  87. kernel_regularizer=l1_l2(0.0001))(input_layer)
  88. conv1_zero_pad = ZeroPadding2D(padding=(1, 1))(conv1_7x7)
  89. # 第二层,max pooling层
  90. pool1_3x3 = MaxPooling2D(name="max_pool1_3x3/2",
  91. pool_size=(3, 3),
  92. strides=(2, 2),
  93. padding='valid')(conv1_zero_pad)
  94. # 第二层,LRN规范化
  95. #pool1_norm1 = tf.nn.lrn(pool1_3x3, 4, bias=1.0, alpha=0.001 / 9.0, beta=0.75, name='ax_pool1_3x3/norm1')
  96. pool1_norm1 = googlenet_custom_layers.LRN2D(name='max_pool1_3x3/norm1')(pool1_3x3)
  97. # 第四层,卷积层降维
  98. conv2_3x3_reduce = Conv2D(name="conv2_3x3_reduce/1",
  99. filters=64,
  100. kernel_size=(1, 1),
  101. padding='same',
  102. kernel_initializer='he_normal',
  103. activation='relu',
  104. kernel_regularizer=l1_l2(0.0001))(pool1_norm1)
  105. # 第五层,卷积层
  106. conv2_3x3 = Conv2D(name="conv2_3x3/1",
  107. filters=192,
  108. kernel_size=(3, 3),
  109. padding='same',
  110. kernel_initializer='he_normal',
  111. activation='relu',
  112. kernel_regularizer=l1_l2(0.0001))(conv2_3x3_reduce)
  113. # 第六层,LRN规范化
  114. #conv2_norm2 = tf.nn.lrn(conv2_3x3, 4, bias=1.0, alpha=0.001 / 9.0, beta=0.75, name='conv2_3x3/norm2')
  115. conv2_norm2 = googlenet_custom_layers.LRN2D(name='conv2_3x3/norm2')(conv2_3x3)
  116. conv2_zero_pad = ZeroPadding2D(padding=(1, 1))(conv2_norm2)
  117. # 第七层,max pooling层
  118. pool2_3x3 = MaxPooling2D(name="max_pool2_3x3",
  119. pool_size=(3, 3),
  120. strides=(2, 2),
  121. padding='valid')(conv2_zero_pad)
  122. # 第八层,inception 3a
  123. inception_3a = inception_module("inception_3a",pool2_3x3, 64, 96, 128, 16, 32, 32)
  124. # 第九层,inception 3b
  125. inception_3b = inception_module("inception_3b",inception_3a, 128, 128, 192, 32, 96, 64)
  126. inception_3b_zero_pad = ZeroPadding2D(padding=(1, 1))(inception_3b)
  127. # 第十层,max pooling层
  128. pool3_3x3 = MaxPooling2D(name="max_pool3_3x3/2",
  129. pool_size=(3, 3),
  130. strides=(2, 2),
  131. padding='valid')(inception_3b_zero_pad)
  132. # 第十一层,inception 4a
  133. inception_4a = inception_module("inception_4a",pool3_3x3, 192, 96, 208, 16, 48, 64)
  134. # 第十二层,分支loss1
  135. loss1_ave_pool = AveragePooling2D(name="loss1/ave_pool",
  136. pool_size=(5, 5),
  137. strides=(3, 3))(inception_4a)
  138. loss1_conv = Conv2D(name="loss1/conv",
  139. filters=128,
  140. kernel_size=(1, 1),
  141. padding='same',
  142. kernel_initializer='he_normal',
  143. activation='relu',
  144. kernel_regularizer=l1_l2(0.0001))(loss1_ave_pool)
  145. loss1_flat = Flatten()(loss1_conv)
  146. loss1_fc = Dense(1024,
  147. activation='relu',
  148. name="loss1/fc",
  149. kernel_regularizer=l1_l2(0.0001))(loss1_flat)
  150. loss1_drop_fc = Dropout(0.7)(loss1_fc)
  151. loss1_classifier = Dense(output_num,
  152. name="loss1/classifier",
  153. kernel_regularizer=l1_l2(0.0001))(loss1_drop_fc)
  154. loss1_classifier_act = Activation('softmax')(loss1_classifier)
  155. # 第十二层,inception_4b
  156. inception_4b = inception_module("inception_4b",inception_4a, 160, 112, 224, 24, 64, 64)
  157. # 第十三层,inception_4c
  158. inception_4c = inception_module("inception_4c",inception_4b, 128, 128, 256, 24, 64, 64)
  159. # 第十四层,inception_4c
  160. inception_4d = inception_module("inception_4d",inception_4c, 112, 144, 288, 32, 64, 64)
  161. # 第十五层,分支loss2
  162. loss2_ave_pool = AveragePooling2D(pool_size=(5, 5),
  163. strides=(3, 3),
  164. name='loss2/ave_pool')(inception_4d)
  165. loss2_conv = Conv2D(name="loss2/conv",
  166. filters=128,
  167. kernel_size=(1, 1),
  168. padding='same',
  169. kernel_initializer='he_normal',
  170. activation='relu',
  171. kernel_regularizer=l1_l2(0.0001))(loss2_ave_pool)
  172. loss2_flat = Flatten()(loss2_conv)
  173. loss2_fc = Dense(1024,
  174. activation='relu',
  175. name="loss2/fc",
  176. kernel_regularizer=l1_l2(0.0001))(loss2_flat)
  177. loss2_drop_fc = Dropout(0.7)(loss2_fc)
  178. loss2_classifier = Dense(output_num,
  179. name="loss2/classifier",
  180. kernel_regularizer=l1_l2(0.0001))(loss2_drop_fc)
  181. loss2_classifier_act = Activation('softmax')(loss2_classifier)
  182. # 第十五层,inception_4e
  183. inception_4e = inception_module("inception_4e",inception_4d, 256, 160, 320, 32, 128, 128)
  184. inception_4e_zero_pad = ZeroPadding2D(padding=(1, 1))(inception_4e)
  185. # 第十六层,max pooling层
  186. pool4_3x3 = MaxPooling2D(name="max_pool4_3x3",
  187. pool_size=(3, 3),
  188. strides=(2, 2),
  189. padding='valid')(inception_4e_zero_pad)
  190. # 第十七层,inception_5a
  191. inception_5a = inception_module("inception_5a",pool4_3x3, 256, 160, 320, 32, 128, 128)
  192. # 第十八层,inception_5b
  193. inception_5b = inception_module("inception_5b",inception_5a, 384, 192, 384, 48, 128, 128)
  194. # 第十九层,average pooling层
  195. pool5_7x7 = AveragePooling2D(name="ave_pool5_7x7",
  196. pool_size=(7, 7),
  197. strides=(1, 1))(inception_5b)
  198. loss3_flat = Flatten()(pool5_7x7)
  199. pool5_drop_7x7 = Dropout(0.4)(loss3_flat)
  200. # 第二十层,全连接层
  201. loss3_classifier = Dense(output_num,
  202. name="loss3/classifier",
  203. kernel_regularizer=l1_l2(0.0001))(pool5_drop_7x7)
  204. loss3_classifier_act = Activation('softmax')(loss3_classifier)
  205. googlenet_inception_v1 = Model(name="googlenet_inception_v1",
  206. input=input_layer,
  207. output=[loss1_classifier_act, loss2_classifier_act, loss3_classifier_act])
  208. if fine_tune:
  209. googlenet_inception_v1.load_weights(fine_tune)
  210. return googlenet_inception_v1
  1. from keras.layers.core import Layer
  2. import keras.backend as K
  3. class LRN2D(Layer):
  4. """
  5. This code is adapted from pylearn2.
  6. License at: https://github.com/lisa-lab/pylearn2/blob/master/LICENSE.txt
  7. """
  8. def __init__(self, alpha=1e-4, k=2, beta=0.75, n=5, **kwargs):
  9. if n % 2 == 0:
  10. raise NotImplementedError("LRN2D only works with odd n. n provided: " + str(n))
  11. super(LRN2D, self).__init__(**kwargs)
  12. self.alpha = alpha
  13. self.k = k
  14. self.beta = beta
  15. self.n = n
  16. def get_output(self, train):
  17. X = self.get_input(train)
  18. b, ch, r, c = K.shape(X)
  19. half_n = self.n // 2
  20. input_sqr = K.square(X)
  21. extra_channels = K.zeros((b, ch + 2 * half_n, r, c))
  22. input_sqr = K.concatenate([extra_channels[:, :half_n, :, :],
  23. input_sqr,
  24. extra_channels[:, half_n + ch:, :, :]],
  25. axis=1)
  26. scale = self.k
  27. for i in range(self.n):
  28. scale += self.alpha * input_sqr[:, i:i + ch, :, :]
  29. scale = scale ** self.beta
  30. return X / scale
  31. def get_config(self):
  32. config = {"name": self.__class__.__name__,
  33. "alpha": self.alpha,
  34. "k": self.k,
  35. "beta": self.beta,
  36. "n": self.n}
  37. base_config = super(LRN2D, self).get_config()
  38. return dict(list(base_config.items()) + list(config.items()))
  39. class PoolHelper(Layer):
  40. def __init__(self, **kwargs):
  41. super(PoolHelper, self).__init__(**kwargs)
  42. def call(self, x, mask=None):
  43. return x[:, :, 1:, 1:]
  44. def get_config(self):
  45. config = {}
  46. base_config = super(PoolHelper, self).get_config()
  47. return dict(list(base_config.items()) + list(config.items()))
  1. # -*- coding: utf-8 -*-
  2. import numpy as np
  3. import matplotlib
  4. matplotlib.use("Agg")
  5. import matplotlib.pyplot as plt
  6. import os
  7. from scipy.misc import toimage
  8. from keras.datasets import cifar10
  9. from keras.utils import np_utils
  10. from keras.preprocessing.image import ImageDataGenerator
  11. from keras.callbacks import ModelCheckpoint
  12. from keras import backend as K
  13. import tensorflow as tf
  14. tf.python.control_flow_ops = tf
  15. from keras.callbacks import ReduceLROnPlateau, CSVLogger, EarlyStopping
  16. lr_reducer = ReduceLROnPlateau(monitor='val_loss', factor=np.sqrt(0.5), cooldown=0, patience=3, min_lr=1e-6)
  17. early_stopper = EarlyStopping(monitor='val_acc', min_delta=0.0005, patience=15)
  18. csv_logger = CSVLogger('resnet34_cifar10.csv')
  19. import os
  20. import googlenet_inception_v1
  21. if __name__ == "__main__":
  22. from keras.utils.vis_utils import plot_model
  23. with tf.device('/gpu:4'):
  24. gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=1, allow_growth=True)
  25. os.environ["CUDA_VISIBLE_DEVICES"] = "4"
  26. tf.Session(config=K.tf.ConfigProto(allow_soft_placement=True,
  27. log_device_placement=True,
  28. gpu_options=gpu_options))
  29. (X_train, y_train), (X_test, y_test) = cifar10.load_data()
  30. # 定义输入数据并做归一化
  31. dim = 32
  32. channel = 3
  33. class_num = 10
  34. X_train = X_train.reshape(X_train.shape[0], dim, dim, channel).astype('float32') / 255
  35. X_test = X_test.reshape(X_test.shape[0], dim, dim, channel).astype('float32') / 255
  36. Y_train = np_utils.to_categorical(y_train, class_num)
  37. Y_test = np_utils.to_categorical(y_test, class_num)
  38. # this will do preprocessing and realtime data augmentation
  39. datagen = ImageDataGenerator(
  40. featurewise_center=False, # set input mean to 0 over the dataset
  41. samplewise_center=False, # set each sample mean to 0
  42. featurewise_std_normalization=False, # divide inputs by std of the dataset
  43. samplewise_std_normalization=False, # divide each input by its std
  44. zca_whitening=False, # apply ZCA whitening
  45. rotation_range=25, # randomly rotate images in the range (degrees, 0 to 180)
  46. width_shift_range=0.1, # randomly shift images horizontally (fraction of total width)
  47. height_shift_range=0.1, # randomly shift images vertically (fraction of total height)
  48. horizontal_flip=True, # randomly flip images
  49. vertical_flip=False) # randomly flip images
  50. datagen.fit(X_train)
  51. s = X_train.shape[1:]
  52. print(s)
  53. model = googlenet_inception_v1.googLeNet_inception_v1_building(s,class_num)
  54. model.summary()
  55. #import pdb
  56. #pdb.set_trace()
  57. plot_model(model, to_file="GoogLeNet-Inception-V1.jpg", show_shapes=True)
  58. model.compile(loss='categorical_crossentropy',
  59. optimizer='adadelta',
  60. metrics=['accuracy'])
  61. batch_size = 32
  62. nb_epoch = 100
  63. # import pdb
  64. # pdb.set_trace()
  65. ModelCheckpoint("weights-improvement-{epoch:02d}-{val_acc:.2f}.hdf5", monitor='val_loss', verbose=0,
  66. save_best_only=False, save_weights_only=False, mode='auto')
  67. for e in range(nb_epoch):
  68. batches = 0
  69. for X_batch, Y_batch in datagen.flow(X_train, Y_train, batch_size=64):
  70. loss = model.train_on_batch(X_batch, [Y_batch,Y_batch,Y_batch]) # note the three outputs
  71. print loss
  72. #print '\r\n'
  73. #loss_and_metrics = model.evaluate(X_test, [Y_test,Y_test,Y_test], batch_size=128)
  74. #model.fit(X_test, [Y_test,Y_test,Y_test], batch_size=64)
  75. batches += 1
  76. if batches >= len(X_train) / 64:
  77. # we need to break the loop by hand because
  78. # the generator loops indefinitely
  79. break
  80. score = model.evaluate(X_test, Y_test, verbose=0)
  81. print('Test score:', score[0])
  82. print('Test accuracy:', score[1])



5.13 GoogLeNet Inception V2

GoogLeNet Inception V2在《Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift》出现,最大亮点是提出了Batch Normalization方法,它起到以下作用:

5.13.1 一些思考

在机器学习中,我们通常会做一种假设:训练样本独立同分布(iid)且训练样本与测试样本分布一致,如果真实数据符合这个假设则模型效果可能会不错,反之亦然,这个在学术上叫Covariate Shift,所以从样本(外部)的角度说,对于神经网络也是一样的道理。从结构(内部)的角度说,由于神经网络由多层组成,样本在层与层之间边提特征边往前传播,如果每层的输入分布不一致,那么势必造成要么模型效果不好,要么学习速度较慢,学术上这个叫Internal Covariate Shift。

5.13.2 BN原理









5.13.2 卷积神经网络中的BN

卷积网络中采用权重共享策略,每个feature map只有一对需要学习。

5.13.3 代码实践

  1. import copy
  2. import numpy as np
  3. import pandas as pd
  4. import matplotlib
  5. matplotlib.use("Agg")
  6. import matplotlib.pyplot as plt
  7. from matplotlib.pyplot import plot,savefig
  8. from keras.datasets import mnist, cifar10
  9. from keras.models import Sequential
  10. from keras.layers.core import Dense, Dropout, Activation, Flatten, Reshape
  11. from keras.optimizers import SGD, RMSprop
  12. from keras.utils import np_utils
  13. from keras.regularizers import l2
  14. from keras.layers.convolutional import Convolution2D, MaxPooling2D, ZeroPadding2D, AveragePooling2D
  15. from keras.callbacks import EarlyStopping
  16. from keras.preprocessing.image import ImageDataGenerator
  17. from keras.layers.normalization import BatchNormalization
  18. import tensorflow as tf
  19. tf.python.control_flow_ops = tf
  20. from PIL import Image
  21. def build_LeNet5():
  22. model = Sequential()
  23. model.add(Convolution2D(96, 11, 11, border_mode='same', input_shape = (32, 32, 3), dim_ordering='tf'))
  24. #注释1 model.add(BatchNormalization())
  25. model.add(MaxPooling2D(pool_size=(2, 2)))
  26. #注释2 model.add(BatchNormalization())
  27. model.add(Activation("tanh"))
  28. model.add(Convolution2D(120, 1, 1, border_mode='valid'))
  29. #注释3 model.add(BatchNormalization())
  30. model.add(Flatten())
  31. model.add(Dense(10))
  32. model.add(BatchNormalization())
  33. model.add(Activation("relu"))
  34. #注释4 model.add(Dense(10))
  35. model.add(Activation('softmax'))
  36. return model
  37. if __name__=="__main__":
  38. from keras.utils.vis_utils import plot_model
  39. model = build_LeNet5()
  40. model.summary()
  41. plot_model(model, to_file="LeNet-5.png", show_shapes=True)
  42. (X_train, y_train), (X_test, y_test) = cifar10.load_data()#mnist.load_data()
  43. X_train = X_train.reshape(X_train.shape[0], 32, 32, 3).astype('float32') / 255
  44. X_test = X_test.reshape(X_test.shape[0], 32, 32, 3).astype('float32') / 255
  45. Y_train = np_utils.to_categorical(y_train, 10)
  46. Y_test = np_utils.to_categorical(y_test, 10)
  47. # this will do preprocessing and realtime data augmentation
  48. datagen = ImageDataGenerator(
  49. featurewise_center=False, # set input mean to 0 over the dataset
  50. samplewise_center=False, # set each sample mean to 0
  51. featurewise_std_normalization=False, # divide inputs by std of the dataset
  52. samplewise_std_normalization=False, # divide each input by its std
  53. zca_whitening=False, # apply ZCA whitening
  54. rotation_range=25, # randomly rotate images in the range (degrees, 0 to 180)
  55. width_shift_range=0.1, # randomly shift images horizontally (fraction of total width)
  56. height_shift_range=0.1, # randomly shift images vertically (fraction of total height)
  57. horizontal_flip=False, # randomly flip images
  58. vertical_flip=False) # randomly flip images
  59. datagen.fit(X_train)
  60. # training
  61. model.compile(loss='categorical_crossentropy',
  62. optimizer='adadelta',
  63. metrics=['accuracy'])
  64. batch_size = 32
  65. nb_epoch = 8
  66. model.fit(X_train, Y_train, batch_size=batch_size, nb_epoch=nb_epoch,
  67. verbose=1, validation_data=(X_test, Y_test))
  68. score = model.evaluate(X_test, Y_test, verbose=0)
  69. print('Test score:', score[0])
  70. print('Test accuracy:', score[1])


5.14 GoogLeNet Inception V3

GoogLeNet Inception V3在《Rethinking the Inception Architecture for Computer Vision》中提出(注意,在这篇论文中作者把该网络结构叫做v2版,我们以最终的v4版论文的划分为标准),该论文的亮点在于:

5.14.1 网络结构设计的准则


5.14.2 平滑样本标注

对于多分类的样本标注一般是one-hot的,例如[0,0,0,1],使用类似交叉熵的损失函数会使得模型学习中对ground truth标签分配过于置信的概率,并且由于ground truth标签的logit值与其他标签差距过大导致,出现过拟合,导致降低泛化性。一种解决方法是加正则项,即对样本标签给个概率分布做调节,使得样本标注变成“soft”的,例如[0.1,0.2,0.1,0.6],这种方式在实验中降低了top-1和top-5的错误率0.2%。

5.14.3 网络结构

5.14.4 代码实践

为了能在单机跑起来,对feature map做了缩减,为适应cifar10的输入大小,对输入的stride做了调整,代码如下。

  1. # -*- coding: utf-8 -*-
  2. import numpy as np
  3. from keras.layers import Input, merge, Dropout, Dense, Lambda, Flatten, Activation, merge
  4. from keras.layers.convolutional import MaxPooling2D, Conv2D, AveragePooling2D
  5. from keras.layers.normalization import BatchNormalization
  6. from keras.layers.merge import concatenate, add
  7. from keras.regularizers import l1_l2
  8. from keras.models import Model
  9. from keras.callbacks import CSVLogger, ReduceLROnPlateau, ModelCheckpoint, EarlyStopping
  10. lr_reducer = ReduceLROnPlateau(monitor='val_loss', factor=np.sqrt(0.5), cooldown=0, patience=3, min_lr=1e-6)
  11. early_stopper = EarlyStopping(monitor='val_acc', min_delta=0.0005, patience=15)
  12. csv_logger = CSVLogger('resnet34_cifar10.csv')
  13. from keras.utils.vis_utils import plot_model
  14. import os
  15. from keras.preprocessing.image import ImageDataGenerator
  16. from keras.utils import np_utils
  17. from keras.datasets import cifar10
  18. from keras import backend as K
  19. import tensorflow as tf
  20. tf.python.control_flow_ops = tf
  21. import warnings
  22. warnings.filterwarnings('ignore')
  23. filter_control = 8
  24. def bn_relu(input):
  25. """Helper to build a BN -> relu block
  26. """
  27. norm = BatchNormalization()(input)
  28. return Activation("relu")(norm)
  29. def before_inception(input_shape, small_mode=False):
  30. input_layer = input_shape
  31. if small_mode:
  32. strides = (1, 1)
  33. else:
  34. strides = (2, 2)
  35. before_conv1_3x3 = Conv2D(name="before_conv1_3x3/2",
  36. filters=32 // filter_control,
  37. kernel_size=(3, 3),
  38. strides=strides,
  39. kernel_initializer='he_normal',
  40. activation='relu',
  41. kernel_regularizer=l1_l2(0.00001))(input_layer)
  42. before_conv2_3x3 = Conv2D(name="before_conv2_3x3/1",
  43. filters=32 // filter_control,
  44. kernel_size=(3, 3),
  45. strides=(1, 1),
  46. kernel_initializer='he_normal',
  47. activation='relu',
  48. kernel_regularizer=l1_l2(0.00001))(before_conv1_3x3)
  49. before_conv3_3x3 = Conv2D(name="before_conv3_3x3/1",
  50. filters=64 // filter_control,
  51. kernel_size=(3, 3),
  52. strides=(1, 1),
  53. kernel_initializer='he_normal',
  54. activation='relu',
  55. padding='same',
  56. kernel_regularizer=l1_l2(0.00001))(before_conv2_3x3)
  57. before_pool1_3x3 = MaxPooling2D(name="before_pool1_3x3/2",
  58. pool_size=(3, 3),
  59. strides=strides,
  60. padding='valid')(before_conv3_3x3)
  61. before_conv4_3x3 = Conv2D(name="before_conv4_3x3/1",
  62. filters=80 // filter_control,
  63. kernel_size=(3, 3),
  64. strides=(1, 1),
  65. kernel_initializer='he_normal',
  66. activation='relu',
  67. padding='valid',
  68. kernel_regularizer=l1_l2(0.00001))(before_pool1_3x3)
  69. before_conv5_3x3 = Conv2D(name="before_conv3_3x3/2",
  70. filters=192 // filter_control,
  71. kernel_size=(3, 3),
  72. strides=strides,
  73. kernel_initializer='he_normal',
  74. activation='relu',
  75. padding='valid',
  76. kernel_regularizer=l1_l2(0.00001))(before_conv4_3x3)
  77. before_conv6_3x3 = Conv2D(name="before_conv6_3x3/1",
  78. filters=288 // filter_control,
  79. kernel_size=(3, 3),
  80. strides=(1, 1),
  81. kernel_initializer='he_normal',
  82. activation='relu',
  83. padding='valid',
  84. kernel_regularizer=l1_l2(0.00001))(before_conv5_3x3)
  85. return before_conv6_3x3
  86. def inception_A(i, input_shape):
  87. input_layer = input_shape
  88. # (20,20,288)
  89. inception_A_conv1_1x1 = Conv2D(name="inception_A_conv1_1x1/1" + i,
  90. filters=64 // filter_control,
  91. kernel_size=(1, 1),
  92. strides=(1, 1),
  93. kernel_initializer='he_normal',
  94. activation='relu',
  95. padding='same',
  96. kernel_regularizer=l1_l2(0.00001))(input_layer)
  97. inception_A_conv2_3x3 = Conv2D(name="inception_A_conv2_3x3/1" + i,
  98. filters=96 // filter_control,
  99. kernel_size=(3, 3),
  100. strides=(1, 1),
  101. kernel_initializer='he_normal',
  102. activation='relu',
  103. padding='same',
  104. kernel_regularizer=l1_l2(0.00001))(inception_A_conv1_1x1)
  105. inception_A_conv3_3x3 = Conv2D(name="inception_A_conv3_3x3/1" + i,
  106. filters=96 // filter_control,
  107. kernel_size=(3, 3),
  108. strides=(1, 1),
  109. kernel_initializer='he_normal',
  110. activation='relu',
  111. padding='same',
  112. kernel_regularizer=l1_l2(0.00001))(inception_A_conv2_3x3)
  113. inception_A_conv4_1x1 = Conv2D(name="inception_A_conv4_1x1/1" + i,
  114. filters=48 // filter_control,
  115. kernel_size=(1, 1),
  116. strides=(1, 1),
  117. kernel_initializer='he_normal',
  118. activation='relu',
  119. padding='same',
  120. kernel_regularizer=l1_l2(0.00001))(input_layer)
  121. inception_A_conv5_3x3 = Conv2D(name="inception_A_conv5_3x3/1" + i,
  122. filters=64 // filter_control,
  123. kernel_size=(3, 3),
  124. strides=(1, 1),
  125. kernel_initializer='he_normal',
  126. activation='relu',
  127. padding='same',
  128. kernel_regularizer=l1_l2(0.00001))(inception_A_conv4_1x1)
  129. inception_A_pool1_3x3 = AveragePooling2D(name="inception_A_pool1_3x3/1" + i,
  130. pool_size=(3, 3),
  131. strides=(1, 1),
  132. padding='same')(input_layer)
  133. inception_A_conv6_1x1 = Conv2D(name="inception_A_conv6_1x1/1" + i,
  134. filters=64 // filter_control,
  135. kernel_size=(1, 1),
  136. strides=(1, 1),
  137. kernel_initializer='he_normal',
  138. activation='relu',
  139. padding='same',
  140. kernel_regularizer=l1_l2(0.00001))(inception_A_pool1_3x3)
  141. inception_A_conv7_1x1 = Conv2D(name="inception_A_conv7_1x1/1" + i,
  142. filters=64 // filter_control,
  143. kernel_size=(1, 1),
  144. strides=(1, 1),
  145. kernel_initializer='he_normal',
  146. activation='relu',
  147. padding='same',
  148. kernel_regularizer=l1_l2(0.00001))(input_layer)
  149. inception_A_merge1 = concatenate([inception_A_conv3_3x3, inception_A_conv5_3x3, inception_A_conv6_1x1, inception_A_conv7_1x1])
  150. return bn_relu(inception_A_merge1)
  151. def inception_B(i, input_shape):
  152. input_layer = input_shape
  153. inception_B_conv1_1x1 = Conv2D(name="inception_B_conv1_1x1/1" + i,
  154. filters=128 // filter_control,
  155. kernel_size=(1, 1),
  156. strides=(1, 1),
  157. kernel_initializer='he_normal',
  158. activation='relu',
  159. padding='same',
  160. kernel_regularizer=l1_l2(0.00001))(input_layer)
  161. inception_B_conv2_1x7 = Conv2D(name="inception_A_conv2_3x3/1" + i,
  162. filters=128 // filter_control,
  163. kernel_size=(1, 7),
  164. strides=(1, 1),
  165. kernel_initializer='he_normal',
  166. activation='relu',
  167. padding='same',
  168. kernel_regularizer=l1_l2(0.00001))(inception_B_conv1_1x1)
  169. inception_B_conv3_7x1 = Conv2D(name="inception_B_conv3_7x1/1" + i,
  170. filters=128 // filter_control,
  171. kernel_size=(7, 1),
  172. strides=(1, 1),
  173. kernel_initializer='he_normal',
  174. activation='relu',
  175. padding='same',
  176. kernel_regularizer=l1_l2(0.00001))(inception_B_conv2_1x7)
  177. inception_B_conv4_1x7 = Conv2D(name="inception_B_conv4_1x7/1" + i,
  178. filters=128 // filter_control,
  179. kernel_size=(1, 7),
  180. strides=(1, 1),
  181. kernel_initializer='he_normal',
  182. activation='relu',
  183. padding='same',
  184. kernel_regularizer=l1_l2(0.00001))(inception_B_conv3_7x1)
  185. inception_B_conv5_7x1 = Conv2D(name="inception_B_conv5_7x1/1" + i,
  186. filters=192 // filter_control,
  187. kernel_size=(7, 1),
  188. strides=(1, 1),
  189. kernel_initializer='he_normal',
  190. activation='relu',
  191. padding='same',
  192. kernel_regularizer=l1_l2(0.00001))(inception_B_conv4_1x7)
  193. inception_B_conv6_1x1 = Conv2D(name="inception_B_conv6_1x1/1" + i,
  194. filters=128 // filter_control,
  195. kernel_size=(1, 1),
  196. strides=(1, 1),
  197. kernel_initializer='he_normal',
  198. activation='relu',
  199. padding='same',
  200. kernel_regularizer=l1_l2(0.00001))(input_layer)
  201. inception_B_conv7_1x7 = Conv2D(name="inception_B_conv7_1x7/1" + i,
  202. filters=128 // filter_control,
  203. kernel_size=(1, 7),
  204. strides=(1, 1),
  205. kernel_initializer='he_normal',
  206. activation='relu',
  207. padding='same',
  208. kernel_regularizer=l1_l2(0.00001))(inception_B_conv6_1x1)
  209. inception_B_conv8_7x1 = Conv2D(name="inception_B_conv8_7x1/1" + i,
  210. filters=192 // filter_control,
  211. kernel_size=(7, 1),
  212. strides=(1, 1),
  213. kernel_initializer='he_normal',
  214. activation='relu',
  215. padding='same',
  216. kernel_regularizer=l1_l2(0.00001))(inception_B_conv7_1x7)
  217. inception_B_pool1_3x3 = AveragePooling2D(name="inception_B_pool1_3x3/1" + i,
  218. pool_size=(3, 3),
  219. strides=(1, 1),
  220. padding='same')(input_layer)
  221. inception_B_conv9_1x1 = Conv2D(name="inception_B_conv9_1x1/1" + i,
  222. filters=192 // filter_control,
  223. kernel_size=(1, 1),
  224. strides=(1, 1),
  225. kernel_initializer='he_normal',
  226. activation='relu',
  227. padding='same',
  228. kernel_regularizer=l1_l2(0.00001))(inception_B_pool1_3x3)
  229. inception_B_conv10_1x1 = Conv2D(name="inception_B_conv10_1x1/1" + i,
  230. filters=192 // filter_control,
  231. kernel_size=(1, 1),
  232. strides=(1, 1),
  233. kernel_initializer='he_normal',
  234. activation='relu',
  235. padding='same',
  236. kernel_regularizer=l1_l2(0.00001))(input_layer)
  237. inception_B_merge1 = concatenate(
  238. [inception_B_conv5_7x1, inception_B_conv8_7x1, inception_B_conv9_1x1, inception_B_conv10_1x1])
  239. return bn_relu(inception_B_merge1)
  240. def inception_C(i, input_shape):
  241. input_layer = input_shape
  242. inception_C_conv1_1x1 = Conv2D(name="inception_C_conv1_1x1/1" + i,
  243. filters=448 // filter_control,
  244. kernel_size=(1, 1),
  245. strides=(1, 1),
  246. kernel_initializer='he_normal',
  247. activation='relu',
  248. padding='same',
  249. kernel_regularizer=l1_l2(0.00001))(input_layer)
  250. inception_C_conv2_3x3 = Conv2D(name="inception_C_conv2_3x3/1" + i,
  251. filters=384 // filter_control,
  252. kernel_size=(3, 3),
  253. strides=(1, 1),
  254. kernel_initializer='he_normal',
  255. activation='relu',
  256. padding='same',
  257. kernel_regularizer=l1_l2(0.00001))(inception_C_conv1_1x1)
  258. inception_C_conv3_1x3 = Conv2D(name="inception_C_conv3_1x3/1" + i,
  259. filters=384 // filter_control,
  260. kernel_size=(1, 3),
  261. strides=(1, 1),
  262. kernel_initializer='he_normal',
  263. activation='relu',
  264. padding='same',
  265. kernel_regularizer=l1_l2(0.00001))(inception_C_conv2_3x3)
  266. inception_C_conv4_3x1 = Conv2D(name="inception_C_conv4_3x1/1" + i,
  267. filters=384 // filter_control,
  268. kernel_size=(3, 1),
  269. strides=(1, 1),
  270. kernel_initializer='he_normal',
  271. activation='relu',
  272. padding='same',
  273. kernel_regularizer=l1_l2(0.00001))(inception_C_conv2_3x3)
  274. inception_C_merge1 = concatenate([inception_C_conv3_1x3, inception_C_conv4_3x1])
  275. inception_C_conv5_1x1 = Conv2D(name="inception_C_conv5_1x1/1" + i,
  276. filters=384 // filter_control,
  277. kernel_size=(1, 1),
  278. strides=(1, 1),
  279. kernel_initializer='he_normal',
  280. activation='relu',
  281. padding='same',
  282. kernel_regularizer=l1_l2(0.00001))(input_layer)
  283. inception_C_conv6_1x3 = Conv2D(name="inception_C_conv6_1x3/1" + i,
  284. filters=384 // filter_control,
  285. kernel_size=(1, 3),
  286. strides=(1, 1),
  287. kernel_initializer='he_normal',
  288. activation='relu',
  289. padding='same',
  290. kernel_regularizer=l1_l2(0.00001))(inception_C_conv5_1x1)
  291. inception_C_conv7_3x1 = Conv2D(name="inception_C_conv7_3x1/1" + i,
  292. filters=384 // filter_control,
  293. kernel_size=(3, 1),
  294. strides=(1, 1),
  295. kernel_initializer='he_normal',
  296. activation='relu',
  297. padding='same',
  298. kernel_regularizer=l1_l2(0.00001))(inception_C_conv5_1x1)
  299. inception_C_merge2 = concatenate([inception_C_conv6_1x3, inception_C_conv7_3x1])
  300. inception_C_pool1_3x3 = AveragePooling2D(name="inception_C_pool1_3x3/1" + i,
  301. pool_size=(3, 3),
  302. strides=(1, 1),
  303. padding='same')(input_layer)
  304. inception_C_conv8_1x1 = Conv2D(name="inception_C_conv8_1x1/1" + i,
  305. filters=192 // filter_control,
  306. kernel_size=(1, 1),
  307. strides=(1, 1),
  308. kernel_initializer='he_normal',
  309. activation='relu',
  310. padding='same',
  311. kernel_regularizer=l1_l2(0.00001))(inception_C_pool1_3x3)
  312. inception_C_conv9_1x1 = Conv2D(name="inception_C_conv9_1x1/1" + i,
  313. filters=320 // filter_control,
  314. kernel_size=(1, 1),
  315. strides=(1, 1),
  316. kernel_initializer='he_normal',
  317. activation='relu',
  318. padding='same',
  319. kernel_regularizer=l1_l2(0.00001))(input_layer)
  320. inception_C_merge3 = concatenate(
  321. [inception_C_merge1, inception_C_merge2, inception_C_conv8_1x1, inception_C_conv9_1x1])
  322. return bn_relu(inception_C_merge3)
  323. def create_inception_v3(input_shape, nb_classes=10, small_mode=False):
  324. input_layer = Input(input_shape)
  325. x = before_inception(input_layer, small_mode)
  326. # 3 x Inception A
  327. for i in range(3):
  328. x = inception_A(str(i), x)
  329. # 5 x Inception B
  330. for i in range(5):
  331. x = inception_B(str(i), x)
  332. # 2 x Inception C
  333. for i in range(2):
  334. x = inception_C(str(i), x)
  335. x = AveragePooling2D((8, 8), strides=(1, 1))(x)
  336. # Dropout
  337. x = Dropout(0.8)(x)
  338. x = Flatten()(x)
  339. # Output
  340. out = Dense(output_dim=nb_classes, activation='softmax')(x)
  341. model = Model(input_layer, output=out, name='Inception-v3')
  342. return model
  343. if __name__ == "__main__":
  344. with tf.device('/gpu:3'):
  345. gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=1, allow_growth=True)
  346. os.environ["CUDA_VISIBLE_DEVICES"] = "3"
  347. tf.Session(config=K.tf.ConfigProto(allow_soft_placement=True,
  348. log_device_placement=True,
  349. gpu_options=gpu_options))
  350. (x_train, y_train), (x_test, y_test) = cifar10.load_data()
  351. # reorder dimensions for tensorflow
  352. x_train = np.transpose(x_train.astype('float32') / 255., (0, 1, 2, 3))
  353. x_test = np.transpose(x_test.astype('float32') / 255., (0, 1, 2, 3))
  354. print('x_train shape:', x_train.shape)
  355. print(x_train.shape[0], 'train samples')
  356. print(x_test.shape[0], 'test samples')
  357. # convert class vectors to binary class matrices
  358. y_train = np_utils.to_categorical(y_train)
  359. y_test = np_utils.to_categorical(y_test)
  360. s = x_train.shape[1:]
  361. batch_size = 128
  362. nb_epoch = 10
  363. nb_classes = 10
  364. model = create_inception_v3(s, nb_classes)
  365. model.summary()
  366. plot_model(model, to_file="GoogLeNet-Inception-V3.jpg", show_shapes=True)
  367. model.compile(optimizer='adadelta',
  368. loss='categorical_crossentropy',
  369. metrics=['accuracy'])
  370. model.fit(x_train, y_train,
  371. batch_size=batch_size, nb_epoch=nb_epoch, verbose=1,
  372. validation_data=(x_test, y_test), shuffle=True,
  373. callbacks=[])
  374. # Model saving callback
  375. checkpointer = ModelCheckpoint("weights-improvement-{epoch:02d}-{val_acc:.2f}.hdf5", monitor='val_loss',
  376. verbose=0,
  377. save_best_only=False, save_weights_only=False, mode='auto')
  378. print('Using real-time data augmentation.')
  379. datagen_train = ImageDataGenerator(
  380. featurewise_center=False,
  381. samplewise_center=False,
  382. featurewise_std_normalization=False,
  383. samplewise_std_normalization=False,
  384. zca_whitening=False,
  385. rotation_range=0,
  386. width_shift_range=0.125,
  387. height_shift_range=0.125,
  388. horizontal_flip=True,
  389. vertical_flip=False)
  390. datagen_train.fit(x_train)
  391. history = model.fit_generator(datagen_train.flow(x_train, y_train, batch_size=batch_size, shuffle=True),
  392. samples_per_epoch=x_train.shape[0],
  393. nb_epoch=nb_epoch, verbose=1,
  394. validation_data=(x_test, y_test),
  395. callbacks=[lr_reducer, early_stopper, csv_logger, checkpointer])

5.15 GoogLeNet Inception V4/ResNet V1/V2

这三种结构在《Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning》一文中提出,论文的亮点是:提出了效果更好的GoogLeNet Inception v4网络结构;与残差网络融合,提出效果不逊于v4但训练速度更快的结构。

5.15.1 GoogLeNet Inception V4网络结构

5.15.2 GoogLeNet Inception ResNet网络结构

5.15.3 代码实践

GoogLeNet Inception ResNet V2

  1. # -*- coding: utf-8 -*-
  2. import numpy as np
  3. from keras.layers import Input, merge, Dropout, Dense, Lambda, Flatten, Activation
  4. from keras.layers.convolutional import MaxPooling2D, Conv2D, AveragePooling2D
  5. from keras.layers.normalization import BatchNormalization
  6. from keras.layers.merge import concatenate, add
  7. from keras.regularizers import l1_l2
  8. from keras.models import Model
  9. from keras.callbacks import CSVLogger, ReduceLROnPlateau, ModelCheckpoint, EarlyStopping
  10. lr_reducer = ReduceLROnPlateau(monitor='val_loss', factor=np.sqrt(0.5), cooldown=0, patience=3, min_lr=1e-6)
  11. early_stopper = EarlyStopping(monitor='val_acc', min_delta=0.0005, patience=15)
  12. csv_logger = CSVLogger('resnet34_cifar10.csv')
  13. from keras.utils.vis_utils import plot_model
  14. import os
  15. from keras.preprocessing.image import ImageDataGenerator
  16. from keras.utils import np_utils
  17. from keras.datasets import cifar10
  18. from keras import backend as K
  19. import tensorflow as tf
  20. tf.python.control_flow_ops = tf
  21. import warnings
  22. warnings.filterwarnings('ignore')
  23. filter_control = 8
  24. def bn_relu(input):
  25. """Helper to build a BN -> relu block
  26. """
  27. norm = BatchNormalization()(input)
  28. return Activation("relu")(norm)
  29. def inception_resnet_stem(input_shape, small_mode=False):
  30. input_layer = input_shape
  31. if small_mode:
  32. strides = (1, 1)
  33. else:
  34. strides = (2, 2)
  35. stem_conv1_3x3 = Conv2D(name="stem_conv1_3x3/2",
  36. filters=32 // filter_control,
  37. kernel_size=(3, 3),
  38. strides=strides,
  39. kernel_initializer='he_normal',
  40. activation='relu',
  41. kernel_regularizer=l1_l2(0.0001))(input_layer)
  42. stem_conv2_3x3 = Conv2D(name="stem_conv2_3x3/1",
  43. filters=32 // filter_control,
  44. kernel_size=(3, 3),
  45. strides=(1, 1),
  46. kernel_initializer='he_normal',
  47. activation='relu',
  48. kernel_regularizer=l1_l2(0.0001))(stem_conv1_3x3)
  49. stem_conv3_3x3 = Conv2D(name="stem_conv3_3x3/1",
  50. filters=64 // filter_control,
  51. kernel_size=(3, 3),
  52. strides=(1, 1),
  53. padding='same',
  54. kernel_initializer='he_normal',
  55. activation='relu',
  56. kernel_regularizer=l1_l2(0.0001))(stem_conv2_3x3)
  57. stem_pool1_3x3 = MaxPooling2D(name="stem_pool1_3x3/2",
  58. pool_size=(3, 3),
  59. strides=strides,
  60. padding='valid')(stem_conv3_3x3)
  61. stem_conv4_3x3 = Conv2D(name="stem_conv4_3x3/2",
  62. filters=96 // filter_control,
  63. kernel_size=(3, 3),
  64. strides=strides,
  65. padding='valid',
  66. kernel_initializer='he_normal',
  67. activation='relu',
  68. kernel_regularizer=l1_l2(0.0001))(stem_conv3_3x3)
  69. stem_merge1 = concatenate([stem_pool1_3x3, stem_conv4_3x3])
  70. stem_conv5_1x1 = Conv2D(name="stem_conv5_1x1/1",
  71. filters=64 // filter_control,
  72. kernel_size=(1, 1),
  73. strides=(1, 1),
  74. padding='same',
  75. kernel_initializer='he_normal',
  76. activation='relu',
  77. kernel_regularizer=l1_l2(0.0001))(stem_merge1)
  78. stem_conv6_3x3 = Conv2D(name="stem_conv6_3x3/1",
  79. filters=96 // filter_control,
  80. kernel_size=(3, 3),
  81. strides=(1, 1),
  82. kernel_initializer='he_normal',
  83. activation='relu',
  84. kernel_regularizer=l1_l2(0.0001))(stem_conv5_1x1)
  85. stem_conv7_1x1 = Conv2D(name="stem_conv7_1x1/1",
  86. filters=64 // filter_control,
  87. kernel_size=(1, 1),
  88. strides=(1, 1),
  89. padding='same',
  90. kernel_initializer='he_normal',
  91. activation='relu',
  92. kernel_regularizer=l1_l2(0.0001))(stem_merge1)
  93. stem_conv8_7x1 = Conv2D(name="stem_conv8_7x1/1",
  94. filters=64 // filter_control,
  95. kernel_size=(7, 1),
  96. strides=(1, 1),
  97. padding='same',
  98. kernel_initializer='he_normal',
  99. activation='relu',
  100. kernel_regularizer=l1_l2(0.0001))(stem_conv7_1x1)
  101. stem_conv9_1x7 = Conv2D(name="stem_conv8_1x7/1",
  102. filters=64 // filter_control,
  103. kernel_size=(1, 7),
  104. strides=(1, 1),
  105. padding='same',
  106. kernel_initializer='he_normal',
  107. activation='relu',
  108. kernel_regularizer=l1_l2(0.0001))(stem_conv8_7x1)
  109. stem_conv10_3x3 = Conv2D(name="stem_conv10_3x3/1",
  110. filters=96 // filter_control,
  111. kernel_size=(3, 3),
  112. strides=(1, 1),
  113. padding='valid',
  114. kernel_initializer='he_normal',
  115. activation='relu',
  116. kernel_regularizer=l1_l2(0.0001))(stem_conv9_1x7)
  117. stem_merge2 = concatenate([stem_conv6_3x3, stem_conv10_3x3])
  118. stem_pool2_3x3 = MaxPooling2D(name="stem_pool2_3x3/2",
  119. pool_size=(3, 3),
  120. strides=strides,
  121. padding='valid')(stem_merge2)
  122. stem_conv11_3x3 = Conv2D(name="stem_conv11_3x3/2",
  123. filters=192 // filter_control,
  124. kernel_size=(3, 3),
  125. strides=strides,
  126. padding='valid',
  127. kernel_initializer='he_normal',
  128. activation='relu',
  129. kernel_regularizer=l1_l2(0.0001))(stem_merge2)
  130. stem_merge3 = concatenate([stem_pool2_3x3, stem_conv11_3x3])
  131. return bn_relu(stem_merge3)
  132. def inception_resnet_v2_A(i, input):
  133. # 输入是一个ReLU激活
  134. init = input
  135. inception_A_conv1_1x1 = Conv2D(name="inception_A_conv1_1x1/1" + i,
  136. filters=32 // filter_control,
  137. kernel_size=(1, 1),
  138. strides=(1, 1),
  139. padding='same',
  140. kernel_initializer='he_normal',
  141. activation='relu',
  142. kernel_regularizer=l1_l2(0.0001))(input)
  143. inception_A_conv2_1x1 = Conv2D(name="inception_A_conv2_1x1/1" + i,
  144. filters=32 // filter_control,
  145. kernel_size=(1, 1),
  146. strides=(1, 1),
  147. padding='same',
  148. kernel_initializer='he_normal',
  149. activation='relu',
  150. kernel_regularizer=l1_l2(0.0001))(input)
  151. inception_A_conv3_3x3 = Conv2D(name="inception_A_conv3_3x3/1" + i,
  152. filters=32 // filter_control,
  153. kernel_size=(3, 3),
  154. strides=(1, 1),
  155. padding='same',
  156. kernel_initializer='he_normal',
  157. activation='relu',
  158. kernel_regularizer=l1_l2(0.0001))(inception_A_conv2_1x1)
  159. inception_A_conv4_1x1 = Conv2D(name="inception_A_conv4_1x1/1" + i,
  160. filters=32 // filter_control,
  161. kernel_size=(1, 1),
  162. strides=(1, 1),
  163. padding='same',
  164. kernel_initializer='he_normal',
  165. activation='relu',
  166. kernel_regularizer=l1_l2(0.0001))(input)
  167. inception_A_conv5_3x3 = Conv2D(name="inception_A_conv5_3x3/1" + i,
  168. filters=48 // filter_control,
  169. kernel_size=(3, 3),
  170. strides=(1, 1),
  171. padding='same',
  172. kernel_initializer='he_normal',
  173. activation='relu',
  174. kernel_regularizer=l1_l2(0.0001))(inception_A_conv4_1x1)
  175. inception_A_conv6_3x3 = Conv2D(name="inception_A_conv6_3x3/1" + i,
  176. filters=64 // filter_control,
  177. kernel_size=(3, 3),
  178. strides=(1, 1),
  179. padding='same',
  180. kernel_initializer='he_normal',
  181. activation='relu',
  182. kernel_regularizer=l1_l2(0.0001))(inception_A_conv5_3x3)
  183. inception_merge1 = concatenate([inception_A_conv1_1x1, inception_A_conv3_3x3, inception_A_conv6_3x3])
  184. inception_A_conv7_1x1 = Conv2D(name="inception_A_conv7_1x1/1" + i,
  185. filters=384 // filter_control,
  186. kernel_size=(1, 1),
  187. strides=(1, 1),
  188. padding='same',
  189. activation='linear')(inception_merge1)
  190. out = add([input, inception_A_conv7_1x1])
  191. return bn_relu(out)
  192. def inception_resnet_v2_B(i, input):
  193. # 输入是一个ReLU激活
  194. init = input
  195. inception_B_conv1_1x1 = Conv2D(name="inception_B_conv1_1x1/1" + i,
  196. filters=192 // filter_control,
  197. kernel_size=(1, 1),
  198. strides=(1, 1),
  199. padding='same',
  200. activation='relu')(input)
  201. inception_B_conv2_1x1 = Conv2D(name="inception_B_conv2_1x1/1" + i,
  202. filters=128 // filter_control,
  203. kernel_size=(1, 1),
  204. strides=(1, 1),
  205. padding='same',
  206. activation='relu')(input)
  207. inception_B_conv3_1x7 = Conv2D(name="inception_B_conv3_1x7/1" + i,
  208. filters=160 // filter_control,
  209. kernel_size=(1, 7),
  210. strides=(1, 1),
  211. padding='same',
  212. activation='relu')(inception_B_conv2_1x1)
  213. inception_B_conv4_7x1 = Conv2D(name="inception_B_conv4_7x1/1" + i,
  214. filters=192 // filter_control,
  215. kernel_size=(7, 1),
  216. strides=(1, 1),
  217. padding='same',
  218. activation='relu')(inception_B_conv3_1x7)
  219. inception_B_merge = concatenate([inception_B_conv1_1x1, inception_B_conv4_7x1])
  220. inception_B_conv7_1x1 = Conv2D(name="inception_B_conv7_1x1/1" + i,
  221. filters=1154 // filter_control,
  222. kernel_size=(1, 1),
  223. strides=(1, 1),
  224. padding='same',
  225. activation='linear')(inception_B_merge)
  226. out = add([input, inception_B_conv7_1x1])
  227. return bn_relu(out)
  228. def inception_resnet_v2_C(i, input):
  229. # 输入是一个ReLU激活
  230. inception_C_conv1_1x1 = Conv2D(name="inception_C_conv1_1x1/1" + i,
  231. filters=192 // filter_control,
  232. kernel_size=(1, 1),
  233. strides=(1, 1),
  234. padding='same',
  235. activation='relu')(input)
  236. inception_C_conv2_1x1 = Conv2D(name="inception_C_conv2_1x1/1" + i,
  237. filters=192 // filter_control,
  238. kernel_size=(1, 1),
  239. strides=(1, 1),
  240. padding='same',
  241. activation='relu')(input)
  242. inception_C_conv3_1x3 = Conv2D(name="inception_C_conv3_1x3/1" + i,
  243. filters=224 // filter_control,
  244. kernel_size=(1, 3),
  245. strides=(1, 1),
  246. padding='same',
  247. activation='relu')(inception_C_conv2_1x1)
  248. inception_C_conv3_3x1 = Conv2D(name="inception_C_conv3_3x1/1" + i,
  249. filters=256 // filter_control,
  250. kernel_size=(3, 1),
  251. strides=(1, 1),
  252. padding='same',
  253. activation='relu')(inception_C_conv3_1x3)
  254. ir_merge = concatenate([inception_C_conv1_1x1, inception_C_conv3_3x1])
  255. inception_C_conv4_1x1 = Conv2D(name="inception_C_conv4_1x1/1" + i,
  256. filters=2048 // filter_control,
  257. kernel_size=(1, 1),
  258. strides=(1, 1),
  259. padding='same',
  260. activation='linear')(ir_merge)
  261. out = add([input, inception_C_conv4_1x1])
  262. return bn_relu(out)
  263. def reduction_A(input, k=192, l=224, m=256, n=384):
  264. pool_size = (3, 3)
  265. strides = (2, 2)
  266. reduction_A_pool1 = MaxPooling2D(name="reduction_A_pool1/2",
  267. pool_size=pool_size,
  268. strides=strides,
  269. padding='valid')(input)
  270. reduction_A_conv1_3x3 = Conv2D(name="reduction_A_conv1_3x3/1",
  271. filters=n // filter_control,
  272. kernel_size=pool_size,
  273. strides=strides,
  274. activation='relu')(input)
  275. reduction_A_conv2_1x1 = Conv2D(name="reduction_A_conv2_1x1/1",
  276. filters=k // filter_control,
  277. kernel_size=(1, 1),
  278. strides=(1, 1),
  279. padding='same',
  280. activation='relu')(input)
  281. reduction_A_conv2_3x3 = Conv2D(name="reduction_A_conv2_3x3/1",
  282. filters=l // filter_control,
  283. kernel_size=(3, 3),
  284. strides=(1, 1),
  285. padding='same',
  286. activation='relu')(reduction_A_conv2_1x1)
  287. reduction_A_conv3_3x3 = Conv2D(name="reduction_A_conv3_3x3/1",
  288. filters=m // filter_control,
  289. kernel_size=pool_size,
  290. strides=strides,
  291. activation='relu')(reduction_A_conv2_3x3)
  292. reduction_A_merge = concatenate([reduction_A_pool1, reduction_A_conv1_3x3, reduction_A_conv3_3x3])
  293. return reduction_A_merge
  294. def reduction_B(input):
  295. pool_size = (3, 3)
  296. strides = (2, 2)
  297. reduction_B_pool1 = MaxPooling2D(name="reduction_B_pool1/2",
  298. pool_size=pool_size,
  299. strides=strides,
  300. padding='valid')(input)
  301. reduction_B_conv1_1x1 = Conv2D(name="reduction_B_conv3_3x3/1",
  302. filters=256 // filter_control,
  303. kernel_size=(1, 1),
  304. strides=(1, 1),
  305. padding='same',
  306. activation='relu')(input)
  307. reduction_B_conv2_3x3 = Conv2D(name="reduction_B_conv2_3x3/1",
  308. filters=288 // filter_control,
  309. kernel_size=pool_size,
  310. strides=strides,
  311. activation='relu')(reduction_B_conv1_1x1)
  312. reduction_B_conv3_1x1 = Conv2D(name="reduction_B_conv3_1x1/1",
  313. filters=256 // filter_control,
  314. kernel_size=(1, 1),
  315. strides=(1, 1),
  316. padding='same',
  317. activation='relu')(input)
  318. reduction_B_conv4_3x3 = Conv2D(name="reduction_B_conv4_3x3/1",
  319. filters=288 // filter_control,
  320. kernel_size=pool_size,
  321. strides=strides,
  322. activation='relu')(reduction_B_conv3_1x1)
  323. reduction_B_conv5_1x1 = Conv2D(name="reduction_B_conv5_1x1/1",
  324. filters=256 // filter_control,
  325. kernel_size=(1, 1),
  326. strides=(1, 1),
  327. padding='same',
  328. activation='relu')(input)
  329. reduction_B_conv5_3x3 = Conv2D(name="reduction_B_conv5_3x3/1",
  330. filters=288 // filter_control,
  331. kernel_size=(3, 3),
  332. strides=(1, 1),
  333. padding='same',
  334. activation='relu')(reduction_B_conv5_1x1)
  335. reduction_B_conv6_3x3 = Conv2D(name="reduction_B_conv6_3x3/1",
  336. filters=320 // filter_control,
  337. kernel_size=pool_size,
  338. strides=strides,
  339. activation='relu')(reduction_B_conv5_3x3)
  340. reduction_B_merge = concatenate(
  341. [reduction_B_pool1, reduction_B_conv2_3x3, reduction_B_conv4_3x3, reduction_B_conv6_3x3])
  342. return reduction_B_merge
  343. def create_inception_resnet_v2(input_shape, nb_classes=10, small_mode=False):
  344. input_layer = Input(input_shape)
  345. x = inception_resnet_stem(input_layer, small_mode)
  346. # 10 x Inception Resnet A
  347. for i in range(10):
  348. x = inception_resnet_v2_A(str(i), x)
  349. # Reduction A
  350. x = reduction_A(x, k=256, l=256, m=384, n=384)
  351. # 20 x Inception Resnet B
  352. for i in range(20):
  353. x = inception_resnet_v2_B(str(i), x)
  354. # 对32*32*3的数据可以更改pooling层
  355. aout = AveragePooling2D((5, 5), strides=(3, 3))(x)
  356. aout = Conv2D(name="conv1_1x1/1",
  357. filters=128,
  358. kernel_size=(1, 1),
  359. strides=(1, 1),
  360. padding='same',
  361. activation='relu')(aout)
  362. aout = Conv2D(name="conv1_5x5/1",
  363. filters=768,
  364. kernel_size=(5, 5),
  365. strides=(1, 1),
  366. padding='same',
  367. activation='relu')(aout)
  368. aout = Flatten()(aout)
  369. aout = Dense(nb_classes, activation='softmax')(aout)
  370. # Reduction Resnet B
  371. x = reduction_B(x)
  372. # 10 x Inception Resnet C
  373. for i in range(10):
  374. x = inception_resnet_v2_C(str(i), x)
  375. # 需要视情况更改
  376. x = AveragePooling2D((4, 4), strides=(1, 1))(x)
  377. # Dropout
  378. x = Dropout(0.8)(x)
  379. x = Flatten()(x)
  380. # Output
  381. out = Dense(output_dim=nb_classes, activation='softmax')(x)
  382. # 简单起见去掉附加目标函数
  383. # model = Model(input_layer, output=[out, aout], name='Inception-Resnet-v2')
  384. model = Model(input_layer, output=out, name='Inception-Resnet-v2')
  385. return model
  386. if __name__ == "__main__":
  387. with tf.device('/gpu:3'):
  388. gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=1, allow_growth=True)
  389. os.environ["CUDA_VISIBLE_DEVICES"] = "3"
  390. tf.Session(config=K.tf.ConfigProto(allow_soft_placement=True,
  391. log_device_placement=True,
  392. gpu_options=gpu_options))
  393. (x_train, y_train), (x_test, y_test) = cifar10.load_data()
  394. # reorder dimensions for tensorflow
  395. x_train = np.transpose(x_train.astype('float32') / 255., (0, 1, 2, 3))
  396. x_test = np.transpose(x_test.astype('float32') / 255., (0, 1, 2, 3))
  397. print('x_train shape:', x_train.shape)
  398. print(x_train.shape[0], 'train samples')
  399. print(x_test.shape[0], 'test samples')
  400. # convert class vectors to binary class matrices
  401. y_train = np_utils.to_categorical(y_train)
  402. y_test = np_utils.to_categorical(y_test)
  403. s = x_train.shape[1:]
  404. batch_size = 128
  405. nb_epoch = 10
  406. nb_classes = 10
  407. model = create_inception_resnet_v2(s, nb_classes, False, True)
  408. model.summary()
  409. plot_model(model, to_file="GoogLeNet-Inception-Resnet-V2.jpg", show_shapes=True)
  410. model.compile(optimizer='adadelta',
  411. loss='categorical_crossentropy',
  412. metrics=['accuracy'])
  413. # Model saving callback
  414. checkpointer = ModelCheckpoint("weights-improvement-{epoch:02d}-{val_acc:.2f}.hdf5", monitor='val_loss',
  415. verbose=0,
  416. save_best_only=False, save_weights_only=False, mode='auto')
  417. print('Using real-time data augmentation.')
  418. datagen_train = ImageDataGenerator(
  419. featurewise_center=False,
  420. samplewise_center=False,
  421. featurewise_std_normalization=False,
  422. samplewise_std_normalization=False,
  423. zca_whitening=False,
  424. rotation_range=0,
  425. width_shift_range=0.125,
  426. height_shift_range=0.125,
  427. horizontal_flip=True,
  428. vertical_flip=False)
  429. datagen_train.fit(x_train)
  430. history = model.fit_generator(datagen_train.flow(x_train, y_train, batch_size=batch_size, shuffle=True),
  431. samples_per_epoch=x_train.shape[0],
  432. nb_epoch=nb_epoch, verbose=1,
  433. validation_data=(x_test, y_test),
  434. callbacks=[lr_reducer, early_stopper, csv_logger, checkpointer])

5.16 模型可视化

5.16.1 一些说明

神经网络本身包含了一系列特征提取器,理想的feature map应该是稀疏的以及包含典型的局部信息,通过模型可视化能有一些直观的认识并帮助我们调试模型,比如:feature map与原图很接近,说明它没有学到什么特征;或者它几乎是一个纯色的图,说明它太过稀疏,可能是我们feature map数太多了。可视化有很多种,比如:feature map可视化、权重可视化等等,我以feature map可视化为例。

利用keras,采用在imagenet 1000分类的数据集上预训练好的googLeNet inception v3做实验,以下面两张图作为输入。

从左往右看,可以看到整个特征提取的过程,有的分离背景、有的提取轮廓,有的提取色差,但也能发现10、11层中间两个feature map是纯色的,可能这一层feature map数有点多了,另外北汽绅宝D50的光晕对feature map中光晕的影响也能比较明显看到。

5.16.2 代码实践


  1. # -*- coding: utf-8 -*-
  2. from keras.applications import InceptionV3
  3. from keras.applications.inception_v3 import preprocess_input
  4. from keras.preprocessing import image
  5. from keras.models import Model
  6. from keras.applications.imagenet_utils import decode_predictions
  7. import numpy as np
  8. import cv2
  9. from cv2 import *
  10. import matplotlib.pyplot as plt
  11. import scipy as sp
  12. from scipy.misc import toimage
  13. def test_opencv():
  14. # 加载摄像头
  15. cam = VideoCapture(0) # 0 -> 摄像头序号,如果有两个三个四个摄像头,要调用哪一个数字往上加嘛
  16. # 抓拍 5 张小图片
  17. for x in range(0, 5):
  18. s, img = cam.read()
  19. if s:
  20. imwrite("o-" + str(x) + ".jpg", img)
  21. def load_original(img_path):
  22. # 把原始图片压缩为 299*299大小
  23. im_original = cv2.resize(cv2.imread(img_path), (299, 299))
  24. im_converted = cv2.cvtColor(im_original, cv2.COLOR_BGR2RGB)
  25. plt.figure(0)
  26. plt.subplot(211)
  27. plt.imshow(im_converted)
  28. return im_original
  29. def load_fine_tune_googlenet_v3(img):
  30. # 加载fine-tuning googlenet v3模型,并做预测
  31. model = InceptionV3(include_top=True, weights='imagenet')
  32. model.summary()
  33. x = image.img_to_array(img)
  34. x = np.expand_dims(x, axis=0)
  35. x = preprocess_input(x)
  36. preds = model.predict(x)
  37. print('Predicted:', decode_predictions(preds))
  38. plt.subplot(212)
  39. plt.plot(preds.ravel())
  40. plt.show()
  41. return model, x
  42. def extract_features(ins, layer_id, filters, layer_num):
  43. '''
  44. 提取指定模型指定层指定数目的feature map并输出到一幅图上.
  45. :param ins: 模型实例
  46. :param layer_id: 提取指定层特征
  47. :param filters: 每层提取的feature map数
  48. :param layer_num: 一共提取多少层feature map
  49. :return: None
  50. '''
  51. if len(ins) != 2:
  52. print('parameter error:(model, instance)')
  53. return None
  54. model = ins[0]
  55. x = ins[1]
  56. if type(layer_id) == type(1):
  57. model_extractfeatures = Model(input=model.input, output=model.get_layer(index=layer_id).output)
  58. else:
  59. model_extractfeatures = Model(input=model.input, output=model.get_layer(name=layer_id).output)
  60. fc2_features = model_extractfeatures.predict(x)
  61. if filters > len(fc2_features[0][0][0]):
  62. print('layer number error.', len(fc2_features[0][0][0]),',',filters)
  63. return None
  64. for i in range(filters):
  65. plt.subplots_adjust(left=0, right=1, bottom=0, top=1)
  66. plt.subplot(filters, layer_num, layer_id + 1 + i * layer_num)
  67. plt.axis("off")
  68. if i < len(fc2_features[0][0][0]):
  69. plt.imshow(fc2_features[0, :, :, i])
  70. # 层数、模型、卷积核数
  71. def extract_features_batch(layer_num, model, filters):
  72. '''
  73. 批量提取特征
  74. :param layer_num: 层数
  75. :param model: 模型
  76. :param filters: feature map数
  77. :return: None
  78. '''
  79. plt.figure(figsize=(filters, layer_num))
  80. plt.subplot(filters, layer_num, 1)
  81. for i in range(layer_num):
  82. extract_features(model, i, filters, layer_num)
  83. plt.savefig('sample.jpg')
  84. plt.show()
  85. def extract_features_with_layers(layers_extract):
  86. '''
  87. 提取hypercolumn并可视化.
  88. :param layers_extract: 指定层列表
  89. :return: None
  90. '''
  91. hc = extract_hypercolumn(x[0], layers_extract, x[1])
  92. ave = np.average(hc.transpose(1, 2, 0), axis=2)
  93. plt.imshow(ave)
  94. plt.show()
  95. def extract_hypercolumn(model, layer_indexes, instance):
  96. '''
  97. 提取指定模型指定层的hypercolumn向量
  98. :param model: 模型
  99. :param layer_indexes: 层id
  100. :param instance: 模型
  101. :return:
  102. '''
  103. feature_maps = []
  104. for i in layer_indexes:
  105. feature_maps.append(Model(input=model.input, output=model.get_layer(index=i).output).predict(instance))
  106. hypercolumns = []
  107. for convmap in feature_maps:
  108. for i in convmap[0][0][0]:
  109. upscaled = sp.misc.imresize(convmap[0, :, :, i], size=(299, 299), mode="F", interp='bilinear')
  110. hypercolumns.append(upscaled)
  111. return np.asarray(hypercolumns)
  112. if __name__ == '__main__':
  113. img_path = 'd:\car3.jpg'
  114. img = load_original(img_path)
  115. x = load_fine_tune_googlenet_v3(img)
  116. extract_features_batch(15, x, 3)
  117. extract_features_with_layers([1, 4, 7])
  118. extract_features_with_layers([1, 4, 7, 10, 11, 14, 17])

6. 循环神经网络(待填坑)

6.1 RNN

6.1.1 解决的问题

6.1.2 基本结构

6.1.3 BPTT

6.1.4 模型缺点

6.1.5 代码实践

6.2 LSTM

6.2.1 解决的问题

6.2.2 基本结构

6.2.3 模型缺点

6.2.4 代码实践

6.3 Sequence to Sequence应用

7. 对抗神经网络(待填坑)

7.1 GANs

7.2 Wasserstein GAN

7.3 代码实践

8. 目标检测


针对上面的问题,一种解决方案是借鉴启发式搜索的方法,充分利用人类的先验知识。J.R.R. Uijlings在《Selective Search for Object Recoginition》提出一种方法:基于数据驱动,与具体类别无关的多种策略融合的启发式生成方法。图片包含各种丰富信息,例如:大小、形状、颜色、纹理、物体重叠关系等,如果只使用一种信息往往不能解决大部分问题,例如:


8.1.1 启发式生成设计准则


基于以上准则设计Selective Search算法:


8.1.3 使用Selective Search做目标识别


检测定位效果评价采用Average Best Overlap(ABO)和Mean Average Best Overlap(MABO):

其中:为类别标注、为类别下的ground truth,为通过Selective Search生成的候选框。

8.1.4 代码实践


  1. # -*- coding: utf-8 -*-
  2. import skimage.io
  3. import skimage.feature
  4. import skimage.color
  5. import skimage.transform
  6. import skimage.util
  7. import skimage.segmentation
  8. import numpy
  9. # "Selective Search for Object Recognition" by J.R.R. Uijlings et al.
  10. #
  11. # - Modified version with LBP extractor for texture vectorization
  12. def _generate_segments(im_orig, scale, sigma, min_size):
  13. """
  14. segment smallest regions by the algorithm of Felzenswalb and
  15. Huttenlocher
  16. """
  17. # open the Image
  18. im_mask = skimage.segmentation.felzenszwalb(
  19. skimage.util.img_as_float(im_orig), scale=scale, sigma=sigma,
  20. min_size=min_size)
  21. # merge mask channel to the image as a 4th channel
  22. im_orig = numpy.append(
  23. im_orig, numpy.zeros(im_orig.shape[:2])[:, :, numpy.newaxis], axis=2)
  24. im_orig[:, :, 3] = im_mask
  25. return im_orig
  26. def _sim_colour(r1, r2):
  27. """
  28. calculate the sum of histogram intersection of colour
  29. """
  30. return sum([min(a, b) for a, b in zip(r1["hist_c"], r2["hist_c"])])
  31. def _sim_texture(r1, r2):
  32. """
  33. calculate the sum of histogram intersection of texture
  34. """
  35. return sum([min(a, b) for a, b in zip(r1["hist_t"], r2["hist_t"])])
  36. def _sim_size(r1, r2, imsize):
  37. """
  38. calculate the size similarity over the image
  39. """
  40. return 1.0 - (r1["size"] + r2["size"]) / imsize
  41. def _sim_fill(r1, r2, imsize):
  42. """
  43. calculate the fill similarity over the image
  44. """
  45. bbsize = (
  46. (max(r1["max_x"], r2["max_x"]) - min(r1["min_x"], r2["min_x"]))
  47. * (max(r1["max_y"], r2["max_y"]) - min(r1["min_y"], r2["min_y"]))
  48. )
  49. return 1.0 - (bbsize - r1["size"] - r2["size"]) / imsize
  50. def _calc_sim(r1, r2, imsize):
  51. return (_sim_colour(r1, r2) + _sim_texture(r1, r2)
  52. + _sim_size(r1, r2, imsize) + _sim_fill(r1, r2, imsize))
  53. def _calc_colour_hist(img):
  54. """
  55. calculate colour histogram for each region
  56. the size of output histogram will be BINS * COLOUR_CHANNELS(3)
  57. number of bins is 25 as same as [uijlings_ijcv2013_draft.pdf]
  58. extract HSV
  59. """
  60. BINS = 25
  61. hist = numpy.array([])
  62. for colour_channel in (0, 1, 2):
  63. # extracting one colour channel
  64. c = img[:, colour_channel]
  65. # calculate histogram for each colour and join to the result
  66. hist = numpy.concatenate(
  67. [hist] + [numpy.histogram(c, BINS, (0.0, 255.0))[0]])
  68. # L1 normalize
  69. hist = hist / len(img)
  70. return hist
  71. def _calc_texture_gradient(img):
  72. """
  73. calculate texture gradient for entire image
  74. The original SelectiveSearch algorithm proposed Gaussian derivative
  75. for 8 orientations, but we use LBP instead.
  76. output will be [height(*)][width(*)]
  77. """
  78. ret = numpy.zeros((img.shape[0], img.shape[1], img.shape[2]))
  79. for colour_channel in (0, 1, 2):
  80. ret[:, :, colour_channel] = skimage.feature.local_binary_pattern(
  81. img[:, :, colour_channel], 8, 1.0)
  82. return ret
  83. def _calc_texture_hist(img):
  84. """
  85. calculate texture histogram for each region
  86. calculate the histogram of gradient for each colours
  87. the size of output histogram will be
  89. """
  90. BINS = 10
  91. hist = numpy.array([])
  92. for colour_channel in (0, 1, 2):
  93. # mask by the colour channel
  94. fd = img[:, colour_channel]
  95. # calculate histogram for each orientation and concatenate them all
  96. # and join to the result
  97. hist = numpy.concatenate(
  98. [hist] + [numpy.histogram(fd, BINS, (0.0, 1.0))[0]])
  99. # L1 Normalize
  100. hist = hist / len(img)
  101. return hist
  102. def _extract_regions(img):
  103. R = {}
  104. # get hsv image
  105. hsv = skimage.color.rgb2hsv(img[:, :, :3])
  106. # pass 1: count pixel positions
  107. for y, i in enumerate(img):
  108. for x, (r, g, b, l) in enumerate(i):
  109. # initialize a new region
  110. if l not in R:
  111. R[l] = {
  112. "min_x": 0xffff, "min_y": 0xffff,
  113. "max_x": 0, "max_y": 0, "labels": [l]}
  114. # bounding box
  115. if R[l]["min_x"] > x:
  116. R[l]["min_x"] = x
  117. if R[l]["min_y"] > y:
  118. R[l]["min_y"] = y
  119. if R[l]["max_x"] < x:
  120. R[l]["max_x"] = x
  121. if R[l]["max_y"] < y:
  122. R[l]["max_y"] = y
  123. # pass 2: calculate texture gradient
  124. tex_grad = _calc_texture_gradient(img)
  125. # pass 3: calculate colour histogram of each region
  126. for k, v in R.items():
  127. # colour histogram
  128. masked_pixels = hsv[:, :, :][img[:, :, 3] == k]
  129. R[k]["size"] = len(masked_pixels / 4)
  130. R[k]["hist_c"] = _calc_colour_hist(masked_pixels)
  131. # texture histogram
  132. R[k]["hist_t"] = _calc_texture_hist(tex_grad[:, :][img[:, :, 3] == k])
  133. return R
  134. def _extract_neighbours(regions):
  135. def intersect(a, b):
  136. if (a["min_x"] < b["min_x"] < a["max_x"]
  137. and a["min_y"] < b["min_y"] < a["max_y"]) or (
  138. a["min_x"] < b["max_x"] < a["max_x"]
  139. and a["min_y"] < b["max_y"] < a["max_y"]) or (
  140. a["min_x"] < b["min_x"] < a["max_x"]
  141. and a["min_y"] < b["max_y"] < a["max_y"]) or (
  142. a["min_x"] < b["max_x"] < a["max_x"]
  143. and a["min_y"] < b["min_y"] < a["max_y"]):
  144. return True
  145. return False
  146. R = regions.items()
  147. neighbours = []
  148. for cur, a in enumerate(R[:-1]):
  149. for b in R[cur + 1:]:
  150. if intersect(a[1], b[1]):
  151. neighbours.append((a, b))
  152. return neighbours
  153. def _merge_regions(r1, r2):
  154. new_size = r1["size"] + r2["size"]
  155. rt = {
  156. "min_x": min(r1["min_x"], r2["min_x"]),
  157. "min_y": min(r1["min_y"], r2["min_y"]),
  158. "max_x": max(r1["max_x"], r2["max_x"]),
  159. "max_y": max(r1["max_y"], r2["max_y"]),
  160. "size": new_size,
  161. "hist_c": (
  162. r1["hist_c"] * r1["size"] + r2["hist_c"] * r2["size"]) / new_size,
  163. "hist_t": (
  164. r1["hist_t"] * r1["size"] + r2["hist_t"] * r2["size"]) / new_size,
  165. "labels": r1["labels"] + r2["labels"]
  166. }
  167. return rt
  168. def selective_search(
  169. im_orig, scale=1.0, sigma=0.8, min_size=50):
  170. '''Selective Search
  171. Parameters
  172. ----------
  173. im_orig : ndarray
  174. Input image
  175. scale : int
  176. Free parameter. Higher means larger clusters in felzenszwalb segmentation.
  177. sigma : float
  178. Width of Gaussian kernel for felzenszwalb segmentation.
  179. min_size : int
  180. Minimum component size for felzenszwalb segmentation.
  181. Returns
  182. -------
  183. img : ndarray
  184. image with region label
  185. region label is stored in the 4th value of each pixel [r,g,b,(region)]
  186. regions : array of dict
  187. [
  188. {
  189. 'rect': (left, top, right, bottom),
  190. 'labels': [...]
  191. },
  192. ...
  193. ]
  194. '''
  195. assert im_orig.shape[2] == 3, "3ch image is expected"
  196. # load image and get smallest regions
  197. # region label is stored in the 4th value of each pixel [r,g,b,(region)]
  198. img = _generate_segments(im_orig, scale, sigma, min_size)
  199. if img is None:
  200. return None, {}
  201. imsize = img.shape[0] * img.shape[1]
  202. R = _extract_regions(img)
  203. # extract neighbouring information
  204. neighbours = _extract_neighbours(R)
  205. # calculate initial similarities
  206. S = {}
  207. for (ai, ar), (bi, br) in neighbours:
  208. S[(ai, bi)] = _calc_sim(ar, br, imsize)
  209. # hierarchal search
  210. while S != {}:
  211. # get highest similarity
  212. i, j = sorted(S.items(), cmp=lambda a, b: cmp(a[1], b[1]))[-1][0]
  213. # merge corresponding regions
  214. t = max(R.keys()) + 1.0
  215. R[t] = _merge_regions(R[i], R[j])
  216. # mark similarities for regions to be removed
  217. key_to_delete = []
  218. for k, v in S.items():
  219. if (i in k) or (j in k):
  220. key_to_delete.append(k)
  221. # remove old similarities of related regions
  222. for k in key_to_delete:
  223. del S[k]
  224. # calculate similarity set with the new region
  225. for k in filter(lambda a: a != (i, j), key_to_delete):
  226. n = k[1] if k[0] in (i, j) else k[0]
  227. S[(t, n)] = _calc_sim(R[t], R[n], imsize)
  228. regions = []
  229. for k, r in R.items():
  230. regions.append({
  231. 'rect': (
  232. r['min_x'], r['min_y'],
  233. r['max_x'] - r['min_x'], r['max_y'] - r['min_y']),
  234. 'size': r['size'],
  235. 'labels': r['labels']
  236. })
  237. return img, regions
  1. # -*- coding: utf-8 -*-
  2. import matplotlib
  3. matplotlib.use("Agg")
  4. import matplotlib.pyplot as plt
  5. import skimage.data
  6. import skimage.io
  7. from skimage.io import use_plugin,imread
  8. import matplotlib.patches as mpatches
  9. from matplotlib.pyplot import savefig
  10. import selectivesearch
  11. def main():
  12. # loading astronaut image
  13. #img = skimage.data.astronaut()
  14. use_plugin('pil')
  15. img = imread('car.jpg', as_grey=False)
  16. # perform selective search
  17. img_lbl, regions = selectivesearch.selective_search(
  18. img, scale=500, sigma=0.9, min_size=10)
  19. candidates = set()
  20. for r in regions:
  21. # excluding same rectangle (with different segments)
  22. if r['rect'] in candidates:
  23. continue
  24. # excluding regions smaller than 2000 pixels
  25. if r['size'] < 2000:
  26. continue
  27. # distorted rects
  28. x, y, w, h = r['rect']
  29. if w / h > 1.2 or h / w > 1.2:
  30. continue
  31. candidates.add(r['rect'])
  32. # draw rectangles on the original image
  33. plt.figure()
  34. fig, ax = plt.subplots(ncols=1, nrows=1, figsize=(6, 6))
  35. ax.imshow(img)
  36. for x, y, w, h in candidates:
  37. print x, y, w, h
  38. rect = mpatches.Rectangle(
  39. (x, y), w, h, fill=False, edgecolor='red', linewidth=1)
  40. ax.add_patch(rect)
  41. #plt.show()
  42. savefig('MyFig.jpg')
  43. if __name__ == "__main__":
  44. main()



8.2 OverFeat

计算机视觉有三大任务:分类(识别)、定位、检测,从左到右每个任务是下个任务的子任务,所以难度递增。OverFeat是2014年《OverFeat:Integrated Recognition, Localization and Detection using Convolutional Networks》中提出的一个基于卷积神经网络的特征提取框架,论文的最大亮点在于通过一个统一的框架去解决图像分类、定位、检测问题,并提出feature map上的一个点可以还原并对应到原图的一个区域,于是一些在原图上的操作可以转到在feature map上做,这点对以后的检测算法有较深远的影响。它在ImageNet 2013的task 3定位任务中获得第一,在检测和分类任务中也有不错的表现。

8.2.1 OverFeat分类任务



前两层使用较小的stride,从而产生较大的feature map,提高了模型精度

a图代表经过第5个卷积层后的feature map有20个神经元,选取stride=3做非重叠pooling,有以下3种方式:(通常我们只使用第一种)


在二维情况下,输入图像在经过FCN及第5个卷积层后得到若干个feature map,使用3x3 filter在feature map上做滑动窗口(注意此时不在原图上做,节省大量计算消耗)。按上图的原理,滑动窗口总共要做9次,从(0,0), (0,1), (0,2), (1,0), (1,1), (1,2), (2,0), (2,1), (2,2)处分别滑动。得到的feature map分别经过后面的3个FC层,得到多组特征,最后拼接起来得到最终特征向量并用于分类。

绿色代表卷积核,蓝色代表feature map,当输入大于规定尺寸时,在黄色区域会有额外计算,最终的输出也不是一个值而是一个矩阵,可以用各种策略输出最终结果,比如一种简单做法是用矩阵平均值作为最终分类结果。

8.2.2 OverFeat定位任务

第5层pooling结果作为输入,共256个通道,以FCN的思想理解,先走一个4096通道的全连接层再走一个1024通道的全连接层,与前面类似使用Offet Pooing和滑动窗口对每类生成一个4通道矩阵,4个通道分别代表BB的四条边的坐标。

8.2.3 OverFeat检测任务


8.2.4 代码实践


8.3 R-CNN

过去若干年,目标检测使用的都是滑动窗口的方式,这种方式计算效率较差,另外以往CNN在ImageNet比赛分类问题的表现更加突出,如何利用这些成果以及ImageNet的大量训练数据去借力打力也是一个值得研究的课题。R-CNN由Ross Girshick等人在《Rich feature hierarchies for accurate object detection and semantic segmentation》中提出,OverFeat从某种程度可以看做R-CNN的特例,R-CNN在图像检测领域有很大的影响力,该算法的亮点在于:使用Selective Search代替传统滑动窗口方式生成候选框并使用CNN提取特征;把分类和回归方法同时应用在检测中;当训练数据不足时,通过预训练利用领域数据(知识)做transfer learning,在对象数据集上再应用fine-tuning继续训练。

8.3.1 IoU

IoU(intersection over union),是用来衡量Bounding Box定位精度的指标,它的定义类似Jaccard距离,假设A为人工标定的BB,B为预测的BB则:

8.3.2 NMS

NMS(non-maximum suppression)在目标检测中用来依据置信度消除重叠度过高的重复候选框,从而提高检测算法效率。



代码可参考:Non-Maximum Suppression for Object Detection in Python

  1. # import the necessary packages
  2. import numpy as np
  3. # Felzenszwalb et al.
  4. def non_max_suppression_slow(boxes, overlapThresh):
  5. # if there are no boxes, return an empty list
  6. if len(boxes) == 0:
  7. return []
  8. # initialize the list of picked indexes
  9. pick = []
  10. # grab the coordinates of the bounding boxes
  11. x1 = boxes[:,0]
  12. y1 = boxes[:,1]
  13. x2 = boxes[:,2]
  14. y2 = boxes[:,3]
  15. scores = boxes[:, 4]
  16. # compute the area of the bounding boxes and sort the bounding
  17. # boxes by the bottom-right y-coordinate of the bounding box
  18. area = (x2 - x1 + 1) * (y2 - y1 + 1)
  19. idxs = np.argsort(scores)
  20. # keep looping while some indexes still remain in the indexes
  21. # list
  22. while len(idxs) > 0:
  23. # grab the last index in the indexes list, add the index
  24. # value to the list of picked indexes, then initialize
  25. # the suppression list (i.e. indexes that will be deleted)
  26. # using the last index
  27. last = len(idxs) - 1
  28. i = idxs[last]
  29. pick.append(i)
  30. suppress = [last]
  31. # loop over all indexes in the indexes list
  32. for pos in xrange(0, last):
  33. # grab the current index
  34. j = idxs[pos]
  35. # find the largest (x, y) coordinates for the start of
  36. # the bounding box and the smallest (x, y) coordinates
  37. # for the end of the bounding box
  38. xx1 = max(x1[i], x1[j])
  39. yy1 = max(y1[i], y1[j])
  40. xx2 = min(x2[i], x2[j])
  41. yy2 = min(y2[i], y2[j])
  42. # compute the width and height of the bounding box
  43. w = max(0, xx2 - xx1 + 1)
  44. h = max(0, yy2 - yy1 + 1)
  45. # compute the ratio of overlap between the computed
  46. # bounding box and the bounding box in the area list
  47. overlap = float(w * h) / area[j]
  48. # if there is sufficient overlap, suppress the
  49. # current bounding box
  50. if overlap > overlapThresh:
  51. suppress.append(pos)
  52. # delete all indexes from the index list that are in the
  53. # suppression list
  54. idxs = np.delete(idxs, suppress)
  55. # return only the bounding boxes that were picked
  56. return boxes[pick]


  1. # import the necessary packages
  2. from pyimagesearch.nms import non_max_suppression_slow
  3. import numpy as np
  4. import cv2
  5. # construct a list containing the images that will be examined
  6. # along with their respective bounding boxes
  7. # 最后一位为:分类置信度*100
  8. images = [
  9. ("images/333.jpg", np.array([
  10. (285,293,713,679,96),
  11. (9,309,161,719,90),
  12. (703,259,959,659,93),
  13. (291,309,693,663,90),
  14. (1,371,155,621,80),
  15. (511,347,681,637,89),
  16. (293,587,721,671,70),
  17. (757,469,957,641,60)]))]
  18. # loop over the images
  19. for (imagePath, boundingBoxes) in images:
  20. # load the image and clone it
  21. print "[x] %d initial bounding boxes" % (len(boundingBoxes))
  22. image = cv2.imread(imagePath)
  23. orig = image.copy()
  24. # loop over the bounding boxes for each image and draw them
  25. for (startX, startY, endX, endY, c) in boundingBoxes:
  26. cv2.rectangle(orig, (startX, startY), (endX, endY), (0, 0, 255), 2)
  27. # perform non-maximum suppression on the bounding boxes
  28. pick = non_max_suppression_slow(boundingBoxes, 0.3)
  29. print "[x] after applying non-maximum, %d bounding boxes" % (len(pick))
  30. # loop over the picked bounding boxes and draw them
  31. for (startX, startY, endX, endY,c) in pick:
  32. cv2.rectangle(image, (startX, startY), (endX, endY), (0, 255, 0), 2)
  33. # display the images
  34. cv2.imshow("Original", orig)
  35. cv2.imshow("After NMS", image)
  36. cv2.waitKey(0)

8.3.3 mAP

先介绍什么是AP,以PASCAL VOC CHALLENGE 2010以后的定义做说明。

编号 预测值 实际值
1 0.88 1
2 0.76 0
3 0.56 0
4 0.92 0
5 0.10 1
6 0.77 1
7 0.23 0
8 0.34 0
9 0.35 0
10 0.66 1
11 0.56 0
12 0.45 1
13 0.93 1
14 0.97 0
15 0.81 1
16 0.78 0
17 0.66 0
18 0.54 0
19 0.43 1
20 0.31 0
21 0.22 0
22 0.12 0
23 0.02 0
24 0.05 1
25 0.15 0
26 0.01 0
27 0.77 1
28 0.37 0
29 0.43 1
30 0.99 1


编号 预测值 实际值
30 0.99 1
14 0.97 0
13 0.93 1
4 0.92 0
1 0.88 1
15 0.81 1
16 0.78 0
6 0.77 1
27 0.77 1
2 0.76 0
10 0.66 1
17 0.66 0
3 0.56 0
11 0.56 0
18 0.54 0
12 0.45 1
19 0.43 1
29 0.43 1
28 0.37 0
9 0.35 0
8 0.34 0
20 0.31 0
7 0.23 0
21 0.22 0
25 0.15 0
22 0.12 0
5 0.10 1
24 0.05 1
23 0.02 0
26 0.01 0


编号 预测值 实际值 Precision Recall(r) Max Precision with Recall(r'≥r) AP
30 0.99 1 1/1=1 1/12=0.08 1 0.609
14 0.97 0 1/2=0.5 1/12=0.08
13 0.93 1 2/3=0.67 2/12=0.17 0.67
4 0.92 0 2/4=0.5 2/12=0.17
1 0.88 1 3/5=0.6 3/12=0.25 0.6
15 0.81 1 4/6=0.67 4/12=0.33 0.67
16 0.78 0 4/7=0.57 4/12=0.33
6 0.77 1 5/8=0.63 5/12=0.42 0.63
27 0.77 1 6/9=0.67 6/12=0.5 0.67
2 0.76 0 6/10=0.6 6/12=0.5
10 0.66 1 7/11=0.64 7/12=0.58 0.64
17 0.66 0 7/12=0.58 7/12=0.58
3 0.56 0 7/13=0.54 7/12=0.58
11 0.56 0 7/14=0.5 7/12=0.58
18 0.54 0 7/15=0.47 7/12=0.58
12 0.45 1 8/16=0.5 8/12=0.67 0.5
19 0.43 1 9/17=0.53 9/12=0.75 0.53
29 0.43 1 10/18=0.56 10/12=0.83 0.56
28 0.37 0 10/19=0.53 10/12=0.83
9 0.35 0 10/20=0.5 10/12=0.83
8 0.34 0 10/21=0.48 10/12=0.83
20 0.31 0 10/22=0.45 10/12=0.83
7 0.23 0 10/23=0.43 10/12=0.83
21 0.22 0 10/24=0.42 10/12=0.83
25 0.15 0 10/25=0.4 10/12=0.83
22 0.12 0 10/26=0.38 10/12=0.83
5 0.1 1 11/27=0.41 11/12=0.92 0.41
24 0.05 1 12/28=0.43 12/12=1 0.43
23 0.02 0 12/29=0.41 12/12=1
26 0.01 0 12/30=0.4 12/12=1


8.3.4 R-CNN原理

训练阶段 整个过程分4步:



8.3.5 代码实践

作者代码能力极强,具体可见:R-CNN: Region-based Convolutional Neural Networks

8.4 SPP-Net

SPP-Net是何凯明等人在《Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition》一文中提出,文章亮点是主要解决了两个问题:
2、借鉴OverFeat只对整张图做一次特征提取,一些操作只在feature map上做而不用在原图进行且feature map上的点可以还原到原图上。

8.4.1 问题回顾



分析CNN网络结构可以发现,卷积层和pooling层对图片输入大小都没有要求,唯独全连接层需要其输入是固定大小的,所以改进主要针对全连接层的输入,另外通过特征可视化观察到feature map包含了图片的空间信息,所以新方法同样需要包含空间信息,于是文中提出了通过增加SPP层解决问题,新的算法流程变为:

8.4.2 SPP详解

可以把这个问题看做如何找到输入可变,输出固定且能保留空间信息的映射问题,问题三个相关变量:feature map的大小、bin的个数(借鉴BoW《Video Google: A Text Retrieval Approach to Object Matching in Videos》的思想,表示固定特征的维度数)、pooling步长。现在feature map的大小不固定但bin的个数固定,于是唯一能自适应可变的就是pooling步长了。
假设:最后一个卷积层产生的feature map大小为,希望产生个bins,则窗口大小为,步长为,例如:

每个bin的pooling方式可以是max pooling或其他pooling。


8.4.3 感受野(Receptive Field)

感受野来源于生物学,Levine and Shefner在《Fundamentals of sensation and perception》中将感受野定义为:由于受到刺激导致特定神经元发生反应的区域。比如人在观察某个物体的某个部分时由于受到刺激,物体会投影到视网膜,之后传到给大脑并激活某个区域(橘色的框框住的区域)。

CNN的任何一个卷积层或pooling层产生的任何一个feature map上的任何一点都会对应到原始图像上的某个区域,那个区域就是该点的感受野。例如,红、绿、橙三个点的感受野不同:


8.4.4 feature map与原图对应关系转换

由于SPP只对原图做一次特征提取,省去了大量重复劳动,另外由于特征点的可还原性,使得后续对所有对候选框做SPP特征映射操作时只需要在最后一个卷积层产生的feature map上进行即可(否则需要考虑感受野上的所有特征映射将会产生巨大的计算量)。
详情可参考《R-CNN minus R》.
3、原图中任何一点坐标为,该点在任何一个feature map上的位置为
4、从原图到该feature map感受野范围内的所有stride乘积为
原图候选框左上点的坐标与其在任意feature map上的坐标关系为:

原图候选框右下点的坐标与其在任意feature map上的坐标关系为:


是feature map上的特征点感受野的中心位置坐标;
是当前特征点处于由CNN的第几层产生的feature map中;
反过来可以知道原图任何一个候选框在任何一个feature map上的位置。

感受野大小的计算采用Top to Down的方式,从当前层往靠近输入层的方式逐层传递,具体方法为:
假设:待计算感受野的特征点所在feature map所处层为为特征点在原图的感受野大小。


8.4.5 代码实践

  1. # -*- coding: utf-8 -*-
  2. #一层表示为一个三元组: [filter size, stride, padding]
  3. import math
  4. def forword(conv, layerIn):
  5. n_in = layerIn
  6. k = conv[0]
  7. s = conv[1]
  8. p = conv[2]
  9. return math.floor((n_in - k + 2*p)/s) + 1
  10. def alexnet():
  11. convnet = [[],[11,4,0],[3,2,0],[5,1,2],[3,2,0],[3,1,1],[3,1,1],[3,1,1],[3,2,0],[6,1,0], [1, 1, 0]]
  12. layer_names = [['input'],'conv1','pool1','conv2','pool2','conv3','conv4','conv5','pool5','fc6-conv', 'fc7-conv']
  13. return [convnet, layer_names]
  14. def testnet():
  15. convnet = [[],[2,1,0],[3,3,1]]
  16. layer_names = [['input'],'conv1','conv2']
  17. return [convnet, layer_names]
  18. # layerid >= 1
  19. def receptivefield(net, layerid):
  20. if layerid > len(net[0]):
  21. print '[error] receptivefield:no such layerid!'
  22. return 0
  23. rf = 1
  24. for i in reversed(range(layerid)):
  25. filtersize, stride, padding = net[0][i+1]
  26. rf = (rf - 1)*stride + filtersize
  27. print ' 感受野大小为:%d.' % (int(rf))
  28. return rf
  29. def anylayerout(net, layerin, layerid):
  30. if layerid > len(net[0]):
  31. print '[error] anylayerout:no such layerid!'
  32. return 0
  33. for i in range(layerid):
  34. if i == 0:
  35. fout = forword(net[0][i+1], layerin)
  36. continue
  37. fout = forword(net[0][i+1], fout)
  38. print '当前层为:%s, 输出节点维度为:%d.' % (net[1][layerid], int(fout))
  39. #x,y>=1
  40. def receptivefieldcenter(net, layerid, x, y):
  41. if layerid > len(net[0]):
  42. print '[error] receptivefieldcenter:no such layerid!'
  43. return 0
  44. al = 1
  45. bl = 1
  46. for i in range(layerid):
  47. filtersize, stride, padding = net[0][i+1]
  48. al = al * stride
  49. ss = 1
  50. for j in range(i):
  51. fsize, std, pad = net[0][j+1]
  52. ss = ss * std
  53. bl = bl + ss * (float(filtersize-1)/2 - padding)
  54. xi0 = al * (x - 1) + float(bl)
  55. yi0 = al * (y - 1) + bl
  56. print ' 该层上的特征点(%d,%d)在原图的感受野中心坐标为:(%.1f,%.1f).' % (int(x), int(y), float(xi0), float(yi0))
  57. return (xi0, yi0)
  58. # net:为某个CNN网络
  59. # insize:为输入层大小
  60. # totallayers:为除了输入层外的所有层个数
  61. # x,y为某层特征点坐标
  62. def printlayer(net, insize, totallayers, x, y):
  63. for i in range(totallayers):
  64. # 计算每一层的输出大小
  65. anylayerout(net, insize, i+1)
  66. # 计算每层的感受野大小
  67. receptivefield(net, i+1)
  68. # 计算feature map上(x,y)点在原图感受野的中心位置坐标
  69. receptivefieldcenter(net, i+1, x, y)
  70. if __name__ == '__main__':
  71. #net = testnet()
  72. #printlayer(net, insize=6, totallayers=2, x=1, y=1)
  73. net = alexnet()
  74. printlayer(net, insize=227, totallayers=8, x=2, y=3)

8.5 Fast R-CNN

Fast R-CNN》的出现解决了R-CNN+SPP中的以下问题:

8.5.1 算法概述


直观对比R-CNN与Fast R-CNN的forward pipeline

8.5.2 训练阶段

smooth L1函数对异常点不敏感(在|x|值较大时使用线性分段函数而不是二次函数),如图:

8.5.3 代码实践

fast r-cnn完整代码请参考rbgirshick/fast-rcnn

  1. // ------------------------------------------------------------------
  2. // Fast R-CNN
  3. // Copyright (c) 2015 Microsoft
  4. // Licensed under The MIT License [see fast-rcnn/LICENSE for details]
  5. // Written by Ross Girshick
  6. // ------------------------------------------------------------------
  7. #include <cfloat>
  8. #include "caffe/fast_rcnn_layers.hpp"
  9. using std::max;
  10. using std::min;
  11. namespace caffe {
  12. template <typename Dtype>
  13. // 以下参数解释以VGG16为例,即进入roi pooling前的网络结构采用经典VGG16.
  14. // 在Layer类中输入数据用bottom表示, 输出数据用top表示
  15. __global__ void ROIPoolForward(
  16. const int nthreads, // 任务数,对应通过roi pooling后的输出feature map的神经元节点总数,
  17. // 具体为:RoI的个数(m) × channel个数(VGG16的conv5_3的输出为512个) × roi pooling输出宽(配置为7) × roi pooling输出高(配置为7) = 25088×m个
  18. const Dtype* bottom_data, // 输入的feature map,原图经过各种卷积、pooling等前向传播后得到(VGG16的conv5_3卷积产生的feature map,大小为:512×14×14)
  19. const Dtype spatial_scale, // 由之前所有卷积层的strides相乘得到,在fast rcnn中为1/16,注:从原图往conv5_3的feature map上映射为缩小过程,所以乘以1/16,反之需要乘以16
  20. const int channels, // 输入层(VGG16为卷积层conv5_3)feature map的channel个数(512)
  21. const int height, // 输入层(VGG16为卷积层conv5_3)feature map的高(14)
  22. const int width, // 输入层(VGG16为卷积层conv5_3)feature map的宽(14)
  23. const int pooled_height, // roi pooling输出feature map的高,fast rcnn中配置为h=7
  24. const int pooled_width, // roi pooling输出feature map的宽,fast rcnn中配置为w=7
  25. const Dtype* bottom_rois, // 输入的roi信息,存储所有rois或一个batch的rois,数据结构为[batch_ind,x1,y1,x2,y2],包含roi的:索引、左上角坐标及右下角坐标
  26. Dtype* top_data, // 存储roi pooling后得到的feature map
  27. int* argmax_data) { // 为每个roi pooling后的feature map元素存储max pooling后对应conv5_3 feature map元素的索引信息,长度等于nthreads
  28. // index为线程索引,个数为roi pooling后的feature map上所有值的个数,索引范围为:[0,nthreads-1]
  29. CUDA_KERNEL_LOOP(index, nthreads) {
  30. // 该线程对应的top blob(N,C,H,W)中的W,输出roi pooling后feature map的中的宽的坐标,即feature map的第i=[0,k-1]列
  31. int pw = index % pooled_width;
  32. // 该线程对应的top blob(N,C,H,W)中的H,输出roi pooling后feature map的中的高的坐标,即feature map的第j=[0,k-1]行
  33. int ph = (index / pooled_width) % pooled_height;
  34. // 该线程对应的top blob(N,C,H,W)中的C,即第c个channel,channel数最大值为输入feature map的channel数(VGG16中为512).
  35. int c = (index / pooled_width / pooled_height) % channels;
  36. // 该线程对应的是第几个RoI,一共m个.
  37. int n = index / pooled_width / pooled_height / channels;
  38. // [start, end),指定RoI信息的存储范围,指针每次移动5的倍数是因为包含信息的数据结构大小为5,包含信息为:[batch_ind,x1,y1,x2,y2],含义同上
  39. bottom_rois += n * 5;
  40. // 将每个原图的RoI区域映射到feature map(VGG16为conv5_3产生的feature mao)上的坐标,bottom_rois第0个位置存放的是roi索引.
  41. int roi_batch_ind = bottom_rois[0];
  42. // 原图到feature map的映射为乘以1/16,这里采用粗映射而不是上文讲的精确映射,原因你懂的.
  43. int roi_start_w = round(bottom_rois[1] * spatial_scale);
  44. int roi_start_h = round(bottom_rois[2] * spatial_scale);
  45. int roi_end_w = round(bottom_rois[3] * spatial_scale);
  46. int roi_end_h = round(bottom_rois[4] * spatial_scale);
  47. // 强制把RoI的宽和高限制在1x1,防止出现映射后的RoI大小为0的情况
  48. int roi_width = max(roi_end_w - roi_start_w + 1, 1);
  49. int roi_height = max(roi_end_h - roi_start_h + 1, 1);
  50. // 根据原图映射得到的roi的高和配置的roi pooling的高(这里大小配置为7)自适应计算bin桶的高度
  51. Dtype bin_size_h = static_cast<Dtype>(roi_height)
  52. / static_cast<Dtype>(pooled_height);
  53. // 根据原图映射得到的roi的宽和配置的roi pooling的宽(这里大小配置为7)自适应计算bin桶的宽度
  54. Dtype bin_size_w = static_cast<Dtype>(roi_width)
  55. / static_cast<Dtype>(pooled_width);
  56. // 计算第(i,j)个bin桶在feature map上的坐标范围,需要依据它们确定后续max pooling的范围
  57. int hstart = static_cast<int>(floor(static_cast<Dtype>(ph)
  58. * bin_size_h));
  59. int wstart = static_cast<int>(floor(static_cast<Dtype>(pw)
  60. * bin_size_w));
  61. int hend = static_cast<int>(ceil(static_cast<Dtype>(ph + 1)
  62. * bin_size_h));
  63. int wend = static_cast<int>(ceil(static_cast<Dtype>(pw + 1)
  64. * bin_size_w));
  65. // 确定max pooling具体范围,注意由于RoI取自原图,其左上角不是从(0,0)开始,
  66. // 所以需要加上 roi_start_h 或 roi_start_w作为偏移量,并且超出feature map尺寸范围的部分会被舍弃
  67. hstart = min(max(hstart + roi_start_h, 0), height);
  68. hend = min(max(hend + roi_start_h, 0), height);
  69. wstart = min(max(wstart + roi_start_w, 0), width);
  70. wend = min(max(wend + roi_start_w, 0), width);
  71. bool is_empty = (hend <= hstart) || (wend <= wstart);
  72. // 如果区域为0返回错误代码
  73. Dtype maxval = is_empty ? 0 : -FLT_MAX;
  74. // If nothing is pooled, argmax = -1 causes nothing to be backprop'd
  75. int maxidx = -1;
  76. bottom_data += (roi_batch_ind * channels + c) * height * width;
  77. // 在给定bin桶的区域中做max pooling
  78. for (int h = hstart; h < hend; ++h) {
  79. for (int w = wstart; w < wend; ++w) {
  80. int bottom_index = h * width + w;
  81. if (bottom_data[bottom_index] > maxval) {
  82. maxval = bottom_data[bottom_index];
  83. maxidx = bottom_index;
  84. }
  85. }
  86. }
  87. // 为某个roi pooling的feature map元素记录其由对conv5_3(VGG16)的feature map做max pooling后产生元素的索引号及值
  88. top_data[index] = maxval;
  89. argmax_data[index] = maxidx;
  90. }
  91. }
  92. template <typename Dtype>
  93. void ROIPoolingLayer<Dtype>::Forward_gpu(
  94. const vector<Blob<Dtype>*>& bottom, // 以VGG16为例,bottom[0]为最后一个卷积层conv5_3产生的feature map,shape[1, 512, 14, 14],
  95. // bottom[1]为rois数据,shape[roi个数m, 5]
  96. const vector<Blob<Dtype>*>& top) { // top为输出层结构, top->count() = top.n(RoI的个数) × top.channel(channel数)
  97. // × top.w(输出feature map的宽) × top.h(输出feature map的高)
  98. const Dtype* bottom_data = bottom[0]->gpu_data();
  99. const Dtype* bottom_rois = bottom[1]->gpu_data();
  100. Dtype* top_data = top[0]->mutable_gpu_data();
  101. int* argmax_data = max_idx_.mutable_gpu_data();
  102. int count = top[0]->count();
  103. /*
  104. 参照caffe-fast-rcnn/src/caffe/layers/roi_pooling_layer.cpp中的代码:
  105. template <typename Dtype>
  106. void ROIPoolingLayer<Dtype>::Reshape(const vector<Blob<Dtype>*>& bottom,
  107. const vector<Blob<Dtype>*>& top) {
  108. channels_ = bottom[0]->channels();
  109. height_ = bottom[0]->height();
  110. width_ = bottom[0]->width();
  111. top[0]->Reshape(bottom[1]->num(), channels_, pooled_height_, pooled_width_);
  112. max_idx_.Reshape(bottom[1]->num(), channels_, pooled_height_, pooled_width_);
  113. }*/
  114. /*
  115. 参照caffe-fast-rcnn/include/caffe/util/device_alternate.hpp中的代码:
  117. #define CUDA_KERNEL_LOOP(i, n) \
  118. for (int i = blockIdx.x * blockDim.x + threadIdx.x; \
  119. i < (n); \
  120. i += blockDim.x * gridDim.x)
  122. // CUDA: number of blocks for threads.
  123. inline int CAFFE_GET_BLOCKS(const int N) {
  125. }
  127. // CUDA: thread number configuration.
  128. // Use 1024 threads per block, which requires cuda sm_2x or above,
  129. // or fall back to attempt compatibility (best of luck to you).
  130. #if __CUDA_ARCH__ >= 200
  131. const int CAFFE_CUDA_NUM_THREADS = 1024;
  132. #else
  133. const int CAFFE_CUDA_NUM_THREADS = 512;
  134. #endif
  135. */
  136. ROIPoolForward<Dtype><<<CAFFE_GET_BLOCKS(count), CAFFE_CUDA_NUM_THREADS>>>(
  137. count, bottom_data, spatial_scale_, channels_, height_, width_,
  138. pooled_height_, pooled_width_, bottom_rois, top_data, argmax_data);
  140. }
  141. template <typename Dtype>
  142. // 反向传播的过程与论文中"Back-propagation through RoI pooling layers"这一小节的公式完全一致
  143. __global__ void ROIPoolBackward(
  144. const int nthreads, // 输入feature map的元素数(VGG16为:512×14×14)
  145. const Dtype* top_diff, // roi pooling输出feature map所带的梯度信息∂L/∂y(r,j)
  146. const int* argmax_data, // 同前向,不解释
  147. const int num_rois, // 同前向,不解释
  148. const Dtype spatial_scale, // 同前向,不解释
  149. const int channels, // 同前向,不解释
  150. const int height, // 同前向,不解释
  151. const int width, // 同前向,不解释
  152. const int pooled_height, // 同前向,不解释
  153. const int pooled_width, // 同前向,不解释
  154. Dtype* bottom_diff, // 保留输入feature map每个元素通过梯度反向传播得到的梯度信息
  155. const Dtype* bottom_rois) { // 同前向,不解释
  156. // 含义同前向,需要注意的是这里表示的是输入feature map的元素数(反向传播嘛)
  157. CUDA_KERNEL_LOOP(index, nthreads) {
  158. // 同前向,不解释
  159. int w = index % width;
  160. int h = (index / width) % height;
  161. int c = (index / width / height) % channels;
  162. int n = index / width / height / channels;
  163. Dtype gradient = 0;
  164. // 同论文中公式,任何一个输入feature map的元素的梯度信息为:
  165. // 所有max pooling时被该元素落入且该元素值被选中(最大值)的
  166. // roi pooling feature map元素的梯度信息累加和
  167. // 遍历所有RoI,以判断是否满足上述条件
  168. for (int roi_n = 0; roi_n < num_rois; ++roi_n) {
  169. const Dtype* offset_bottom_rois = bottom_rois + roi_n * 5;
  170. int roi_batch_ind = offset_bottom_rois[0];
  171. // 如果RoI的索引号不满足条件则跳过
  172. if (n != roi_batch_ind) {
  173. continue;
  174. }
  175. // 找原图RoI在feature map上的映射位置,解释同前向传播
  176. int roi_start_w = round(offset_bottom_rois[1] * spatial_scale);
  177. int roi_start_h = round(offset_bottom_rois[2] * spatial_scale);
  178. int roi_end_w = round(offset_bottom_rois[3] * spatial_scale);
  179. int roi_end_h = round(offset_bottom_rois[4] * spatial_scale);
  180. // (h,w)不在RoI范围则跳过
  181. const bool in_roi = (w >= roi_start_w && w <= roi_end_w &&
  182. h >= roi_start_h && h <= roi_end_h);
  183. if (!in_roi) {
  184. continue;
  185. }
  186. int offset = (roi_n * channels + c) * pooled_height * pooled_width;
  187. const Dtype* offset_top_diff = top_diff + offset;
  188. const int* offset_argmax_data = argmax_data + offset;
  189. // 同前向
  190. int roi_width = max(roi_end_w - roi_start_w + 1, 1);
  191. int roi_height = max(roi_end_h - roi_start_h + 1, 1);
  192. // 同前向
  193. Dtype bin_size_h = static_cast<Dtype>(roi_height)
  194. / static_cast<Dtype>(pooled_height);
  195. Dtype bin_size_w = static_cast<Dtype>(roi_width)
  196. / static_cast<Dtype>(pooled_width);
  197. // 类比前向,看做一个逆过程
  198. int phstart = floor(static_cast<Dtype>(h - roi_start_h) / bin_size_h);
  199. int phend = ceil(static_cast<Dtype>(h - roi_start_h + 1) / bin_size_h);
  200. int pwstart = floor(static_cast<Dtype>(w - roi_start_w) / bin_size_w);
  201. int pwend = ceil(static_cast<Dtype>(w - roi_start_w + 1) / bin_size_w);
  202. phstart = min(max(phstart, 0), pooled_height);
  203. phend = min(max(phend, 0), pooled_height);
  204. pwstart = min(max(pwstart, 0), pooled_width);
  205. pwend = min(max(pwend, 0), pooled_width);
  206. // 累积所有与当前输入feature map上的元素相关的roi pooling元素的梯度信息
  207. for (int ph = phstart; ph < phend; ++ph) {
  208. for (int pw = pwstart; pw < pwend; ++pw) {
  209. if (offset_argmax_data[ph * pooled_width + pw] == (h * width + w)) {
  210. gradient += offset_top_diff[ph * pooled_width + pw];
  211. }
  212. }
  213. }
  214. }
  215. // 存储当前输入feature map上元素的反向传播梯度信息
  216. bottom_diff[index] = gradient;
  217. }
  218. }
  219. template <typename Dtype>
  220. void ROIPoolingLayer<Dtype>::Backward_gpu(
  221. const vector<Blob<Dtype>*>& top, // roi pooling输出feature map
  222. const vector<bool>& propagate_down, // 是否做反向传播,回忆前向传播时的那个bool值
  223. const vector<Blob<Dtype>*>& bottom) { // roi pooling输入feature map(VGG16中的conv5_3产生的feature map)
  224. if (!propagate_down[0]) {
  225. return;
  226. }
  227. const Dtype* bottom_rois = bottom[1]->gpu_data(); // 原始RoI信息
  228. const Dtype* top_diff = top[0]->gpu_diff(); // roi pooling feature map梯度信息
  229. Dtype* bottom_diff = bottom[0]->mutable_gpu_diff(); // 待写入的输入feature map梯度信息
  230. const int count = bottom[0]->count(); // 输入feature map元素总数
  231. caffe_gpu_set(count, Dtype(0.), bottom_diff);
  232. const int* argmax_data = max_idx_.gpu_data();
  233. // NOLINT_NEXT_LINE(whitespace/operators)
  234. ROIPoolBackward<Dtype><<<CAFFE_GET_BLOCKS(count), CAFFE_CUDA_NUM_THREADS>>>(
  235. count, top_diff, argmax_data, top[0]->num(), spatial_scale_, channels_,
  236. height_, width_, pooled_height_, pooled_width_, bottom_diff, bottom_rois);
  238. }
  240. } // namespace caffe



  1. layer {
  2. name: "conv5_3"
  3. type: "Convolution"
  4. bottom: "conv5_2"
  5. top: "conv5_3"
  6. param {
  7. lr_mult: 1
  8. }
  9. param {
  10. lr_mult: 2
  11. }
  12. convolution_param {
  13. num_output: 512
  14. pad: 1
  15. kernel_size: 3
  16. }
  17. }
  18. layer {
  19. name: "relu5_3"
  20. type: "ReLU"
  21. bottom: "conv5_3"
  22. top: "conv5_3"
  23. }
  24. layer {
  25. name: "roi_pool5"
  26. type: "ROIPooling"
  27. bottom: "conv5_3"
  28. bottom: "rois"
  29. top: "pool5"
  30. roi_pooling_param {
  31. pooled_w: 7
  32. pooled_h: 7
  33. spatial_scale: 0.0625 # 1/16
  34. }
  35. }

8.6 Faster R-CNN

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks》提出了Region Proposal Network(RPN),解决了基于Region的检测算法需要事先通过Selective Search生成候选框的问题,让候选框生成、分类、bounding box回归公用同一套特征提取网络,从而使这类检测算法真正意义上实现End to End。

8.6.1 算法概述

如上所述,Faster R-CNN设计了RPN使得候选框生成可以共用特征提取网络,算法流程如下:

RPN负责生成Proposal候选框,其他过程类似Fast R-CNN,同样,生成候选框的扫描过程发生在最后一个卷积层产生的feature map上(而不是扫描原图),通过之前讲的坐标换算关系可以将feature map任意一点映射回原图。

8.6.2 RPN


1、RPN的输入是特征提取器最后一个卷积(pooling)产生的feature map,例如VGG16为conv5_3产生的512维(channel数)的feature map(图中例子是256维);
2、之后以m×m大小的滑动窗口扫描feature map,如果feature map大小为h×w,则扫描h×w次(即以每个像素点为中心做一次),文中m的取值为3,取值与具体网络结构有关,感受野的不同导致候选框的初始大小不同;

8.6.3 Anchor

RPN里很重要的一个概念是anchor,可以把它理解为生成候选框的模板,在RPN里只生成一次,anchor是用原图为参照物,以(0,0,指定宽,指定高)四元组采用不同缩放比例和尺度后产生的候选框模板集合,而候选框由滑动窗口(中心点x,中心点y)利用anchor生成。也可以从逆SPP角度去理解,SPP可以把一个feature map通过多尺度变换为金字塔式的多个feature map,反过来任何一个feature map也可利用多尺度变成多个feature map,这么做的好处是压根儿不用在原图上做各种尺度缩放而只用在feature map上做就好,并且这种变换具有不变性(Translation-Invariant Anchor):候选框生成及其预测函数具有可复现性,例如通过k-means聚类得到800个anchor,如果重复做一次实验不一定还是原来那800个,这个性质可以降低模型大小以及过拟合的风险。

以16×16大小为,base anchor[0,0,15,15]为例:




  1. # --------------------------------------------------------
  2. # Faster R-CNN
  3. # Copyright (c) 2015 Microsoft
  4. # Licensed under The MIT License [see LICENSE for details]
  5. # Written by Ross Girshick and Sean Bell
  6. # --------------------------------------------------------
  7. import numpy as np
  8. # Verify that we compute the same anchors as Shaoqing's matlab implementation:
  9. #
  10. # >> load output/rpn_cachedir/faster_rcnn_VOC2007_ZF_stage1_rpn/anchors.mat
  11. # >> anchors
  12. #
  13. # anchors =
  14. #
  15. # -83 -39 100 56
  16. # -175 -87 192 104
  17. # -359 -183 376 200
  18. # -55 -55 72 72
  19. # -119 -119 136 136
  20. # -247 -247 264 264
  21. # -35 -79 52 96
  22. # -79 -167 96 184
  23. # -167 -343 184 360
  24. #array([[ -83., -39., 100., 56.],
  25. # [-175., -87., 192., 104.],
  26. # [-359., -183., 376., 200.],
  27. # [ -55., -55., 72., 72.],
  28. # [-119., -119., 136., 136.],
  29. # [-247., -247., 264., 264.],
  30. # [ -35., -79., 52., 96.],
  31. # [ -79., -167., 96., 184.],
  32. # [-167., -343., 184., 360.]])
  33. # 生成多尺度anchors,默认实现是大小为16,起始anchor位置是(0, 0, 15, 15)[左下角和右上角坐标],宽高比例为1/2,1,2,尺度缩放倍数为8,16,32。
  34. def generate_anchors(base_size=16, ratios=[0.5, 1, 2],
  35. scales=2**np.arange(3, 6)):
  36. """
  37. Generate anchor (reference) windows by enumerating aspect ratios X
  38. scales wrt a reference (0, 0, 15, 15) window.
  39. """
  40. # 生成起始anchor位置是(0, 0, 15, 15)
  41. base_anchor = np.array([1, 1, base_size, base_size]) - 1
  42. # 枚举1/2,1,2三种宽高缩放比例
  43. ratio_anchors = _ratio_enum(base_anchor, ratios)
  44. # 在以上比例的基础上做8,16,32三类尺度缩放,最终生成9个anchor。
  45. anchors = np.vstack([_scale_enum(ratio_anchors[i, :], scales)
  46. for i in xrange(ratio_anchors.shape[0])])
  47. return anchors
  48. # 对给定anchor返回宽、高和中心点坐标(anchor存储的是左下角和右上角)
  49. def _whctrs(anchor):
  50. """
  51. Return width, height, x center, and y center for an anchor (window).
  52. """
  53. w = anchor[2] - anchor[0] + 1
  54. h = anchor[3] - anchor[1] + 1
  55. x_ctr = anchor[0] + 0.5 * (w - 1)
  56. y_ctr = anchor[1] + 0.5 * (h - 1)
  57. return w, h, x_ctr, y_ctr
  58. # 给定宽、高和中心点,输出anchor的左下角和右上角坐标
  59. def _mkanchors(ws, hs, x_ctr, y_ctr):
  60. """
  61. Given a vector of widths (ws) and heights (hs) around a center
  62. (x_ctr, y_ctr), output a set of anchors (windows).
  63. """
  64. ws = ws[:, np.newaxis]
  65. hs = hs[:, np.newaxis]
  66. anchors = np.hstack((x_ctr - 0.5 * (ws - 1),
  67. y_ctr - 0.5 * (hs - 1),
  68. x_ctr + 0.5 * (ws - 1),
  69. y_ctr + 0.5 * (hs - 1)))
  70. return anchors
  71. # 枚举anchor的三种宽高比 1:2,1:1,2:1
  72. def _ratio_enum(anchor, ratios):
  73. """
  74. Enumerate a set of anchors for each aspect ratio wrt an anchor.
  75. """
  76. w, h, x_ctr, y_ctr = _whctrs(anchor)
  77. size = w * h
  78. size_ratios = size / ratios
  79. ws = np.round(np.sqrt(size_ratios))
  80. hs = np.round(ws * ratios)
  81. anchors = _mkanchors(ws, hs, x_ctr, y_ctr)
  82. return anchors
  83. # 枚举anchor的各种尺度,如:anchor为[0 0 15 15],尺度为[8 16 32]
  84. def _scale_enum(anchor, scales):
  85. """
  86. Enumerate a set of anchors for each scale wrt an anchor.
  87. """
  88. w, h, x_ctr, y_ctr = _whctrs(anchor)
  89. ws = w * scales
  90. hs = h * scales
  91. anchors = _mkanchors(ws, hs, x_ctr, y_ctr)
  92. return anchors
  93. if __name__ == '__main__':
  94. import time
  95. t = time.time()
  96. a = generate_anchors()
  97. print time.time() - t
  98. print a
  99. from IPython import embed; embed()

8.6.4 代码实践


  1. layer {
  2. name: "rpn_conv/3x3"
  3. type: "Convolution"
  4. bottom: "conv5_3"
  5. top: "rpn/output"
  6. param { lr_mult: 1.0 }
  7. param { lr_mult: 2.0 }
  8. convolution_param {
  9. num_output: 512
  10. kernel_size: 3 pad: 1 stride: 1
  11. weight_filler { type: "gaussian" std: 0.01 }
  12. bias_filler { type: "constant" value: 0 }
  13. }
  14. }
  15. layer {
  16. name: "rpn_relu/3x3"
  17. type: "ReLU"
  18. bottom: "rpn/output"
  19. top: "rpn/output"
  20. }
  21. layer {
  22. name: "rpn_cls_score"
  23. type: "Convolution"
  24. bottom: "rpn/output"
  25. top: "rpn_cls_score"
  26. param { lr_mult: 1.0 }
  27. param { lr_mult: 2.0 }
  28. convolution_param {
  29. num_output: 18 # 2(bg/fg) * 9(anchors)
  30. kernel_size: 1 pad: 0 stride: 1
  31. weight_filler { type: "gaussian" std: 0.01 }
  32. bias_filler { type: "constant" value: 0 }
  33. }
  34. }
  35. layer {
  36. name: "rpn_bbox_pred"
  37. type: "Convolution"
  38. bottom: "rpn/output"
  39. top: "rpn_bbox_pred"
  40. param { lr_mult: 1.0 }
  41. param { lr_mult: 2.0 }
  42. convolution_param {
  43. num_output: 36 # 4 * 9(anchors)
  44. kernel_size: 1 pad: 0 stride: 1
  45. weight_filler { type: "gaussian" std: 0.01 }
  46. bias_filler { type: "constant" value: 0 }
  47. }
  48. }
  49. layer {
  50. bottom: "rpn_cls_score"
  51. top: "rpn_cls_score_reshape"
  52. name: "rpn_cls_score_reshape"
  53. type: "Reshape"
  54. reshape_param { shape { dim: 0 dim: 2 dim: -1 dim: 0 } }
  55. }
  56. layer {
  57. name: 'rpn-data'
  58. type: 'Python'
  59. bottom: 'rpn_cls_score'
  60. bottom: 'gt_boxes'
  61. bottom: 'im_info'
  62. bottom: 'data'
  63. top: 'rpn_labels'
  64. top: 'rpn_bbox_targets'
  65. top: 'rpn_bbox_inside_weights'
  66. top: 'rpn_bbox_outside_weights'
  67. python_param {
  68. module: 'rpn.anchor_target_layer'
  69. layer: 'AnchorTargetLayer'
  70. param_str: "'feat_stride': 16"
  71. }
  72. }
  73. layer {
  74. name: "rpn_loss_cls"
  75. type: "SoftmaxWithLoss"
  76. bottom: "rpn_cls_score_reshape"
  77. bottom: "rpn_labels"
  78. propagate_down: 1
  79. propagate_down: 0
  80. top: "rpn_cls_loss"
  81. loss_weight: 1
  82. loss_param {
  83. ignore_label: -1
  84. normalize: true
  85. }
  86. }
  87. layer {
  88. name: "rpn_loss_bbox"
  89. type: "SmoothL1Loss"
  90. bottom: "rpn_bbox_pred"
  91. bottom: "rpn_bbox_targets"
  92. bottom: 'rpn_bbox_inside_weights'
  93. bottom: 'rpn_bbox_outside_weights'
  94. top: "rpn_loss_bbox"
  95. loss_weight: 1
  96. smooth_l1_loss_param { sigma: 3.0 }
  97. }
  1. def setup(self, bottom, top):
  2. # parse the layer parameter string, which must be valid YAML
  3. layer_params = yaml.load(self.param_str_)
  4. # 获取所有特征提取层stride的乘积。(例如VGG为16)
  5. self._feat_stride = layer_params['feat_stride']
  6. # 设置初始尺度变换比例为8、16、32。
  7. anchor_scales = layer_params.get('scales', (8, 16, 32))
  8. # 使用上面介绍的方法生成anchor模板。
  9. self._anchors = generate_anchors(scales=np.array(anchor_scales))
  10. # anchor数量。(例如:9)
  11. self._num_anchors = self._anchors.shape[0]
  12. if DEBUG:
  13. print 'feat_stride: {}'.format(self._feat_stride)
  14. print 'anchors:'
  15. print self._anchors
  16. # rois blob: holds R regions of interest, each is a 5-tuple
  17. # (n, x1, y1, x2, y2) specifying an image batch index n and a
  18. # rectangle (x1, y1, x2, y2)
  19. top[0].reshape(1, 5)
  20. # scores blob: holds scores for R regions of interest
  21. if len(top) > 1:
  22. top[1].reshape(1, 1, 1, 1)



  1. A = self._num_anchors
  2. K = shifts.shape[0]
  3. anchors = self._anchors.reshape((1, A, 4)) + \
  4. shifts.reshape((1, K, 4)).transpose((1, 0, 2))
  5. anchors = anchors.reshape((K * A, 4))

8.6.5 Faster R-CNN训练流程

采用四阶段交替方式训练(4-Step Alternating Training)
2、使用ImageNet预训练模型权重初始化并将上一步产生的候选框(proposal)作为输入训练独立的Faster R-CNN检测模型(此时没有卷积网络共享);
Faster R-CNN是效果最好的目标检测与分类模型之一,但如果想用于实时监测和前置到客户端则需要做大量模型裁剪、压缩和优化工作,具体做法我以后介绍,目前我们做的比较初步,模型大小压缩到10m左右,准确率损失小于1.5%,线上inference响应时间在500k左右大小图片、k80单机单卡单次请求下为20ms左右(在高并发情况下会通过打batch的方式及其他方法提高并发量)。


8.6.6 Faster R-CNN with Caffe

源码地址:Faster R-CNN(rbgirshick版)。一定注意,caffe有个问题(我认为是架构上的设计缺陷,这个问题tensorflow就没有):由于要支持自定义的网络层之类的需求,每个人的caffe版本可能是不一样的,所以在编译时需要注意,比如这里的caffe必须使用0dcd397这个branch,否则编译不通过,因为这里有自定义的proposal层以及相关参数。

Centos 7上编译运行caffe及Faster R-CNN


git clone https://github.com/gflags/gflags
cd gflags
mkdir build && cd build
export CXXFLAGS="-fPIC" && cmake ..
make VERBOSE=1 -j
sudo make install


git clone https://github.com/google/glog
cd glog
./autogen.sh && ./configure && make && make install

10、安装 lmdb

git clone https://github.com/LMDB/lmdb
cd lmdb/libraries/liblmdb
make -j
sudo make install

11、安装 hdf5

wget https://support.hdfgroup.org/ftp/HDF5/current18/src/hdf5-1.8.19.tar.gz
tar -xvf hdf5-1.8.19.tar.gz
cd hdf5-1.8.19
./configure --prefix=/usr/local
make -j
sudo make install

12、安装 leveldb

git clone https://github.com/google/leveldb
cd leveldb
make -j
sudo cp out-shared/libleveldb.so* /usr/local/lib
sudo cp out-static/.a /usr/local/lib
sudo cp -r include/


cd py-faster-rcnn
git clone https://github.com/rbgirshick/caffe-fast-rcnn.git



cd caffe-fast-rcnn
cp Makefile.config.example Makefile.config
vim Makefile.config

1)、指定CUDA_DIR,如:CUDA_DIR := /usr/local/cuda
2)、BLAS := open


make clean
make all -j
make test -j
make runtest -j
make pycaffe -j


cd py-faster-rcnn/lib/

vim ~/.bashrc

export PYTHONPATH=/data/liyiran/py-R-FCN/tools/python:$PYTHONPATH
source ~/.bashrc


cd py-faster-rcnn/data
wget http://host.robots.ox.ac.uk/pascal/VOC/voc2007/VOCtrainval_06-Nov-2007.tar
tar -xvf VOCtrainval_06-Nov-2007.tar
mv VOCtrainval_06-Nov-2007 VOCdevkit2007


cd py-faster-rcnn/model
wget https://dl.dropboxusercontent.com/s/gstw7122padlf0l/imagenet_models.tgz?dl=0

3、使用VGG16,应用于pascal_voc 2007数据集

sh experiments/scripts/faster_rcnn_end2end.sh 1 VGG16 pascal_voc

8.7 R-FCN

回想之前所有基于Region的检测算法,有一个共同点是:整个网络被分成两部分:共享计算的、与Region无关的全卷积子网络和RoI Pooling之后不共享计算的、与Region相关的子网络(如RPN和BBox Regression网络)。再回想之前所有的分类网络,尤其到残差和GoogLeNet系列,都可以看做是全卷积网络,且在分类问题上的效果已经非常赞了,但当把这些网络直接用于检测问题时,效果往往特别差,甚至不如VGG-16,原因也是明确的:分类问题往往会忽略位置信息,只需要判断是否为某个物体,所以要求提取出来的特征具有平移不变性,不管图片特征放大、缩小还是位移都能很好的适应,而卷积操作、pooling操作都能较好的保持这个性质,并且网络越深模型越对位置不敏感;但在检测问题中,提取的特征还需要能敏锐的捕捉到位置信息,即具备平移变化性,这就尴尬了。为此,大家插入类似RoI Pooling这样的层结构,一方面是的任意大小图片都可以输入,更重要的是一定程度上弥补了位置信息的缺失,所以检测效果也就嗖嗖的上来了。但带来一个副作用是:RoI后每个Region都需要跑一遍后续子网络,计算不共享就导致训练和Inference的速度慢,为此代季峰、何凯明几位提出《R-FCN: Object Detection via Region-based Fully Convolutional Networks》检测框架,用Position-Sensitive RoI Pooling代替原来的RoI Pooling,共享了所有计算,很好的tradeoff了平移不变性和平移变化性,并且由于是全卷积,训练和Inference的速度更快。

8.7.1 算法概述

如上所述,算法核心就是position-sentitive RoI pooling的加入,核心思想是这样的:

这里的feature map是过去RoI Pooling前的全卷积特征提取子网络,之后接着的(彩色立方体)是position-sensitive feature map,它其实是一个普通的卷积层,权重通过position-sensitive RoI Pooling层反向传播时修正。假设position-sensitive feature map(后面简写为ps feature map)的大小为k×k,检测分类数为C+1(1为背景类),则ps feature map的通道数为:k×k×(C+1),假如K=3,则每一类的 ps feature map会有k×k=9个,每个feature map含有一类位置特征(如:左上、左中、左右、......,下右,图中用不同颜色代表);接着,通过ps RoI Pooling后,每个RoI Region在C+1的每一类上都会得到一个k×k网格,对每个网格做分类判断,之后所有网格一起投票。最终得到C+1维向量,然后接个softmax做分类。


对RPN来说也是类似,每个Bounding Box候选框的位置为一类(左上角坐标、长和宽),ps feature map的通道数为k×k×4。

3、position-sensitive feature map

为了显示编码位置信息,假如ps feature map网格大小k×k,RoI大小为:,则每个bin大小约为:,对于第(i,j)个bin()做ps RoI Pooling为:


  • 为第c类在第(i,j)个bin的pooling响应值;
  • 为是k×k×(C+1)个feature map中的一个;
  • 为RoI的左上角坐标;
  • 是当前bin中的像素数;
  • 是网络所有可学习参数;
  • x、y的取值范围为:
  • pooling采用average、max甚至其他自定义的操作。



  • 是每一类的label,代表背景类;
  • ,是交叉熵损失函数;
  • ,与Fast R-CNN的定义一致;



8.7.2 position-sentitive RoI pooling

8.7.3 模型训练

1、训练使用Online Hard Example Mining


8.7.4 代码实践

源码可在py-R-FCN下载,需要把下载R-FCN版本caffe,编译方式类似Faster RCNN,目录类似:

  1. // ------------------------------------------------------------------
  2. // R-FCN
  3. // Copyright (c) 2016 Microsoft
  4. // Licensed under The MIT License [see r-fcn/LICENSE for details]
  5. // Written by Yi Li
  6. // ------------------------------------------------------------------
  7. #include <cfloat>
  8. #include "caffe/rfcn_layers.hpp"
  9. #include "caffe/util/gpu_util.cuh"
  10. using std::max;
  11. using std::min;
  12. namespace caffe {
  13. template <typename Dtype>
  14. __global__ void PSROIPoolingForward(
  15. const int nthreads, // 任务数,对应通过roi pooling后的输出feature map的神经元节点总数,RoI的个数(m) × channel个数(21类) × psroi pooling输出宽(配置为7) × psroi pooling输出高(配置为7) = 1029×m
  16. const Dtype* bottom_data, // 输入的feature map,原图经过各种卷积、pooling等前向传播后得到(ResNet50rfcn_cls卷积产生的position sensitive feature map,大小为:1029×14×14
  17. const Dtype spatial_scale, // 由之前所有卷积层的strides相乘得到,在rfcn中为1/16,注:从原图往rfcn_clsfeature map上映射为缩小过程,所以乘以1/16,反之需要乘以16
  18. const int channels, // 输入层(ResNet50为卷积层rfcn_clsfeature mapchannel个数(k×k×(C+1)=7×7×21=1029)
  19. const int height, // feature map的宽度(14)
  20. const int width, // feature map的高度(14)
  21. const int pooled_height, // psroi pooling输出feature map的高,fast rcnn中配置为h=7
  22. const int pooled_width, // psroi pooling输出feature map的宽,fast rcnn中配置为w=7
  23. const Dtype* bottom_rois, // 输入的roi信息,存储所有rois或一个batchrois,数据结构为[batch_ind,x1,y1,x2,y2],包含roi的:索引、左上角坐标及右下角坐标
  24. const int output_dim, // 输出feature map的维度,psroipooled_cls_rois2121个类别),psroipooled_loc_rois8
  25. const int group_size, // k=7
  26. Dtype* top_data, // 存储psroi pooling后得到的feature map
  27. int* mapping_channel) {
  28. // index为线程索引,个数为psroi pooling后的feature map上所有值的个数,索引范围为:[0,nthreads-1]
  29. CUDA_KERNEL_LOOP(index, nthreads) {
  30. // 该线程对应的top blobN,C,H,W)中的W,输出roi poolingfeature map的中的宽的坐标,即feature map的第i=[0,k-1]列
  31. int pw = index % pooled_width;
  32. // 该线程对应的top blobN,C,H,W)中的H,输出roi poolingfeature map的中的高的坐标,即feature map的第j=[0,k-1]行
  33. int ph = (index / pooled_width) % pooled_height;
  34. // 该线程对应的top blobN,C,H,W)中的C,即第cchannelchannel数最大值为21(包含背景类的类别数)
  35. int ctop = (index / pooled_width / pooled_height) % output_dim;
  36. // 该线程对应的是第几个RoI,一共m个.
  37. int n = index / pooled_width / pooled_height / output_dim;
  38. // [start, end),指定RoI信息的存储范围,指针每次移动5的倍数是因为包含信息的数据结构大小为5,包含信息为:[batch_ind,x1,y1,x2,y2],含义同上
  39. bottom_rois += n * 5;
  40. // 将每个原图的RoI区域映射到feature map(VGG16conv5_3产生的feature mao)上的坐标,bottom_rois0个位置存放的是roi索引.
  41. int roi_batch_ind = bottom_rois[0];
  42. // 原图到feature map的映射为乘以1/16,这里采用粗映射而不是上文讲的精确映射,原因你懂的.
  43. Dtype roi_start_w = static_cast<Dtype>(round(bottom_rois[1])) * spatial_scale;
  44. Dtype roi_start_h = static_cast<Dtype>(round(bottom_rois[2])) * spatial_scale;
  45. Dtype roi_end_w = static_cast<Dtype>(round(bottom_rois[3]) + 1.) * spatial_scale;
  46. Dtype roi_end_h = static_cast<Dtype>(round(bottom_rois[4]) + 1.) * spatial_scale;
  47. // 强制把RoI的宽和高限制在1x1,防止出现映射后的RoI大小为0的情况
  48. Dtype roi_width = max(roi_end_w - roi_start_w, 0.1);
  49. Dtype roi_height = max(roi_end_h - roi_start_h, 0.1);
  50. // 根据原图映射得到的roi的高和配置的psroi pooling的高(这里大小配置为7)自适应计算bin桶的高度
  51. Dtype bin_size_h = roi_height / static_cast<Dtype>(pooled_height);
  52. // 根据原图映射得到的roi的宽和配置的psroi pooling的宽(这里大小配置为7)自适应计算bin桶的宽度
  53. Dtype bin_size_w = roi_width / static_cast<Dtype>(pooled_width);
  54. // 计算第(i,j)个bin桶在feature map上的坐标范围,需要依据它们确定后续pooling的范围
  55. int hstart = floor(static_cast<Dtype>(ph) * bin_size_h
  56. + roi_start_h);
  57. int wstart = floor(static_cast<Dtype>(pw)* bin_size_w
  58. + roi_start_w);
  59. int hend = ceil(static_cast<Dtype>(ph + 1) * bin_size_h
  60. + roi_start_h);
  61. int wend = ceil(static_cast<Dtype>(pw + 1) * bin_size_w
  62. + roi_start_w);
  63. // 确定max pooling具体范围,注意由于RoI取自原图,其左上角不是从(0,0)开始,
  64. // 所以需要加上 roi_start_h roi_start_w作为偏移量,并且超出feature map尺寸范围的部分会被舍弃
  65. hstart = min(max(hstart, 0), height);
  66. hend = min(max(hend, 0), height);
  67. wstart = min(max(wstart, 0),width);
  68. wend = min(max(wend, 0), width);
  69. bool is_empty = (hend <= hstart) || (wend <= wstart);
  70. int gw = pw;
  71. int gh = ph;
  72. // 计算第C类的(ph,pw)位置索引 = ctop×group_size×group_size + gh×gh×group_size + gw
  73. // 例如: ps feature map上第C[=1]类的第(i,j)[=(1,1)]位置,c=1×7×7 + 1×1×7+1=57
  74. int c = (ctop*group_size + gh)*group_size + gw;
  75. // 逐层做average pooling
  76. bottom_data += (roi_batch_ind * channels + c) * height * width;
  77. Dtype out_sum = 0;
  78. for (int h = hstart; h < hend; ++h){
  79. for (int w = wstart; w < wend; ++w){
  80. int bottom_index = h*width + w;
  81. out_sum += bottom_data[bottom_index];
  82. }
  83. }
  84. // 计算第(i,j)bin桶在feature map上的面积
  85. Dtype bin_area = (hend - hstart)*(wend - wstart);
  86. // 若第(i,j)bin桶宽高非法则设置为0,否则为平均值
  87. top_data[index] = is_empty? 0. : out_sum/bin_area;
  88. // 记录此次迭代计算ps feature map上的索引位置
  89. mapping_channel[index] = c;
  90. }
  91. }
  92. template <typename Dtype>
  93. void PSROIPoolingLayer<Dtype>::Forward_gpu(
  94. const vector<Blob<Dtype>*>& bottom, // ResNet50为例,bottom[0]为最后一个卷积层rfcn_cls产生的feature mapshape[1, 1029, 14, 14],
  95. // bottom[1]为rois数据,shape[roi个数m, 5]
  96. const vector<Blob<Dtype>*>& top) { // top为输出层结构, top->count() = top.nRoI的个数) × top.channel(channel数)
  97. // × top.w(输出feature map的宽) × top.h(输出feature map的高)
  98. const Dtype* bottom_data = bottom[0]->gpu_data();
  99. const Dtype* bottom_rois = bottom[1]->gpu_data();
  100. Dtype* top_data = top[0]->mutable_gpu_data();
  101. int* mapping_channel_ptr = mapping_channel_.mutable_gpu_data();
  102. int count = top[0]->count();
  103. caffe_gpu_set(count, Dtype(0), top_data);
  104. caffe_gpu_set(count, -1, mapping_channel_ptr);
  105. // NOLINT_NEXT_LINE(whitespace/operators)
  106. PSROIPoolingForward<Dtype> << <CAFFE_GET_BLOCKS(count), CAFFE_CUDA_NUM_THREADS >> >(
  107. count, bottom_data, spatial_scale_, channels_, height_, width_, pooled_height_,
  108. pooled_width_, bottom_rois, output_dim_, group_size_, top_data, mapping_channel_ptr);
  110. }
  111. template <typename Dtype>
  112. __global__ void PSROIPoolingBackwardAtomic(
  113. const int nthreads, // 输入feature map的元素数
  114. const Dtype* top_diff, // psroi pooling输出feature map所带的梯度信息∂L/∂y(r,j)
  115. const int* mapping_channel, // 同前向,不解释
  116. const int num_rois, // 同前向,不解释
  117. const Dtype spatial_scale, // 同前向,不解释
  118. const int channels, // 同前向,不解释
  119. const int height, // 同前向,不解释
  120. const int width, // 同前向,不解释
  121. const int pooled_height, // 同前向,不解释
  122. const int pooled_width, // 同前向,不解释
  123. const int output_dim, // 同前向,不解释
  124. Dtype* bottom_diff, // 保留输入feature map每个元素通过梯度反向传播得到的梯度信息
  125. const Dtype* bottom_rois) { // 同前向,不解释
  126. // 含义同前向,需要注意的是这里表示的是输入feature map的元素数(反向传播嘛)
  127. CUDA_KERNEL_LOOP(index, nthreads) {
  128. // 同前向,不解释
  129. int pw = index % pooled_width;
  130. int ph = (index / pooled_width) % pooled_height;
  131. int n = index / pooled_width / pooled_height / output_dim;
  132. // 找原图RoIfeature map上的映射位置,解释同前向传播
  133. bottom_rois += n * 5;
  134. int roi_batch_ind = bottom_rois[0];
  135. Dtype roi_start_w = static_cast<Dtype>(round(bottom_rois[1])) * spatial_scale;
  136. Dtype roi_start_h = static_cast<Dtype>(round(bottom_rois[2])) * spatial_scale;
  137. Dtype roi_end_w = static_cast<Dtype>(round(bottom_rois[3]) + 1.) * spatial_scale;
  138. Dtype roi_end_h = static_cast<Dtype>(round(bottom_rois[4]) + 1.) * spatial_scale;
  139. // 同前向
  140. Dtype roi_width = max(roi_end_w - roi_start_w, 0.1); //avoid 0
  141. Dtype roi_height = max(roi_end_h - roi_start_h, 0.1);
  142. // 同前向
  143. Dtype bin_size_h = roi_height / static_cast<Dtype>(pooled_height);
  144. Dtype bin_size_w = roi_width / static_cast<Dtype>(pooled_width);
  145. int hstart = floor(static_cast<Dtype>(ph)* bin_size_h
  146. + roi_start_h);
  147. int wstart = floor(static_cast<Dtype>(pw)* bin_size_w
  148. + roi_start_w);
  149. int hend = ceil(static_cast<Dtype>(ph + 1) * bin_size_h
  150. + roi_start_h);
  151. int wend = ceil(static_cast<Dtype>(pw + 1) * bin_size_w
  152. + roi_start_w);
  153. // 同前向
  154. hstart = min(max(hstart, 0), height);
  155. hend = min(max(hend, 0), height);
  156. wstart = min(max(wstart, 0), width);
  157. wend = min(max(wend, 0), width);
  158. bool is_empty = (hend <= hstart) || (wend <= wstart);
  159. // 计算第Cps feature map权重值,梯度信息会被平均分配
  160. int c = mapping_channel[index];
  161. Dtype* offset_bottom_diff = bottom_diff + (roi_batch_ind * channels + c) * height * width;
  162. Dtype bin_area = (hend - hstart)*(wend - wstart);
  163. Dtype diff_val = is_empty ? 0. : top_diff[index] / bin_area;
  164. for (int h = hstart; h < hend; ++h){
  165. for (int w = wstart; w < wend; ++w){
  166. int bottom_index = h*width + w;
  167. caffe_gpu_atomic_add(diff_val, offset_bottom_diff + bottom_index);
  168. }
  169. }
  170. }
  171. }
  172. template <typename Dtype>
  173. void PSROIPoolingLayer<Dtype>::Backward_gpu(
  174. const vector<Blob<Dtype>*>& top, // psroi pooling输出feature map
  175. const vector<bool>& propagate_down, // 是否做反向传播,回忆前向传播时的那个bool
  176. const vector<Blob<Dtype>*>& bottom) { // psroi pooling输入feature map(ResNet中的rfcn_cls产生的feature map)
  177. if (!propagate_down[0]) {
  178. return;
  179. }
  180. const Dtype* bottom_rois = bottom[1]->gpu_data(); // 原始RoI信息
  181. const Dtype* top_diff = top[0]->gpu_diff(); // psroi pooling feature map梯度信息
  182. Dtype* bottom_diff = bottom[0]->mutable_gpu_diff(); // 待写入的输入feature map梯度信息
  183. const int bottom_count = bottom[0]->count(); // 输入feature map元素总数
  184. const int* mapping_channel_ptr = mapping_channel_.gpu_data();
  185. caffe_gpu_set(bottom[1]->count(), Dtype(0), bottom[1]->mutable_gpu_diff());
  186. caffe_gpu_set(bottom_count, Dtype(0), bottom_diff);
  187. const int count = top[0]->count();
  188. // NOLINT_NEXT_LINE(whitespace/operators)
  189. PSROIPoolingBackwardAtomic<Dtype> << <CAFFE_GET_BLOCKS(count), CAFFE_CUDA_NUM_THREADS >> >(
  190. count, top_diff, mapping_channel_ptr, top[0]->num(), spatial_scale_,
  191. channels_, height_, width_, pooled_height_, pooled_width_, output_dim_,
  192. bottom_diff, bottom_rois);
  194. }
  196. } // namespace caffe
  1. #!/usr/bin/env python
  2. # -*- coding: utf-8 -*- 2
  3. """
  4. Demo script showing detections in sample images.
  5. See README.md for installation instructions before running.
  6. """
  7. import matplotlib
  8. matplotlib.use('Agg')
  9. import matplotlib.pyplot as plt
  10. import _init_paths
  11. from fast_rcnn.config import cfg
  12. from fast_rcnn.test import im_detect
  13. from fast_rcnn.nms_wrapper import nms
  14. from utils.timer import Timer
  15. import numpy as np
  16. import scipy.io as sio
  17. import caffe, os, sys, cv2
  18. import argparse
  19. CLASSES = ('__background__',
  20. 'aeroplane', 'bicycle', 'bird', 'boat',
  21. 'bottle', 'bus', 'car', 'cat', 'chair',
  22. 'cow', 'diningtable', 'dog', 'horse',
  23. 'motorbike', 'person', 'pottedplant',
  24. 'sheep', 'sofa', 'train', 'tvmonitor')
  25. NETS = {'ResNet-101': ('ResNet-101',
  26. 'resnet101_rfcn_final.caffemodel'),
  27. 'ResNet-50': ('ResNet-50',
  28. 'resnet50_rfcn_final.caffemodel')}
  29. def parse_args():
  30. """Parse input arguments."""
  31. parser = argparse.ArgumentParser(description='Faster R-CNN demo')
  32. parser.add_argument('--gpu', dest='gpu_id', help='GPU device id to use [0]',
  33. default=0, type=int)
  34. parser.add_argument('--cpu', dest='cpu_mode',
  35. help='Use CPU mode (overrides --gpu)',
  36. action='store_true')
  37. parser.add_argument('--net', dest='demo_net', help='Network to use [ResNet-101]',
  38. choices=NETS.keys(), default='ResNet-101')
  39. args = parser.parse_args()
  40. return args
  41. def vis_square(data, i):
  42. """Take an array of shape (n, height, width) or (n, height, width, 3)
  43. and visualize each (height, width) thing in a grid of size approx. sqrt(n) by sqrt(n)"""
  44. # normalize data for display
  45. data = (data - data.min()) / (data.max() - data.min())
  46. # force the number of filters to be square
  47. n = int(np.ceil(np.sqrt(data.shape[0])))
  48. padding = (((0, n ** 2 - data.shape[0]),
  49. (0, 1), (0, 1)) # add some space between filters
  50. + ((0, 0),) * (data.ndim - 3)) # don't pad the last dimension (if there is one)
  51. data = np.pad(data, padding, mode='constant', constant_values=1) # pad with ones (white)
  52. # tile the filters into an image
  53. data = data.reshape((n, n) + data.shape[1:]).transpose((0, 2, 1, 3) + tuple(range(4, data.ndim + 1)))
  54. data = data.reshape((n * data.shape[1], n * data.shape[3]) + data.shape[4:])
  55. plt.imshow(data); plt.axis('off')
  56. plt.savefig('feature-' + str(i) + '.jpg')
  57. def vis_demo(net, image_name):
  58. """可视化位置敏感特征图."""
  59. # Load the demo image
  60. im_file = os.path.join(cfg.DATA_DIR, 'demo', image_name)
  61. im = cv2.imread(im_file)
  62. # Detect all object classes and regress object bounds
  63. timer = Timer()
  64. timer.tic()
  65. scores, boxes = im_detect(net, im)
  66. timer.toc()
  67. print ('Detection took {:.3f}s for '
  68. '{:d} object proposals').format(timer.total_time, boxes.shape[0])
  69. conv = net.blobs['data'].data[0]
  70. ave = np.average(conv.transpose(1, 2, 0), axis=2)
  71. plt.imshow(ave); plt.axis('off')
  72. plt.savefig('featurex.jpg')
  73. # Visualize detections for each class
  74. CONF_THRESH = 0.8
  75. NMS_THRESH = 0.3
  76. for cls_ind, cls in enumerate(CLASSES[1:]):
  77. cls_ind += 1 # because we skipped background
  78. cls_boxes = boxes[:, 4:8]
  79. cls_scores = scores[:, cls_ind]
  80. dets = np.hstack((cls_boxes,
  81. cls_scores[:, np.newaxis])).astype(np.float32)
  82. keep = nms(dets, NMS_THRESH)
  83. dets = dets[keep, :]
  84. print cls_ind, ' ', cls
  85. # rfcn_cls[0, 0:49] 是第0类的7×7map,rfcn_cls[0, 49:98] 是第1类的7×7map,以此类推。
  86. feat = net.blobs['rfcn_cls'].data[0, cls_ind*49:(cls_ind+1)*49]
  87. vis_square(feat, cls)
  88. if __name__ == '__main__':
  89. cfg.TEST.HAS_RPN = True # Use RPN for proposals
  90. args = parse_args()
  91. prototxt = os.path.join(cfg.MODELS_DIR, NETS[args.demo_net][0],
  92. 'rfcn_end2end', 'test_agnostic.prototxt')
  93. caffemodel = os.path.join(cfg.DATA_DIR, 'rfcn_models',
  94. NETS[args.demo_net][132])
  95. if not os.path.isfile(caffemodel):
  96. raise IOError(('{:s} not found.\n').format(caffemodel))
  97. if args.cpu_mode:
  98. caffe.set_mode_cpu()
  99. else:
  100. caffe.set_mode_gpu()
  101. caffe.set_device(args.gpu_id)
  102. cfg.GPU_ID = args.gpu_id
  103. net = caffe.Net(prototxt, caffemodel, caffe.TEST)
  104. for layer_name, blob in net.blobs.iteritems():
  105. print layer_name + '\t' + str(blob.data.shape)
  106. print '\n\nLoaded network {:s}'.format(caffemodel)
  107. # Warmup on a dummy image
  108. im = 128 * np.ones((300, 500, 3), dtype=np.uint8)
  109. for i in xrange(2):
  110. _, _= im_detect(net, im)
  111. im_names = ['car.jpg']
  112. for im_name in im_names:
  113. print '~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~'
  114. print 'Demo for data/demo/{}'.format(im_name)
  115. vis_demo(net, im_name)
  116. # obtain the output probabilities
  117. output_prob = net.blobs['cls_prob'].data[0]
  118. print 'probabilities:'
  119. print output_prob

8.8 DenseNet

8.9 Mask-R-CNN

8.10 YOLO

8.11 SSD

8.12 YOLO 9000

9. 语义分割

9.1 FCN

FCN在《Fully Convolutional Networks for Semantic Segmentation》中第一次被提出,个人认为是实现图像end to end语义分割的开山之作,第一次做到了低成本的像素级分类预测(end-to-end, pixels-to-pixels),另外这个方法用在目标检测、识别上效果好于传统新方法(如:Faster R-CNN)。

9.1.1 算法概述

CNN网络无疑是特征提取的利器,尤其在图像领域,回顾我们的做法:CNN做特征提取+全连接层做特征组合+分类/回归,为了能提高模型预测能力,需要通过多个全连接层(做笛卡尔积)做特征组合,这里是参数数量最多的地方,成为模型训练,尤其是inference时的最大瓶颈(所以模型压缩和剪枝算法会把第一把刀放在全连接层),而由于全连接层的存在,导致整个网络的输入必须是固定大小的:由于卷积和采样操作更本不关心输入大小如何,试想如果输入大小不一,不同图片到了全连接层时其输入节点数是不一样的,而网络的定义必须事先定义好,所以没法儿玩儿了,于是有了前面的SPP及RoI pooling来解决这个问题,FCN则是解决这个问题的另一个思路。
相比于传统CNN,FCN把全连接层全部替换成卷积层,并在feature map(可以是其中任何一个)上做上采样,使其恢复到原始图片大小,这样不但保留了每个像素的空间信息,而且每个像素都会有一个分类预测。比如下图中pixelwise prediction那一层,小猫、小狗、电视、背景都会在像素级别做分类预测:

9.1.2 1×1卷积回顾

1、每个1×1卷积核会有一个参数,利用它们可以做跨通道特征融合,即对多个通道的feature map做线性组合;
3、可以在不损失feature map信息的前提下利用后面的激活函数增加模型非线性表征能力,可以低成本的把网络变深。

9.1.3 全卷积网络

1、为了考虑上下文信息,需要一个滑动窗口,利用滑动窗口内的feature map对每个像素做分类,分类效果及存储空间随滑动窗口的大小上升;

上图是传统CNN工作流程,下图是FCN工作流程,它最终可以得到关于目标的热图,这种变换除了在语义分割、检测、识别上用到,也会在feature map可视化上用来帮助分析特征。


9.1.4 Nyquist–Shannon采样定理


其中采样率为:1/T,s(n*T)是s(x)的采样样本,sinc(x)是采样核(resampling kernel)。
一般来说 信息重构器有以下性质:
3、resampling kernel:
4、resampling kernel:是对称的,
5、resampling kernel:是处处可微的。

当然还有其他形式的resampling kernel,比如bilinear resampling kernel,满足上述性质2、3、4:

我利用scikit-image library给个简单的bilinear resampling示例:

  1. import skimage.transform
  2. from numpy import ogrid, repeat, newaxis
  3. from skimage import io
  4. def upsample_with_skimage(img, factor):
  5. # order=1表示bilinear resampling,参见:http://scikit-image.org/docs/dev/api/skimage.transform.html。
  6. # order的含义:
  7. # 0: Nearest-neighbor
  8. # 1: Bi-linear (default)
  9. # 2: Bi-quadratic
  10. # 3: Bi-cubic
  11. # 4: Bi-quartic
  12. # 5: Bi-quintic
  13. return skimage.transform.rescale(img,
  14. factor,
  15. mode='constant',
  16. cval=0,
  17. order=1)
  18. if __name__ == '__main__':
  19. target = upsample_with_skimage(img=io.imread("feature_map.jpg"), factor=5)
  20. io.imsave("upsampling.png", target, interpolation='none')

9.1.5 转置卷积(Transposed Convolution)







整个过程平滑柔顺,多种情况下的详细解释可以看:《Convolution arithmetic tutorial》

keras下做转置卷积,输入feature map及最终效果与8.7.4。

  1. # -*- coding: utf-8 -*-
  2. from __future__ import division
  3. import numpy as np
  4. import tensorflow as tf
  5. from skimage import io
  6. import skimage
  7. import io
  8. import os
  9. import keras.backend as K
  10. def get_kernel_size(factor):
  11. """
  12. 给定上采样因子,返回核大小,上采样因子大小等于转置卷积步长。
  13. """
  14. return 2 * factor - factor % 2
  15. def upsample_filt(size):
  16. """
  17. 返回上采样bilinear kernel矩阵。
  18. """
  19. factor = (size + 1) // 2
  20. if size % 2 == 1:
  21. center = factor - 1
  22. else:
  23. center = factor - 0.5
  24. og = np.ogrid[:size, :size]
  25. return (1 - abs(og[0] - center) / factor) * \
  26. (1 - abs(og[1] - center) / factor)
  27. def bilinear_upsample_weights(factor, channel):
  28. """
  29. 使用bilinear filter初始化转置卷积权重矩阵。
  30. """
  31. filter_size = get_kernel_size(factor)
  32. weights = np.zeros((filter_size,
  33. filter_size,
  34. channel,
  35. channel), dtype=np.float32)
  36. upsample_kernel = upsample_filt(filter_size)
  37. for i in xrange(channel):
  38. weights[:, :, i, i] = upsample_kernel
  39. return weights
  40. def upsample_keras(factor, input_img):
  41. SCALE = 256
  42. channel = input_img.shape[2]
  43. scale_height = input_img.shape[0] * factor
  44. scale_width = input_img.shape[1] * factor
  45. expanded_img = np.expand_dims(input_img, axis=0)
  46. with tf.device("/gpu:1"):
  47. gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=1, allow_growth=True)
  48. os.environ["CUDA_VISIBLE_DEVICES"] = "1"
  49. sess = tf.Session(config=K.tf.ConfigProto(allow_soft_placement=True,
  50. log_device_placement=True,
  51. gpu_options=gpu_options))
  52. input_value = tf.placeholder(tf.float32)
  53. trans_filter = tf.placeholder(tf.float32)
  54. upsample_filter_np = bilinear_upsample_weights(factor, channel)
  55. res = K.conv2d_transpose(input_value, trans_filter,
  56. output_shape=[1, scale_height, scale_width, channel],
  57. padding='same',
  58. strides=(factor, factor))
  59. final_result = sess.run(res,
  60. feed_dict={trans_filter: upsample_filter_np,
  61. input_value: expanded_img})
  62. if channel != 1:
  63. return final_result.squeeze() / SCALE
  64. return final_result.squeeze()
  65. upsampled_img_keras = upsample_keras(factor=5, input_img=skimage.io.imread("feature_map.jpg"))
  66. skimage.io.imsave("bilinear_feature_map.jpg",upsampled_img_keras, interpolation='none')

9.1.6 代码实践


  1. import numpy as np
  2. import matplotlib.pyplot as plt
  3. from pylab import *
  4. import os
  5. import sys
  6. import cv2
  7. from PIL import Image
  8. from keras.preprocessing.image import *
  9. from keras.models import load_model
  10. import keras.backend as K
  11. from keras.applications.imagenet_utils import preprocess_input
  12. from models import *
  13. def inference(model_name, weight_file, image_size, image_list, data_dir, label_dir, return_results=True, save_dir=None,
  14. label_suffix='.png',
  15. data_suffix='.jpg'):
  16. current_dir = os.path.dirname(os.path.realpath(__file__))
  17. # mean_value = np.array([104.00699, 116.66877, 122.67892])
  18. batch_shape = (1, ) + image_size + (3, )
  19. save_path = os.path.join(current_dir, 'Models/'+model_name)
  20. model_path = os.path.join(save_path, "model.json")
  21. checkpoint_path = os.path.join(save_path, weight_file)
  22. # model_path = os.path.join(current_dir, 'model_weights/fcn_atrous/model_change.hdf5')
  23. # model = FCN_Resnet50_32s((480,480,3))
  24. #config = tf.ConfigProto(gpu_options=tf.GPUOptions(allow_growth=True))
  25. #session = tf.Session(config=config)
  26. #K.set_session(session)
  27. model = globals()[model_name](batch_shape=batch_shape, input_shape=(512, 512, 3))
  28. model.load_weights(checkpoint_path, by_name=True)
  29. model.summary()
  30. results = []
  31. total = 0
  32. for img_num in image_list:
  33. img_num = img_num.strip('\n')
  34. total += 1
  35. print('#%d: %s' % (total,img_num))
  36. image = Image.open('%s/%s%s' % (data_dir, img_num, data_suffix))
  37. image = img_to_array(image) # , data_format='default')
  38. label = Image.open('%s/%s%s' % (label_dir, img_num, label_suffix))
  39. label_size = label.size
  40. img_h, img_w = image.shape[0:2]
  41. # long_side = max(img_h, img_w, image_size[0], image_size[1])
  42. pad_w = max(image_size[1] - img_w, 0)
  43. pad_h = max(image_size[0] - img_h, 0)
  44. image = np.lib.pad(image, ((pad_h/2, pad_h - pad_h/2), (pad_w/2, pad_w - pad_w/2), (0, 0)), 'constant', constant_values=0.)
  45. # image -= mean_value
  46. '''img = array_to_img(image, 'channels_last', scale=False)
  47. img.show()
  48. exit()'''
  49. # image = cv2.resize(image, image_size)
  50. image = np.expand_dims(image, axis=0)
  51. image = preprocess_input(image)
  52. result = model.predict(image, batch_size=1)
  53. result = np.argmax(np.squeeze(result), axis=-1).astype(np.uint8)
  54. result_img = Image.fromarray(result, mode='P')
  55. result_img.palette = label.palette
  56. # result_img = result_img.resize(label_size, resample=Image.BILINEAR)
  57. result_img = result_img.crop((pad_w/2, pad_h/2, pad_w/2+img_w, pad_h/2+img_h))
  58. # result_img.show(title='result')
  59. if return_results:
  60. results.append(result_img)
  61. if save_dir:
  62. result_img.save(os.path.join(save_dir, img_num + '.png'))
  63. return results
  64. if __name__ == '__main__':
  65. with tf.device('/gpu:1'):
  66. gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=1, allow_growth=True)
  67. os.environ["CUDA_VISIBLE_DEVICES"] = "1"
  68. tf.Session(config=K.tf.ConfigProto(allow_soft_placement=True,
  69. log_device_placement=True,
  70. gpu_options=gpu_options))
  71. model_name = 'AtrousFCN_Resnet50_16s'
  72. weight_file = 'checkpoint_weights.hdf5'
  73. image_size = (512, 512)
  74. data_dir = os.path.expanduser('~/.keras/datasets/VOC2012/VOCdevkit/VOC2012/JPEGImages')
  75. label_dir = os.path.expanduser('~/.keras/datasets/VOC2012/VOCdevkit/VOC2012/SegmentationClass')
  76. image_list = sys.argv[1:]#'2007_000491'
  77. results = inference(model_name, weight_file, image_size, image_list, data_dir, label_dir, save_dir="result")
  78. for result in results:
  79. result.show(title='result', command=None)


9.3 SegNet

9.4 UberNet

10. 物体跟踪

10.1 卡尔曼滤波器

10.2 CamShift

10.3 DLT

10.4 SO-DLT

10.5 FCNT

10.6 MDNet

10.7 RTT

10.8 DeepTracking

11. 强化学习

12. BOT

12.1 BOT架构

12.2 DSL

13. OCR

13.1 基于字符分割

13.2 基于行分割

13.3 CTC

14. 机器学习工具

14.1 机器学习架构设计

14.2 Keras

14.2.1 Keras设计思想


14.2.2 Keras处理大数据量

14.2.3 Keras单机多GPU

14.2.4 Keras多机多GPU

14.2.5 Keras与Tensorflow混合编程

14.3 Tensorflow

14.3.1 TF架构

14.3.2 TF in Docker

14.4 Kaldi

15. 自动驾驶

15.1 Openpilot

15.1.1 项目简介

Comma.ai是由天才黑客George Hotz(第一个破解iPhone、PS 3的人,相关介绍:https://www.bloomberg.com/features/2015-george-hotz-self-driving-car/)创立的专注自动驾驶的公司,目标是1000刀实现自动驾驶,但公司由于受到美国国家公路交通安全管理局的严格管制,于是“一怒之下”的把整个系统开源,取名openpilot,从功能上完全具备了目前特斯拉的autopilot具有的能力,主要表现在ACC和LKAS上。目前为止所有自动驾驶汽车都属于level 2,包括Waymo、Cruise、comma.ai、Ford、Tesla,特点是需要驾驶员坐在驾驶位且持续关注行车状态并随时接管汽车,实验室车辆在我看来也就Leve 2+,Level 3阶段,在特定路段驾驶员可以完全不用关注汽车行驶状态,目前没有厂商实现L3。openpilot目前主要能力是在6min内无需人的干预(但人需要盯着)控制本田和讴歌某几款车的加速、刹车、转向,从效果看,是我个人目前最看好的开源项目,且与我之前的构想一致:无需对汽车进行改造,无需昂贵的硬件设备,即插即用实现自动辅助驾驶。另外消费者不一定买同一品牌汽车,他们的数据也可以互相共享,从而降低自动驾驶造成的事故发生几率。

15.1.2 基本概念

CAN总线:(Controller Area Network, CAN)即控制器局域网络,是由以研发和生产汽车电子产品著称的德国BOSCH公司开发的,并最终成为国际标准(ISO 11898),是国际上应用最广泛的现场总线之一。不仅用于汽车,也广泛运用于工业,商业等领域。
在汽车领域,CAN是用于连接电子控制单元[ECU]的多主串行总线标准(通讯总线)。CAN网络需要两个或多个节点进行通信。节点的复杂性可以从简单的I / O设备到具有CAN接口和复杂软件的嵌入式计算机。节点还可以是允许标准计算机通过USB或以太网端口与CAN网络上的设备进行通信的网关。所有节点通过两线总线相互连接。电线为120Ω额定双绞线。

LIN总线:(Local Interconnect Network)本地互联网,是一种低成本的串行通讯网络,用于实现汽车中的分布式电子系统控制。LIN 的目标是为现有汽车网络(例如CAN 总线)提供辅助功能,因此LIN总线是一种辅助的总线网络。在不需要CAN 总线的带宽和多功能的场合,比如智能传感器和制动装置之间的通讯使用LIN 总线可大大节省成本。
一个开源机器人软件开发平台,目前和 Neo 适配的智能手机只有中国厂商一加生产的一加 3 手机,只有这部手机权限足够开放,而且相机和芯片 (高通骁龙820)都符合要求,且会利用该手机的GPS。硬件成本700刀。

15.1.3 系统架构

15.1.4 软件架构


15.1.5 汽车基础组建

identifier +11-bit标准段+29-bit扩展段,整个消息长度可扩展到8 bytes。

  2. SG_ STEER_TORQUE : 7|16@0- (1,0) [-3840|3840] "" EPS
  3. SG_ STEER_TORQUE_REQUEST : 23|1@0+ (1,0) [0|1] "" EPS
  4. SG_ CHECKSUM : 39|4@0+ (1,0) [0|15] "" EPS
  5. SG_ COUNTER : 33|2@0+ (1,0) [0|3] "" EPS



15.1.6 公共组件


15.1.7 手机组件

智能手机是openpilot的最大硬件,所有通信、数据收集、计算、展现都是通过手机作为载体。整个openpilot采用cap’n proto做消息序列化封装,使用ZMQ做消息通信,很高效,整体架构提前做了ROS 2.0想做的事。
can’n proto(https://capnproto.org/)的效率更加适用于这种嵌入式场景:


15.1.8 自动驾驶组件


  1. struct ModelData {
  2. frameId @0 :UInt32;
  3. path @1 :路径数据;
  4. leftLane @2 :左行车道;
  5. rightLane @3 :右行车道;
  6. lead @4 :前方引领车辆;
  7. ...

15.1.9 总结

目前开源软件能让我们达到Level 2,但要实现更高级别必须解决上面4个问题。



16. CUDA编程与高性能计算

17. References

1、《Understanding the Bias-Variance Tradeoff》
2、《Boosting Algorithms as Gradient Descent in Function Space》
3、《Optimal Action Extraction for Random Forests and
Boosted Trees》
4、《Applying Neural Network Ensemble Concepts for Modelling Project Success》
5、《Introduction to Boosted Trees》
6、《Machine Learning:Perceptrons》
7、《An overview of gradient descent optimization algorithms》
8、《Ad Click Prediction: a View from the Trenches》
9、《Improving the Convergence of Back-Propagation Learning with Second Order Methods》
11、《Adaptive Subgradient Methods for Online Learning and Stochastic Optimization》
11、《Sparse Allreduce: Efficient Scalable Communication for Power-Law Data》
12、《Asynchronous Parallel Stochastic Gradient Descent》
13、《Large Scale Distributed Deep Networks》
14、《Introduction to Optimization —— Second Order Optimization Methods》
15、《On the complexity of steepest descent, Newton’s and regularized Newton’s methods for nonconvex unconstrained optimization》
16、《On Discriminative vs. Generative classifiers: A comparison of logistic regression and naive Bayes 》
17、《Parametric vs Nonparametric Models》
18、《XGBoost: A Scalable Tree Boosting System》
20、《Computer vision: LeNet-5, AlexNet, VGG-19, GoogLeNet》
21、François Chollet在Quora上的专题问答:
23、《Upsampling and Image Segmentation with Tensorflow and TF-Slim》
