@songying 2018-08-03T09:21:10.000000Z 字数 1293 阅读 960

Highway Networks



随着神经网络层数的加深, 其变得越来越难以训练。在本文中, 我们引入了一个新的架构来减轻深度神经网络中gradient-based training。在该架构中,我们采用了一种门机制, 在此机制下, 一些信息流没有衰减的通过一些网络层, 适用于SGD方法。


In this extended abstract, we present a novel architecture that enables the optimization of networks with virtually arbitrary depth. This is accomplished through the use of a learned gating mechanism for regulating information flow which is inspired by Long Short Term Memory recurrent neural networks. Due to this gating mechanism, a neural network can have paths
along which information can flow across several layers without attenuation.

2. Highway Networks


其中, H通常表示一个非线性的激活函数。
而在highway network中, 我们添加了两个非线性的转换如下:

其中, T称为 the transform gate, C称为 the carry gate, 为了更加简明,我们这里设置, 那么则有:

注意,此时, 的维度必须相同,不够补零,最后得到:

我们因此得到了它的Jacobian :

Thus, depending on the output of the transform gates, a highway layer can smoothly vary its behavior between that of a plain layer and that of a layer which simply passes its inputs through.

2.1 Constructing Highway Networks
