@songying
2018-08-03T09:21:10.000000Z
字数 1293
阅读 938
word-embedding
随着神经网络层数的加深, 其变得越来越难以训练。在本文中, 我们引入了一个新的架构来减轻深度神经网络中gradient-based training。在该架构中,我们采用了一种门机制, 在此机制下, 一些信息流没有衰减的通过一些网络层, 适用于SGD方法。
In this extended abstract, we present a novel architecture that enables the optimization of networks with virtually arbitrary depth. This is accomplished through the use of a learned gating mechanism for regulating information flow which is inspired by Long Short Term Memory recurrent neural networks. Due to this gating mechanism, a neural network can have paths
along which information can flow across several layers without attenuation.
对于一个简单的前馈神经网络而言,其公式如下:
注意,此时, 的维度必须相同,不够补零,最后得到:
我们因此得到了它的Jacobian :
Thus, depending on the output of the transform gates, a highway layer can smoothly vary its behavior between that of a plain layer and that of a layer which simply passes its inputs through.