@songying 2018-07-19T13:59:58.000000Z 字数 1414 阅读 1740

Recurrent Neural Network Regularization

论文笔记

这篇文章是TensorFlow官方实现RNN的文章。
代码地址：https://github.com/tensorflow/models/blob/master/tutorials/rnn/ptb/ptb_word_lm.py

摘要

本文尝试将dropout用于LSTM中。实验表明的确减轻了各种task中的overfitting。这些task包括： language model， speech recognition， image caption generation 和 machine translation

Introduction

RNN在许多tasks包括 language model， speech recognition， machine translation都取得了state-of-art的效果。
在实际使用中， large RNNs 会导致overfit。在本文中，我们在LSTMs中使用 dropout正则化，用于减少overfitting，并将其用于三种不同的问题中。

在本文中，我们考虑了以下工作：language modeling， speech recognition, machine translation

Regulatizing RNNs with LSTM CELLS

$h^l_t \in R^n$ 表示t时刻的第l层。, 下标表示timesteps，上标表示层数,
$T_{n,m}$ 表示 $R^n \to R^m$ 的映射(Wx + b)
$\bigodot$ : 表示按元素乘积
$h_t^0$ ：表示k时刻输入的词向量

3.1 LSTM(LONG-SHORT TERM MEMORY UNITS)

普通RNN：

$，其中$
$h^{l-1}_t, h^l_{t-1} \to h_t^l \\ h_t^l = f(T_{n,n} h^{l-1}_t + T_{n,n}h^l_{t-1}) ，其中 f \in \{sigm tanh\}$
LSTM

$c_t^l \in R^n$ 存储long term memory.
本文中使用lstm：论文：

i: 输入门的门值，表示当前时刻 $l-1$ 层有多少保存到单元状态 $c_t$

f: 遗忘门门值，表示上一时刻的 $c_{t-1}$ 有多少保留到当前时刻 $c_t$

o: 输出门门值，表示 $c_t$ 有多少输入当当前输出值 $h_t$

g: 表示当前输入状态，你可以把它想象成不包含上一时刻的长期状态 c_{t-1} 时，我们生成的当前此刻的长期状态\tilde{c_t}

Regularization with Dropout

这是本文的核心，将dropout正则化成功应用到lstm中，并减少了overfitting。
主要思想是： apply the dropout operator only to the non-recurrent connections.（见图2）
下面的图和公式很好的miao述了正则化过程，其中D表示的就是Dropout操作。

shows how information could flow from an event that occurred at timestep t − 2 to the prediction in timestep t + 2 in our implementation of dropout.

Experments

我们在三个领域进行了测试： language modeling， speech recognition， machine translation。

Recurrent Neural Network Regularization

摘要

Introduction

Regulatizing RNNs with LSTM CELLS

3.1 LSTM(LONG-SHORT TERM MEMORY UNITS)

Regularization with Dropout

Experments

4.1 language Modeling

PTB数据集

Recurrent Neural Network Regularization

摘要

Introduction

Related work

Regulatizing RNNs with LSTM CELLS

3.1 LSTM(LONG-SHORT TERM MEMORY UNITS)

Regularization with Dropout

Experments

4.1 language Modeling

PTB数据集

内容目录