@songying 2019-01-07T03:54:04.000000Z 字数 1624 阅读 1520

Neural Responding Machine for Short-Text Conversation

Attention

值得一看的paper

Abstract

task：STC(Short-Text Conversation)
模型： NRM

本文提出了NRM模型，该模型采用通用的encoder-decoder 架构：它将response的生成过程比作是规范为一个对输入文本表示的decoding 过程，其中，encoding 与 decoding都采用RNN。

3. Neural Responding Machines for STC

NRM基本思想：先建立post的一个表示，然后基于这个表示生成回答。具体来说， encoder 将输入序列 $x = (x_1, \cdots, x_T)$ 转化成高维表示 $h = (h_1, \cdots, h_T)$ ，然后利用attention机制，生成不同时刻t的上下文向量 $c_t$ ，然后 $c_t$ 通过矩阵L（decoder一部分）将 $s_t$ ，然后通过RNN，生成第t个单词 $y_t$ 。

在NMT中， L将源语言的 representation 转化为目标语言。但在NRM 中， L担任着更重要的角色：

it needs to transform the representation of post (or some part of it) to the rich representation of many plausible responses.

3.1 The computation in Decoder

上图就是模型的decoder部分，本质上是一个标准的RNN语言模型，只是添加了一个考虑上下文 $c$ 。

第t个词的生成概率可以表达为：

（

$p（y_t |y_{t-1}, \cdots, y_1, x) = g(y_{t-1}, s_t, c_t)$

$y_t$ ：是one-hot 表示

g(): 是一个softmax 激活函数

$s_t$ : t时刻decoder的隐层状态，有：

$s_t = f(y_{t-1}, s_{t-1}, c_t)$

其中， $f()$ 是一个非线性函数，且L为 $f()$ 的参数。此处的 $f()$ 可以是logistic 函数， LSTM或GRU。在本文中，我们使用GRU，它的permance与LSTM接近，但参数更少，易于训练。

那么，此时的 $s_t$ 的计算如下：

更 新 门 ： 重 置 门 ：

$更新门： z_t = \sigma(W_z \, e(y_{t-1}) + U_z s_{t-1} + L_zc_t) \\ 重置门： r_t = \sigma(W_r \, e(y_{t-1}) + U_r s_{t-1} + L_r c_t) \\ \hat{s_t} = tanh(W \, e(y_{t-1}) + U(r_t \circ s_{t-1}) + L c_t) \\ s_t = (1-z_t) \circ s_{t-1} + z_t \circ \hat{s_t}$

在上述公式中， $e(y_{t-1})$ 是一个词 $y_{t-1}$ 的词向量。 $L = \{L, L_z, L_r\}$ 。

3.2 The Computation in Encoder

我们考虑三种encoding方案：

the global scheme

the local scheme

the hybrid scheme which combines 1 and 2

3.2.1 Global Scheme

$h_t = f(x_t, h_{t-1}$

我们采用RNN的最后一个hidden state $c_t = h_T$ 来作为句子的全局表示。但该方法有以下缺点：

a vectorial summarization of the entire post is often hard to obtain and may lose important details for response generation, especially when the dimension of the hidden state is not big enough .

3.2.2 Local Scheme

$\alpha_{t,j} = q(h_j, s_{t-1}) \\ c_t = \sum_{j=1}^T \alpha_{tj}h_j$