@songying 2019-03-19T03:55:15.000000Z 字数 1232 阅读 1295

Gated-Attention Readers for Text Comprehension

阅读理解

Abstract

本文提出新的模型： Gated-Attention Reader

数据集： CNN&Daily Mail， Who Did What

Introductioon

阅读理解模型成功的原因有以下两大因素：
1. Multi-hop architectures，模型能够 scan the document and the question iteratively for multiple passes.
2. 注意力机制，使得模型能 focus on appropriate subparts of the context document。

本模型将二者结合起来形成新的attention。

3. Gated-Attention Reader

3.1 Multi-Hop Architecture

首先，依旧是先将Dcoument 与Query 转换为词向量，然后采用双向RNN（GRU）来获得 Document 与query的上下文表示矩阵，表示如下：

$Document : X^{(0)} = [x_1^{(0)}, x_2^{(0)}, \cdots , x_{|D|}^{(0)}] \\ Query: Y = [y_1, y_2, \cdots , y_{|Q|}] \\ Document: D^{(1)} = \overleftrightarrow{GRU}_D^{(1)}(X^{(0)}) \\ Query: Q^{(k)} = \overleftrightarrow{GRU}_Q^{(k)}(Y)$
Gated-Attention Module: 然后，在接下来计算中，我们要不断的迭代 D 与 X：

$D^{(k)} = \overleftrightarrow{GRU}_D^{(k)}(X^{(k-1)}) \\ X^{(k)} = GA(D^{(k)}, Q^{(k)})$
其中， GA Attention 的计算公式如下：

$\alpha_i = softmax(Q^T d_i) \\ \tilde{q_i} = Q \alpha_i \\ x_i = d_i \odot \tilde{q_i} \quad or \quad x_i = d_i + \tilde{q}_i \quad or \quad x_i = d_i || \tilde{q}_i \\$
从直观上看，其实还是不断的融入 query 信息来获得在document中与query最相关的实体词。与上述几个模型来比较，该模型是多层的，更能够把握这种相关语义。
这个过程，我们迭代了K次，最终得到了 $D^{(k)}$ 。
Answer Prediction
在Answer Prediction 阶段，先找到空白处位置的词的表示，然后与 D^{(k)} 做内积，再进行softmax：

$这个不太懂$
$q_l^{(K)} = q_l^f || q_{T-L+1}^b \quad ?? \quad 这个不太懂\\ s = softmax((q_l^{(K)})^T D^{(K)})$
最后，再将相同词的概率合并：

$Pr(c | d,q ) = \sum_{i \in I(c,d)} s_i$