@songying 2018-06-21T12:36:19.000000Z 字数 1462 阅读 1229

Words or characters? fine-grained gating for reading comprehension

word-embedding

代码地址： https://github.com/kimiyoung/fg-gating

Abstract

We present a fine-grained gating mechanism to dynamically combine word-level and character-level representations based on properties of the words.

Introduction

Word-level representations are good at memorizing the semantics of the tokens while character-level representations are more suitable for modeling sub-word morphologies (Ling et al., 2015; Yang et al., 2016a).
For example,considering “cat” and “cats”, word-level representations can only learn the similarities between the two tokens by training on a large amount of training data, while character-level representations, by design, can easily capture the similarities. Character-level representations are also used to alleviate the difficulties of modeling out-of-vocabulary (OOV) tokens (Luong & Manning, 2016).

Fine-Grained Gating

3.1 Reading Comprehension Setting

document: $P = (p_1, \cdots, p_M)$
question: $Q = (q_1, \cdots, q_N)$
每一个词的表示为 $(w_i, C_i)$ 其中， $w_i$ 是该词的one-hot encoding格式。 $C_i$ 是词中每个character的向量矩阵。

$h_i^0 = f(w_i, C_i)$
- $H_p^0$ : 表示document中的tokens的向量表示
- $H_q$ : 表示query中的tokens的向量表示

假设我们在第k层对 $H_p^{k-1}$ 和 $H_q$ 使用RNN来获得隐层状态 $P^k$ 与 $Q^k$ ,其中 $P^k$ 是 M × d维的矩阵； $Q^k$ 是 N × d维的矩阵；d是隐层单元数。

$H_p^k = \gamma(P^k, Q^k).$

3.2 Word-Character Fine-Grained Gating

每个token的表示： $h_i^0 = f(w_i, C_i)$
我们首先在C上使用RNN，并将最后一步的c作为character-level representation。用E来表示the token embedding lookup table。我们使用Ew来获得token的word-level representation。我们假定c与Ew都是相同的长度 $d_e$ 。

我们提出使用一个门来动态的选择word-level representation 与 char-level representation.

$g = \sigma(W_gv + b_g)$

NER: 词性标注
POS：命名实体
Frequency：出现频率
Lookup：