@songying 2018-06-22T07:47:55.000000Z 字数 2262 阅读 1430

A Comparative Study of Word Embeddings for Reading Comprehension

word-embedding

off-the-shelf (OTS): 现成的，已训练好的
out-of-vocabulary (OOV) tokens：未出现过的词

Abstract

the use of pre-trained word embeddings
the representation of out-of-vocabulary tokens at test time
这两个因素与model 结构相比对最终的performance有更大的影响。

Wesystematically explore several options for these choices, and provide recommendations to researchers working in this area.

Introduction

目前很多RC模型都使用了一下技术：

Tokens in the document and question are represented using word vectors obtained from a lookup table (either initialized randomly, or from a pre-trained source such as GloVe (Pennington et al., 2014)).

A sequence model such as LSTM (Hochreiter and Schmidhuber, 1997), augmented with an attention mechanism (Bahdanau et al., 2014), updates these vectors to produce contextual representations.

An output layer uses these contextual representations to locate the answer in the document.

本文探索1 对最终性能带来的影响。

在本文中，我们比较了两个模型： Stanford Attentive Reader (AR) (Chenet al., 2016) and Gated Attention (GA) Reader(Dhingra et al., 2016)，数据集采用： Who-Did-What dataset (Onishi et al., 2016)

pre-trained 词向量

Based on our findings, we recommend the use of certain pre-trained GloVe vectors for initialization. These consistently outperform other off-the-shelfembeddings such as word2vec 1 (Mikolov et al.,2013), as well as those pre-trained on the target corpus itself and, perhaps surprisingly, those trained on a large corpus from the same domain as the target dataset.
how to handle out-of-vocabulary (OOV) tokens?
1. A common approach (e.g. (Chen et al., 2016; Shen et al., 2016)) is to replace infrequent words during training with a special token UNK, and use this token to model the OOV words at the test phase.
  1. A superior strategy is to assign each OOV token either a pre-trained, if available, or a random but unique vector at test time.

2. Background

2.1 RC Datasets & Models

本文选用数据集：

Who-Did-What (WDW) (Onishi et al., 2016)

the Children’s Book Test (CBT) (Hill et al., 2015)

本文选用模型：

Stanford AR

the high-performing GA Reader.

Stanford AR
GA Reader

Word Embedding 方法

最受欢迎的方法是 Glove 和 Word2Vec
一大区别： Glove和 word2vec 都提供了预训练好的词向量，While the GloVe package provides embeddings with varying sizes (50-300), word2vec only provides embeddings of size 300.

我们也训练了一些其他的词向量：

总之，对于WDW语料库，我们使用了两种语料：

one large (OTS)

one small (WDW)

对于CBT语料库，我们使用了：

one large (BT)

one small (CBT).

超参数设置

对于WDW数据集：隐层单元数： d= 128 , RNN单元: GRU， dropout: p=0.3.

对于 CBT-NE数据集：隐层单元数： d= 128 , RNN单元: GRU， dropout: p=0.4.

The Stanford AR 只有一层
the GA Reader 有 3 层

对所有实验，词向量长度为： $d_w = 100$

A Comparative Study of Word Embeddings for Reading Comprehension

Abstract

Introduction

2. Background

2.1 RC Datasets & Models

Word Embedding 方法

3. Experiments and Results

3.1 Comparison of Word Embeddings

A Comparative Study of Word Embeddings for Reading Comprehension

Abstract

Introduction

2. Background

2.1 RC Datasets & Models

Word Embedding 方法

3. Experiments and Results

3.1 Comparison of Word Embeddings

内容目录