本文也提供了两种baseline algorithms: a featurebased classifier 和 a state-of-the-art neural network
Introduction
RC的难点在于:
the questions can be complex, e.g. have highly compositional semantics
finding the correct answer can require complex reasoning, e.g. combining facts from multiple sentences or background knowledge
individual facts can be difficult to recover from text
举例:
Question
The Dodecanese Campaign of WWII that was an attempt by the Allied forces to capture islands in the Aegean Sea was the inspiration for which acclaimed 1961 commando film?
Answer
The Guns of Navarone
Excerpt
The Dodecanese Campaign of World War II was an attempt by Allied forces to capture the Italianheld Dodecanese islands in the Aegean Sea following the surrender of Italy in September 1943, and use them
as bases against the German-controlled Balkans. The failed campaign, and in particular the Battle of Leros, inspired the 1957 novel The Guns of Navarone and the successful 1961 movie of the same name.
我们的贡献
We collect over 650K question-answer-evidence triples, with questions originating from trivia enthusiasts independent of the evidence documents. A high percentage of the questions are challenging, with substantial syntactic and lexical variability and often requiring multi-sentence reasoning. The dataset and code are available at http://nlp.cs.washington.edu/triviaqa/, offering resources for training new reading-comprehension models.
We present a manual analysis quantifying the quality of the dataset and the challenges involved in solving the task.
We present experiments with two baseline methods, demonstrating that the TriviaQA tasks are not easily solved and are worthy of future study.
In addition to the automatically gathered large-scale (but noisy) dataset, we present a clean, human-annotated subset of 1975 question-document-answer triples whose documents are certified to contain all facts required to answer the questions.
2. Overview
Problem Formulation
: 问题
: 答案
: 相关文档
其中, 我们假定 是 中的一个substring,且 是一个文档集合而不是单个文档。
Data and Distant Supervision
我们的文档从Wikipedia或网络搜索结果中获取,
Dataset Collection
First we gathered question-answer pairs from 14 trivia and quiz-league websites. We removed questions with less than four tokens, since these were generally either too simple or too vague.
We then collected textual evidence to answer questions using two sources: documents from
Web search results and Wikipedia articles for entities in the question.
Finally, to support learning from distant supervision, we further filtered the evidence documents
to exclude those missing the correct answer string and formed evidence document sets as described in Section 2. This left us with 95K question-answer pairs organized into
(1) 650K training examples for the Web search results, each contain-ing a single (combined) evidence document, and
(2) 78K examples for the Wikipedia reading comprehension domain, containing on average 1.8 evidence documents per example.