MS MARCO: A Human Generated MAchine Reading COmprehension Dataset
数据集
Abstract
In MS MARCO, all questions are sampled from real anonymized user queries.
- The context passages are extracted from real web documents .
- The answers to the queries are human generated.
Introduction
Compared to previous publicly available datasets, this dataset is unique in the sense that
- all questions are real user queries,
- the context passages, which answers are derived from, are extracted from real web documents,
- all the answers to the queries are human generated,
- a subset of these queries has multiple answers,
- all queries are tagged with segment information.