Datav1 (zip), v2 (zip)
The formal semantics literature has long been concerned with the complex array of inferences that different open class lexical items trigger. For example, why does (1a) give rise to the inference (2a), while the structurally identical (1b) triggers the inference (2b)?
(1) a. Jo doesn’t believe that Bo left. b. Jo doesn’t know that Bo left.
(2) a. Jo believes that Bo didn’t leave. b. Bo left. c. Bo didn’t leave.
A major finding of this literature is that lexically triggered inferences are conditioned by surprising aspects of the syntactic context that a word occurs in. For example, while (3a), (3b), and (4a) trigger the inference (2b), (4b) triggers the inference (2c).
(3) a. Jo remembered that Bo left. b. Jo didn’t remember that Bo left.
(4) a. Bo remembered to leave. b. Bo didn’t remember to leave.
The MegaVeridicality dataset consists of ordinal veridicality judgments as well as ordinal acceptability judgments for 773 clause-embedding verbs of English with a variety of subordinate clause structures. For a detailed description of the dataset, the item construction and collection methods, and discussion of how to use a dataset on this scale to address questions in linguistic theory, please see the following paper:
White, A. S., R. Rudinger, K. Rawlins, & B. Van Durme. 2018. Lexicosyntactic Inference in Neural Models. To appear in Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, October 31-November 4, 2018.
White, A. S. & K. Rawlins. 2018. The role of veridicality and factivity in clause selection. To appear in the Proceedings of the 48th Meeting of the North East Linguistic Society.
If you make use of this dataset in a presentation or publication, we ask that you please cite these papers.
Aaron Steven White
Benjamin Van Durme