The MegaVeridicality dataset
Authors: Aaron Steven White and Kyle Rawlins
Contact: aaron.white@rochester.edu, kgr@jhu.edu
Version: 2.1
Release date: July 29, 2020
Overview
This dataset consists of ordinal veridicality judgments as well as ordinal acceptability judgments for 773 clause-embedding verbs of English. The data were collected on Amazon’s Mechanical Turk using Turktools.
For a detailed description of the dataset, the item construction and collection methods, and discussion of how to use a dataset on this scale to address questions in linguistic theory, please see the following papers:
White, A. S., R. Rudinger, K. Rawlins, & B. Van Durme. 2018. Lexicosyntactic Inference in Neural Models. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, October 31-November 4, 2018.
White, A. S. & K. Rawlins. 2018. The role of veridicality and factivity in clause selection. In Proceedings of the 48th Meeting of the North East Linguistic Society.
If you make use of this dataset in a presentation or publication, we ask that you please cite these papers.
Version history
1.0: first public release (May 11, 2018) 2.0: alpha release (May 23, 2018) 2.1: official release (July 29, 2020)
Manifest
mega-veridicality-v2.1.tsvmega-veridicality-v2.1-normalized.tsvREADME.mdLICENSE
Description
mega-veridicality-v2.1.tsv contains the raw data collected on
Mechanical Turk.
| Column | Description | Values |
|---|---|---|
| participant | anonymous integer identifier for participant that provided the response | 0…634 |
| list | integer identifier for list participant was responding to | 0…81 |
| presentationorder | relative position of item in list | 1…68 |
| verb | clause-embedding verb found in the item | see paper |
| frame | clausal complement found in the item | see paper |
| polarity | polarity found in the item | positive, negative |
| conditional | whether the item was embedded in the antecedent of a conditional (see paper) | True, False |
| sentence | the sentence judged | see paper |
| veridicality | ordinal scale veridicality response | no, maybe, yes |
| acceptability | ordinal scale acceptability response | 1…7 |
| nativeenglish | whether the participant reported speaking American English natively | True, False |
| exclude | whether the participant should be excluded based on native language | True, False |
mega-veridicality-v2.1-normalized.csv contains normalized veridicality
and acceptability ratings, one for each verb-frame-voice-polarity
tuple. These normalized ratings are constructed using an ordinal
model-based normalization procedure and can be thought of as mean
ratings that control for participants’ differing uses of the relevant
response scale. The mean log-likelihood for the 10 ratings given for
each verb-frame-voice-polarity tuple under the ordinal model used to
normalize the data is also given. This can be thought of as analogous
to an estimate of rating variance.
| Column | Description | Values |
|---|---|---|
| verb | clause-embedding verb found in the item | see paper |
| frame | clausal complement found in the item | see paper |
| voice | voice found in the item | active, passive |
| polarity | polarity found in the item | positive, negative |
| conditional | whether the item was embedded in the antecedent of a conditional (see paper) | True, False |
| sentence | the sentence judged | see paper |
| veridicalitylike | the likelihood of the ordinal response under the normalization model | [-1.60, -0.02] |
| veridicalitynorm | the normalized veridicality rating | [-3.98, 4.00] |
Notes
- A javascript error produced 3 NA values for
veridicality, none of which affect the same verb-frame pair.