The MegaAcceptability dataset

Authors: Aaron Steven White and Kyle Rawlins


Version: 1.2

Release date: 7 Apr 2020


This dataset consists of ordinal acceptability judgments for 1,000 clause-embedding verbs of English in 50 surface-syntactic frames. The data were collected on Amazon’s Mechanical Turk using Turktools.

For a detailed description of the dataset, the item construction and collection methods, and discussion of how to use a dataset on this scale to address questions in linguistic theory, please see the following papers:

White, A.S. & K. Rawlins. 2020. Frequency, acceptability, and selection: A case study of clause-embedding. Accepted to Glossa.

White, A. S. & K. Rawlins. 2016. A computational model of S-selection. In M. Moroney, C-R. Little, J. Collard & D. Burgdorf (eds.), Semantics and Linguistic Theory 26, 641-663. Ithaca, NY: CLC Publications.

If you make use of this dataset in a presentation or publication, we ask that you please cite these papers.

Version history

1.0: first public release, 30 Oct 2016 1.1: formatting update, 14 Aug 2019 1.2: adds normalized data, 7 Apr 2020


mega-attitude-v1.tsv contains the raw data.

Column Description Values
participant anonymous integer identifier for participant that provided the response 0…728
list integer identifier for list participant was responding to 0…999
presentationorder relative position of item in list 1…50
verb lemma of clause-embedding verb found in the item see paper
frame clausal complement found in the item see paper
response ordinal scale acceptability response 1…7
nativeenglish whether the participant reported speaking American English natively True, False
sentence sentence that was judged see paper

mega-attitude-v1-normalized.tsv contains data normalized using the procedure described in White & Rawlins 2020.

Column Description Values
verb lemma of clause-embedding verb found in the item see paper
verbform form of clause-embedding verb found in the item see paper
frame clausal complement found in the item see paper
responsenorm normalized ordinal scale acceptability response [-3.84, 4.94]
responsevar variability of ordinal scale acceptability responses [-3.64, -0.19]
sentence sentence that was judged, white-space tokenized see paper


  • A javascript error produced 10 NA values for response, none of which affect the same verb-frame pair.