The MegaAcceptability dataset

Authors: Aaron Steven White, Hannah YoungEun An, and Kyle Rawlins


Version: 2.0

Release date: 14 Aug 2019


This MegaAcceptability dataset consists of ordinal acceptability judgments for 1,007 clause-embedding verbs of English in 50 surface-syntactic frames and three matrix tenses. This dataset combines the MegaAcceptability version 1.0 and data collected for 25,000 additional verb-frame pairs on Amazon’s Mechanical Turk using Ibex on Mechanical Turk.

For a detailed description of the dataset, the item construction and collection methods, and discussion of how to use a dataset on this scale to address questions in linguistic theory, please see the following papers:

White, A.S. & K. Rawlins. 2020. Frequency, acceptability, and selection: A case study of clause-embedding. Accepted to Glossa.

An, H.Y. & A.S. White. 2020. The lexical and grammatical sources of neg-raising inferences. Proceedings of the Society for Computation in Linguistics 3:23, 220-233.

White, A. S. & K. Rawlins. 2016. A computational model of S-selection. In M. Moroney, C-R. Little, J. Collard & D. Burgdorf (eds.), Semantics and Linguistic Theory 26, 641-663. Ithaca, NY: CLC Publications.

If you make use of this dataset in a presentation or publication, we ask that you please cite these papers.

Version history

1.0: first public release, 30 Oct 2016 1.1: formatting update, 14 Aug 2019 2.0: first public release, 14 Aug 2019


Column Description Values
participant anonymous integer identifier for participant that provided the response 0…1293
list integer identifier for list participant was responding to 0…1499
presentationorder relative position of item in list 1…50
verb clause-embedding verb found in the item see paper
frame clausal complement found in the item see paper
tense matrix tense found in the item present, past, past_progressive
response ordinal scale acceptability response 1…7
nativeenglish whether the participant reported speaking American English natively True, False
sentence sentence that was judged see paper
version MegaAcceptability dataset version where the judgment first appeared 1, 2


  • A javascript error produced 10 NA values for response, none of which affect the same verb-frame pair. All such values are inherited from version 1.x