v1 (zip), v2 (zip), linking (zip)


It has long been assumed that a verb’s syntactic distribution is determined by at least two kinds of lexical information: (i) the verb’s semantic type signatures and (ii) its morphosyntactic features. The first of these is often termed S(emantic)-selection ; the second goes under various names, though perhaps the most neutral term is subcategorization. Standard distributional analyses in the theoretical literature have had tremendous success in uncovering the nature of S-selection and its relationship to the syntax—i.e., projection rules. But as theories scale to the entire lexicon, these approaches hit a limit, imposed by the sheer size of lexica and by bounds on human analysts’ memory and processing power. This challenge suggests the need for lexicon-scale datasets.

The MegaAcceptability dataset consists of ordinal acceptability judgments for 1,000 clause-embedding verbs of English in 50 surface-syntactic frames and with three different matrix tense-aspect. For a detailed description of the dataset, the item construction and collection methods, and discussion of how to use a dataset on this scale to address questions in linguistic theory, please see the following paper:

An, H. Y. & A.S. White 2019. The lexical and grammatical sources of neg-raising inferences. arXiv:1908.05253 [cs.CL]

White, A. S. & K. Rawlins. 2016. A computational model of S-selection. In M. Moroney, C-R. Little, J. Collard & D. Burgdorf (eds.), Semantics and Linguistic Theory 26, 641-663. Ithaca, NY: CLC Publications.

If you make use of this dataset in a presentation or publication, we ask that you please cite these papers.


Aaron Steven White bio photo
Aaron Steven White
Kyle Rawlins bio photo
Kyle Rawlins
Hannah Youngeun An bio photo
Hannah Youngeun An