It has long been assumed that a verb’s syntactic distribution is determined by at least two kinds of lexical information: (i) the verb’s semantic type signatures and (ii) its morphosyntactic features. The first of these is often termed S(emantic)-selection; the second goes under various names, though perhaps the most neutral term is subcategorization. Standard distributional analyses in the theoretical literature have had tremendous success in uncovering the nature of S-selection and its relationship to the syntax—i.e. projection rules. But as theories scale to the entire lexicon, these approaches hit a limit, imposed by the sheer size of lexica and by bounds on human analysts’ memory and processing power. This challenge suggests the need for lexicon-scale datasets.

For a detailed description of the datasets associated with this project, the item construction and collection methods, and discussion of how to use a dataset on this scale to address questions in linguistic theory, please see the references below.


Sentences Predicates Frames Download Citation
50000 1000 50 v1 (zip) White & Rawlins 2016, 2020
74830 1007 150 v2 (zip) White & Rawlins 2016, 2020
An & White 2020
50 50 50 linking (zip) White & Rawlins 2020
1850 37 50 single verb (zip) White & Rawlins 2020
1380 30 46 replication (zip) White & Rawlins 2020


Kim, Gene Louis and Aaron Steven White. to appear. Montague Grammar Induction. Semantics and Linguistic Theory 30. [pdf]
White, Aaron Steven, and Kyle Rawlins. 2020. Frequency, Acceptability, and Selection: A Case Study of Clause-Embedding. Glossa 5(1): 105. 1–41. [pdf, code, doi]
An, Hannah Youngeun, and Aaron Steven White. 2020. The Lexical and Grammatical Sources of Neg-Raising Inferences. In Proceedings of the Society for Computation in Linguistics 3: 220–233. [pdf, doi]
White, Aaron Steven, and Kyle Rawlins. 2016. A Computational Model of S-Selection. Edited by Mary Moroney, Carol-Rose Little, Jacob Collard, and Dan Burgdorf. Semantics and Linguistic Theory 26: 641–663. [pdf, doi]


Aaron Steven White bio photo
Aaron Steven White
Kyle Rawlins bio photo
Kyle Rawlins
Hannah Youngeun An bio photo
Hannah Youngeun An