It has long been assumed that a verb’s syntactic distribution is determined by at least two kinds of lexical information: (i) the verb’s semantic type signatures and (ii) its morphosyntactic features. The first of these is often termed S(emantic)-selection; the second goes under various names, though perhaps the most neutral term is subcategorization. Standard distributional analyses in the theoretical literature have had tremendous success in uncovering the nature of S-selection and its relationship to the syntax—i.e. projection rules. But as theories scale to the entire lexicon, these approaches hit a limit, imposed by the sheer size of lexica and by bounds on human analysts’ memory and processing power. This challenge suggests the need for lexicon-scale datasets.
For a detailed description of the datasets associated with this project, the item construction and collection methods, and discussion of how to use a dataset on this scale to address questions in linguistic theory, please see the references below.
|50000||1000||50||v1 (zip)||White & Rawlins 2016, 2020|
|74830||1007||150||v2 (zip)||White & Rawlins 2016, 2020
An & White 2020
|50||50||50||linking (zip)||White & Rawlins 2020|
|1850||37||50||single verb (zip)||White & Rawlins 2020|
|1380||30||46||replication (zip)||White & Rawlins 2020|
Aaron Steven White
Hannah Youngeun An