Friday, April 11, 2014

Frequency, structure and the POS

There are never enough good papers illustrating the poverty of the stimulus. Here’s a recent one that I read by Jennifer Culbertson and David Adger (yes, that David Adger!) (C&A) that uses artificial language learning tasks as probes into the kinds of generalizations that learners naturally (i.e. uninstructed) make. Remember that generalization is the name of the game. Everyone agrees that no generalizing beyond the input, no learning. The debate is is not about whether this exists, but what the relevant dimensions are that guide the generalization process. One standard view is that it’s just frequency of some kind, often bigram and trigram frequencies. Another is that the dimension along which a learner generalizes is more abstract, e.g. along some dimension of linguistic structure.  C&A provide an interesting example of the latter in the context of artificial language learning, a technique, I believe, that is still new to most linguists.[1]

Let me say a word about this technique. Typological investigation provides a standard method for finding UG universals. The method is to survey diverse grammars (or more often, and more superficially, languages) and see what properties they all share. Greenberg was a past master of this methodology, though from the current perspective, his methods look rather “shallow,” (though the same cannot be said of modern cartographers like Cinque). And, looking for common features of diverse grammars seems like a plausible way to search for invariances. The current typological literature is well developed in this regard and C&A note that Greenberg’s U20, which their experiment explores, is based on an analysis of 341 languages (p.2/6).  So, these kinds of typological investigations are clearly suggestive. Nonetheless, I think that C&A are correct in thinking that supplementing this kind of typological evidence with experimental evidence is a very good idea for it allows one to investigate directly what typological surveys can only do indirectly: to what degree the gaps in the record are principled.  We know for a fact that the extant languages/grammars are not the only possible ones. Moreover, we know (or at least I believe) that the sample of grammars we have at our disposal are a small subset of the possible ones. As the artificial language learning experiments promise to allow us to directly probe what typological comparison only allows us to indirectly infer, better to use the direct method if it is workable.  C&A’s paper offers a nice paradigm for how to do this that those interested in exploring UG should look at this method with interest.

So what do C&A do? They expose learners to an artificial version of English wherein pre-nominal order of determiner, numeral and adjective are flipped from the English case. So, in “real” English (RE), the order and structure is [Dem [ num [ adj [ N ] ] ] (as in: these three yellow mice). C&A expose learners to nominal bits of artificial English (AE) where the dem, num, and adj are postnominal. In particular, they present learners with data like mice these, mice three, mice yellow etc. and see how they generalize to examples with more than one postnominal element, e.g. do learners prefer phrases in AE like mice yellow these or mice these yellow? If learners treat AE as just like RE but for the postnominal order then they might be expected to preserve the word order they invariably see pre-nominally in RE postnominally in AE (thus to prefer mice these yellow). However, if they prefer to retain the scope structure of the expressionsin RE and port that over to AE, then they will prefer to preserve the bracketing noted above but flip the word order, i.e. [ [ [ N ] adj ] num ] dem]. On the first hypothesis, learners prefer to orders they’ve encountered repeatedly in RE before, while on the second they prefer to preserve RE’s more abstract scope relations when projecting to the new structures in AE.

So what happens? Well you already know, right? They go for door number 2 and preserve the scope order of RE thus reliably generalizing to an order ‘N-adj-num-det.’ C&A conclude, reasonably enough, that “learner’s overwhelmingly favor structural similarity over preservation of superficial order” (abstract, p.1/6) and that this means that “when they are pitted against one another, structural rather than distributional knowledge is brought to bear most strongly in learning a new language” (p.5/6). The relevant cognitive constraint, C&A conclude, is that learners adopt a constraint “enforcing an isomorphism in the mapping between semantics and surface word order via hierarchical syntax.”[2]

This actually coincides with similar biases young kids exhibit in acquiring their first language. Lidz and Musolino (2006) (L&M) show a similar kind of preference in relating quantificational scope and surface word order. Together, C&A and L&M show a strong preference for preserving a direct mapping between overt linear order and hierarchical structure, at least in “early” learning, and, as C&A’s results show, that this preference is not a simple L-R preference but a real structural one.

One further point struck me. We must couple the observed preference for scope preserving order with a dispreference for treating surface forms as a derived structure, i.e. a product of movement. C&A note that ‘N-dem-num-adj’ order is typologically rare. However, this order is easy enough to derive from a structure like (1) via head movement given some plausible functional structure. Given (1), N to F0 movement suffices.

(1)  F0  [Dem [ num [ adj [ N ] ] ] à [N+F0  [Dem [ num [ adj [ N ] ] ] ]

We know that there are languages where N moves to above determiners (so one gets the order N-det rather than Det-N) and though the N-dem-num-adj is “rare” it is, apparently, not unattested. So, there must be more going on. This, it goes without saying I hope, does not detract from C&A’s conclusions, but it raises other interesting questions that we might be able to use this technique to explore.

So, C&A have written a fun paper with an interesting conclusion that deploys a useful method that those interested in FL might find productive to incorporate into their bag of investigative tricks. Enjoy!

[1] Though not to psychologists and some psycholinguists. Lidz and his student Eri Takahashi (see here) have used this technique to also argue against standard statistical approaches to language acquisition.
[2] Takahashi comes to a similar conclusion in her thesis.


  1. How does this relate to FL, exactly? I can see how you could make a case that it says something about UG, though, even here, you need to be careful, since these are adult native speakers of a language generalizing in a highly constrained space. It could be that it's just telling us something about a specific G and how, having acquired that G, people generalize in this particular space.

  2. I think what Noah is trying to say is: English speakers have learned that English has hierarchical structure (e.g. is better described as a context free grammar than as a Markov chain), and that this hierarchical structure typically corresponds to semantic scope. So they form an "overhypothesis" about languages in general and will be biased to assume that any new language they learn should have the same properties. So while it's neat that C&A's participants applied these pretty abstract overhypotheses to an artificial language, it's unclear whether the results of the experiment tell us anything about innate biases. I guess it would be interesting to replicate the experiment with speakers of one of the languages where word order doesn't correspond to semantic scope relations and see if the generalization that participants extract is similar to the one in their native language or to Greenberg's universal.

  3. Hi Tal. We're designing ongoing work to do exactly that - running similar experiments on speakers of Thai and, we hope, kikuyu or kitharaka (PNAS made us take out the footnote referring to it). And you're absolutely right. We're very careful in the paper to make clear that this is about how linguistic knowledge is represented and applied, but the larger project connects this to biases in real acquisition that display themselves in terms of typological frequencies. See also Jenny's work with Smolensly and Legendre on universal 18.

  4. Thanks for your reply, David. Just to be clear, I was commenting on the summary of the paper in the blog post, and not on the paper itself. As you're saying the paper is pretty cautious about the potential interpretations of the results. Regardless of the source of the learning biases though (innate / transfer from native language), the fact that they even exist is very interesting IMO, and potentially rules out certain classes of strictly linear order based models. Great to hear you're running the experiment with Thai / Kikuyu speakers -- looking forward to reading about the results.

    1. Sorry for the long delay in responding. I think that the original work bears on FL for it is based in two kinds of evidence: typological evidence concerning these patterns and experimental evidence concerning what happens in artificial languages. It seems that these two streams of evidence converge. Now, one might argue that the experimental evidence just reflects the properties about a specific G and how that influences generalization to a G-like variant. However, this does not address the typological patterns, or at least I don't see that it does (I might be obtuse here). So, if we take these two together, we want the same factor that explains the experimental evidence that C&A produce and the typological patterns. I took they common factor to be a bias for generalizing along the "scope structure" dimension, rather than the "linear order" dimension. Were there such a bias, then it would seem to explain BOTH sets of facts.

      At any rate, that was why I took the C&A stuff to bear on FL. Of course, it would be good to do yet more experiments looking at typologically different languages, as David indicates he is planning to do.

  5. Like Norbert, I think this work is really interesting, and I had an opportunity to talk to C & A about this last year, but it raises some important questions. In particular whether adult AGL learning experiments target the LAD or not. So many people think that the LAD is only active in the critical period --roughly before puberty. So does using these experiments commit you to the idea that the LAD is still active in adults?

    The second interesting question is what the role of these biases is in explaining language acquisition. The naive view is that since children can learn languages that violate these biases (since there are attested languages where they are violated) then any acquisition that is happening is happening instead of these biases rather then because of them. In other words, if you have a bias towards languages that have property P, but you can learn languages that are not P, then it is hard to see how the bias to P can be part of the explanation of how you acquire languages in general.

    I feel the pull of that argument, but I think it may be missing a few steps; I guess this is the question that Norbert is raising at the end of the post.

  6. I'm not sure either how a bias like this one could explain language acquisition. It's not even clear that there's a poverty of the stimulus issue involved in the acquisition of noun phrase modifier order (in a natural language, as opposed to the artificial language from C&A's experiment). Biases can still explain language universals and statistical typological generalizations diachronically though: language learners might extract the wrong generalizations from the data such that after a few generations languages that conform to the bias will be more common than languages that don't (see e.g. this paper by Joe Pater and Elliott Moreton:, though I don't know how plausible this story is for noun phrase modifier order.