Tuesday, April 28, 2015

A minimalist evolang story

MIT News (here) mentions a paper that recently appeared in Frontiers in Psychology (here) by Vitor Norbrega and Shigeru Miyagawa (N&M). The paper is an Evolang effort that argues for a rapid (rather than a gradual) emergence of FL. The blessed event was “triggered” by the emergence of Merge which allowed for the “integration” of two “pre-adapted systems,” one relating to outward expression (think AP) and one related to referential meaning (think CI). N&M calls the first the E-system and the second the L-system. The main point of the paper is that the L-system does not correspond to anything like a word. Why? Because words found in Gs are themselves hierarchically structured objects, with structures very like the kind we find in phrases (a DMish perspective). The paper is interesting and worth looking at, though I have more than a few quibbles with some of the central claims. Here are some comments.

N&M has two aims: first to rebut gradualist claims concerning the evolution of FL. The second is to provide a story for the rapid emergence of the faculty. I personally found the criticisms more compelling than the positive proposal. Here’s why.

The idea that FL emerged gradually generally rests on the idea that FL builds on more primitive systems that went from 1-word to 2-word to arbitrarily large n-word sequences.  My problem with these kinds of stories has always been how we get from 2 to arbitrarily large n. As Chomsky has noted, “go on indefinitely” does not obviously arise from “go to some fixed n.” The recursive trick that Merge embodies does not conceptually require priming by finite instances to get it going. Why? Because there is no valid inference from “I can do X once, twice” to “I can do X indefinitely many times.” True, to get to ‘indefinitely many X’ might casually (if not conceptually) require seguing via finite instances of X, but if it does, nobody has explained how it does.[1] Brute facts causing other brute facts does not an explanation make.

Let me put this another way: Perhaps as a matter of historical fact our ancestors did go through a protolanguage to get to FL. However, it has never been explained how going through such a stage was/is required to get to the recursive FL of the kind we have. The gradualist idea seems to be that first we tried 1-word sequences then 2-word and that this prompted the idea to go to 3, 4, n-word sequences for arbitrary n. How exactly this is supposed to have happened absent already having the idea that “going on indefinitely” was ok has never been explained (at least to me). As this is taken to be a defining characteristic of FL, failing to show the link between the finite stages and the unbounded one (a link that I believe is conceptually impossible to show, btw) leaves the causal relevance of the earlier finite stages (should they even exist) entirely opaque (if not worse).  So, the argument that recursion “gradually” emerged is not merely wrong, IMO, it is barely coherent, at least if one’s interest is in explaining how unbounded hierarchical recursion arose in the species.[2]

N&M hints at a second account that, IMO, is not as conceptually handicapped as the one above. Here it is: One might imagine a system in place in our ancestors capable of generating arbitrarily big “flat” structures. Such structures would be different from our FL in not being hierarchical, and the same in being unbounded. These procedures, then, could generate arbitrarily “long” structures (i.e. the flat structures could be indefinitely long (think beads on a string) but have 0-depth).  Now we can ask a question: how can one get from the generative procedures that deliver arbitrarily long strings to our generative procedures which deliver structures that are both long and deep? I confess to having been very attracted to this conception of Darwin’s Problem (DP). DP so understood asks for the secret sauce required to go from “flat” n-membered sets (or sequences for arbitrary n) to the kind of arbitrarily deeply hierarchically structured sets (or graphs or whatever) we find in Gs produced by FL. I have a dog in this fight (see here), though I am not that wedded to the answer I gave (in terms of labeling being the novelty that precipitated change). This version of the problem finesses the question of where recursion came from (after all, it assumes that we have a procedure to generate arbitrarily long flat structures) and substitutes the question where did hierarchical recursion come from. At any rate, the two strike me as different, the second not suffering from the conceptual hurdle besetting the first.

N&M provides more detailed arguments against several current proposals for a gradualist conception for the evolution of FL. Many of these seem to take words as fossils of the earlier evolutionary stages. N&M argues that words cannot be the missing link that gradualists have hoped for. The discussion is squarely based on Distributed Morphology reasoning and observations. I found the points N&M makes very much to the point. However, given the technical requirements needed to follow the details, I fear that tyros (i.e. the natural readership of Frontiers) will remain unconvinced. This said, the points seem dead on target.

This brings us to the second aim of the paper, and here I confess to having a hard time following the logic. The idea seems to be that Merge when added to the E systems we find in bird song and the L system we find in vervets gets us the kinds of generative systems we find in G products of FL  This is a version of the classical Minimalist answer to DP favored by Chomsky. I say “sort of” as Chomsky, at least lately, has been making a big deal of the claim that the mapping to E systems is a late accretion and the real action is in the mapping to thought. I am not sure that N&M disagrees with this (the paper doesn’t really discuss this point) as I am not sure how the L-system and Chomsky’s CI interface relate to one another. The L-system seems closer to concepts than full-blown propositional representations, but I could be wrong here.  At any rate, this seems to be the N&M view.

Here’s my problem; in fact a few. First, this seems to ignore the various observations that whatever our L-atoms are they seem different in kind from what we find in animal communication systems. The fact seems to be that vervet calls are far more “referential” than human “words” are. Ours are pretty loosely tied to whatever humans may use words to refer to. Chomsky has discussed these differences at length (see here for a recent critique of “referentialism”) and if he is in any way correct it suggests that vervet calls are not a very good proxy for what our linguistic atoms do as the two have very different properties. N&M might agree with this, distinguishing roots from words and saying that our words have the Chomsky properties but our concepts are vervetish. But how turning roots into words manages this remains, so far as I can see, a mystery. Chomsky notes that the question of where the properties of our lexical items comes from is at present completely mysterious. But the bottom line, as Chomsky sees it (and I agree with him here), is that “[t]he minimal meaning-bearing elements of human languages – word-like, but not words -- are radically different from anything known in animal communication systems.” And if this is right, then it is not clear to me that Merge alone is sufficient to explain what our language manages to do, at least on the lexical side. There is something “special” about lexicalization that we really don’t yet understand and it does not seem to be reducible to Merge and it does not seem to really resemble the kinds of animal calls that N&M invokes. In sum, if Merge is the secret sauce, then it did more than link to a pre-existing L-system of the kind we find in vervet calls. It radically changed their basic character. How Merge might have done this is a mystery (at least to me (and, I believe, Chomsky)).

Again, N&M might agree, for the story it tells does not rely exclusively on Merge to bridge the gap. The other ingredient involves checking grammatical features. By “grammatical” I mean that these features are not reducible to the features of the E or L systems. Merge’s main grammatical contribution is to allow these grammatical features to talk to one another (to allow valuation to apply). As roots don’t have such features, merging roots would not deliver the kinds of structures that our Gs do as roots do not have the wherewithal to deliver “combinatorial systems.” So it seems that in addition to Merge, we need grammatical features to deliver what we have.

The obvious question is where these syntactic features come from?  More pointedly, Merge for N&M seems to be combinatorically idle absent these features. So Merge as such is not sufficient to explain Gish generative procedures. Thus, the real secret sauce is not Merge but these features and the valuation procedures that they underwrite. If this is correct, the deep Evolang question concerns the genesis of these features, not the operation instructing how to put grammatical objects together given their feature structures. Or, put another way: once you have the features how to put them together seems pretty straightforward: put them together as the features instruct (think combinatorial grammar here or type theory). Darwin’s Problem on this conception reduces to explaining how these syntactic features got a mental toehold. Merge plays a secondary role, or so it seems to me.

To be honest, the above problem is a problem for every Minimalist story addressing DP. The Gs we are playing with in most contemporary work have two separate interacting components: (i) Merge serves to build hierarchy, (ii) AGREE in Probe-Goal configurations check/value features. AGREE operations, to my knowledge, are not generally reducible to Merge (in particular I-merge). Indeed trying to unify them, as in Chomsky’s early minimalist musings, has (IMO, sadly) fallen out of fashion.[3] But if they are not unified and most/many non-local dependencies are the province of AGREE rather than I-merge, then Merge alone is not sufficient to explain the emergence of Gs with the characteristic dependencies ours embody. We also need a story about the etiology of the long distance AGREE operation and a story about the genesis of the syntactic features they truck in.[4] To date, I know of no story addressing this, not even very speculative ones. We could really use some good ideas here (or, as in note 3, begin to rethink the centrality of Probe/Goal Agree).

I don’t want to come off sounding overly negative. N&M, unlike many evolangers know a lot about FL. Their critique of gradualist stories seems to be very well aimed. However, precisely because the authors know so much about FL while trying to give a responsible positive outline of an answer to DP the problem, the paper makes clear the outstanding problems that providing an adequate explanation sketch faces. For this alone, N&M is worth reading.

So what’s the takeaway message here? I think we know what a solution to DP in the domain of language should involve. It should provide an account of how the generative procedures responsible for the G properties we have discovered over the last 60 years arose in the species. The standard Minimalist answer has been to focus on Merge and argue that adding it the capacities of our non-linguistic ancestors suffices to give them our kinds of grammatical powers. Now, there is no doubting that Merge does work wonders. However, if current theoretical thinking is on the right track, then Merge alone is insufficient to account for the various non-local dependencies that we find in Gs. Thus, Merge alone does not deliver what we need to fully explain the origins of our FL (i.e. it leaves out a large variety of agreement phenomena).[5] In this sense, either we need some  ideas about where AGREE comes from, or we need some work showing how to accomodate the phenomena that AGREE does via I-merge. Either way, the story that ties the evolutionary origins of our FL to the emergence of a single novel Merge operation is, at best, incomplete.

[1] Here from Edward St Aubyn in At Last: the final Patrick Melrose Novel:
 “Ok, so who created infinite regress.” That’s the right question.
[2] No less a figure than Wittgenstein had a field day with this observation.  “And so on” is not a concept that finite sequences of anything embody.
[3] I may be one of the last thinking that moving to AGREE systems was a bad idea if one’s interest is in DP. I argue this here. I don't think I’ve convinced many of the virtues either of disagreement in general or dis-AGREE-ment in particular. So it goes.
[4] It is tempting to see Chomsky’s latest discussions of labeling as an attempt to resolve this problem. Agreement on this view is what is required to get interpretable objects at the CI interface. It is not the product of AGREE but of the labeling algorithm. Chomsky does not say this. But this is where he might be heading. It is an attempt to reduce “morphology” to Bare Output Conditions. I personally am not convinced by his detailed arguments, but if this is the intent, I am very sympathetic to the project.
[5] I am currently co-teaching intro to contemporary minimalism with Omer Preminger. He has inundated me with arguments (good ones) that something like AGREE does excellent work in accounting for huge swaths of intricate data. Thus, at the very least, it seems that the current consensus among minimalist syntacticians is that Merge is not the only basic syntactic operation and so an account that ties all of our grammatical prowess to Merge is either insufficient or the current consensus is wrong. If I were a betting person, I would put my money on the first disjunct. But…


  1. "Merge alone is not sufficient to explain the emergence of Gs [...] We also need a story about the etiology of the long distance AGREE operation and a story about the genesis of the syntactic features they truck in"

    Taking the SMT as a heuristic, should't the story be that they come from interface conditions? To phonologically realize a syntactic object constructed by Merge (and especially under a DM framework with late insertion and syntax all the way down), it is required that the leaves of the binary tree be distinguishable and this is going to be pretty rare if they don't exhibit features (one particularly obvious case being the bottom of the tree problem but it's pervasive throughout).

    So under the requirement of a linearly ordered output alone, one should expect grammatical features to exist and to play a role at least at spell-out (and the parallel story at the CI interface is what I understand to be Chomsky's argument). Note by the way that under this story, the head triggering spell-out should be able to manipulate features (and so the dual nature of phase heads is to be expected). Note also that this story is not incompatible with the idea that "unexternalized" narrow syntactic objects could be the province of Merge alone, so that maybe "mentalese" is really only Merge.

    Two things by way of conclusion. First, if you do the math (that is to say if you compute the average number of leaves that have to exhibit grammatical features for a binary tree to be suitable for phonological insertion; or the average distinction number of a binary tree in math parlance), you find out that roughly half of the heads should be functional, and that is not incompatible with what is actually observed. Second, as always the proof is in the empirical pudding or should be, so the game should be to devise an empirical conclusion from this story. Personally, my money is on the licensing condition on ellipsis.

    1. "if they don't exhibit features"

      Not merely features, but SYNTACTIC features. take Bare Phrase Structure as given. LIs have phono and semantic features of necessity. These are motivated by the interfaces, as you said. But why do we also need another set of features, syntactic ones, that force all sorts of G operations to apply? Where do these non interface required features come from and what would go wrong with a grammar if they did not exist? It is conceivable, at least to me, that one could have grammars that constructed hierarchical objects without any syntactic features at all. So where did these come from?

  2. I have been too long already, but as for the genesis of said features, one formal way to reduce it to Merge is to assume featural geometry (so binary trees, or Merge, all the way down) with the most ramified leaves just being abstract bi-valuation; the bare minimum required to distinguish undistinguishable nodes. Under that story, Agree has formally disappeared again (maybe this makes you happy) and has been replaced by deep sensitivity to the geometry of the tree at the leaves or, metaphorically, every time we think Merge has built an ambiguous structure, it is because we have neglected the deeply ramified structures at ambiguous leaves. This fits well with cyclic agreement (and Omer's work) but note that for the moment, it is presumably purely a formal move.

    One non-trivial condition that this move would seem to entail though, is the ultimate identification of phi, focus and wh-feature. To me, this screams for a study of superiority effect along these lines.

    1. "the bare minimum required to distinguish undistinguishable nodes"

      Why wouldn't the contents of the "nodes" distinguish them? After all, we are combining LIs aren't we? Or if we are not, why COULDN'T we?

    2. Your two questions are closely related so I venture a common answer here. Assume that spell-out requires as input a narrow syntactic structure which is unambiguous (minimally in the precise sense that the structure has no non-trivial automorphism but I believe that stronger requirements are plausible; or equivalently I believe that there are good reasons to think that the heads triggering spell-out do not typically have access to the full geometry of their complement), then you need syntactic features because you are still within narrow syntax. So if we assume that indeed the content of the nodes do not distinguish them, then features have to be syntactic and the answers to your first set of question follow from the answer to your second set of questions.

      "Why wouldn't the contents of the "nodes" distinguish them?" Perhaps because the content of the nodes is radically impoverished in narrow syntax (just + and - in the model I outlined above; and notice not + or -3rd person or wh, just + or -).

      "After all, we are combining LIs aren't we?" I don't know. It seems to me we are combining way more abstract things (v, v*, Asp, vn and the like do not strike me as especially lexical in nature). What do you think more generally about late insertion?

      "Or if we are not, why COULDN'T we?" Suppose narrow syntax is indeed impoverished and all lexical properties and phonological and semantic features are indeed inserted late. Then the answer to your question is "because we are humans." But note that this model is very consistent with Darwin's problem Chomsky's style: one single engine for the FLN, but the luxury of gradual evolution of all the post-narrow syntactic features (including lexical items and their denoting properties) and interactions between FLN and the rest of the cognitive capabilities.

    3. I just have a question for clarification. Norbert writes:
      "The idea that FL emerged gradually generally rests on the idea that FL builds on more primitive systems that went from 1-word to 2-word to arbitrarily large n-word sequences. My problem with these kinds of stories has always been how we get from 2 to arbitrarily large n."

      I agree that this gradualist account would be silly. But I am not aware of anyone defending such an account. So maybe someone could provide a reference or two? Thanks.

    4. @Olivier

      I guess another way of making my point is that the scenario you outline, though conceivable, goes somewhat beyond simply assuming that an operation like Merge arose. It requires a specification of why there is a segregated syntax that operates over exclusively syntactic material. After all, there is nothing incoherent of Merge manipulating phone or sem features without the intermediation of syntactic ones. We need to explain why this system is NOT lexicalist, what a root is such that it is semantically ambiguous between all of its various N,V,A meanings, why narrow syntax is impoverished etc. All of this seems quite plausible. But it is hardly virtually conceptually necessary (or at least it isn't to me). This is all that I meant by saying that Merge alone given what we take Gs to be, is not sufficient to explain why we have the Gs we have and not others. Thx for the discussion.

    5. "It requires a specification of why there is a segregated syntax that operates over exclusively syntactic material. After all, there is nothing incoherent of Merge manipulating phone or sem features without the intermediation of syntactic ones. We need to explain why this system is NOT lexicalist, what a root is such that it is semantically ambiguous between all of its various N,V,A meanings, why narrow syntax is impoverished etc"

      Well, before we explain why this is the case, we should show that this is indeed the case (you didn't really say if you agreed). All these propositions are certainly not matter of internal logic and conceptual necessity: many alternatives are perfectly valid (indeed merge manipulating subtle sem feature would not be too far from Generative semantics, would it?), they just don't seem (to me) to be better (or less worse) models for the FLN of human beings. Likewise, the existence of omnivorous agreement is better explained by a system in which heads have coarse probing faculties than by a system in which they have refined capabilities (so that if you in fact want the content of the nodes to be there before spell-out, no problem, just assume that spell-out cannot respond to it).

      That said, there is something I don't understand in your position. Consider the following two basic theories: Merge operates in an impoverished narrow syntax (so the ability to build hierarchical structures, nothing more) and Merge operates on a rich array of lexical items endowed with their own complex phonological and denotational properties. Surely, all else being equal, it is permissible to believe that evolutionary constraints would make the former more likely. As it is what (I believe) what empirical findings indicate as well, where is the puzzle? In fact, as I hinted above, shouldn't the null hypothesis for someone convinced by Chomsky's take on the evolution of language be 1) Merge appears then, possibly gradually and much later, 2) the structures it builds are externalized. If you believe that null hypothesis, then the logically most likely outcome (not the only coherent one, just the most likely) is that narrow syntax exists and is impoverished. Just what we independently need for a variety of reasons (I believe).

    6. I guess I thought that evolution is opportunistic. It adds as little as possible. To bound computation to syntactic features requires the "invention" of a novel set of features. Moreover, it also requires conventions mapping one set of features (syntactic ones) to other kinds (sam and phone). THus we require quite a large organization of the mental space which must proceed in addition to the addition of the novel operation. So, why? Here's one answer: going syntactic is in some sense better and so incurring the cost of inventing the other things it is worth it. Maybe. I guess I am looking for a reason to believe this.

      One way of taking your last suggestion is that the surface stuff we see is not really part of the core system. It is just there for externalization. So, for example, case, agreement, and other such effects are really part of the PF system. OK, then these should be factored out in DP discussions. I am sympathetic to this. But this means that lots of what linguists are working on is strictly speaking outside the domain of DP interest.

    7. I'm really bad at this, obviously, because it is the second time I completely fail to adequately make my position understood (the first was in a conversation with Omer).

      I think that we agree that for evolutionary reasons, we should expect the introduction of the simplest possible paraphernalia, ideally just Merge. So we don't want a novel set of feature, we want them to be just Merge, so in fact highly ramified end branches of of trees built by Merge (se where we put DP, there is in fact a much more intricate geometry). If this picture holds, then 1) you need just one novel set of feature + and - and that is all, the rest is just Merge (and 2) all features should be unified but let us leave that for the moment). Moreover, we don't want conventions mapping our abstract geometrical features to other kinds, we want this to be quite arbitrary, subject to cross-linguistic variation (and possibly to have evolved gradually). It is instructive to read N.Richards's Beyond Strength and Weakness in this light for instance. Or take so called "wh-feature": in Germanic and Romance, some featural geometry is interpreted as carrying wh-force, but for other language, it is simply interpreted as an existential opening operator and for other yet, it could be a simple focus feature etc.

      In particular, I don't want to move Case, agreement etc... to PF (and don't believe it holds empirically). Quite the contrary in fact, I want them to be firmly narrow syntactic phenomena rooted in geometry and stemming from the data the head triggering spell-out has access to in terms of geometry. So they are caused by externalization in the narrow sense that they are computed at spell-out at the latest and that without them spell-out would fail, but they are not strictly PF- nor CI-phenomena: they are the by product of the interaction of the geometry of narrow syntax with these 2.

    8. Thanks. So, there is one big miracle, Merge, and one very little one, the intro of one novel feature with 2 values. The rest is due to spell out at externalization. We expect and find variation here, though there is a common core reflecting geometrical properties of the +/- values in the "tree." Is this right?

    9. Yes, exactly right, that's precisely what I meant. Thanks for having kept on reading my long-winded comments.

  3. This comment has been removed by a blog administrator.