Friday, January 20, 2017

Tragedy, farce, pathos

Dan Everett (DE) has written once again on his views about Piraha, recursion, and the implications for Universal Grammar (here). I was strongly tempted to avoid posting on it for it adds nothing new of substance to the discussion (and will almost certainly serve to keep the silliness alive), beyond a healthy dose of self-pity and self-aggrandizement. It makes the same mistakes, in almost the same way, and adds a few more irrelevancies to the mix. If history surfaces first as tragedy and the second time as farce (see here) then pseudo debates in their moth eaten n-th iteration are just pathetic. The Piraha “debate” has long since passed its sell-by date. As I’ve said all that I am about to say before, I would urge you not to expend time or energy reading this. But if you are the kind of person who slows down to rubberneck a wreck on the road and can’t help but find the ghoulish fascinating, this post is for you.

The DE piece makes several points.

First, that there is a debate. As you all know this is wrong. There can be no debate if the controversy hinges on an equivocation. And it does, for what the DE piece claims about the G of Piraha, even if completely accurate (which I doubt, but the facts are beyond my expertise) has no bearing on Chomsky’s proposal, viz. that recursion is the only distinctively linguistic feature of FL. This is a logical point, not an empirical one. More exactly, the controversy rests on an equivocation concerning the notion “universal.” The equivocation has been a consistent feature of DE’s discussions and this piece is no different. Let me once again explain the logic.

Chomsky’s proposal rests on a few observations. First, that humans display linguistic creativity. Second, that humans are only accidentally native speakers of their native languages.

The first observation is manifest in the fact that, for example, a native speaker of English, can effortlessly use and understand an unbounded number of linguistic expressions never before encountered. The second is manifest in the observation that a child deposited in any linguistic community will grow up to be a linguistically competent native speaker of that language with linguistic capacities indistinguishable from any of the other native speakers (e.g. wrt his/her linguistic creativity).

These two observations prompt some questions.

First, what underlying mental architecture is required to allow for the linguistic creativity we find in humans?  Answer 1 a mind that has recursive rules able to generate ever more sophisticated expressions from simple building blocks (aka, a G). Question 2: what kind of mental architecture must a such a G competent being have? Answer 2: a mind that can acquire recursive rules (i.e a G) from products of those rules (i.e. generated examples of the G). Why recursive rules? Because linguistic productivity just names the fact that human speakers are competent with respect to an unbounded number of different linguistic expressions.

Second, why assume that the capacity to acquire recursive Gs is a feature of human minds in general rather than simply a feature of those human minds that have actually acquired recursive Gs? Answer: Because any human can acquire any G that generates any language. So the capacity to acquire language in general requires the meta-capacity to acquire recursive rule systems (aka, Gs).  As this meta-capacity seems to be restricted to humans (i.e. so far as we know only humans display the kind of recursive capacity manifested in linguistic creativity) and as this capacity is most clearly manifest in language then Chomsky’s conjecture is that if there is anything linguistically specific about the human capacity to acquire language the linguistic specificity resides in this recursive meta-capacity.[1] Or to put this another way: there may be more to the human capacity to acquire language than the recursive meta-capacity but at least this meta capacity is part of the story.[2] Or, to put this another way, absent the human given (i.e. innate) meta-capacity to acquire (certain specifiable kinds of) recursive Gs, humans would not be able to acquire the kinds of Gs that we know that they in fact do acquire (e.g. Gs like those English, French, Spanish, Tagalog, Arabic, Inuit, Chinese … speakers have in fact acquired). Hence, humans must come equipped with this recursive meta-capacity as part of FL.

Ok, some observations: recursion in this story is principally a predicate of FL, the meta-capacity. The meta-capacity is to acquire recursive Gs (with specific properties that GG has been in the business of identifying for the last 50 years or so). The conjecture is that humans have this meta-capacity (aka FL) because they do in fact display linguistic creativity (and, as the DE paper concedes, native speakers of non-Piraha do regularly display linguistic creativity implicating the internalization of recursive language specific Gs) and because the linguistic creativity a native speaker of (e.g.) English displays could have been displayed by any person raised in an English linguistic milieu. In sum, FL is recursive in the sense that it has the capacity to acquire recursive Gs and speakers of any language have such FLs.

Observe that FL must have the capacity to acquire recursive Gs even if not all human Gs are recursive. FL must have this capacity because all agree that many/most (e.g.) non-Piraha Gs are recursive in the sense that Piraha is claimed not to be. So, the following two claims are consistent: (1) some languages have non-recursive Gs but (2) native speakers of those languages have recursive FLs. This DE piece (like all the other DE papers on this topic) fails, once again, to recognize this. A discontinuous quote (4):

 If there were a language that chose not to use recursion, it would at the very least be curious and at most would mean that Chomsky’s entire conception of language/grammar is wrong….

Chomsky made a clear claim –recursion is fundamental to having a language. And my paper did in fact present a counterexample. Recursion cannot be fundamental to language if there are languages without it, even just one language.

First an aside: I tend to agree that it would indeed be curious if we found a language with a non-recursive G given that virtually all of the Gs that have been studied are recursive. Thus finding one that is not would be odd for the same reason that finding a single counter example to any generalization is always curious (and which is why I tend not to believe DE’s claims and tend to find the critique by Nevins, Pesetsky and Rodrigues compelling).[3] But, and this is the main take home message, whether curious or not, it is at right angles to Chomsky’s claim concerning FL for the reasons outlined above. The capacity to acquire recursive Gs is not falsified by the acquisition of a non-recursive one. Thus, logically speaking, the observation that Piraha does not have embedded clauses (i.e. does not the display one of the standard diagnostics of a recursive G) does not imply that Piraha speakers do not have recursive FLs. Thus, DE’s claims are completely irrelevant to Chomsky’s even if correct. That point has been made repeatedly and, sadly, it has still not sunk in. I doubt that for some it ever will.

Ok, let’s now consider some other questions. Here’s one: is this linguistic meta-capacity permanent or evanescent? In other words, one can imagine that FL has the capacity to acquire recursive Gs but that once it has acquired a non-recursive G it can no longer acquire a recursive one. DE’s article suggests that this is so for Piraha speakers (p. 7). Again, I have no idea if this is indeed the case (if true it constitute evidence for a strong version of the Sapir-Whorf hypothesis) but this claim even if correct is at right angles to Chomsky’s claim about FL. Species specific dedicated capacities need not remain intact after use. It could be true that FL is only available for first language acquisition and this would mean that second languages are acquired in different ways (maybe by piggy backing on the first G acquired).[4] However so far as I know, neither Chomsky nor GG has ever committed hostages to this issue. Again, I am personally skeptical that having a Piraha G precludes you from the recursive parts of a Portuguese G, but I have nothing but prejudicial hunches to sustain the skepticism. At any rate, it doesn’t bear on Chomsky’s thesis concerning FL. The upshot: DE’s remarks once again are at right angles to Chomsky’s claims so interesting as the possibility it raises might be for interesting issues relating to second language acquisition, it is not relevant to Chomsky’s claims about the recursive nature of FL.

A third question: is the meta-capacity culturally relative? DE’s piece suggests that it is because the actual acquisition of recursive Gs might be subject to cultural influences. The point seems to be that if culture influences whether an acquired G is recursive or not implies that the meta-capacity is recursive or not as well. But this does not follow.  Let me explain.

All agree that the details of an actual G are influenced by all sorts of factors, including culture.[5] This must be so and has been insisted upon since the earliest days of GG. After all, the G one acquires is a function of FL and the PLD used to construct that G. But the PLD is itself a function of what is actually gets and there is no doubt that what utterances are performed is influenced by the culture of the utterers.[6] So, that culture has an effect on the shape of specific Gs is (or should be) uncontroversial. However, none of this implies that the meta-capacity to build recursive Gs is itself culturally dependent, nor does DE’s piece explain how it could be. In fact, it has always been unclear how external factors could affect this meta-capacity. You either have a recursive meta-capacity or you don’t. As Dawkins put it (see here for discussion and references):

… Just as you can’t have half a segment, there are no intermediates between a recursive and a non-recursive subroutine. Computer languages either allow recursion or they don’t. There’s no such thing as half-recursion. It’s an all or nothing software trick… (383)

Given this “all or nothing” quality, what would it mean to say that the capacity (i.e. the innately provided “computer language” of FL) was dependent on “culture.”? Of course, if what you mean is that the exercise of the capacity is culture dependent and what you mean by this is that it depends on the nature of the PLD (and other factors) that might themselves be influenced by “culture” then duh! But, if this is what DE’s piece intends, then once again it fails to make contact with Chomsky’s claim concerning the recursive nature of FL. The capacity is what it is though of course the exercise of the capacity to produce a G will be influenced by all sorts of factors, some of which we can call “culture.”[7]

Two more points and we are done.

First, there is a source for the confusion in DE’s papers (and it is the same one I have pointed to before). DE’s discussion treats all universals as if Greenbergian. Here’s a quote from the current piece that shows this (I leave it as an exercise to the reader to uncover the Greenbergian premise):

The real lesson is that if recursion is the narrow faculty of language, but doesn’t actually have to be manifested in a given language, then likely more languages than Piraha…could lack recursion. And by this reasoning we derive the astonishing claim that. Although, recursion would be the characteristic that makes human language possible, it need not actually be found in any given language. (8)

Note the premise: unless every G is recursive then recursion cannot be “that which makes human languages possible.” But this only makes sense if you understand things as Greenberg does. If you understand the claim as being about the capacity to acquire recursive Gs then none of this follows.

Nor are we led to absurdity. Let me froth here. Of course, nobody would think that we had a capacity for constructing recursive Gs unless we had reason to think that some Gs were so. But we have endless evidence that this is the case. So, given that there is at least one such G (indeed endlessly many), humans clearly must have the capacity to construct such Gs. So, though we might have had such a capacity and never exercised it (this is logically possible), we are not really in that part of the counterfactual space. All we need to get the argument going for a recursive meta-capacity is mastery of at least one recursive G and there is no dispute that there exists such a G and that humans have acquired it. Given this, the only coherent reason for thinking a counterexample (like Piraha) could be a problem is if one understood the claim to universality as implying that a universal property of FL (i.e. a feature of FL) must manifest itself in every G. And this is to understand ‘universal’ a la Greenberg and and not as Chomsky does. Thus we are back to original sin in DE’s oeuvre; the insistence on a Greenberg conception of universal.

Second, the piece makes another point. It suggests that DE’s dispute with Chomsky is actually over whether recursion is part of FL or part of cognition more generally. Here’s the quote (10):

…the question is not whether humans can think recursively. The question is whether this ability is linked specifically to language or instead to human cognitive accomplishments more generally…

If I understand this correctly, it is agreed that recursion is an innate part of human mental machinery. What’s at issue is whether there is anything linguistically proprietary about it. Thus, Chomsky could be right to think that human linguistic capacity manifests recursion but that this is not a specifically linguistic fact about us as we manifest recursion in our mental life quite generally.[8]

Maybe. But frankly it is hard to see how DE’s writings bear on these very recondite issues. Here’s what I mean: Human Gs are not merely recursive but exhibit a particular kind of recursion. Work in GG over the last 60 years has been in service of trying to specify what kind of recursive Gs humans entertain. Now, the claim here is that we find the kind of structure we find in human Gs in cognition more generally. This is empirically possible. Show me! Show me that other kinds of cognition have the same structures as those GGers have found occur in Gs.  Nothing in DE’s arguments about Piraha have any obvious bearing on this claim for there is no demonstration that other parts of cognition have anything like the recursive structure we find in human Gs.

But let’s say that we establish such a parallelism. There is still more to do. Here is a second question: is FL recursive because our mental life in general is or is our mental life in general recursive because we have FL.[9] This is the old species specificity question all over again. Chomsky’s claim is that if there is anything species special about human linguistic facility it rests in the kind of recursion we find in language. To rebut this species specificity requires showing that this kind of recursion is not the exclusive preserve of linguistically capable beings. But, once again, nothing in DE’s work addresses this question. No evidence is presented trying to establish the parallel between the kind of recursion we find in human Gs and any animal cognitive structures.

Suffice it to say that the kind of recursion we find in language is not cognitively ubiquitous (so far as we can tell) and that if it occurs in other parts of cognition it does not appear to be rampant in non-human animal cognition. And, for me at least, that is linguistically specific enough. Moreover, and this is the important point as regards DE’s claims, it is quite unclear how anything about Piraha will bear on this question. Whether or not Piraha has a recursive G will tell us nothing about whether other animals have recursive minds like ours.

Conclusion? The same as before. There is no there there. We find arguments based on equivocation and assertions without support. The whole discussion is irrelevant to Chomsky’s claims about the recursive structure of FL and whether that is the sole UGish feature of FL.[10]

That’s it. As you can see, I got carried away. I didn’t mean to write so much. Sorry. Last time? Let’s all hope so.

[1] Here you can whistle some appropriate Minimalist tune if you would like. I personally think that there is something linguistically specific about FL given that we are the only animals that appear to manifest anything like the recursive structures we find in language. But, this is an empirical question. See here for discussion.
[2] Chomsky’s minimalist conjecture is that this is the sole linguistically special capacity required.
[3] Indeed such odd counterexamples place a very strong burden of proof on the individual arguing for it. Sometimes this burden of proof can be met. But singular counterexamples that float in a sea of regularity are indeed curious and worthy of considerable skepticism. However, that’s not my point here. It is a different one: the Piraha facts whatever they turn out to be are irrelevant to the claim the FL has the capacity to acquire recursive Gs. As this is what Chomsky has been proposing. Thus, the facts regarding Piraha whatever they turn out to be are logically irrelevant to Chomsky’s proposal.
[4] This seems to be the way that Sakel conceives of the process (see here). Sakel is the person the DE piece cites as rebutting the idea that Piraha speakers with Portuguese as a second language behave. That speakers build their second G on the scaffolding provided by a first G is quite plausible a priori (though whether it is true is another matter entirely). And if this is so, then features of one’s first G should have significant impact on properties of one’s second G. Sakel, btw, is far less categorical in her views than what DE’s piece suggests. Last point: a nice “experiment” if this interests you is to see what happens if a speaker is acquiring Portuguese and Piraha simultaneously; both as first Gs. What should we expect? I dunno, but my hunch is that both would be acquired swimmingly.
[5] So, for example, dialects of English differ wrt the acceptability of Topicalization. My community used it freely and I find them great. My students at UMD were not that comfortable with this kind of displacement. I am willing to bet that Topicalization’s alias (i.e. Yiddish Movement) betrays a certain cultural influence.
[6] Again, see note 4 and Sakel’s useful discussion of the complexity of Portuguese input to the Piraha second language acquirer.
[7] BTW, so far as I can tell, invoking “culture” is nothing but a rhetorical flourish most of the time. It usually means nothing more than “not biology.” However, how culture affects matters and which bits do what is often (always?) left unsettled. It often seems to me that the word is brandished a bit like garlic against vampires, mainly there to ward off evil biological spirits.
[8] On this view, DE agrees that there is FLB but no FLN, i.e. a UGish part of FL.
[9] In Minimalist terms, is recursion a UGish part of FL or is there no UG at all in FL.
[10] There is also some truly silly stuff in which DE speculates as to why the push back against his views has been so vigorous. Curiously, DE does not countenance the possibility that it is because his arguments though severely wanting have been very widely covered. There is some dumb stuff on Chomsky’s politics, Wolfe junk, and general BS about how to do science. This is garbage and not worth your time, except for psycho-sociological speculation.


  1. I think this entire debate might have gone better and in a more constructive direction if people had taken DE's article as a stimulus to find out more about how much and what kind of data is required to unlock the classic 'XP within XP' recursion skill ... we knew from German prenominal possessives that it could be inhibited by lack of data, and we even knew this from English prenominal adjectives, which can't take any following complements (*a proud of her children mother). But afaik we have no idea about how much is enough, say, to establish such things in a genre of a language.

    1. I don't fully understand what you have in mind. Are you suggesting that German doesn't have prenominal possessive recursion because at some point there wasn't enough input for children to infer that rule? That seems a little backwards, no? If German never allowed that, then of course there's no input. In that respect self-embedding doesn't strike me as much different from, say, topic drop or parasitic gaps: some languages have it, others don't, but how this comes about diachronically --- albeit interesting --- isn't the purview of syntactic theory.

    2. Something like 'meines Vaters Auto' ought, according to simplistic X-bar theory, to indicate that there is a prenominal possessor NP, so we expect 'my father's first wife's best friend's car' etc, but we don't get stuff like that in German (at least, not in generally accepted German), whereas in English we do; so how much and what kind of more complex NP structures need to be in the English input for English learners not to come out thinking that English is like German?

      So it's not a question of diachronic theory, and not really existing syntactic theory either, but I think such questions ought to be. For one thing, if we had decent answers, we'd have some idea of how much Piraha we'd have to look at and fail to find recursive NPs etc in order to be reasonably confident that they were impossible.

    3. I agree Avery. On top of the simplex/complex distinction, the simple fact that languages don't stop at say 2, or 3, possessors (though they do at one simplex one), when input to kids doesn't go beyond 3, basically tells you that you need minimal input to go from one to unbounded (that is, you don't go through two, then three, then unbounded). So if Dan is right about Piraha, then it may just be that Piraha is like German in this respect. What would be a real problem for Merge is if you had a language that allowed a maximum of two possessors (so John's car; John's car's door, but *John's car's door's lock mechanism/Sue's sister; sue's sister's friend, but *Sue's sister's friend's mother). That wouldn't be a problem for a CxG approach though, as you'd just learn that that was the construction. This is basically the induction problem, and the reason CxG just can't work.

    4. @Avery: There isn't much of a problem with a learner that assumes non-recursivity by default and unlocks recursion only if a fixed number of embedding levels have been encountered (presumably 1 or more). Which is exactly how one would design a learner that is limited to positive evidence and cannot use probabilistic reasoning to indirectly get negative evidence. That X'-theory immediately gives rise to recursion holds only if one assumes that the learner immediately posits the same category for the possessive phrase and the possessor, and that is anything but inevitable unless one posits a very rich UG.

      @David: I don't see how MG is better off in this respect than CxG. Merge can easily be restricted to any finite number of embeddings. You can do that in an ugly fashion by refining the category labels for each level of embedding, or you can split that out into a dedicated "levels of embedding" constraint that can be encoded in a myriad of ways. Of course you can set it all up so that recursion arises naturally with the first level of embedding, and you can even make succinctness arguments for that. But you can also generalize over constructions in a manner that gives rise to the same behavior --- fluid CxG, for example, treats constructions very similar to classes in object-oriented programing, so it's easy to use analogues of class inheritance and meta-classes to generalize over constructions. And then you have a succinctness argument there, too.

    5. Oops, MG should have been Minimalism. Not a Freudian slip, I swear ;)

    6. @Thomas: that's why I said 'naive'. But my real issue here is not what kind of learners will or will not work, but what kind and especially amount of evidence is necessary for various kinds of G's to get learned. So in the CHILDES corpus, non-pronominally possessed possessives (Erica's brother's name) occur at a rate of about 6 per million words (in the whole thingp in the UK portion it's only a bit over 3); is that enough, or are the adjectivally modified and coordinated possessors (the little boy's dog, grandma and grandpa's birthday) also needed to deliver a possessor=NP result. Note btw that in the German example, the fact that the possessor consists of a determiner+a noun would indicate that it is a DP, under reasonable assumptions.

      Doubly possessed possessors (John's mommy's friend's dog) otoh don't occur at all, in the whole 13.5 mw, so I think we can take this as evidence that these are 'projected', but that a frequence of < 1 in 10 million words is not enough for something to be learned as a basic fact about a language (a 'parameter', in a somewhat modified use of the term) is just a preliminary guess on my part; I don't know, and don't get the impression that anybody else does either.

      But since GG claims to be based on a the idea of a poverty of the stimulus problem, things might go better if we had a clearer notion of what is in the stimulus vs what is in the Gs, since the stimulus is now far more investigable than it used to be.

    7. @Thomas but the point is that you need an extraneous restriction on Merge to do this. CxG (at least the usage-based variants) have to add in something extra to allow it (something analogous to recursion, in fact). Fluid CxG is basically generative anyway, as is Sign-based CxG, so the issue is the usage-based vs generative nature of the sytem. The core issue is not about formalism, it's about the architecture of the two approaches, I think. Do you build that architecture around a recursive mechanism, or do you build it around non-monotonic generalization. If you do the latter, then it's trivial to learn a language with a restriction to 2 possessors, which is what you don't want empirically. If you do the former, then you need to add something to the system to stop it recursing beyond 2. The former approach is clearly preferable, no?

    8. This comment has been removed by the author.

    9. But Merge on its own is completely vacuous -- everything is an extraneous restriction. Even if you only allow one (or zero) levels of embedding this needs an extraneous restriction.

      I don't think there is any particular learnability difficulty here.
      Say we have three languages
      L0 = { a}
      L1 = { a, ba }
      L2 = {a, ba , bba }
      L* = {a, ba, bba , bbba, ...}

      None of these are particularly hard to learn, nor to distinguish from each other.
      If the question is why L2 never happens, then it is hard to see why this should be something that should be accounted for by a limitation on the learner.

    10. @Alex: Assume indeed that (i) it's exceedingly easy to construct an artificial learner that would learn L2; (ii) we never see a human language that is L2-like; and (iii) with respect to this particular question, we have a sufficiently large sample of unrelated human languages that the absence of L2-like languages is indeed significant.

      One very reasonable explanation for this state of affairs would be that the human learner differs in meaningful ways from the artificial learner that you envision for (i). You can reject this, but then you're on the hook for providing an alternative explanation for the absence of L2-like languages.

    11. @Omer: You're right of course, that there are lots of ways to try and account for typological generalizations. Some might be due to learning biases, some to parsing biases, some to cultural biases, some to language change trajectories, some to grammar, and some to accident.

      If there were an alternative successful learning strategy on the market, then perhaps Alex would be responsible for justifying why he doesn't want to derive a particular typological generalization from learning considerations. As there isn't, I think Alex's comment should be more charitably interpreted in terms of an expert suggesting that there doesn't seem to be an obvious and not completely ad hoc way to cash out David's hand wavy suggestion about the difference between minimalism and CxG.

    12. One obvious partial explanation is just simplicity. If you write down the minimal automaton for the four languages,
      L0 and L* have 2 states, L1 has 3 states, and L2 has 4 states. (much the same applies for a CFG) In the absence of some compelling reason to use L2,
      one would expect the other options to be more easily learned and used etc. and that this would show up typologically.
      Presumably this is why we also don't see L17 or { a, bbba, bbbbbbba}, even those present no particular difficulty for learning.

      But even if there wasn't such an explanation, I don't think that the default assumption should be that every typological universal should be explained by UG, not least because we don't have a good handle on many of the different types of non-UG based explanations.

    13. @Greg, @Alex: I wasn't saying anything about minimalism or CxG, nor even about UG. I was talking about ways in which the human learner might differ from the artificial learner in my (i), above (the one that would have no problem learning L2). @Alex: your sketch involving pressure to minimize the number of states in the constructed automaton is one such example of what the relevant difference could be. We can then have a separate discussion about whether that pressure belongs in UG or not, but I did not take a stand on that. @Greg: I confess I am not familiar enough with CxG to evaluate David's claims regarding minimalism vs. CxG; I was only addressing Alex's comments about whether putting the relative properties "in the learner" is a reasonable move.

    14. Thanks to Alex and Greg for saying pretty much what I would've said. There is one very different point, though, that has not been made yet:

      David explicitly puts aside the generative variants of CxG (not sure I agree with the classification, but mostly because generative has become a very broad catch-all term over the years). But if we only consider usage-based CxG then recursion is a moot point because that variant of CxG rejects the competence-performance afaik (if I'm wrong on that, the rest of the post can be ignored). In usage, you do not have unbounded recursion. What you have is a collection of configurations with a certain amount of self-embedding that varies between those configurations and even individuals. And you also have a big puzzle in that the rules for those configurations do not seem to change past the first level of self-embedding. So you do not get something like "in the first embedded CP, the verb is in the first position, in the second one in the second position, and so on, until we reach our finite cut-off point".

      Minimalists (and many other stripes of linguists) explain this by factorizing the model into two components: a formalism with uniform, unbounded recursion past the first level (if you squint and ignore all the edge cases the formalism is actually capable of), and a constraint that limits this unbounded mechanism to a finite number of levels --- the familiar performance limitations. Usage-based CxG instead chooses to directly describe the intersection of the two. Whether that is a smart thing to do or not depends on your goals. Factorization is usually a good thing, but not every factorization works well for every problem.

    15. Indonesian might constitute a L2-like situation, because two attributive adjectives is sort-of possible, but rare, I'm told, but three is rejected. But I don't know the frequency of two attributive adjectives in texts.

    16. @Thomas I don't think it's quite true that CxG in its usage-based variant doesn't generalize - if you look at Goldberg's work, at least, she explicitly discusses abstractions from surface patterns as a way to store the relevant knowledge. But what is stored a series of constructions linked by default unification. Which makes Alex's L2 easy to learn (not that anyone thinks that the surface string language is actually what is learned). And you're point about why the rules don't change as you get more embedded is absolutely spot on: to capture that with constructions requires you to obligatorily create a cycle in your default unification matrix. Indeed that plus the typological absence of L2, or Ln, where n is a particular finite number, makes the CxG proposals very unattractive, I'd say. And factorization wins the day.

    17. I am very sympathetic to the hypothesis that the way people generalize from language data (language learning) is going to directly shape linguistic typology. Frankly, to me it seems like the kind of thing that has to be true just as it is that language must be compositional in order to account for its capacity to construct longer and longer sound-meaning pairs without limit.

      Alex's example of L0, L1, L2, and L* is a simple case so let's examine it a bit. To clarify some earlier discussion, we are not interested in the fact that for each language there is some algorithm that can learn it, but we are interested in an algorithm that can learn each language L in a set of languages from positive examples of L. In this particular case, we are interested in an algorithm that could learn L0, L1 and L* but not L2.

      It is interesting to observe that L0, L1, and L* are all Strictly 2-Local but L2 is not. One informal definition of SL2 languages is this: A language L is SL2 if there is a finite set of 2-long substrings S such that if every word in L is broken down into its 2-long substrings, each of these occurs in S.

      There are well-understood algorithms which learn the Strictly 2-Local languages. Essentially, these learners work by scanning the words they observe for the 2-long substrings. Since for any given SL2 language the set S is finite, with enough input data, eventually the learner sees them all. It is an interesting to me that such an algorithm, given the word forms in L2 will immediately generalize to the language L*.

      So imagine we discovered another planet whose inhabitants speak L0, L1 and L* but not L2, L3, and so on. Is this fact significant and if so how would we account for it? I personally find the hypothesis that speakers generalize in the way SL2 learners do to be compelling. One reason is that SL2 languages are not just some obscure class of formal languages. Their computational nature reflects a particularly simple kind of memory (essentially Markovian). (BTW, this simple kind of memory is NOT reflected in the size of the automata, but in other kinds of language-theoretic and algebraic properties.) But another reason it would be compelling is because I wouldn't know of any competing hypothesis!

      So to sum up this exercise, the observed language typology on this imaginary planet can lead us to hypothesize a specific explanatory learning algorithm that has clear psychological implications for the nature of memory insofar as it relates to language.

      Of course none of this addresses Alex's question, which is why "should" language learning explain typological gaps, given that so many other factors may do so as well. To me it comes down to the standard scientific excursus of examining concrete hypotheses. If there are competing concrete hypotheses for a typological gap we should investigae areas where they make different predictions. And as Greg points out, most of the time we don't even have concrete hypotheses for the gap. So researchers wave a learning IOU, or a minumum description length IOU, or something else.

      In my opinion, the most concrete learning algorithms which come closest to being able to explain the prenominal possessive type gap are ones developed by Alex and his colleagues for learning subclasses of context-free and context-sensitive languages. These are based on the various notions of "substitutability". One under-appreciated aspect of this line of research, and of grammatical inference more generally, is that the analytical techniques clearly identify finite sets of strings, contexts, or whatever from which the algorithm makes the inductive leap from finitely many strings to infinitely many. Whatever the analogs are in the real world of those finite sets---and some infinite sets which contain those finite sets---are predicted to be gaps if these learning algorithms are hypothesized to underlie the way humans generalize from their linguistic experience.

  2. Well, Chomsky and his cohort *always* respond this way to criticisms; they're always obviously wrong or misguided, and even if they weren't, they'd be irrelevant. It's a fine rhetorical technique, but after decades of this, one begins to wonder. Seems like there are two possibilities: Either (a) somehow critics of Chomsky are always bumblers who cannot understand what he is doing, or (b) something very different is going on.

    But back to the issue at hand. Hauser, Chomsky, and Fitch (HCF) argued the following:

    "FLN is the abstract linguistic computational system alone, independent of the other systems with which it interacts and interfaces....At a minimum, then, FLN includes the capacity of recursion. FLN only includes recursion and is the only uniquely human component of the faculty of language."

    If Everett's observations on Piraha are correct--I'm not in a position to know--then there are a number of possible conclusions:

    1) FLN includes the capacity of recursion, but that capacity is unnecessary for producing language. It's like writing or even speech--sure, they're common and near-universal, but you don't need them. Therefore, if we want to know about the essentials of language, we need to turn away from HCF and look elsewhere.

    2) Thus, maybe FLN is something else--if we accept that there is an "abstract linguistic computational system," then we need to find some other feature than recursion. Again, HCF won't be much help here.

    3) Or, perhaps the concept of FLN is actually vacuous--there is no reason to chop up FL this way. Perhaps there is no uniquely human component, and the difference is one of degree, rather than that of kind.

    Part of the problem is that Chomsky's definition of recursion is vague and ever-changing, which is how you get poor Tecumseh Fitch standing in the Amazon desperately trying to remember what definition of recursion he's supposed to be using. It also seems to have little to do with "recursion" as that term is normally understood in computer science.

    The other part of the problem is that Chomsky has always had a tough time dealing with empirical data. It's a pretty significant weakness, and one gets the impression that he'd rather nobody do any empirical work at all. It is an odd stance from someone who is in the belly of the scientific beast.

    1. So what exactly kind of data are you complaining about Chomsky not being able to deal with? There are many kinds of data, so more specificity in the criticism would be helpful.

    2. Chomsky has been consistent about the recursive device, viz., Merge, which allows for embedding, but also produces strings (to trun to a weak generative perspective) that do not feature embedding in the relevant sense. If DE is right, the empirical question remaining for the generativist is why Merge in this case does not issue in embedding structures. I fail to see any confusion, vagueness, or equivocation here.

    3. Apologies, this is John Collins using someone else's account:)

    4. Well, let me preface this by saying that I'm approaching this issue from the sciences, specifically neuroscience; perhaps the meaning of "data" is different in a field like linguistics. That said, a few points:

      If you read, say, Knowledge of Language, the absence of empirical data is quite striking; it's very different from what a scientific book would do. Certainly there are examples, typically grounded in fairly arcane sentences, but these are more like anecdotal evidence than data; they can provide existence proofs and the like, but there's nothing systematic about their collection, selection, or use.

      If you look instead at the works of Tomasello, Gleitman, Clarke, Bates, etc., you'll see a lot more empirical evidence. Couple that with Chomsky's attitude towards Everett, not to mention his treatment of statistical learning, and you have a field, or at least a professor's perspective, that's very different from the sciences as traditionally understood.

      Heck, you can even listen to Chomsky himself. In the widely panned book Science of Language, he writes "Behavioural science is...keeping to the
      data; so you just know that there’s something wrong with it." When discussing Mendel in the Atlantic, he notes that Mendel "thr[ew] out a lot of the data that didn't work," and "he did the right thing. He let the theory guide the data." To be blunt, there is no point in collecting data if you are going to let the theory dictate what data you are going to accept.

      Or at least that's the view in the sciences; again, linguistics might be different. As Feynman put it, if your idea disagrees with data, it is wrong, and if it can't be tested by data or experiment, it isn't science. Linguistics might have a different view from the sciences--that's a question worth exploring.

      As for recursion--well, it's uncontroversial that Chomsky doesn't mean what "recursion" means anywhere else. For more on the confusion, etc., see this thread which helps explain why Tecumseh Fitch didn't seem to know what he was looking for. Thus "[t]he empirical question remaining for the generativist" is not "why Merge...does not issue in embedding structures," but whether something like Merge exists at all. String-production is pretty feeble on its own, and probably isn't specific to language as normally understood.

    5. Steven P. wrote: Certainly there are examples, typically grounded in fairly arcane sentences, but these are more like anecdotal evidence than data; they can provide existence proofs and the like, but there's nothing systematic about their collection, selection, or use.

      Could you elaborate a bit on what you mean by "but there's nothing systematic about their collection, selection, or use"?

      The way in which Chomsky selected the sentences discussed in KoL (or anywhere else) is, I would think, roughly the same way that any scientist selects the stimuli for their experiments. It is certainly not systematic in the sense that there is no recipe for designing the next "killer experiment"; there is a systematicity required when it gets down to the specifics of designing the appropriate manipulations and controls and so on, but that's there in KoL in the form of minimal pairs (and "minimal 2x2 quadruplets", etc.).

      I think it's misleading to speak of the sentences being collected. What was collected were acceptability judgements, reported by an asterisk or absence-of-asterisk in front of each of the stimuli sentences.

    6. But one can imagine an alternate universe in which generative grammmarians actually wrote generative grammars and systematically evaluated the predictions of the models by sampling from the grammars and testing the accuracy of the predictions by acceptability judgments or ambiguity, or by looking at the coverage of the grammar over some set of examples. This would allow a more systematic and quantitative comparison of different proposed grammars, and grammatical theories.

      (Without going down the Penn treebank route).

      We can start to see glimmers of how this might develop now, with various example sets, with collected acceptability judgments.

    7. KoL is intended for a general audience, as is your other source, Science of Language. If you look at Chomsky's technical publications - to pick one at random, his On WH-movement - they are replete with crucial examples, systematically contrasted and elucidated. Literally hundreds of them, in a 50-something page paper.

      That said, the quality of a research paper isn't necessarily related to the amount of data it adduces, as demonstrated by countless, pointless Big Data publications. A mass of unanalysed data is scientifically worthless.

      Your claim that scientific truth is traditionally adjucated on the basis of individual data is simply false. When Chomsky says that Mendel and Galileo had the right idea because they abstracted away from the noise of the unanalysed mass of data, he is accurately describing the methodology of some of history's greatest scientists. In contrast, if what you are claiming were true, the Standard Model would have been abandoned in around 1980, when it started to become clear that 90-99% of the universe's predicted mass is unaccounted for, undetectable. There is hardly a more stark conflict between theory and data in any of the sciences, yet theory prevails.

      Finally, I don't think linguists need to be taking any methodological cues from neuroscience, which is a bastion of inflated claims and false promises. Very recent work suggests that the field's approach to data itself is misguided and, yes, unscientific.

    8. @doonyakka: I also like how linguistics is said to rely on "arcane" data – and this is supposed to contrast with the "sciences as traditionally understood." Because, you know, everyone can recreate the conditions in the Large Hadron Collider in their own backyard. Nothing "arcane" about that.

      This also misses (by a mile) the fact that the more arcane the linguistic data, the cleaner the experiment. That's because, if there are still robust judgments on this arcane data, there is little to no chance that those judgments come from rote learning or prior exposure. Hence those judgments are much more likely to be clean probes into the nature of linguistic knowledge. Insert your own analogies about creating artificial (near-)vacuum conditions for physics experiments, etc. etc.

      I suspect the previous commenter is equating "sciences as traditionally understood" with large quantities of datas, statistical significance metrics, and p-values – a false equivalence if I've ever seen one.

    9. Another dimension of this is that unlike neuroscientists, many linguists are completely immersed in data (when they are working on their native language), or can very easily get access to considerable amounts of it (by opening a book or looking at a webpage in a language other than their native one that they are working on); it's not until you're doing fieldwork on remote or marginalized languages or varieties without literatures that getting data becomes a problem. So the arcane examples that figure in the literature are like peaks in a landscape that linguists know pretty well.

      The lower areas of which have for centuries (actually millenia) been the subject of traditional descriptive work as found in pedagogical grammars and philological handbooks, which many linguists spend significant parts of their adolescence studying (Icelandic, Egyptian Hieroglyphs, Hebrew, etc etc).

      That's the history, but now, the advent of computing technology, large corpora etc makes something like Alex's vision above much more amenable to realization than it used to be, so I expect that this will start happening to an increasing degree, and indeed already goes on to some extent in LFG & HPSG.

    10. @StevenP: I think your your reading of generative linguistics and your characterisation of the standard view in science more generally are both way off. Open up any issue of NLLT, Syntax, LI, etc and you'll find systematically collected data where the data collection methodology is appropriate to the data (usually judgment tasks, sometimes formal experimental work), and use of these data for development of theories. But the theories, in generative linguistics as in science more generally are not theories of the data. They are theories of the principles and laws of some aspect of the natural world (galaxies, ecosystems, vision, language,...). The data, together with an interpretation emerging from an analysis of that data (whether statistical or not), provides evidence for one theory or other of the relevant aspect of the world. That Chomsky quote you used simply states that providing summaries of data isn't engaging in the whole of the scientific enterprise and makes the further claim that that is true of `behavioural science'. That latter claim probably too strong, depending on what one characterizes as `behavioural science' - there's clearly good theoretical work in various aspects of psychology grounded in theory - but the point is fairly straightforward.

      On recursion, I'd recommend Lobina's 2014 `what linguists are talking about when talking about ...'. I don't dispute that the field as a whole has not been confused about this concept until recently (indeed, I think that one good effect of Everett's work is to force us to bring some clarity to this issue), but I think Chomsky himself has been pretty consistent (I spent a long time reading all of this stuff while preparing that response to Vyvyan Evans a while back and, on this matter, Chomsky has been remarkably consistent, with hardly any terminological slips over half a century of writing on the topic).

    11. @Tim Hunter: Well, when my colleagues in psychology give people (say) passages to remember, they have to be remarkably specific about how they selected those passages, etc., and they typically have to argue that the results they've obtained are not specific to those particular passages. Now, something similar does happen in corpus linguistics and the like, but it's generally pretty clear that judgments of competence/acceptability are not as universal as they seem. (I do agree that acceptability judgments are what matter.)

      @doonyakka: I'm not sure what you mean by "individual" data--I didn't use the word. If you mean "one single experiment" then you're right, but that's not what anyone means by "adjudicated by data." If your data consistently disagree with theory, then the theory needs to be modified, constrained, or thrown out entirely. Period.

      As for what you mean "abstract[ing] away from the noise of the unanalysed mass of data," it depends on what you mean. Data are always "cleaned" using a variety of procedures--outlier trimming, standardization, manipulation checks, etc.--but these are done *without the goal of obtaining a particular result.* If you are throwing out data *because* they don't agree with a theory, then it's not clear to me why you bothered collecting the data in the first place; it's something like cargo-cult science. (As for your comments about the Standard Model, it was pretty clear by at least the late 90s that the model needed to be revised; by the 2000s, the need was so well-accepted that it was showing up in Scientific American and Wikipedia.) I nvere said that "big data" was necessarily useful, though obviously it can be.

      @Omer: There's a lot more to neuroscience than p-values (which I acknowledge are at best highly dubious and at worst actively misleading and destructive).

      @davidadger Thanks; I"ll read the Lobina paper.

    12. @StevenP: I'm sick and swamped with work atm, so I’ll try to keep this short :)

      1) I'm not sure what you mean by "individual" data

      I mean a datum, cf “individual cars”, “individual sheep”. So, I’m saying that a single datum (or the result of an experiment, if you will) is not enough to prove or doom a scientific theory, at least if standard scientific norms and practice are anything to go by.

      2) If your data consistently disagree with theory, then the theory needs to be modified, constrained, or thrown out entirely. Period.

      Sure, but are you suggesting that there is anything in linguistics that comes close to the discrepancy between theory and data that has existed in physics since the early 80s? (Btw, I’m taking Rubin 1980 as a starting point, though the data were already recognised as problematic in the 60s). Physicists continue to assume the Standard Model is largely correct, and postulate a kludge (“dark matter”) for which the “evidence” is entirely theory-internal, akin to the post-Newtonian postulation of the aether. They may recognise that the Standard Model needs to be revised but, if they haven’t "thrown out the theory entirely" after 50+ years of failing to find a solution, then I can’t help but feel that you are holding linguistics to a higher standard than the hard sciences.

      Which data can you point to that so decisively and “consistently disagree with" anything in GG?

      3) Data are always "cleaned" using a variety of procedures--outlier trimming, standardization, manipulation checks, etc.--but these are done without the goal of obtaining a particular result.

      Again, an example of this happening in linguistics would help. In physics, vacuums and frictionless planes go far beyond “outlier trimming, standardization, manipulation checks” and, indeed, far beyond anything that I can think of in linguistics, since the “arcane” examples you disparage have actually been uttered in the real world.

      4) Can we agree that your characterisation of Chomsky as always having had "a tough time dealing with data" was inaccurate?

      OK, so much for keeping it short...

  3. All of these criticisms are rather odd - the first one being fallacious, at best - but the last one is particularly baffling, given that Chomsky's insights, from the very beginning, have been motivated by a wealth of interesting facts about data that no one had even noticed, let alone tried to explain, before. I don't think it's too much of an exaggeration to say that, were it not for those data (coupled with Chomsky's original and incisive reflection on them), linguistics would be a very different beast than it is today. This may be for better or worse, depending on your theoretical/sociopolitical biases and commitments, but it's not a result of any supposed aversion to data from Chomsky, because there's no such thing.

  4. I didn't suggest that Chomsky or anyone else defines recursion as Merge, but only that Merge is recursive. Recursion simply describes a function that effectively enumerates a set. Such functions that allow for embedding are an interesting subset, but also produce non-embedding structures. I couldn't follow the rest.