Wednesday, September 28, 2016

Vox cognoscenti

The thoroughly modern well informed member of the professional classes reads Vox (I do). Not surprisingly, then, Vox has reviewed Wolfe’s book to provide discussion fodder for those that frequently eat out in groups. Unlike the NYT or The Chronicle, Vox found a linguist, John McWhorter (at Columbia) to do the explaining. The review (henceforth JMR) (here) makes three points: (i) that Wolfe really has no idea what he is talking about, (ii) that GGers are arrogant and dismissive and their work (overly) baroque, and (iii) that Everett and other critics might really have a point, though (and here JM is being fair and balanced) so does Chomsky and Everett’s critics.

The biggest problem with JMR lies with its attempts to split the difference between Chomsky and his “critics.” As I’ve noted before (and the reason I put critics in scare quotes), Everett (and Vyvyan Evans who also gets a mention) have no idea what Chomsky has been claiming and so their “critiques” have little to do with what he has actually said. They thus cannot be criticisms of Chomsky’s work and are thus of little value in assessing Chomsky’s claims.

Furthermore, it is clear that these critiques are of interest to the general public and are covered by the high brow media precisely to the extent that they show that Chomsky is wrong about the nature of language, as he understand this. Indeed, this is why part of every title to every piece covering these critiques declares that Chomsky is wrong about Universal Grammar (UG). Nobody outside a small number of linguists really cares about whether Piraha embeds sentences! All the fireworks in the high-brow press are due to what Everett’s findings mean for the Chomsky program, which is precisely nothing for it rests on a simple misunderstanding of Chomsky’s claims (see here and here for recent discussion).

The sociological significance of the expansive coverage of these “critiques” given their shoddiness is another matter. It says a lot about how much our thought leaders want to discredit Chomsky’s non-linguistic views. I would go further: I doubt that our thought leaders care much about the fine structure of the Faculty of Language. But they hope that discrediting Chomsky’s scientific/linguistic project might also serve to discredit his non-linguistic ones. However, given FoL’s remit, I won’t here develop this (pretty obvious) line of speculation. Instead I will lightly review some JMR highlights and make the (by now, I hope) obvious points. 

A. JMR describes the position that Wolfe is attacking (via Everett’s work) as follows:

Wolfe’s topic is Noam Chomsky’s proposal that all humans are born with a sentence structure blueprint programmed in their brains, invariant across the species, and that each language is but a variation upon this "universal grammar" generated by an as-yet unidentified "language organ." In other words, we are born already knowing language. (2-3)

This is a very misleading way of putting Chomsky’s claims about UG. A better analogy is that there is a biologically given recipe for constructing Gs on the basis of PLD. These Gs can (and do differ) significantly. The “blueprint” analogy suggests that all Gs have the same structures, with a tweak here or there (JRM: “few “switches” that flip in the toddler’s brain”). And this suggests that finding a language with a very different G “blueprint” (Piraha say, which JRM (reporting on Everett) writes does not allow for “the ability to nest ideas inside one another” (with an example of multiple sentential embedding as illustration)) would constitute a problem for Chomsky’s FL conception as it would fail to have a key feature of UG (“the absence proves that no universal grammar could exist”). But, as you all know, this is incorrect. It is consistent with Chomsky’s views that the “blueprints” differ. What matters is that the capacity to draw them (i.e. acquire Gs) remains the same. Put more directly, JMR suggests a Greenbergian understanding of Chomsky Universals. And that, is a big no-no![1]

Now, to be honest, I sometimes had trouble distinguishing what JMR is reporting from what JMR is endorsing. However, as the whole point of the non-debate is that what Everett criticizes is at right angles to what Chomsky claims, leaving this fuzzy severely misreports what is going on. Especially when JMR reports that Chomsky “didn’t like this,” thereby suggesting that it was the content of Everett’s claim that Chomsky objected to, rather than the logic behind it. Chomsky’s primary objection was, and still is, that even if Everett is right about Piraha, it has nothing to do with GG claims about whether recursion is built into the structure of FL/UG.

This is the important point about Everett’s research, and it must be highlighted in any informative review. Once this point is firmly and clearly made one can raise secondary issues (very secondary IMO): whether Everett’s specific claims about Piraha are empirically accurate (IMO, likely not). However, this is decidedly a secondary concern if one’s interest is the relevance of Everett’s claims to Chomsky’s claims concerning the structure of FL/UG. JMR fails to make this simple logical point. Hence, whatever its other virtues, it serves to obscure the relevant issues and so to misinform.

            B. JMR writes that the “meat of the debate” revolves around “Chomskyans belief that adaptations have arisen in the brain that serve exclusively to allow speech.” This is contrasted with views that believe that “speech merely piggybacks on equipment that already evolved to allow advanced thought.”

There are some small bones to pick with this description. Thus, the issue is not speech but linguistic knowledge more generally. But let’s put this aside. Is there really a disagreement between Chomsky and his critics about how linguistically specific FoL is? I doubt it. Or, more accurately, if there is such a disagreement Chomsky’s critics have had nothing whatsoever to say about it. Why?

The question is an interesting one and, as you all know, it is the central question animating the Minimalist Program (MP). MP takes as a research question whether FoL is entirely reducible to operations and primitives of general cognition and computation or whether there is a linguistically specific residue despite FoL’s computational operations largely overlapping with general principles of computation and cognition. 

Before addressing how would one go about resolving this debate, let me observe (again) how modest the Chomskyan claim is. It does not say that every part of FoL is linguistically specific. It does not deny that language interacts with other areas of cognition, emotion or culture. It does not assert that every detail of linguistic competence or behavior is insulated from everything else we know and do. Nope. It makes the very modest claim that there is something special about language, something that humans have qua being human and that this is interesting and investigatable.

Of course, over the years linguists have made specific proposals concerning what this something special might be and have identified properties of FoL that don’t look to be easily reducible to other cognitive, computational, emotional or cultural factors. But this is what you would expect if you took the question seriously. And you would expect those that took the opposite view (i.e there is nothing linguistically special about human linguistic facility) to show how to reduce these apparent linguistically sui generis facts to more general facts about cognition, computation or whatever. But you would be wrong. The critics almost never do this. Which suggests, that there really is no serious debate here. Debate would require both sides to address the question. So far as I can tell, critics interpret Chomsky as claiming that culture, general cognition etc. have no impact on any part of language knowledge or use. They then go on to point to cases where this appears to be false. But as Chomsky never denied this, as his claim is far more modest, these observations, like those of Everett’s concerning recursion, are beside the point. To have a debate, there must be some proposition being debated. So far as I can tell, once again this is false in this particular case. Hence no debate.

JMR notes that dealing with the substantive question of the linguistically specificity of FoL requires getting empirically and theoretically quite technical.[2]

…without a drive-by of this rather occult framework, one can’t begin to understand the contours, tone, and current state of the debate Wolfe covers. (8)

Of course, JMR is correct. How could it be otherwise? After all, if one is arguing that the computations are linguistically sui generis then one needs to specify what these are. And, not surprisingly, these investigations can get quite technical. And JMR understands this. However, it also seems to find this offensive. Note the occult. Later on JMR says:

…from one academic generation to the next, this method [standard GG analyses:NH] of parsing language has mission-crept into a strangely complicated business, increasingly unrelated to what either laypeople or intellectuals outside of linguistics would think of as human language. It is truly one of the oddest schools of thought I am familiar with in any discipline; it intrigues me from afar, like giant squid and 12-tone classical music. (10)

Hmm. JMR clearly suggests that things have gotten too technical. A little is ok, but GGers have gone overboard. JMR seems to believe that dealing with the question about FoL’s specific fine structure should be answerable without getting too complex, without leaving the lay person behind, without technical intricacies. Imagine the reaction to a similar kind of remark if applied to any other scientific domain of inquiry. Reminds me of the Emperor’s quip to Mozart in Amadeus: Sorry Mr Mozart, too many notes!

JMR concedes that all of this extra complexity would be fine if only there was evidence for it.

The question is whether there is independent evidence that justifies assuming that speech entails these peculiar mechanisms for which there is no indication in, well, how people talk and think.

And the problem is that this independent evidence does not seem to exist; anyway, outsiders would find it peculiar how very little interest practitioners have in demonstrating such evidence. Rather, they stipulate that syntax should be this way if it is to be "interesting," if it is to be, as the literature has termed it, "robust" or "rich." Yet where does the idea that how we construct sentences must be "robust" or "rich" in the way this school approves of come from? It’s an assumption, not a finding. (13)

This is calumny. If there is one thing that linguists love to do is find empirical consequences of some piece of formal machinery. But, with this summary judgment, JMR joins the Everett/Evans camp and simply asserts that it is too much. There really are too many notes- “Split IP, Merge, phases and something called “little v”” (14). That these proposals come backed by endless empirical justification is hardly mentioned, let alone discussed. Look, I love hatchet jobs, but as JMR notes about Wolfe, even a drive-by heading to this conclusion requires more than assertion.

I suspect that JMR includes this to be able to play both sides of the fence: sure Wolfe knows nothing, but really he is somewhat right. No. He isn’t. Nor is JMR’s suggestion that there is something to Wolfe’s suspicions justified or, IMO, justifiable.

C. Then there is the linguists and their “bile” against anyone “questioning universal grammar” (16). More specifically against Everett.

On a personal note, I did not take any interest in Everett’s findings until I read the New Yorker piece, and then only because of how badly it misrepresented matters. Nor do I believe that anyone else would have noticed it much, but for the public brouhaha. Even then, had the high-brow press not used Everett’s work to denigrate my own, I would have given it a free pass. But this is not what occurred. The claim was repeatedly made that Everett’s work demonstrates that GG is wrong. Efforts to show that this is incorrect have not been greeted nearly as enthusiastically. JMR mischaracterizes the state of play. And in doing so, once again, obscures the issues at hand.

The “GGers (Chomkyans) are vitriolic” trope has become a staple of the “GG/Chomskyan linguistics is dead” meme. Why? There is one obvious reason. It allows Chomsky’s critics to shift debate from the intellectual issues and refocus them on the personal ones (i.e. to move from content to gossip). To argue against the claims GG has made requires understanding them. It also takes a lot of work because there is a lot of this kind of work out there. Saying that GGers are meanies allows one to stake the high ground without doing any of the intellectual or empirical hard work. This is not unlike political coverage one finds in the press: lots of personality gossip, very little policy analysis.

Why is JMR so sensitive? Apparently some students at some conference found what they were hearing uninteresting (18). Though, JMR notes that “most Chomkysans” are not as dismissive (19). Yet he mentions that there does exist “a current of opinion within the Chomskyan syntax orbit that considers most kinds of linguistic inquiry as beside the point” (19). So, some students are bored about anything outside their immediate interests and some linguists are dismissive. And this means what exactly? It means that GGers are vitriolic and dismissive (though most aren’t), or they could be because such dismissiveness is in the air. This is really dumb stuff, not even People Magazine level titter.

D. Towards the end, JMR notes that Everett has not really made his case concerning Piraha (25-6). Indeed, he finds the Nevins, Pesetsky and Rodrigues rebuttal “largely convincing” and believes that it is “quite plausible that Piraha is not as quirky a language as Everett proposed” (26). Here my views and JMR’s coincide. However, to repeat, JMR’s reasonable discussion mainly serves to obscure the main issue . Let me end with this (again).

JMR, like many of the other discussions in the high-brow press frame the relevant debate in terms of whether Everett is right about Piraha embedding. This presupposes that what Everett found out (or not) about Piraha is relevant to Chomsky’s claims. Because the coverage is framed as an empirical debate, when GGers dismiss Everett’s claims they can be described as having acted inappropriately. They have failed to dispassionately consider the empirical evidence, evidence which the coverage regularly reports would undermine central tenets of Chomsky’s theory of FL/UG if accurate. But this framing is wrong. There is no empirical debate because the presupposition is incorrect. What has gotten (some) GGers hot under the collar is this mis-framing. It is one thing to be shown to be wrong. It is quite another to have people debunk views that you have never held and then accuse you of being snippy because you refuse to hold those views. I don’t dismiss Everett’s views because I fear they might be right. I dismiss these views because they are logically irrelevant to the claims I am interested in (specifically whether (some kind of) recursion is a linguistically specific feature of FL) and because every time this is pointed out the critic’s feelings get hurt. That really does boil the blood.

To end: As reviews go, JMR is not the worst. But it is bad enough. And part of what makes it bad is its apparent judiciousness. It follows the standard tropes and frames the issues in the now familiar ways, albeit with a node here and there to Chomsky and GG. But, as noted, that is the problem. It really seems to be hard for many to accept that so much contention in the press can be based on a pun (GUs vs CUs) and that the whole “debate” is intellectually vapid. But that’s the way it is. Let’s hope this is the last round for the time being.

[1] If memory serves, Stephen Jay Gould discussed a similar problem in biology with the notion blueprint. He noted that for a long time genetic inheritance was conceived in blueprint terms. This, Gould argued, led to homoncular theories of genetic information transmission (we each had a smaller version of ourselves deep down that contained the relevant genetic pattern). This made sense, he argued, if we thought in terms of blueprints. Once we shifted to thinking in terms of codes, the homunculus theory disappeared. I have no idea where Gould made this point, but it has interesting parallels in current conceptions of UG as blueprints.
[2] I’ve discussed this issue before and noted how one might go about trying to adjudicate it rationally (see here).

Friday, September 23, 2016

Chomsky was wrong

An article rolled across my inbox a few weeks ago entitled Chomsky was Wrong. Here's a sequence of six short bits on this article.

Yes, Chomsky said this

When I read the first sentence that new research disproved Chomsky's claim that English is easy - that it turns out English is a hard language - I thought it was a parody. To start with, Chomsky said no such thing. But, indeed, it's about a real paper - even if that paper is about English orthography. In SPE (Sound Pattern of English), Chomsky and Halle claimed that English orthography was "near-optimal": that it reflected pretty closely the lexical representations of English words, except where the pronunciation was not predictable.

That's not what you expect to be reading about when you read about Chomsky. For one thing, there's a reason that there's a rumour Chomsky didn't even write any of SPE. The rumour is pretty clearly false, but Chomsky never worked in phonology again, and he certainly didn't write anything close to all of SPE. For another thing, after all the attacks on "Chomsky's Universal Grammar," it's jarring to read a rebuttal of a specific claim.

But here it is. Let's give both credit and blame where credit is due. Chomsky's name's on the book, so he's responsible for what's in it. If it's wrong, then fine. Chomsky was wrong.

The paper in question is sound

The paper in question is by Garrett Nicolai and Greg Kondrak of the University of Alberta, and it's from the 2015 NAACL (North American Association for Computational Linguistics), linked here.

Nicolai and Kondrak have a simple argument. Any spelling system that's isomorphic to the lexical representations of words should also be "morphologically consistent." That is, the spelling of any given morpheme should be the same across different words. Of course: because multi-morphemic words, at least according to SPE, are built by combining the lexical representations of their component morphemes. Regardless of what those lexical representations are, any spelling that reflects them perfectly will have perfect morphological consistency. English spelling, it turns out, doesn't have this property.

As Chomsky and Halle observed, though, there's a reason that this perfect transparency might not hold for a real spelling system: the spelling system might also want to tell you something about the way the word is actually pronounced. In words like deception, pronounced with [p] but morphologically related to deceive, which is pronounced with [v], you can have a morphologically consistent spelling in which the dece- morpheme has a p in deception, or in which it has a in deceive, (or neither), but you can't have both and still be morphologically consistent. And yet, morphological consistency can make the pronunciation pretty opaque. And so Nicolai and Kondrak have a way to evaluate what's driving English spelling's lack of morphological consistency is perhaps a dose of reader-saving surface-pronunciation transparency. It's not.

The paper is nice because it gets around the obvious difficulty in responding to arguments from linguists, which is that they are usually tightly bound to one specific analysis. Here the authors have found a way to legitimately skip this step (reflecting the underlying forms - thus at least being morphologically consistent - except when really necessary to recover the surface pronunciation). It's a nice approach, too. The argument rests on the constructibility a pseudo-orthography for English which maximizes morphological consistency except when it obscures the pronunciation - exactly what you would expect from a "near-optimal" spelling system - a system that turns out to have much higher levels of both morphological consistency and surface-pronunciation transparency than traditional English orthography. I review some of the details of the paper - which isn't my main quarry - at the bottom below for the interested.

Hanlon's razor

Sometimes you see scientific work obviously distorted in the press and you say, I wonder how that happened - I wonder what happened between the interview and the final article that got this piece so off base. No need to wonder here.

A piece about this research (a piece which was actually fine and informative) appeared on the University of Alberta news site (Google cached version) about a year after the paper was published. Presumably, the university PR department came knocking on doors looking for interesting research. The university news piece was then noticed by CBC Edmonton, who did an interview with Greg Kondrak on the morning show and wrote up a digested version online. The author of this digested version evidently decided to spice it up with some flippant jokes and put "Chomsky was wrong" in the headline with a big picture of Chomsky, because people have heard of Chomsky.

The journalist didn't know too much about the topic, clearly. In an earlier version Noam Chomsky was "Norm Chomsky," Morris Halle was "Morris Hale," and the photo caption under Chomsky was missing the important context - about how the original claim was, in the end, an insignificant one - and so one could read, below Chomsky's face, the highschool-newspaper-worthy "This is the face of a man who was wrong." And, predictably, "English spelling" is conflated with "English", leading to the absurd claim that "English is 40 times harder than Spanish."

The awkward qualification that now appears in the figure caption ("This is the face of a man who - in a small segment from a book published in 1968 - was wrong") bears the mark of some angry linguists with pitchforks complaining to the CBC. Personally, I don't know about the pitchforks. Once I realized that the paper was legit, I wasn't able to muster raising a hackle about the CBC article. It doesn't appear to be grinding an axe, just a bad piece of writing. If it weren't for the fact that, in many other quarters, the walls echo with popular press misinformation about generative linguistics which is both damaging and wrong, this article wouldn't even be a remote cause for concern.

It is possible to talk about Chomsky being wrong without trying to sell me the Brooklyn Bridge

The Nicolai and Kondrak paper, and the comments they gave to the university news site, show that you can write something that refutes Chomsky clearly, and in an accurate, informed, and mature way. The content of the paper demonstrates that they know what they're doing, and have thought carefully about what Chomsky and Halle were actually saying. In the discussion, nothing is exaggerated, and no one is claiming to be the winner or exaggerating their position as the great unlocker of things.

Contrast this with Ibbotson and Tomasello's Scientific American article. Discussed by Jeff in a three-part series recently on the blog by Jeff, it purports to disprove Chomsky. It should be possible to write a popular piece summarizing your research program without fleecing the reader, but they don't. When the topic is whether Chomsky is right or wrong, fleece abounds.

Let me just take three egregious instances of simple fact checking in the Scientific American article:
  • "Recently ... cognitive scientists and linguists have abandoned Chomsky’s 'universal grammar' theory in droves"
    • (i) abandoned - as in, previously believed it but now don't - (ii) in droves - as in, there are many who abandoned all together - and (iii) recently. I agree that there are many people who reject Chomsky, and (i) is certainly attested over the last 60 years, but (ii) and (iii), or any conjunction of them, is totally unfounded. It feels like disdain for the idea that one should even have to be saying factually correct things - a Trump-level falsehood.
  • "The new version of the theory, called principles and parameters, replaced a single universal grammar for all the world’s languages with ..."
    • There was never any claim to a single grammar for all languages.
  • "The main response of universal grammarians to such findings [about children getting inversion right in questions with some wh-words but not others] is that children have the competence with grammar but that other factors can impede their performance and thus both hide the true nature of their grammar"
    • I know it's commonplace in describing science wars to make assertions about what your opponent said that are pulled out of nowhere, but that doesn't make it right. I so doubt that the record, if there is one, would show this to be the "main response" to the claim they're referring to, that I'm willing to call this out as just false. Because this response sounds like a possible response to some other claim. It just doesn't fit here. This statement sounds made up.

This article was definitely written by scientists. It contains some perfectly accurate heady thoughts about desirable properties of a scientific theory, a difficult little intellectual maze on the significance of the sentence Him a presidential candidate!?, and it takes its examples not out of noodling a-priori reasoning but actually out of concrete research papers. In principle, scientists are careful and stick to saying things that are justified. And yet, when trying to make the sale, the scientist feels no compunction about just making up convenient facts.

Chomsky and Halle's claim sucks

The statement was overblown to begin with. C&H are really asking for this to be torn down. Morphological-consistency-except-where-predictable is violated in the very examples C&H use to demonstrate the supposed near optimality, such as divine (divinE), related in the same breath to divinity (divin- NO e -ity), which is laxed under the predictable trisyllabic laxing rule.

But the claim can, I think, further, be said to "suck" in a deeper way in the sense that it

  1. is stated, in not all but many instances as it's raised throughout the book, as if the lexical forms given in SPE were known to be correct, not as if they were a hypothesis being put forward
  2. is backed up by spurious and easily defeasible claims, convenient for C&H if they were true - but not true.
Some examples of (2) are the claim on page 49 that "the fundamental principle of orthography is that phonetic variation is not indicated where it is predictable by general rule" - says who? - and the whopper in the footnote on page 184 (which also contains examples of (1)):

Notice, incidentally, how well the problem of representing the sound pattern of English is solved in this case by conventional orthography [NB: by putting silent e at the end of words where the last syllable is, according to the SPE analysis, [+tense], but leaving it off when it's [-tense], while, consistent with the SPE proposal, leaving the vowel symbol the same, in spite of the radical surface differences induced by the vowel shift that applies to [+tense] vowels]. Corresponding to our device of capitalization of a graphic symbol [to mark +tense vowels], conventional orthography places the symbol e after the single consonant following this symbol ([e] being the only vowel which does not appear in final position phonetically ...). In this case, as in other cases, English orthography turns out to be [!] rather close to an optimal system for spelling English. In other words, it turns out to be rather close to the true [!] phonological representation, given the nonlinguistic constraints that must be met by a spelling system, namely, that it utilize a unidimensional linear representation instead of the linguistically appropriate feature representation and that it limit itself essentially to the letters of the Latin alphabet.

Take a minute to think about the last statement. There are many writing systems that use two dimensions, including any writing system that uses the Latin alphabet with diacritics. In most cases, diacritics are used to signify something phonetically similar to a given sound, and, often, the same diacritic is used consistently to mark the same property across multiple segments, much like a phonological feature. Outside the realm of diacritics, Korean writing uses its additional dimension to mark several pretty uncontroversial phonological features. As far as being limited to letters of the Latin alphabet goes - let's assume this means for English, and not really for "spelling systems"  - just as with diacritics, new letters have been invented as variants of old ones, throughout history. My guess is that this has happened fairly often. And, after all - if you really felt you had to insert an existing letter to featurally modify an old one, presumably, you would stick the extra letter next to the one you were modifying, not as a silent letter at the end of the syllable. 

As for making it sound like the theory was proven fact, maybe it's not so surprising. Chomsky, in my reading, seems to hold across time pretty consistently to the implicit rhetorical/epistemological line that it's rarely worth introducing the qualification "if assumption X is correct." Presumably, all perceived "fact" is ultimately provisional anyway - so who could possibly be so foolish as to take any statement claiming to be fact at face value? I don't really know if Chomsky was even the one responsible for leaving out all the instances of "if we're correct, that is" throughout SPE. But it wouldn't surprise me. But Chomsky isn't alone in this - Halle indulges in the same, and, perhaps partly as a result, the simplified logic of 1960s generative phonology to suppose on the basis of patterns observed in a dictionary that one has all the evidence one needs about the generalizations of native speakers is still standard, drawing far too little criticism in phonology. Hedging too much in scientific writing is an ugly disease, but there is such a thing as hedging too little.

In the current environment, where we're being bombarded with high profile bluster about how wrong generative linguistics is, it's worth taking a lesson from SPE in what not to do. Chomsky likes to argue and he's good at it. Which means you never really lose too much when you see him valuing rhetoric over accuracy. He's fun and interesting to read, and the logic of whatever he's saying is worth pursuing further, even if what he's saying is wrong. And you know he knows. But if an experimental paper rolled across my desk to review and it talked about its conclusions in the same way SPE does, only the blood, of the blood, sweat and tears that would go into the writing of my review, would be metaphorical.

Make no mistake - if it comes to a vote between a guy who's playing complicated intellectual games with me and a simple huckster, I won't vote for the huckster. But I won't be very happy. Every cognitive scientist, I was once cautioned, is one part scientist and one part snake oil salesman.

Nicolai and Kondrak is a good paper, and, notably, despite being a pretty good refutation of a claim of Chomsky's, it's a perfectly normal paper, in which the Chomsky and Halle claim is treated as a normal claim - no need for bluster. And the CBC piece about it is a lesson. If you really desire your scientific contribution to be coloured by falsehood and overstatement, you're perfectly safe. You have no need to worry, and there's no need to do it yourself. All you have to do is send it to a journalist.

Some more details of this paper

Here is a graph from Nicolai and Kondrak's paper of the aforementioned measures of morphological consistency ("morphological optimality") and surface-pronunciation transparency ("orthographic perplexity") - closer to the origin is better on both axes:

The blue x on sitting on the y axis, which has 1 for orthographic perplexity (but a relatively paltry 93.9 for morphemic optimality), is simply the IPA transcription of the pronunciation. The blue + sitting on the x axis, which has 100 for morphemic optimality (but a poor 2.51 for orthographic perplexity), is what you would obtain if you simply picked one spelling for each morpheme, and concatenated them as the spelling of morphologically complex words. (The measure of surface-pronunciation transparency is obviously sensitive to how you decide to spell each morpheme, but for the moment that's unimportant.)

Importantly, the blue diamond is standard English orthography ("traditional orthography", or T.O.), sitting at a sub-optimal 96.1 morphemic optimality and 2.32 orthographic perplexity (for comparison, SR and SS, two proposed spelling reforms, are given). On the other hand, Alg, the orange square, is a constructed pseudo-orthography that keeps one spelling for each morpheme except where the pronunciation isn't predictable, in which case as few surface details as possible are inserted, which leads to a much better orthographic perplexity of 1.33, while maintaining a morphemic optimality of 98.9. This shows that there's no obvious excuse for the lack of morphological consistency.

What keeps English orthography from being optimal? If you apply some well-known English spelling rules it's easy to see. If in your calculation of morphological consistency you ignore the removal the final silent e in voice etc which disappears in voicing, the spelling of panic (and other words that can be followed by -ing to violate the consistency of the pronunciation of -ci- as [s]) as panick instead, and the replacement of the y in industry and other similar words with i in industrial), along with a few other obvious changes, then English orthography pops up to 98.9 percent morphemic optimality, the same level as Alg.

Those spelling rules should attract the attention of anyone who's read SPE, as they're tangentially related to the vowel shift rule, the velar softening rule, and the final yod, but the fact is in all these cases the spelling, with respect to the SPE analysis of these words, reflects the lexical representation in the base form and the surface pronunciation in the derived form. Well, sure, which means that these alternations are presumably at least throwing the orthography a bone as far as its pronunciation transparency goes, but you can do way, way, way better. That's the point. English orthography may be better than it could be if it were maximally morphologically consistent, but it doesn't seem to be optimal.

For details of the measures you can have a look at the paper.

Wednesday, September 21, 2016

Two readables

The science is dying meme is big nowadays. Stories about the replicability crisis are now a staple of everyday journalism and everything from bad incentives to rampant scientific dishonesty are cited as causes for the decline of science. I have been (and remain) quite unmoved by this for several reasons.

First, I have no idea if this is much worse than before. Before the modern era did science faithfully replicate its findings and things have gotten worse? Maybe a replication rate of 25% (the usual horror story number) is better than it used to be. How do we know what a good rate ought to be? Maybe replicating 25% of experiments is amazingly good. We need a base rate, and, so far, I have not seen one provided. And until I do see one, I cannot know whether we are in crisis mode or not.  But I am wary, especially of decline from a golden age stories. I know we no longer live in an age of giants (nobody ever lives in a golden age of giants). The question is whether 50 years from now we will discover that we actually had lived in such a golden age. You know, when the dust has settled and we can see things more clearly.

Second, I think that part of the frustration with our current science comes from having treated anything with numbers and "experiments" as science. The idea seems to be that one can do idea free investigations. Experiments are good or not on their own regardless of the (photo)theory they are tacitly or explicitly based on. IMO, what makes the "real" sciences experimentally stable is not only their superior techniques, but the excellent theory that it brings to the investigative table. This body of knowledge serves as prophylactic against misinterpretation. Remember, never trust a fact until it has been verified by a decent theory! And, yes, the converse also holds. But the converse is taken as definitional of science while the role theory does in regulating experimental inquiry is, IMO, regularly under-appreciated.

So, I am skeptical. This said, there is one very big source of misinformation out there, especially in domains where knowledge translates into big money (and power). We see this in the global warming debates. We saw it on research into tobacco and cancer. Indeed, there are whole public relations outfits whose main activity is to spread doubt and misinformation dressed up as science. And recently we have been treated to a remarkable example of this. Here are two interesting pieces (here, here) on how the sugar industry shaped nutrition science quite explicitly and directly for their own benefit. These cases leave little to the imagination as regards science disrupting mechanisms. And they occurred a while ago, one might be tempted to say in the golden age.

As funding for research becomes more and more privatized this kind of baleful influence on inquiry is sure to increase. People want to get what they are paying for, and research that impinges on corporate income is gong to be in the firing line. If what the articles say is correct, the agnotology industry is very very powerful.

A second interesting piece for those interested in the Sapir-Whorf hypothesis. I have been persuaded by people like Lila that there is no real basis for the hypothesis, i.e. that one's particular language has, at best, a mild influence on the way that one perceives the world. Economists however are unconvinced. Here is a recent piece arguing that the gender structure of a language's pronoun system has effects on how women succeed sociopolitically. Here is the conclusion:

First, linguistic differences can be used to uncover new evidence such as that concerning the formation and persistence of gender norms. Second, as the observed association between gender in language and gender inequality has been remarkably constant over the course of the 20th century, language can play a critical role as a cultural marker, teaching us about the origins and persistence of gender roles. Finally, the epidemiological approach also offers the possibility to disentangle the impact of language from the impact of country of origin factors. Our preliminary evidence suggests that while the lion’s share of gender norms can be attributed to other cultural and environmental influences, yet a direct role language should not be ignored.
Evaluating this is beyond my pay grade, but it is interesting and directly relevant to the Sapir-Whorf hypothesis. True? Dunno. But not uninteresting.

Monday, September 19, 2016

Brain mechanisms and minimalism

I just read a very interesting shortish paper by Dehaene and associates (Dehaene, Meyniel, Wacongne, Wang and Pallier (DMWWP) that appeared in Neuron. I did not find an open source link, but you can use this one if you are university affiliated. I recommend it highly, not the least reason being that Neuron is a very fancy journal and GG gets very good press there. There is a rumor running around that Cog Neuro types have dismissed the findings of GG as of little interest or consequence to brain research. DMWWP puts paid to this and notes, quite rightly, that the problem lies less with GG than with the current state of brain science. This is a decidedly Gallistel inspired theme (i.e. the cog part of cog-neuro is in many domains (e.g. language) healthier and more compelling than the neuro part and it is time for the neuro types to pay attention and try to find mechanisms adequate for dealing with the well grounded cog stuff that has been discovered rather than think it msut be false because the inadequate and primitive neuro models (i.e. neural net/connectionist) don’t have ways of dealing with it) and the more places it gets said the greater the likelihood that CN types will pay attention. So, this is a very good piece for the likes of us (or at least me).

The goal of the paper is to get Cog-Neuro Science (CNS) people to start taking the integration of behavioral, computational and neural as CNS’s main central concern. Here is the abstract:

A sequence of images, sounds, or words can be stored at several levels of detail, from specific items and their timing to abstract structure. We propose a taxonomy of five distinct cerebral mechanisms for sequence coding: transitions and timing knowledge, chunking, ordinal knowledge, algebraic patterns, and nested tree structures. In each case, we review the available experimental paradigms and list the behavioral and neural signatures of the systems involved. Tree structures require a specific recursive neural code, as yet unidentified by electrophysiology, possibly unique to humans, and which may explain the singularity of human language and cognition.

I found the paper interesting in at least three ways.

First, it focuses on mechanisms, not phenomena. So, the paper identifies five kinds of basic operations that reasonably underlies a variety of mental phenomena and takes the aim of CNS to (i) find where in the brain these operations are executed, (ii) provide descriptions of circuits/computational operations that could execute such operations and (iii) investigate how these circuits might be/are neutrally realized.

Second, it shows how phenomena can be and have been used to probe the structure of these mechanisms. This is very well done for the first three kinds of mechanisms: (i) approximate timing of one item relative to the proceeding one, (ii) chunking items into larger units, and (iii) the ordinal ranking of items. Things get more speculative (in a good way, I might add) for the more “abstract” operations: the coding of “algebraic” patterns and nested generated structures.

Third, it gives you a good sense of the kinds of things that CNS types want from linguistics and why minimalism is such a good fit for these desires.

Let me say a word about each.

The review of the literature on coding time relations is a useful pedagogical case. DMWWP reviews the kind of evidence used to show that organisms “maintain internal representations of elapsed time” (3). It then look for “a characteristic signature” of this representation and the “killer” data that supports the representational claim. It then reviews the various brain locations that respond to these signature properties and review the kind of circuit that could code this kind of representation, arguing that “predictive coding” (i.e. ones that “form an internal model of input sequences”) is the right one in that it alone accommodates the basic behavioral facts (4) (basically minsmatched negativity effects without an overt mismatch). Next, it discusses a specific “spiking neuron model” of predictive coding (4) that “requires a neurophysiological mechanism of “time stamp” neurons that are tuned to specific temporal intervals,”  which have, in fact, been found in various parts of the brain. So, in this case we get the full Monte: a task that implicates signature properties of the mechanism, that demands certain kinds of computational circuits, realized by specific neuronal models, realized in neurons of a particular kind, found in different parts of the brain. It is not quite the Barn Owl (see here), but it is very very good.

DMWWP do this more or less again for chunking, though in this case “the precise neural mechanisms of chunk formulation remain unknown” (6). And then again for ordinal representations. Here there are models for how this kind of information might be neutrally coded in terms of “conjunctive cells jointly sensitive to ordinal information and stimulus identity” (8). These kinds of conjunctive neurons seem to be all over the place, with potential application, DMWWP suggests, as neuronal mechanisms for thematic saturation.

The last two kinds of mechanisms, those that would be required to represent algebraic patterns and hierarchical tree-like structures are behaviorally very well-established but currently pose very serious challenges on the neuro side. DMWWP observes that humans, even very young ones, demonstrate amazing facility in tracking such patterns. Monkeys also appear able to exploit similar abstract structures, though DMWWP suggests that their algebraic representations are not quite like ours (9). DMWWP further correctly notes that these sorts of patterns and the neural mechanisms underlying them are of “great interest” as “language, music and mathematics” are replete with such. So, it is clear that humans can deploy algebraic patters which “abstract away from the specific identity and timing of the sequence patterns and to grasp their underlying pattern,” and maybe other animals can too. However, to date there is “no accepted neural network mechanism to accomplish this and it looks like “all current neural network models seem too limited to account for abstract rule-extraction abilities” (9). So, the problem for CNS is that it is absolutely clear that human (and maybe monkey) brains have algebraic competence though it is completely unclear how to model this in wet ware. Now, that is the right way to put matters!

This last reiterates conclusions that Gallistel and Marcus have made in great detail elsewhere. Algebraic knowledge requires the capacity to distinguish variables from values of variables. This is easy to do in standard computer architectures but is not at all trivial in connectionist/neural net frameworks (as Gallistel has argued at length (e.g. see here)). Indeed, one of Gallistel’s main arguments with such neural architectures is their inability to distinguish variables from their values, and to store them separately and call them as needed. Neural nets don’t do this well (e.g. they cannot store a value and later retrieve it), and that is the problem because we do and we do it a lot and easily. DMWWP basically endorses this position.

The last mechanism required is one sufficient to code the dependencies in a nested tree.[1] One of the nice things about DMWWP is that it recognizes that linguistics has demonstrated that the brain codes for these kinds of data structures. This is obvious to us, but the position is not common in the CNS community and the fact that DMWWP is making this case in Neuron is a big deal. As in the case of algebraic patterns, there is no good models of how these kinds of (unbounded) hierarchical dependencies might be neurally coded. The DMWWP conclusion? The CNS community should start working on the problem. To repeat, this is very different from the standard CNS reaction to these facts, which is to dismiss the linguistic data because there are no known mechanisms for dealing with it.

Before ending I want to make a couple of observations.

First, this kind of approach, looking for basic computational mechanisms that are implicated in a variety of behaviors, fits well with the aims of the minimalist program (MP). How so? Well, IMO, MP has two immediate theoretical goals: to show that the standard kinds of dependencies characteristic of linguistic competence are all different manifestations of the same underlying mechanism (e.g. are all instances of Merge). Were it possible to unify the various modules (binding, movement, control, selection, case, theta, etc) as different faces of the same Merge relation and were we able to find the neural “merge” circuit then we would have found the neural basis for linguistic competence. So if all grammatical relations are really just ones built out of merges, then CNSers of language could look for these and thereby discover the neural basis for syntax. In this sense, MP is the kind of theory that CNSers of language should hope is correct. Find one circuit and you’ve solved the basic problem. DMWWP clearly has bought into this hope.

Second, it suggests what GGers with cognitive ambitions should be looking for theoretically. We should be trying to extract basic operations from our grammatical analyses as these will be what CNSers will be interested in trying to find. In other words, the interesting result from a CNS perspective is not a specification of how a complicated set of interactions work, but isolating the core mechanisms that are doing the interacting. And this implies, I believe, trying to unify the various kinds of operations and modules and entities we find (e.g. in a theory like GB) to a very small number of core operations (in the best case just one). DMWWP’s program aims at this level of grain, as does MP and that is why they look like a good fit.

Third, as any MPer knows, FL is not just Merge. There are other operations. It is useful to consider how we might analyze linguistic phenomena that are Merge recalcitrant in these terms. Feature checking and algebraic structures seem made for each other. Maybe memory limitations could undergird something like phases (see DMWWP discussion of a Marcus suggestion on p. 11 that something like phases chunk large trees into “overlapping but incompletely bound subtrees”). At any rate, getting comfortable with the kinds of mental mechanisms extant in other parts of cognition and perception might help linguists focus on the central MP question: what basic operations are linguistically proprietary? One answer is: those operations required in addition to those that other animals have (e.g. time interval determination, ordinal sequencing, chunking, etc.).

This is a good paper, especially so because of where it appears (a very leading brain journal) and because it treats linguistic work as obviously relevant to the CNS of language. The project is basically Marr’s, and unlike so much CNS work, it does not try to shoehorn cognition (including language) into some predetermined conception of neural mechanism which effectively pretends that what we have discovered over the last 60 years does not exist.

[1] DMWWP notes that the real problem is dependencies in an unbounded nested tree. It is not merely the hierarchy, but the unboundedness (i.e. recursion) as well.

Tuesday, September 13, 2016

The Generative Death March, part 3. Whose death is it anyway?

I almost had a brain hemhorrage when I read this paragraph in the Scientific American piece that announced the death of generative linguistics:

“As with the retreat from the cross-linguistic data and the tool-kit argument, the idea of performance masking competence is also pretty much unfalsifiable. Retreats to this type of claim are common in declining scientific paradigms that lack a strong em­­pirical base—consider, for instance, Freudian psychology and Marxist in­­terpretations of history.”

Pretty strong stuff. Fortunately, I was able to stave off my stroke when I realized that this claim, i.e., that performance can’t mask competence, is possibly the most baseless of all of ITs assertions about generative grammar.

Consider the phenomenon of agreement attraction:

(1) The key to the cabinets is/#are on the table

The phenomenon is that people occasionally produce “are” and not “is” in sentences like these (around 8% of the time in experimental production tasks, according to Kay Bock) and they even fail to notice the oddness of “are” in speeded acceptability judgment tasks. Why does this happen? Well, Matt Wagers, Ellen Lau and Colin Phillips have argued that (at least in comprehension) this has something to do with the way parts of sentences are stored and reaccessed in working memory during sentence comprehension. That is, using an independently understood model of working memory and applying it to sentence comprehension these authors explained the kinds of agreement errors that English speakers do and do not notice. So, performance masks competence in some cases. 

Is it possible to falsify claims like this one? Well, sure. You would do so by showing that the independently understood performance system didn’t impact whatever aspect of the grammar you were investigating. Let’s consider, for example, the case of island-violations. Some authors (e.g., Kluender, Sag, etc) have argued that sentences like those in (2) are unacceptable not because of grammatical features but because of properties of working memory.

(2) a.  * What do you wonder whether John bought __?
b.  * Who did the reporter that interviewed __ win the Pulitzer Prize

So, to falsify this claim about performance masking competence Sprouse, Wagers and Phillips (2012) conducted an experiment to ask whether various measures of variability in working memory predicted the degree of perceived ungrammaticality in such cases. They found no relation between working memory and perceived ungrammaticality, contrary to the predictions of this performance theory. They therefore concluded that performance did not mask competence in this case. Pretty simple falsification, right?

Now, in all fairness to IT, when they said that claims of performance masking competence were unfalsifiable, they were talking about children. That is, they claim that it is impossible for performance factors to be responsible for the errors that children make during grammatical development, or at least that claims that such factors are responsible for errors are unfalsifiable. Why children should be subject to different methodological standards than adults is a complete mystery to me, but let’s see if there is any merit to their claims.

Let’s get some facts about children’s performance systems on the ground. First, children are like adults in that they build syntactic representations incrementally. This is true in children ranging from 2- to 10-years old (Altmann and Kamide 1999, Lew-Williams & Fernald 2007, Mani & Huettig 2012, Swingley, Pinto & Fernald 1999; Fernald, Thorpe & Marchman 2010). Second, along with this incrementality children display a kind of syntactic persistence, what John Trueswell dubbed “kindergarten path effects”. Children show difficulty in revising their initial parse on the basis of information arriving after a parsing decision has been made. This syntactic persistence has been shown by many different research groups (Felser, Marinis & Clahsen 2003, Kidd & Bavin 2005, Snedeker & Trueswell 2004, Choi & Trueswell 2010, Rabagliati, Pylkkanen & Marcus 2013).  

These facts allow us to make predictions about the kinds of errors children will make. For example, Omaki et al (2014) examined English- and Japanese-learning 4-year-olds’ interpretations of sentences like (3).

(3) Where did Lizzie tell someone that she was going to catch butterflies?

These sentences have a global ambiguity in that the wh-phrase could be associated with the matrix or embedded verb. Now, if children are incremental parsers and if they have difficulty revising their initial parsing decisions, then we predict that English children should show a very strong bias for the matrix interpretation, since that interpretation would be the first one an incremental parser would access. And, we predict that Japanese children would show a very strong bias for the embedded interpretation, since the order of verbs would be reversed in that language. Indeed, that is precisely what Omaki et al found, suggesting that independently understood properties of the performance systems could explain children’s behavior. Clearly this hypothesis is falsifiable because the data could have come out differently.

A similar argument for incremental interpretation plus revision difficulties has also been deployed to explain children’s performance with scopally ambiguous sentences. Musolino, Crain and Thornton (2000) observed that children, unlike adults, are very strongly biased to interpret ambiguous sentences like (4) with surface scope:

(4) Every horse didn’t jump over the fence
a. All of the horses failed to jump (= surface scope)
b. Not every horse jumped (= inverse scope)

Musolino & Lidz (2006), Gualmini (2008) and Viau, Lidz & Musolino (2010) argued that this bias was not a reflection of children’s grammars being more restricted than adults’ but that other factors interfered in accessing the inverse scope interpretation. And they showed how manipulating those extragrammatical factors could move children’s interpretations around. Moreover, Conroy (2008) argued that a major contributor to children’s scope rigidity came from the facts that (a) the surface scope is constructed first, incrementally, and (b) children have difficulty revising initial interpretations. Support for this view comes from several adult on-line parsing studies demonstrating that children’s only interpretation corresponds to adults’ initial interpretation.  

Again, these ideas are easily falsifiable. It could have been that children were entirely unable to access the inverse scope interpretation and it could have been that other performance factors explained children’s pattern of interpretations. Indeed, the more we understand about performance systems, the better we are able to apportion explanatory force between the developing grammar and the developing parser (see Omaki & Lidz 2015 for review).

So, what IT must have meant was that imprecise hypotheses about performance systems are unfalsifiable. But this is not a complaint about the competence-performance distinction. It is a complaint about using poorly defined explanatory predicates and underdeveloped theories in place of precise theories of grammar, processing and learning. Indeed, we might turn the question of imprecision and unfalsifiability back on IT. What are the precise mechanisms by which intuition and analogy lead to specific grammatical features and why don’t these mechanisms lead to other grammatical features that happen not to be the correct ones? I’m not holding my breath waiting for an answer to that one.

Summing up our three-day march, we can now evaluate IT’s central claims.

1) Intuition and analogy making can replace computational theories of grammar and learning. 
Diagnosis: False. We have seen no explicit theory that links these “general purpose” cognitive skills to the kind of grammatical knowledge that has been uncovered by generative linguistics. Claims to the contrary are wild exaggerations at best.

2) Generative linguists have given up on confronting the linking problem.
Diagnosis: False. This problem remains at the center of an active community of generative acquisitionists. Claims to the contrary reflect more about ITs ability to keep up with the literature than with the actual state of the field.

3) Explanations of children’s errors in terms of performance factors are unfalsifiable and reflect the last gasps of a dying paradigm.
Diagnosis: False. The theory of performance in children has undergone an explosion of activity in the past 15 years and this theory allows us to better partition children’s errors into those caused by grammar and those caused by other interacting systems.

IT has scored a trifecta in the domain of baseless assertions. 

Who’s leading the death march of declining scientific paradigms, again?