Comments

Saturday, February 7, 2015

The future of linguistics; two views

Addendum: I would like to apologize for systematically mis-spelling Peter Hagoort's name. Before I revised the post below, I called him 'Hargoort.' This was not due to malice (someone called 'Norbert' is not really into the name mocking business) but because I cannot spell. Sorry.

Peter Hagoort and I are both worrying about the future of linguistics (see here). [1] We both lament the fact that linguistics once played “a central role in cognitive science” but that “studies in language relevant topics are no longer strongly influences by the developments in linguistics.” This is unfortunate, according to Hagoort, because “linguists could help cognitive (neuro)scientists to be more advanced in their thinking about the representational structures in the human mind.” He is right, of course, but why he thinks this is very unclear given what he actually proposes, as I show below.

I too lament these sad happenings. Moreover, I am saddened because I have seen that consistent collaboration between GGers and psychologists, computationalists and neuroscientists is not only possible but also very fruitful. Nor is it that hard, really. I know this because I live in a department that does this every day and all the time. Of course doing good work is always difficult, but doing good work that combines a decent knowledge of what linguists have discovered with techniques and insights from other near-by disciplines (psych, CS, neuroscience) is quite doable. It is even fun.

Unfortunately, Hargoort’s piece makes clear, he has no idea that this is the case. He appears to believe that linguists have very little to offer him. I suspect that this might indeed be true for the kinds of questions he seems interested in. But I would conclude from this that he is missing out some really interesting questions. Or, to be more charitable: what is sad about cognitive neuroscience of the Hagoort variety is that it has stopped investigating the kinds of questions that knowing something about linguistics would help answer. Why is this?  Hagoort’s diagnosis of linguistics’ fall from cogneuro grace offers an explanation. He identifies three main problems with linguistics of the generative variety. I will review them and comment seriatim.

First, Hagoort believes that the sorts of representations that GGers truck in are just not right for the brain. In other words, he believes that cogneuro has shown that brains cannot support the kinds of mental representations that linguists have argued for. As Hagoort puts it: “language-like structures do not embody the basic machinery of cognition” (2). How does he know? His authority is Paul Churchland who believes that “human neuronal machinery differs from that of other animals in various small degrees but not in fundamental kind. ” The conclusion is that the language like representations that GGers typically use to explain linguistic data are not brain-structure compatible.

Unfotunately, Hagoort does little more than baldly state this conclusion in this short piece.[2] However, the argument he points to is really quite bad. Let’s assume that Hargoort is right and that non-linguistic cognition (he illustrates with the imagery debate between Kosslyn and Pylyshyn) does not use language like structures in executing its functions (I personally find this unlikely, but let’s assume it for the sake of argument). Does Hagoort really believe that linguistic behavior does not exploit “language-like structures” (what Hargoort calls “linguaform”)? Does Hagoort really believe that not even sentences have sentential structure? If he does believe this, then I await the dropping of the second shoe. Which one? The one containing the linguaformless reanalyses of the myriad linguistic phenomena that have been described using linguaforms. To my knowledge this has never been seriously attempted. Paul Churchland has never suggested how this might be done, nor has Hagoort so far as I know. The reason is that sentences have structure, as 60 years of linguistic research has established, and so far as we can tell, the structure that phrases and sentences have (and linguistic sounds and words and meanings) are unlike the structures that scenes and non-linguistic sounds and smells have. And as linguistics has shown over the last 60 years of research, these structural features are important in describing and explaining a large array of linguistic phenomena. So, if linguists have been wrong about the assumption that “linguiforms” are implicated in the description and explanation of these patterns, then there is a big empirical problem waiting to be tackled: to reanalyze (viz. re-describe and re-explain) these very well studied and attested linguistic data in non-“linguaform” terms. Hagoort does not mention this project in his short speech. However, if he is serious in his claims, this is what he must show us how to do. I very much doubt that he will be able to do it. In fact, I know he won’t.

Let me go further still. As I never tire of mentioning, GG has discovered a lot about natural language structure, both its universal properties and its range of variation. GGers don’t understand everything, but there is wide consensus in the profession that sentences have proposition-like structure and that the rules of grammar exploit this structure. This is not controversial. And if it is not, then however much our brains resemble those of other animals, the fact that humans do manipulate “linguaforms” implies that humans have some mental/neural capacities for doing so, even if other animals do not.[3] Moreover, if this is right, then Hagoort’s finger is pointing in the wrong direction. The problem is not with linguistics, but with the cognuero of language. It has decided to stop looking at the facts, something that we can all agree is not a good sign of scientific health within the cogneuro of language.

So Hagoort is ready to ignore what GG has discovered without feeling any obligation to account for this “body of doctrine.” How come? He actually provides two reasons for this neglect (though he doesn’t put it this way).

The first reason he provides is that linguists are a contentious lot who not only (i) don’t agree with one another (there is “no agreed upon taxonomy of the central linguistic phenomena”) but (ii) have also “turned their backs to the developments in cognitive (neuro) science and alienated themselves from what is going on I adjacent fields of research” (2).  I somewhat sympathize with these two points. A bit, not a lot.  Let me say why.

Let’s address (i): Contrary to the accepted wisdom, linguistics has been a fairly conservative discipline with later work largely preserving the insights of earlier research and then building on these. This may be hard to see if you are outsider. Linguists, like all scientists, are proud and fractious and argumentative. There exists a bad habit of pronouncing revolutions every decade or so. However, despite some changes in theory, GG revolutions have preserved most of the structures of the ancien regimes. This is typical for a domain of inquiry that has gained some scientific traction, and it is what has taken place in GG as well. However, independently of this, there is something more relevant to Hagoort’s concerns. For the purposes of most of what goes on in cogneuro, it really doesn’t matter what vintage theory one adopts.

Let me be blunter. I love Minimalist investigations, but for most of what is studied in language acquisition, language processing and production, and neurolinguistics it really doesn’t matter whether you adopt the latest technology or the newest fangled concepts. You can do good work in all of these areas using GB models, LFG models, HPSG and GPSG models, Aspects models, and RG models. For many (most?) of the types of questions being posed in these domains all these models describe things in effectively the same way, make essentially the same distinctions and adopt more or less the same technology.

I’m not making this up. I really do know this to be true for I have seen this at work in my own department. There may be questions for which the differences between these various approaches matter (though I am pretty skeptical about this as I consider many of these as notational variants rather than differing theories), but for most everything I have personally witnessed, this has not been the case. This indicates that contrary to what Hargoort reports, there is a huge overlapping consensus in GG about the basic structure of natural language. That he has failed to note this, IMO, suggests that he has not really taken a serious look at the matter (or asked anyone).  Of course, life would be nicer were there less pushing and pulling within linguistics (well maybe, I like the contention myself), but that’s what intro texts are for and by now there are endless numbers of these in linguistics that Hargoort could easily consult. What he will find is that they contain more or less the same things. And they are more than sufficient for many of the things he might want to investigate, or that’s my guess.

Hagoort’ claim (ii) is that linguists ignore what is going on in cogneuro. Is this correct? Some do, some don’t. As I noted, my own department is very intellectually promiscuous, with syntacticians, phonologists and semanticists mixing freely and gaily with psycho, computational and neuro types on all sorts of projects. However, let’s again assume that Hagoort is right. The real question is intellectually speaking, who needs who more? I would contend that though checking in with your intellectual neighbors is always a good thing to do, it is currently possible (note the italicized adjective please) to do fine work in syntax while ignoring what is happening in the cogneuro of language. The opposite, I would contend, is not the case.  Why? Because to study the cogneuro of X you need to know something about X. Nobody doing the cogneuro of vision would think that ignoring what we know about visual perception is a good idea. So why does Hagoort think that not knowing anything about linguistic structure is ok for the study of the cogneuro of language? All agree that the cogneuro of language aims to study those parts of the brain that allow for the use and acquisition of language. Wouldn’t knowing something about the thing being used/acquired be useful? I would think so. Does Hagoort?

This is not apparent from his remarks, and that is a problem. Imagine you were working on the cogneuro of vision, and say that the people who work on visual perception were rude and obstreperous, would it be scientifically rational to conclude that its ok to ignore their work when working on the cogneuro of vision? I would guess not. Their results are important for your work. So even if it might be hard to get what you need from a bunch of uncivilized heathens, it doesn’t make getting what you need any less critical.  So even if Hagoort is right about the lack of interest among linguists for cogneuro, that’s not a very good reason to not claw your way to their results (is it Peter?).

This said, let me admonish my fellow linguists: If a cogneuro person comes and asks you for some linguistic instruction BE NICE!! In fact be VERY, VERY NICE!!! (psst: it appears that they bruise easily).

So, IMO, the first two reasons that Hagoort provides are very weak. Let’s turn to his third for, if accurate, it could explain why Hargoort thinks that work in linguistics can a safely ignored. The third problem he identifies concerns the methodological standards for evidence evaluation in linguistics. He believes that current linguistic methods for data collection and evaluation are seriously sub-par. More specifically, our common practice is filled with “weak quantitative standards” and consists of nothing more than “running sentences in your head and consulting a colleague.” I assume that Hagoort further believes that such sloppiness invalidates the empirical bases of much GG research.[4]

Sadly, this is just wrong. There has been a lot of back and forth on these issues over the last five years and it is pretty clear that Gibson and Federenko’s worries are (at best) misplaced. In fact, Jon Sprouse and Diogo Almeida have eviscerated these claims (see herehere, here and here). They have shown that the data that GGers use in everyday practice is very robust and that there is nothing lacking in the informal methods deployed. How do we know this? It can be gleaned from the fact that using more conventional statistical methods beloved of all psychologists and neurosceintists yield effectively the same results. Thus, the linguistic data that GG linguists have collected in their informal way (consulting their intuitions and asking a few friends) are extremely robust, indeed more robust than those typically found in psych and cogneuro (something that Sprouse and Almeida also demonstrate). Hagoort does not appear to know about this literature.[5] Too bad.[6]

In light of his (as we have seen, quite faulty) diagnosis, Hagoort offers some remedies. They range from the irrelevant, to the anodyne to the misinformed to the misguided. The irrelevant is to do “proper experimental research,” (i.e. do what Sprouse and Almeida show linguists already do). The anodyne is to talk more to neuroscientists. This is fine advice, the kind of thing that deans say when they want to look like they are saying something stimulating but really have nothing to say. The misinformed is to work more on language phenomena and less on “top heavy theory.” The misguided is to “embed linguistic theory in “a broader framework of human communication.”  Let me address each in turn and then stop.

The irrelevant should speak for itself. If Sprouse and Alemeida are right (which I assure you they are; read the papers) then there is nothing wrong with the data that GGers use. That said, there is nothing inherently wrong (though it is more time consuming and expensive though and no more accurate) with using more obsessive methods when greater care is called for. My colleagues in acquisition, processing and production use these all the time. Sprouse has also used them when the data as conventionally gathered has failed to be clear cut. As linguistics develops and the questions it asks become more and more refined it would surprise me were we forced to the anal retentive more careful methods that Hagoort prizes. There is nothing wrong with using these methods where useful, but there is nothing good about using them when they are not (and to repeat they are far less efficient). It all depends, like most things.

The anodyone should also be self-evident. Indeed, in some areas (e.g. phonology and morphology) cogneuro techniques promise to enrich linguistic methods of investigation. But even in those areas where this is less currently obvious (e.g. syntax, semantics) I think that having linguists talk to neuroscientists will help focus the latter’s attention onto more interesting issues and may help sharpen GGers explanatory skills. So, in addition to it just being good to be catholic in one’s interests, it might even be mutually beneficial.

The third suggestion above is actually quite funny. Clearly Hagoort doesn't talk to many linguists or read what they write or go to their talks. Most current work is language based and very descriptive. I’ve discussed this before, lamenting the fact that theoretical work is so rarely prized or pursued (see here). Hagoort has already gotten his wish. All he has to do is talk to some linguists to realize that this desire is easily met.

The last point is the one to worry about, and it is the one that perhaps shows why Hagoort is really unhappy with current linguistics. He understands language basically from a communication perspective. He wants linguists to investigate language use, rather than language capacity. Here we should resist his advice. Or we should resist the implication that the communicative use of language is the important problem while liming the contours of human linguistic capacity is at best secondary. From where I sit, Hagoort has it exactly backwards. Language use presupposes knowledge of language. Thus, the former is a far more complicated topic than the latter. And all things being equal, studying simple things is a better route to scientific success than studying more complicated ones. At any rate, the best thing linguists who are interested in how language is used to communicate can do is keep describing how Gs are built and what they can do.

That’s what I think. I believe that Hagoort believes the exact opposite. He is of the opinion that communication is primary while grammatical competence is secondary, and here, I believe that he is wrong. He gives no arguments or reasons for this view and until he does, this very bad advice should be ignored. Work on communication if you want to, but understanding how it works will require competence theories of the GG variety. It won’t replace them.

Ok, this has been far too long a post. Hagoort is right. Linguistics has gone into the shadows. It is no longer the Queen of the Cognitive Neuro-Sciences. But this demotion is less for intellectual than socio-political reasons, as I’ve argued extensively on FoL.  I am told that among cogneuro types, Hagoort is relatively friendly to linguistics.  He thinks it worth his time to advise us. Others just ignore us. If Hagoort is indeed our friend, then it will be a long time before linguistics makes it back to the center of the cogneuro stage. This does not mean that good work combining neuro and GG cannot be pursued. But it does mean that for the nonce this will not be received enthusiastically by the cogneuro community. That is too bad; Sociologically (and economically) for linguistics, intellectually for the cogneuro of language.





[1] Actually entered my e-mail. Thx Tal and William.
[2] This is not to fault him, for this was an address, I believe and like all good addresses, brevity is the soul of wit.
[3] Some readers may have captured a whiff of the methodological dualism discussed here and here.
[4] He refers to Gibson and Federenko’s 2010 TICS paper and this is what it argues.
[5] Those interested in a good review of the issues can look at Colin Phillips’ slides here.
[6] There is a certain kind of cargo-cult quality to the obsession with the careful statistical vetting of data. I suspect that Hargoort insists on this because it really looks scientific. You know the lab coats, the button boxes, the RSVP presentation really looks professional. Maybe we should add ‘sciency’ to Colbert’s ‘truthy’ to describe what is at issue sociologically.

74 comments:

  1. Is there an example of a case where the results of a psycholinguistic experiment were used to decide between two theories of syntactic representations in a way that became generally accepted among syntacticians?

    ReplyDelete
    Replies
    1. I can think of one along these lines off the top of my head: Sprouse, Wagers & Phillips (2012): A test of the relation between working-memory capacity and syntactic island effects. They attempted to rule out processing based accounts of islands in favor of grammatical accounts due to the lack of a correlation between working memory capacity and acceptability.

      But this isn't exactly what you asked; I am unaware of any psycholinguistic or neuroimaging experiment that impinged on disputes between linguistic theories.

      http://ling.umd.edu/~colin/wordpress/wp-content/uploads/2014/08/sprousewagersphillips2012.pdf

      Delete
    2. There are various examples of studies that have been presented as evidence for one representational theory over another, but in my experience: (i) few of them really stick, and (ii) the first point is irrelevant, as folks often pay attention to the experimental result that confirms their preference, and then tune out. Case in point: it's surprising how many linguists of a certain vintage continue to think that "trace reactivation" findings from 25 years ago provide evidence for movement/trace-based accounts of unbounded dependencies. I'm mildly sympathetic to such account, but it has been clear for 20+ years that such evidence is indecisive. For discussion of attempts to resolve representational disputes using experimental evidence we have a couple of overviews: Phillips & Wagers 2007 (on wh-movement); Phillips & Parker (on ellipsis).
      One case that I think was fairly influential -- I know that it influenced me -- was the argument by Chien & Wexler (1990) that preschoolers show a dissociation between coreference and bound variable anaphora in their application of Principle B. The so-called "Quantificational Asymmetry." Alas, when we tried to defend this finding against concerns raised by Paul Elbourne (2005), we found that he was right (Conroy et al., 2009, Ling. Inq.).

      http://ling.umd.edu/~colin/wordpress/wp-content/uploads/2014/08/phillipswagers2007.pdf
      http://ling.umd.edu/~colin/wordpress/wp-content/uploads/2014/08/phillips-parker2014.pdf
      http://ling.umd.edu/~colin/wordpress/wp-content/uploads/2014/08/ctlp2009.pdf

      Delete
    3. The best example I have for where psycho tests have altered one's views of representational properties of the grammar is the work on the meaning of 'most' carried out by Hunter, Pietroski, Lidz and Halberda. I discussed this here:
      http://facultyoflanguage.blogspot.com/search?q=Halberda
      Similar results in syntax proper are hard to come by. Colin's remark seems apt to me. In retrospect, the DTC arguments deployed against certain transformational derivations now look pretty accurate (e.g. deriving short from long passive). At the time they were not decisive. However, logically, it looks like the arguments were on the right track. So, as a matter of fact, psycho results have not been particularly influential, though I believe that as the field matures, this could change. There are limits to the resolution of our informal methods.

      Delete
    4. It doesn't meet the criterion of "generally accepted among syntacticians", but one piece of work that makes the kind of argument that we might hope for is this paper by John Hale (2006):
      http://dx.doi.org/10.1207/s15516709cog0000_64
      He compares the classical adjunction analysis of relative clauses with the promotion analysis. Given a particular linking hypothesis (the Entropy Reduction Hypothesis), these two analyses of relative clauses make different predictions about the difficulty of comprehending relative clauses where the gap is in subject position (SRCs) versus in object position (ORCs) versus in indirect object position, etc. It turned out that the predictions of the promotion analysis fit better with the empirical comprehension-difficulty facts than those of the adjunction analysis.

      Now I'm not suggesting that this paper should convince syntacticians to throw the adjunction analysis of relative clauses out the window. In fact I'm happy to admit that, in the debate between those two analyses of relative clauses, John's finding should probably carry much less weight than the traditional kinds of evidence that syntacticians use. (Indeed its weight is probably negligible.) The reason is that the link from a syntactic analysis to a comprehension-difficulty prediction is only as good as the linking hypothesis (i.e. the Entropy Reduction Hypothesis) which although I happen to like it certainly can't be taken as a given; and even if one does take the ERH as a given, then there are further nitty-gritty choices to be made about exactly how one supplements a grammar with the probabilistic information that the ERH uses to derive its predictions. So although the choice of grammar (e.g. adjunction vs. promotion) is one factor that contributes to the overall comprehension-difficulty predictions, it is only one of a handful of such factors, so it's hard to know where to assign blame or credit. Having said all that, I think the important point is that, well, the choice of grammar is one factor that contributes to the overall comprehension-difficulty predictions. So one has to shore up all the other contributing factors to before making strong conclusions about the grammar, but I think this provides the right kind of model for work of the sort that Tal is asking about.

      As for the debate between grammatical and "processing-based" accounts of island effects, I think Colin's 2006 paper using (potential) parasitic gaps is one of the more convincing arguments. Extremely convincing, actually.
      http://ling.umd.edu/~colin/wordpress/wp-content/uploads/2014/08/phillips2006-islands.pdf

      Delete
    5. Another instance is the Kobele et al paper on the psycholinguistic predictions of head movement versus remnant movement. Me and some students of mine are doing similar work right now on relative clause analyses. The problem with all of them is that they require a linking hypothesis between syntactic analyses and observed processing effects, and there isn't enough evidence at this point to decivisely favor one over the other. But at least you can get results like "syntactic analysis S works with linking hypothesis L, and S' works with L', but the other two combinations fail", which is better than nothing.

      Delete
  2. This comment has been removed by the author.

    ReplyDelete
  3. In my opinion, adverting to Sprouse and Almeida in response to Hagoort's worry about methodological standards is a bad idea, almost as bad as his (ii) suggestion to "do proper experimental research".

    Before continuing, let me flag the fact that I haven't actually read the Sprouse and Almeida papers myself. I disagree with the way that their work has often been presented to me by other linguists, but I honestly have no idea what Sprouse and Almeida actually have to say for themselves.

    Anyway, the reason that I think both Hagoort's (ii) suggestion and adverting to Sprouse and Almeida in response are bad ideas is because doing so legitimizes the worry that we haven't been doing proper experimental research. And I think this worry is indicative of a fundamental misunderstanding of the I-language hypothesis about the nature of language. If the I-language hypothesis is correct—and, as far as I can tell, there is absolutely no reason to doubt this—then judgments from any one speaker are useful for investigating both properties of grammars (Gs, as Norbert calls them) and properties of UG/FoL. Attempting to legitimize our field, then, by either doing 'proper experimental research' or by dismissing the methodological concern by adverting to Sprouse and Almeida validates the concern that judgments from a single speaker cannot tell us anything interesting about language.

    When such methodological concerns are raised I think we should instead be pushing back against the concern itself, not saying that it's okay because everybody has these same judgments and go look at Sprouse and Almeida. Instead, we should be explaining the I-language hypothesis and the evidence for it as well as explaining the untenability of an E-language hypothesis about the nature of language.

    Two last clarifying points:

    First, this is not at all to say that 'proper experimental research' isn't interesting or useful. It's certainly interesting and useful. But, as a strategy for increasing the impact and visibility of linguistics within cognitive science, I think it is completely counterproductive. Again, doing so validates a concern that comes from not understanding what linguists take the nature of language to be, which I think will make it even harder to get cognitive scientists to understand and appreciate linguistics if we are validating their misunderstanding of our central claim about the very nature of language.

    Second, I am also not saying that the Sprouse and Almeida work is uninteresting or not useful. I think it is, but I think it is intersting and useful for a very different reason. What it tells us is that all speakers more or less converge on the same grammar despite having completely different primary linguistic data. And this is something that quite easily might not have been true. It certainly seems to be the case from basic inspection, but it might not have been the case for the 'weird' example sentences that linguists invent. So I think it is interesting and useful to see proof of this hunch that speakers really do converge on the same grammar, more or less, even for the very obscure constructions invented by linguists. I just don't think it's an interesting or useful response to the methodological worries that have been raised by non-linguists, which is how their work has always been presented to me. (Again, I have no idea what they actually say for themselves.)

    ReplyDelete
    Replies
    1. Hmm, I think it really would be worrying if certain types of judgment proved not to be consistent between speakers. I'm thinking particularly of judgments on structures which would not be in PLD. For example, the fact that no English speaker is ok with *[How_i did you ask [who fixed the car_i]]? is important, because we know that this regularity can't derive from any regularity in the PLD. If, on the other hand, it turned out that English speakers aren't at all consistent in their judgments on these structures, then that would undermine the case for the relevant universal constraints.

      Delete
    2. @Alex: I agree, and that is more or less what I was trying to get at with my second clarifying point. I am also not suggesting that one should not mention Sprouse and Almeida's work at all. What I was trying to suggest is that mentioning Sprouse and Almeida's work to simply assuage the worries of poor experimental methodology might not be all that helpful—counterproductive even, at least in my opinon—because it could validate the concern that the current informal data collection methodology is bad methodology.

      It's not bad methodology precisely because the I-language hypothesis about the nature of language is true. Language is internal to an individual and arises as the product of UG and PLD, so judgments from a single speaker are useful for investigating the properties of grammars and the properties of UG. This is a fundamental aspect of linguistics, and my suspicion is that the concern from non-linguists about our methodology comes from not fully appreciating this core aspect of the nature of language. Their concern really just seems to be that it is bad methodology, whereas the concern you point out—which I agree would be concerning were it true—seems to be that the hypothesis informing the methodology could have been wrong.

      So anyway, in my opinion what would be more useful (whether it comes with a mention of Sprouse and Almeida or not) is trying to explain why this really isn't bad methodology given the nature of the object of inquiry.

      Delete
    3. @Adam: I think I agree with Alex D here. My objection to Hagoort's view is not that getting reliable data is a bad idea or that I-language hypotheses by their very nature are exempt from standard practices of data collection but that many methods can lead to reliability. Linguists have been pretty lucky in that to do much of what we have done can be done using very quick and dirty methods. These are fast, cheap, and, if Sprouse and Almeida are right, quite good. At least by the standards in psychology and neuroscience they seem to be very very good. This provides more than sufficient reason for continuing to use them, understanding that they might not suffice in some cases and that more careful methods might be called for.

      Last point: the problem is not only judgments across speakers, but even from a single one. I assume that judgments that change from trial to trial even if these are of one individual, would be treated gingerly. So the problem is not one person vs many so far as I can see. Even if we are all investigating a single speaker/hearer the reliability of the acceptability data is an issue.

      Really the last point: I agree with your last statement, but I think that this is what S&A have done. They have shown that the way that linguists gather data, at least over a very large range (see Colin below), is fine. It works and we should keep doing this. Or, put another way, data so collected is prima facie accurate. This is defeasible, as all data are, but a good way to proceed.

      Delete
    4. @Norbert: I suspect we all ultimately agree, but I think there is something still floating under the surface. Let me try to be a little bit more careful by talking specifically about the data that I have in mind when making these points.

      For the purposes of this discussion, we can roughly separate linguistic data into three types: (i) the judgments are clear and speakers of the "same language" (note the scare quotes) agree on the judgments, (ii) the judgments are clear and some speakers of the "same language" disagree on the judgments, and (iii) the judgments are not clear.

      You have already said what I would have said about the first type of data in your response to Ted. You wrote "The need for more careful looks at the data used to be less frequent. Why? Because the data was pretty clear. Extraction effects from islands vs non-islands are not that hard to judge."

      Regarding data of the third type, I think more rigorous experimental methodology can be useful here. If there is some interfering factor that is making the judgment difficult then rigorous methodology can probably help triangulate what the 'correct' judgment is.

      But with data of the second type we have something that I think gets glossed over when simply rejoindering with Sprouse and Almeida. If judgments are clear and speakers of the "same language" don't agree, it's not the case that doing experimental work will lead to us determining what the 'correct' judgment is; both judgments are correct. And we have an interesting fact to explain: how did speakers of the same "language" end up with slightly different grammars. Variation work that takes this issue seriously seems to be on the rise, and I think that's great.

      However, if we took Hagoort's suggestion seriously to just do more rigorous experimental work, we might not be aware of the need to make this distinction between data of the second and third types. We could, depending on the sample, lose sight of the interesting fact that variation does happen and speakers of the same "language" sometimes have slightly different grammars; instead, our experimental methodology might have us converge on one of the sentences being 'correct', discounting the judgments of speakers who said otherwise.

      And furthermore, I think simply appealing to Sprouse and Almeida as a response to methodological concerns doesn't do enough to highlight this difference either. Though again, I haven't read the papers. Maybe this is just how they have been presented to me. Or maybe I'm at fault: it very well could be that I've misunderstood how other linguists have tried to present Sprouse and Almeida's work to me. (It's time I read the papers myself.)

      So again, I think it's counterproductive to simply rejoinder with Sprouse and Almeida to these concerns as a strategy for increasing the impact and visibility of linguistics within cognitive science. Instead, we should be trying to diffuse the concern and motivate the concepts that show that the concern is unwarranted. And it seems that motivating other conceptual distinctions might be necessary, too. Ted's comments show the need to be clear about the competence/performance distinction and the related grammaticality/acceptability distinction in addition to the I-language/E-language distinction.

      Lastly: it's interesting that Ted notes that "one needs about 7 people to agree unanimously on the judgment [... to] be reasonably confident that the contrast is real". This seems to be methodologically rigorous justification of the purportedly "bad methodology" we have been using. Ask 7 people and everybody agrees: we have data of the first type. Ask 7 people and the judgments are clear but there is disagreement: maybe we have data of type 2; maybe you want to run an experiment. Ask 7 people and the judgments are fuzzy: you should probably run an experiment.

      Delete
    5. One can use the more careful methods to look for dialect variation. One will simply get bimodal distributions. So, I am not sure that the second point isn't also covered.

      Re "So again, I think it's counterproductive to simply rejoinder with Sprouse and Almeida to these concerns as a strategy for increasing the impact and visibility of linguistics within cognitive science. Instead, we should be trying to diffuse the concern and motivate the concepts that show that the concern is unwarranted"

      I guess I see S&A making just this point. If they are right, then the data is indeed good enough for our purposes. How do we know? Well because doing it the more careful way leads to the same conclusions. This seems to me to disarm any useful criticism.

      The things that Ted pointed to we already do. Everyone asks more than 7 people and the judgments are generally ok. Always ok? No. We do have disagreements, but they are LOCAL not global. What S&A showed, IMO, is that GLOBAL skepticism concerning this method of data collection is unwarranted. Local skepticism does not undermine the enterprise. Global skepticism is intended to. What S&A demonstrate is that there is no ground for such global skepticism and so the general results linguistics has obtained are not questionable on data grounds. Of course, this or that analysis may be. But not the whole shebang. And that's what Hagoort is aiming for.

      That said, I agree that we need to keep making distinctions and setting them right about what is being done (and what is worth doing). But, S&A really does help.

      Delete
  4. Norbert: I agree with some of this, but not all. You are reliably informed that Peter Hagoort is more receptive to linguistic findings than others of his ilk. He is more inclined than others in his area to pay attention to explicit (psycho)linguistic models.

    I take his first observation not to be an assertion that Churchland is right, but a descriptive comment that the language of thought is no longer in the ascendancy. That seems descriptively accurate, regardless of whether it is desirable.

    Hagoort's second observation [and you might want to do a global search/replace to spell his name right] is about PR. I agree with you, of course, that there's lots and lots that has been stable over time. But I'm sure that you also agree that we could do a better job of making that apparent on the outside.

    I of course think that the gripes about data quality are overstated, and are an excuse for simply dismissing a field. I don't believe for a minute that the adoption of quantitative methods would make the broader world pay closer attention -- I know because we use those methods in our work all the time, and it doesn't change the fact that many would rather not worry about the phenomena that we care about. The notion that they'd pay attention if only we used careful methods is a fraud. They don't pay attention because they think that the key to understanding language lies elsewhere. Simple as that.

    However, I would urge a bit of caution in characterizing the Sprouse & Almeida findings (recently more-or-less replicated by Gibson & colleagues). There is a sampling bias in the phenomena discussed in those studies. For practical reasons, they focus on cases where acceptability judgments involve just string acceptability, and do not rely on interpretations, prosody, focus structure, etc. Some of the most contentious cases involve phenomena that are excluded from the large sample studies because they're hard to test. Testing those additional phenomena either (i) yields a mess when done using the same methods as Sprouse & Almeida (Gibson et al. see this; we've seen this ourselves), or they require more laborious methods to test them.

    ReplyDelete
    Replies
    1. I should clarify: I'm not dismissing the Sprouse & Almeida findings. Merely cautioning that one needs to be aware of what they did and did not test. For their study based on sampling from Linguistic Inquiry, only 48% of the English example contrasts met the eligibility criteria for inclusion in their study, i.e., most examples couldn't be tested. I suspect that their conclusions would be generally correct for the cases that they excluded, but that their approach wouldn't work so well for testing that (as they recognize themselves).

      Delete
    2. I do believe that we can do more to advertise our virtues. And given that these are doable and would be beneficial, we should do them. But I also think that Hagoort's (note the spelling) views explain why this will be a very hard slog and we should not expect much success (at least not quickly). The debate is an intellectual one. The mainstream GG tradition sees the central issues differently. What happened in the early days is that with the collapse of Behaviorism a window opened for a more coherent view. But behaviorism was a pretty crude version of Empiricism (complete discarding of mental states, rather than crude associationist views of mental states) and the empire struck back. There is a resurgent Empiricism that Hagoort has bought into and this is not going away because of some good ling PR. So, I agree with your second to last paragraph entirely.

      So what to do? I think that the only option is to pursue our program and show that it works. In the process, we should pick fights that clarify our goals and views and highlight the differences we have with others. This sharpening of the differences is important, I believe. We need to do all that we can to make Hagoort and those even less friendly defend what they do. They should not get a free pass.

      Delete
    3. Yes, but apropos the other recent thread about dualism and unification, "pursuing the program and showing that it works" should involve taking the unification challenge seriously. Linguists need to show that they can rise to that challenge, and do it better than the alternative. If they instead issue an indefinite IOU on that challenge, then they should not be surprised to elicit the complaint that "this doesn't fit with what we know about X".

      I don't think that rehashing old complaints about empiricism etc. takes us very far. That's not what is at issue here. Mainstream GG is not typically engaging with cognitive/algorithmic levels of analysis, and that has nothing to do with behaviorism/empiricism. Folks in Hagoort's neck of the woods are legitimate in raising that complaint. And GGers are legitimate to claim that they think that it's premature or unhelpful to venture into that domain at present. As usual, we place our bets on what will yield the most insights. Taking refuge in Chomsky-Skinner encourages an unfortunate complacency.

      Delete
    4. Taking the challenge seriously entails respecting the "bodies of doctrine" that each term of the desired unification have unearthed. One of the problems with work in cogneuro that I know about is that it does not take what we have done seriously. As part of rationalization for ignoring this work, they tend to adopt views that would discredit it in principle. After all, it's hard work debunking the details. So, you assume that the whole approach cannot be right. This is where the Empiricism comes in. It serves to justify ignoring the results. So, like it or not Colin, these issues must be engaged for they stand in the way. Better to have them out there up front than in the background tacitly taking methodological potshots under the rubric of "good science" (viz. empiricism with a small 'e').

      As for who needs to rise to the challenge, not surprisingly, I think that I am not fully on board with you here. It's EVERYBODY's challenge, not just linguistics'. To make it linguistics' SPECIAL challenge is to suggest that what we have done up till now has no value unless this missing piece is added. I don't buy this, and putting things this way does channel the suspicion that linguistics has produced nothing of interest, the position that Hagoort more or less endorses.

      What about going forward. Here is how I see things: Linguistics has provided a pretty good description of a large array of G phenomena, and a not bad (albeit hazy at times) outline of the kinds of knowledge that Gs can contain (i.e. UG). How this knowledge is put to use algorithmically is currently a hot area, as you know. We have some results in parsing, and acquisition but research is in the early stages. We don't yet have many PRINCIPLES of parsing and/or acquisition comparable to the PRINCIPLES of G that we have in linguistics. This is not to disparage this work, but to recognize that it is currently less developed. Nonetheless, this is an ongoing enterprise with increasing payoffs to investment.

      The weakest area, IMO, is the embodiment problem: how does all (any?) of this get grounded in brains. Here the work is at a pretty primitive stage. We have some indices that correlate with some kinds of language effects but it's all pretty meager stuff (IMO, and I suspect yours). There is some exciting stuff out there that may get further (especially in phonology and morphology and some very recent stuff on phrase structure), but right now the neuro part of cog-neuro of language looks to me pretty weak. The problems are hard, of course. But the results are pretty meager to date.

      So, what to do when people like Hagoort (see, I can be taught) say what they do. Well, first off to challenge them and not concede that linguistics has done nothing of value and has discovered nothing worth knowing. Too much concession encourages them to ignore us. Second, to outline areas where collaboration has been fruitful. Third, to be as open to their concerns as seems reasonable given what we know. This is where I really found Hagoort's piece a failure: given what he wrote there is little of value that I see coming out of any collaboration with him. And the saddest part is that he is linguistics' closest friend (or so I am told). That means that we are on our own. If there is complacency, its not in linguistics. I know of places that take his serious concerns seriously. I know of almost none that take ours equally so. That's the problem.

      Delete
  5. I have some comments on Hagoort's piece here:

    http://vasishth-statistics.blogspot.de/2015/02/quantitative-methods-in-linguistics.html

    ReplyDelete
  6. Two more comments:

    1. Was Norbert deliberately mis-spelling Hagoort's name, perhaps to mock him?
    2. " If Sprouse and Alemeida are right (which I assure you they are; read the papers) then there is nothing wrong with the data that GGers use."

    One should never be 100% sure of anything. There is always uncertainty and we should openly discuss the range of possibilities whenever we present a conclusion, not just argue for one position. That has been a problem in psychology, with overly strong conclusions, and that is a problem in linguistics, experimentally driven or not. But this is specially relevant for statistical inference. We can never be sure of anything.

    ReplyDelete
    Replies
    1. Thank you for sharing the link to your blog. Interesting Perspective. Being not terribly familiar with statistics myself I take your word re need for getting expertise before using. Still I think from your list the most severe issue is:

      9. Only look for evidence in favor of your theory; never publish against your own theoretical position

      If one is not willing to consider one's theory [or program] could be wrong, the most intimate knowledge of statistical methods is of little help ...

      Delete
    2. @Shravan: No Norbert was not. Note that I put up a mea culpa as addendum on the post.

      But I think that I disagree with your second point about being sure. One way of taking your point is that one should always be ready to admit that one is wrong. As a theoretical option, this is correct. BUT, I doubt very much anyone actually works in this way. Do you really leave open the option that, for example, thinking takes place in the kidneys and not the brain? Is it a live option for you that you see through the ears and see through the eyes? Is if a live option for you that gravitational attraction is stronger than electromagnetic forces over distances of 2 inches? We may be wrong about everything we have learned, but we this is a theoretical, not what in the 17th century was called a moral possibility. Moreover, there is a real down side to keeping too open a mind, which is what genuflecting to this theoretical option can engender. I find refuting flat earthers and climate science denialists a waste of intellectual time and effort. Is it logically possible that they are right? Sure. Is it morally possible? No. Need we open our minds to their possibilities? No. Should we? No. Same IMO with that GGers have found out about language. There are many details I am willing to discuss, but I believe that it is time to stop acting as if the last 60 years of results might one day go up in smoke. That's not being open minded, or it this is what being open minded requires, then so much the worse for being open minded.

      Let me say this another way: there are lots of things I expect to change over the course of the next 25 years of work in linguistics. However, there are many findings that I believe are settled effects. We will not wake up tomorrow and discover that reflexives resist binding or that all unbounded dependencies are created equal. These are not established facts, though there may be some discussion of the limits of their relevance. But they won't all go away. But this is precisely what Hagoort thinks we should do, and on one reading you are suggesting as well. Maybe we are completely wrong! Nope, we aren't. Bding open minded to this kind of global skepticism about the state of play is both wrong and debilitating.

      Last point: you are of course aware that your last sentence is a kind of paradox. Is the only thing we can be sure of is that we can never be sure of anything? Hmm. As you know better than I do, this is NOT what actually happens in statistical practice. There are all sorts of things that are held to be impossible. In any given model the hypothesis space defines the limits of the probable. What's outside has 0 probability. The real fight, always, is what is possible and what not. Only then does probably mean anything.

      Delete
    3. Norbert, your blog's comments section does not allow more than 4096 characters, so I posted my response here:

      http://vasishth-statistics.blogspot.de/2015/02/another-comment-on-hornsteins-comments.html

      Delete
    4. @Shravan
      Ahh, the price of hyperbole is literal confusion. Of course I agree with you that your claim taken literally is absurd. Good to know we agree. As you point out. The question then is what is "obvious"? And here is my point. I am more than willing to be LOCALLY skeptical about many linguistic results. As some might say "mistakes have been made." So sure, local skepticism about this or that conclusion is fine with me. However, there is a reading of Teds stuff (a reading that Hagoort I believe endorses) that moves form local skepticism to GLOBAL skepticism: all of linguistics is untrustworthy because of the data collection problem. In other words, for Hagoort, there is no there there. This I reject as absurd. And S&A's work speaks to this: it shows that this kind of global skepticism is silly even on Hagoort's own terms.

      Now, once we agree to this, the next question is whether doing data collection more carefully is worth the extra effort. It is an entirely empirical question. And for this my answer is that it depends on the question being asked. For many matters the answer is no. For some yes. Ted suggests that we can get more information out of the data if we are more careful. My question is what new questions this data addresses so that it is worth being more careful. I am willing to do this IF there is an obvious payoff and I don't consider assuaging psychologists' methodological fears to be a very good reason. Take that back: for propaganda reasons maybe we need to be more careful. But for scientific reasons it's less than clear to me that this is so. Our practice has been fine and the results discovered solid.

      Last point: Nothing I said implied that linguists are perfect or that their analyses are right. There is lots of room for data disputes. Interesingly, if I read you right, you were skeptical about Mahajan's data BEFORE you did a rating experiment. Great. So you did more careful work to show you were right. Yipee. And It seems that you even change your mind from time to time. Good for you. So do I, so do other linguists, so does everyone reasonable. But this is not what Hagoort was advocating. His opposition to the field was total, not piecemeal. It is the latter that I find ridiculous and that S&A put to bed.

      Let me end with one more observation: you are no doubt aware of the misuse of stats discussion arguing that the results in psych/biomedicine/neurosceince etc are largely bad stats. Imagine someone concludes from this that results based on stats cannot be trusted so that no paper that uses them is scientifically kosher. You (and I) would argue that this global conclusion is nuts. It really depends on the cases. When done well, the stat methods are fine. True, it can be done badly, but who cares. There is nothing wrong with the METHOD, just individual applications thereof. My point exactly. And that was why and how I used the S&A results.

      Delete
    5. This comment has been removed by the author.

      Delete
    6. Norbert, I don't think the distance between your ideas and mine are too great. I am with you (and Colin) that every single intuition-based judgement we make to build syntactic theories does not necessarily need an experiment. I mean, even Ted (or Peter) would not say that we need to do an experiment to decide that "boy the came" is unacceptable/ungrammatical in English. (Although I *am* guessing here.)

      In my own case, I started out wanting to be a syntactician, and what happened to me was that the theoretically interesting moments (when one position could be distinguished from another) invariably were about some very delicate judgements. What led me to quit was that whoever had more tenure got to decide what was OK and what was not. I think that Ted and you and Colin would both agree that in such cases, we are better off getting some objective data, although I admit that even that is hard (when we design an experiment, we often unconsciously--or not--bias it to come out in our favor). But it's better than nothing and better than brandishing around one's personal sensitivity to the construction at hand to deliver a judgement. That's how I read Peter's comments.

      Anyway, the reason I jumped into this fray was actually something different. My original point can be stated quite provocatively. Peter wrote: " Do proper experimental research (including the use of inferential statistics) according to the quality standards in the rest of cognitive science." It is not my impression that the quality standards in psycholinguistics and cognitive science are particularly high. I often ask myself why we even bother to do experiments; the results are always going to prove the experimenter right, right? It's a waste of taxpayer money. How often does an experimenter write a paper says, folks, I got this wrong in my previous paper? So I started out turning away from intuition-based linguistics, and now here I am, ready to turn away from psycholinguistics (not really).

      I think Peter's point (not taken literally) is laudable: we should do more empirically grounded work in linguistics proper, empirical work of the sort I'm talking about above (i.e., where things get tricky and the potential for bias is high).

      What I really wanted to get across is that we can do better than the standard empirically grounded disciplines have done, by making statistics a serious part of the core curriculum of linguistics proper. The reason that many experimental researchers in (psycho)linguistics dismiss statistics as beside the point (by their actions or their words) is that they don't have much contact with what's at stake when you engage in statistical inference. It's very similar to the way that linguists' work is dissed by non-linguists. The less you know, the more contemptuous you are of the unknown. Just like non-linguists just don't understand why syntacticians think they have finely tuned judgements; it takes serious training to come up with accurate judgements. It's the same with statistics (mutatis mutandis).

      Anyway, I just wanted to put this out there because I feel that people just don't acknowledge the problems in psycholx. But I'm not worried; nobody is going to die if one rating comes out one way or another. The same problems are unfolding in medicine, where the consequences are much greater.

      Delete
    7. @Shravan
      I think we do agree, more or less. We likely have differences of detail rather than differences of principle. However, I believe that you generalize from your rational beliefs to others whose ambitions are far less generous. There really are people who think that Generative Grammar has been a tremendous failure and that there is no reason to consider what has been found as relevant to anything in cogneuro. And one of the expressed reasons for this view is that GG is built on suspect data. This view cannot be allowed to stand, nor, IMO, can even suggestions of this view be allowed to stand. That's why I cam out hammer and tongue against Hagoort's (I'm not close enough to call him 'Peter') views.

      I sympathize with your experience wrt subtle judgments. This does occur. However, what often also happens is that over time the data clears up. How? By looking at similar phenomena in other languages where the judgments are much clearer (often due to morphosyntactic indicators of the relevant phenomenon). So, just as in other sciences where the first data is massaged into clarity by later refinements of the experiments, so too in linguistics, much of the time. Also, much of the time, at least where theory has been most successful, the data are pretty clear from the outset and have continued as such. Think Islands, ECP effects, binding, Crossover, control. Here the data has not been particularly fuzzy, though, of course, there are controversies at the margins. IMO, the problem with linguistics has not been a data problem but a theory problem. Much of what we have called theory is pretty low level. Indeed, I think that this is the main problem in cogneuro as a whole. We are data rich and theory poor. And all of this going and froing about the quality of the data misses the real problem: we have rather shallow understanding of the relevant cogneuro mechanisms at play. And this, to repeat, is a problem of ideas, not a problem of data.

      Stats for linguists? Sure, why not. Many of our students at UMD already do this. Stats is the kind of thing any educated person today should understand a little of, but largely not to be mislead. From the little I see, it is incredibly hard to apply this stuff well. A good amount of the time it is misapplied. And when it is dressed up in fancy notation (something stats is very good at) it becomes harder rather than easier to see how misleading the data claims might be. That said it is a fine technology and sure we should learn how to use it WHEN AND WHERE USEFUL. Recall your example above that you are sure (sort of) that Hagoort would deem unacceptable without stats back up.

      So, I think we are on the same page. I don't think we are ALL on this page. Let's hope that everyone comes to join us very soon. Then we can put this heated yet not particularly deep discussion to rest.

      Delete
  7. This comment has been removed by the author.

    ReplyDelete
  8. Hi Norbert:

    You raise many points that are worth discussing. One point that I will respond to:

    "Sadly, this is just wrong. There has been a lot of back and forth on these issues over the last five years and it is pretty clear that Gibson and Federenko’s worries are (at best) misplaced. In fact, Jon Sprouse and Diogo Almeida have eviscerated these claims (see here, here, here and here)."

    I am not sure if you have read Gibson & Fedorenko (2013) or Gibson, Piantadosi & Fedorenko (2013):

    http://tedlab.mit.edu/tedlab_website/researchpapers/Gibson_&_Fedorenko_2013_LCP.pdf
    http://tedlab.mit.edu/tedlab_website/researchpapers/Gibson_et_al_2013InPress_LCP.pdf

    (The second one summarizes the first and responds to many of Sprouse & Almeida’s claims directly.)

    Sprouse, Schutze & Almeida (2013) show that in a random sample of 146 LI judgments that approximately 90-95% of them are right. Mahowald et al. (submitted; link below) replicate this number on 100 different LI contrasts from the same set of years: about 90-95% right, with 5-10% errors, depending on how conservative one is with deciding on significance levels. Perhaps you think that this shows that not doing quantitative work is ok. On the contrary, we argue that there is a real problem, for the following reasons:

    (a) If you don't do the experiment, you never know which 90-95% are ok. This is a serious problem. This means that some fraction of the critical contrasts for your theories are wrong, but you don’t know which ones. This problem is even more severe if you don't speak the language: then you don’t even have your own intuitions about which judgements are probably right, and which ones might be questionable (or wrong).

    (b) Effect sizes: you get no effect size information without a quantitative experiment. Spouse et al. and Mahowald et al. show that the effect sizes are clearly in a continuum, from very tiny (and non-existent) to huge. The notion of grammaticality presupposes some threshold (between “grammatical” and “ungrammatical”) that probably isn't there. In real language, the effects are probabilistic. It's impossible to find this out from an armchair experiment, if one presupposes a threshold between grammatical and ungrammatical.

    (c) Relative judgements across many sentence types: without quantitative methods, you can’t compare judgments across experiments. So even if sentence a is better than sentence b, you won’t have judgment data for the comparisons of a and b relative to many other structures, without a quantitative experiment.

    (d) Interactions: there is no way to measure an interaction among multiple factors using a non-quantitative experiment.

    So, I would say that the state of the field is very different from your description (“it is pretty clear that Gibson and Federenko’s worries are (at best) misplaced”). Rather, it seems clear to me (and to many others) that quantitative experiments are clearly useful, and can move the field forward in a productive direction. Indeed, I was at a meeting not long ago attended by Jon Sprouse, Diogo Almeida, Carson Schutze and Colin Phillips (among others), and I think that they all agreed with the points above.

    cont’d in next comment: I seem to be limited to 4096 characters

    Ted Gibson

    ReplyDelete
    Replies
    1. @Ted
      First off, thx for the comments. I will read the papers and maybe will have something to say down the road. But even without doing so, I have a couple of comments.

      First, the question is not whether these are worth doing, but when and whether not doing them ipso facto invalidates the findings of GG. They are worth doing (i've even done them myself) especially when judgments were dicey. And they have indeed been used productively, IMO (see Sprouses work on islands and Jurka's on DOCs). The need for more careful looks at the data used to be less frequent. Why? Because the data was pretty clear. Extraction effects from islands vs non-islands are not that hard to judge. The effect sizes are pretty big. Even ranking effects sizes among the islands (e.g. wh-islands more acceptable than Relative clauses) is often pretty easy. That said, there are cases where this is not so clear and then using these methods is a good idea. So, I have nothing against these methods, I just don't see that failure to use them invalidates a claim. And I certainly don't see them not being used heretofore as invalidating GGs basic results. This is often the take away message (one that Hagoort, for example, seemed attracted to), and this I believe is an invalid conclusion.

      There is now a second question: going forward should we all adopt these methods? Here I am again unconvinced. Again it depends on the claims being made and the "subtlety" (a technical weasel word in linguistics) of the data being deployed. I'm very pragmatic here; you measure what needs measuring. Would I reject a paper that didn't measure the "right" way? No. Would I hold it against one that did? No. It really depends.

      That said, I completely agree that such methods have been useful and can be used to "move the field forward in a productive direction." Where I am less convinced is that they are necessary for productive work, and whether they should become the standard in linguistics. Personally, I don't think we have yet squeezed all the juice from the messier techniques.

      Last point: You say:

      "The notion of grammaticality presupposes some threshold (between “grammatical” and “ungrammatical”) that probably isn't there. In real language, the effects are probabilistic. It's impossible to find this out from an armchair experiment, if one presupposes a threshold between grammatical and ungrammatical."

      So far as I know, none of these experiments can measure grammaticality. They at best measure acceptability. Grammaticality is a function of many things beyond acceptability. So, grammaticality can be categorical without acceptability being so. Whatever one does with the new or the old techniques will still require factoring out grammaticality and this will necessary be an inferential process as grammaticality is an unobservable. We infer facts about grammaticality from judgments of acceptability, but they are different things. So, the fact that there is a continuum in judgments is well know and has been so for a long time. Form what I understand the newer methods won't change this as they are more careful measures of the very same quantity: acceptability.

      Again, thx for the feedback. I will look at the papers you sent me.

      Delete
  9. …cont’d from previous comment

    (Colin makes it clear in his comments here and elsewhere that he thinks that there are more important issues to be addressed in the field of linguistics. This may be so. But that seems orthogonal to the point that we are making. If we know how to solve this problem, then we should solve it. I agree with Colin that *just* adopting quantitative data analysis standards may not make a broader field more interested in the particular questions that one addresses (such as the questions that Colin feels he and his colleagues are addressing). But adopting more quantitative standards at least gives us all a common language to discuss whatever results we want, and it therefore makes it possible for people in other fields to pay attention to these questions. It is my hope that adopting more quantitative standards will make it clearer what the more important effects are, which more people should care about.)

    One further note on this issue:

    Colin and others have said that one reason that people may not want to use quantitative methods is that it may be too expensive: perhaps it’s hard to get a lot of native speakers for the language in question. As a result, Kyle Mahowald et al. worked out the math to see how many people one needs to be pretty sure that a difference between two constructions is a real difference. Assuming the same distribution of judgments across the 100 judgments that we sampled from Linguistic Inquiry, then the answer is about 7: one needs about 7 people to agree unanimously on the judgment (with one different version of the contrast given to each person). If you do that “mini-experiment” (which we call a “SNAP judgment”), then you can be reasonably confident that the contrast is real. If the judgments aren’t unanimous, then you probably need to do a full-blown experiment to test your claim.

    Here’s a draft of this paper:
    http://web.mit.edu/kylemaho/www/SNAP.pdf

    Note that doing these mini-experiments solves problem 1 above but not problems 2, 3 or 4: you still don’t get effect sizes, relative judgments across sentence types or interactions. So this is not ideal. But it’s a step forward.

    Ted Gibson (egibson@mit.edu)

    ReplyDelete
    Replies
    1. @Ted:
      "But adopting more quantitative standards at least gives us all a common language to discuss whatever results we want, and it therefore makes it possible for people in other fields to pay attention to these questions. It is my hope that adopting more quantitative standards will make it clearer what the more important effects are, which more people should care about."

      Color me very skeptical. What people in other fields should be paying attention to is not a presentation of the acceptability judgments but the generalizations that have been offered and the principles that have been unearthed. Like I've often said, I see no reason to believe that these are not more or less right as described. There are island effects, there are binding effects, there are ECP effects, phrase structure exists, fixed subject constraints hold, etc. I have seen nothing that invalidates these broad conclusions. Indeed, I would go further (though many others might not): were we to find that careful experimental techniques showed us that there are no e.g. ECP effects, then this would indicate that these methods are untrustworthy. It would be a little like finding out that careful experimental procedures showed that there the muller-lyer illusion is bogus. If that were to happen, we would go back and examine the techniques, not the effect. I believe that there are many results in linguistics like this.

      Moreover, these results are regularly reconfirmed experimentally. Colin and Jeff have to satisfy journals that prize such techniques. They thus run experiments to show the these judgments hold. To my knowledge, they almost always succeed. So, in addition to Jon's stuff, I get daily confirmation that the informal techniques largely pan out. My conclusion: it's not the lack of rigor in the data that is the root cause of people in other fields ignoring this work.

      I do have a question: I assume by effect sizes you mean how reliable the judgment is. I was led to believe that the methods being used are all ordinal, not cardinal. As I recall, Jon had a discussion of this in the thesis (or someplace). So, to know how A compares with B we need to compare A and B directly. Even comparing A and B both to a common C does not necessarily tell us how A and B will compare to one another. This is why 7 point scales are about as good as magnitude estimation studies (or so I've been led to believe). have I been misled?

      Delete
    2. Hi Norbert:

      1. My experience with checking the judgments of hypotheses in the field is not as positive as Colin's, but probably that is because we have explored different phenomena. I have always learned something from doing the experiment more rigorously, such as how big the effect is, and how it compares with other related effects. I agree that "informal techniques largely pan out". But as I commented above, if the judgment is one of the 5-10% that doesn’t, that’s very informative. (And without doing the experiment, one can’t know this.)

      2. “it's not the lack of rigor in the data that is the root cause of people in other fields ignoring this work.” I don’t think I have made the claim that lack of rigor in the data is the root cause of other fields ignoring work. What Ev and Steve and I had been saying in our work was that this is one area that is pretty easily fixable, and would help (a little) to connect linguistics work back to cognitive science more generally.

      If everyone did more quantitative work, we would get the following benefits:

      (a) We would find out whether a proposed effect is real or not. This is the issue that you and Sprouse and Phillips et al have been discussing, and that you doubt is of much value to the field. But there are many other benefits to doing quantitative work. See below.

      (b) We would find out how big the effects are. The field as a whole will probably want to explain the bigger effects before we explain the smaller ones, so this is valuable information.

      (c) We would find out how some effects relate to other effects.

      Part 1: maybe there is an acceptability difference between sentence type 1 and sentence type 2, but maybe both are much less acceptable than sentence types 3 and 4. Our theories need to explain this kind of result.

      Part 2: perhaps an acceptability difference between sentence types 1 and 2 is proposed to correlate with an acceptability difference between sentence types 3 and 4. E.g., long-distance-dependency formation is supposed to underly both wh-questions and relative clauses. How do we measure this proposed correlation without a quantitative experiment? I don’t think you can.

      3. “I do have a question: I assume by effect sizes you mean how reliable the judgment is.”

      No: you can have very reliable effects that are tiny. You can be really sure that they are there by getting more data (more participants and items). And you can have a huge effect which you are not confident in, because of lots of variance in the measure.

      For our purposes here, effect size is roughly how far apart two means are (ignoring the variance): http://en.wikipedia.org/wiki/Effect_size

      4. Regarding different methods for gathering data, I don’t think that any one method is better or worse overall. It depends on the question you are asking. For acceptability judgment data, I think that Likert scales (e.g., 1-5, 1-7) or magnitude estimation or forced choice are all about the same. (See e.g., some of Jon’s work, or Weskit & Fanselow (2009), referenced in Gibson & Fedorenko, 2013.) It’s hard to say one is “better” than the others. Any of these is good to get somewhere.

      Delete
    3. @Ted. I'm not sure that you directly say "root cause", but I think that it's remarks like the following that people have taken to amount to that claim, since you do talk about cause and consequence:
      "allowing papers to be published [...] that don’t adhere to high methodological standards has the undesirable effect that the results presented in these papers will be ignored by researchers with stronger methodological standards because these results are based on an invalid methodology. If the standards are strengthened, then researchers with higher methodological standards are more likely to pay attention, and the field will progress faster. Relatedly, researchers from neighbouring fields such as psychology, cognitive neuroscience, and computer science are more likely to pay attention to the claims and the results if they are based on data gathered in a rigorous way. [...] Indeed, we would argue that the use of the traditional nonquantitative methodology has in fact created a crisis in the field of syntax. Evidence of this crisis is that researchers in other fields of cognitive science no longer follow many developments in syntactic research." [Gibson & Fedorenko 2013]

      Delete
    4. ... but setting that aside, the part of Ted's comment that bears emphasizing is the first one. It's true that he and I have drawn different conclusions from our hands-on experience, and getting to the bottom of why that is would be valuable. Ted is right that our experiences are based mostly on distinct samples of phenomena. Our rating studies almost always focus on phenomena that we're preparing to test for other reasons, typically reading-time or ERP studies. And we wouldn't bother designing those studies if we weren't already confident of the phenomena. (Sometimes they're phenomena that linguists themselves have reservations about, e.g., parasitic gaps; but I wouldn't embark on the study if I wasn't confident.) Some of Ted's studies that he cites most often come from a starting point where he was skeptical of a claimed acceptability fact, and his intuitions were corroborated. Sometimes these involve cases where prosody, focus, or interpretation might matter. Meanwhile, in the Sprouse & Almeida studies, sentences whose status depends on meaning or prosody are systematically excluded, so those look at another set of phenomena again.

      Another dimension that might be relevant is how and why the materials are created. For our studies, we're generally preparing for an online study and so we go to great lengths to ensure the naturalness/plausibility of the sentences, to minimize the risk of irrelevant disruptions. That takes a lot of time, and it is undertaken by the lead researcher (typically a PhD student), sometimes with a single well-trained RA, and generally with ad nauseam critiquing of the materials by me. Perhaps overkill for an acceptability study, but valuable for our purposes. Materials used in other studies are prepared in different ways with different goals in mind. E.g., in the new study that Ted line to somewhere in this thread, the materials creation was largely farmed out to 16 undergraduate students as a course assignment. They were asked to take a base sentence and then change lexical items to create new versions. Ted comments that he always learns something from doing the experiments. I would *partly* agree. My experience is that I almost always learn a good deal from the process of materials creation, but then I typically learn little more from the data collection, because the lessons were generally learned beforehand. It sounds like Ted is saying that he learns more at the data collection stage. At that stage, I feel like I mostly learn about ways that the task can be misunderstood, but I can see how we could have different experiences at that stage.

      Delete
  10. Norbert, you said a lot in your blog. I feel it’s worth responding to at least one other comment of yours (beyond the stuff about quantitative methods, which I clearly disagree with you on).

    You say:
    “However, despite some changes in theory, GG revolutions have preserved most of the structures of the ancient regimes. … For the purposes of most of what goes on in cogneuro, … it really doesn’t matter whether you adopt the latest technology or the newest fangled concepts. You can do good work in all of these areas using GB models, LFG models, HPSG and GPSG models, Aspects models, and RG models. For many (most?) of the types of questions being posed in these domains all these models describe things in effectively the same way, make essentially the same distinctions and adopt more or less the same technology.”

    I think that the field is more complex than you describe. Whereas there may be a lot of similarities among the work done by people in different syntactic frameworks, this is not obvious to non-linguists. Indeed, it seems like there are intense debates among camps, such that it is difficult to see one unified theme coming from all of them.  Perhaps the themes are there, but they are not well described in any recent summary review articles that I am aware of.  In contrast, I understand the field as having a lot of disagreements about what the basic building blocks for grammatical structure are.

    For example, suppose I were to take introductory linguistics at MIT. Then I would learn something like minimalist syntax, using a textbook like Adger or Haegeman. But suppose I were to take a similar course at Stanford or Berkeley. I would learn some very different syntactic framework (HPSG), using Sag, Wasow & Bender’s text. If I learn syntax in Princeton, then maybe I would learn Construction Grammar, using Goldberg’s books.

    Maybe what you mean is that there are lots of surface-level phenomena that have been revealed by linguists of all persuasions. E.g., many of the English observations can be found in Huddleston & Pullum (2002):

    http://www.amazon.com/The-Cambridge-Grammar-English-Language/dp/0521431468

    But beyond this, there may not be much agreement among linguists. One big difference among frameworks is whether there exists “underlying” syntactic structure. Among constructionist approaches that do not appeal to underlying structures, there may be a lot of agreement, but I don’t think that that what’s you’re referring to.

    One other commenter suggested that this is a “public relations” (PR) problem that linguistics has. It is at least such a problem. If / when anyone writes some summary articles describing the basics, then maybe we can see if it’s just a PR problem, or something deeper.

    ReplyDelete
    Replies
    1. @Ted: the agreement goes beyond the things that one finds in a descriptive grammar. Pretty much everybody agrees that there is (i) an encoding of thematic relations, roughly who did what to whom; (ii) an encoding of grammatical relations, e.g., subject/object, topic/comment; (iii) an encoding of scope relations for operators such as interrogatives or quantifiers. And pretty much everybody agrees that these different encodings mutually constrain in some fashion, i.e., there are constraints on the relations between these encodings. The 1970s/1980s fights were over how to capture those different encodings and how to capture the constraints on the relations between them. Beyond that, there are a bunch of notational differences, differences in taste about the level of explicitness, and differences in emphasis on what types of problem folks choose to focus on. These are often dressed up as if they are fundamentally different accounts of the same phenomena, but I think that's rarely the case. But I entirely sympathize with the worry that this isn't obvious to observers.

      Re: attempts to lay out some points of general consensus. Here is a recent attempt in that vein by David Adger. (It's the most downloaded article on LingBuzz of the past 6 months, incidentally.)
      http://ling.auf.net/lingbuzz/002243
      And around 10 years ago I tried something similar in the context of a cogsci handbook.
      http://ling.umd.edu/~colin/wordpress/wp-content/uploads/2014/08/phillips2003-syntax.pdf

      Delete
    2. Yes, I do believe that there are many commonalities among the various "theories." When you plunge into the details of most of the different approaches they not only recognize the same "effects" (i.e. G generalizations) but even analyze these in more or less the same ways. In fact, they tend to be quite intertranslatable with the principles of one carrying over smoothly into those of the other. A famous case of this was RG and GB, with Burzio and baker leading raids into RG world and bringing back the generalizations stated in RG terms back into GB. I have always found the discussion of ling distance dependencies to be effectively notational variants of one another in the GB and H/GPSG debates. What I do with successive cyclic movement they do with slash categories. Where I have an island, they have an illicit slash. This does not mean that these theories are in all respects identical, but they exploit very similar techniques of analysis in many of the interesting cases. I should add, that Ivan Sag and I agreed about this (perhaps surprisingly), despite our working in very different "frameworks." More exactly, we agree on a whole bunch of common properties that we thought Gs had, though we would code them in what appear to be superficially distinct ways.

      The exception here may be construction grammar. There are some versions of this that are fully compatible with standard theory. But there have been some claims, e.g. by Goldberg, that this is a whole new understanding of things. I think she is wrong about this. But this is not the place to argue that. So, maybe I should put construction grammar to one side, perhaps. Maybe this is reason not to learn your syntax at Princeton.

      One last point: the agreement is on surface patterns. No, not really. The agreements go deeper. Many of the analyses even agree on the kinds of principles involved, the relevant locality conditions, the relevant hierarchy conditions etc. There is often some reshuffling between various components of the grammar, but when one looks at the details, this is far from clear. A witness to this was the debates about the the psychological reality of traces, that Colin mentioned above. The indisputable fact is that there is a dependency that one finds between antecedents and the predicates that thematically mark them/they are arguments of. Are traces required to make this happen? No. Why, because the very same relation can be coded in other frameworks without traces. In fact, I know of virtually no big effect that can ONLY be coded in one of the known frameworks. And believe me, we would have heard about this were one to exist. The descriptive apparatus of the various "theories" is actually too rich, each able to code what the others have discovered. This speaks to their being less different theories, than different ways of coding the same basic concepts.

      Delete
    3. Wow, thanks for the articles. I will read these. They look very informative.

      Delete
  11. I have to say, Hagoort's essay is one frustrating read. Not in a vicious, ranting way --- it is indeed very polite and free of underhanded accusations, and I'm sure it is well-intended. But the points he makes have been discussed many times and shown to be misguided, you would think that at some point that would have finally sunk in.

    Here's the points I found particularly flummoxing:

    1) The human brain doesn't differ much from animal brains on an architectural level, so it cannot support representations
    Let's apply this line of reasoning to the one computational device we understand really well: computers. An Intel Haswell does not differ much from a Pentium Pro, they're both i686 architectures, so according to Hagoort the fact that the latter does not support complex features like hardware virtualization or C6 states should imply that the former does not, either. Except that it does.

    And even if the hardware is exactly the same, why would that matter? Computation isn't about hardware, it's about software. That's why true progress in computer science is tied to the study of algorithms and data structures. If you want to work with really large lists, you don't need faster hardware, you need to move from something like linear search to binary search, and you better store that list as a prefix tree if the items in your list have signifcant structural overlap. That's what makes it possible to deal with harder problems, not throwing in a few more transistors.

    2) A general unawareness of Marr's levels
    This is basically just a follow-up to the previous point, but you would think that neuroscientists are aware of Marr's three levels of description and what they entail on an empistemic level. There is no need for representations to have a direct analog on the algorithmic or hardware level. In general, the mapping is one-to-many: a set can be instantiated as a list, a hash table, or a tree, for instance, and how that is translated into hardware also differs significantly between computers depending on their architecture (x86, ARM, RISC, ia64, and so on).

    Computer scientists completely abstract away from the actual hardware, and for good reasons:

    a) the hardware level is unnecessarily complex for the questions being studied,
    b) since hardware can differ a lot, important generalizations can only be formulated at higher levels of abstraction
    c) the hardware level is very uninformative --- you won't figure out how a program works by measuring the electric potentials in your computer (even tasks that are very close to the hardware like reverse-engineering a driver don't work that way).

    ReplyDelete
    Replies
    1. [cont]
      3) Usage of corpora
      Useful corpora of a decent size exist only for a handful of languages. And you would need a giant corpus to study infrequent phenomena like the Person Case Constraint, where you can only use sentences with two clitics in a specific order and you have 512 possible combinations to check. I'm also sure that current corpora offer little insight on split ergativity, or resolved agreement in Bantu languages. So should linguists stop working on these phenomena because there are no corpora? If not, why is it okay to use methods that aren't corpus-based for, say, Bantu languages, but not the much better understood Indo-European languages like English, French, and German?

      4) The "me first" attitude
      The entire post is written like [facetious]a guide to how linguists can better serve their neuroscience masters[/facetious]. But what is the incentive for linguists to work on these things, beyond the institutional advantages of money and influence? What research problem that a single generative linguist might be working on can profit from collaboration with neuroscientists? The majority of linguists work on specific languages or phenomena, they are not pooling all their resources into ambitious big picture projects. Neuroscience is important for the latter, but it has much less to say about the former at its current state of development. And depending on how strictly you interpret Marr's levels, it may never have much to say about these specific issues.

      5) Linguists should provide language-specific information rather than top-heavy theory
      In other words, linguists should be cataloging facts like butterfly collections, rather than think about grammar. The one thing I tell a layman when they ask me about linguistics is that it is about language, not languages. I am sure that is the official position across all generative traditions, and it has been like this since the 50s. But it seems that 60 years later this position is still considered useless at best in other fields. Good to know where we stand.

      Delete
  12. I think it's worth emphasizing the point that we shouldn't always assume that a divergence between the results obtained via (i) formal experiments with undergraduate subjects and (ii) informal experiments with linguist subjects indicates that the former are more reliable than the latter. Some entirely incontrovertible judgments are not obtainable using method (i). For example, no-one seriously doubts that there could be no verb KILL such that "John KILLed Mary" means "John and Mary both got killed", but you'd have a hard time getting that judgment out of linguistically naive subjects.

    ReplyDelete
    Replies
    1. @Alex: "For example, no-one seriously doubts that there could be no verb KILL such that "John KILLed Mary" means "John and Mary both got killed", but you'd have a hard time getting that judgment out of linguistically naive subjects."

      This is not an acceptability judgment comparing one sentence type to another. This is a theoretical inference that one might make, given knowledge of how verbs work within and across languages. Proponents of quantitative research are not asking experimental participants to interpret evidence, only to provide the evidence.

      Delete
    2. This comment has been removed by the author.

      Delete
    3. @Ted. It is a judgment, not a theoretical inference. Whether or not we classify it an “acceptability judgment”, it is in any case the kind of judgment that linguists often make use of and collect informally.

      What you are pointing out is that there might be other (much more laborious) means of reaching the same conclusion using other data. But certainly English speakers have the judgment that there could be no such verb in English. (I personally have no knowledge of how verbs work within or across languages, but I still have the judgment.)

      Delete
    4. @Alex. You may call it a judgment, as there is no formal theory behind it, but it is not the kind of thing that we are building our theories to try to describe, just like we are not interested in trying to describe the intuitions people used to have about which structure was the right one for a particular sentence.
      Note, furthermore, that it is a funny kind of intuition, because you will certainly agree that it can be wrong. (What if, heaven forbid, we find such a verb?) In this, it differs from acceptability judgments (and the other judgments we are trying to explain with our theories).
      While there may be some interesting theory governing judgments of this sort, I see no reason to believe that it has anything to do with the theories governing acceptability judgments.

      Delete
    5. @Greg. It seems to me that it is the kind of thing that we are building our theories to describe. Surely we would not want a theory of argument structure to predict that such a verb should exist. In fact, as you probably know, there was quite a significant literature on "impossible verbs" in the 70s in relation to lexical decomposition hypotheses. The judgment at issue is really no different in kind to, e.g. the judgment that [sklplflt] is not a possible word in English.

      I agree of course that the intuition could turn out to be wrong, but that is just a reason to treat it with some appropriate degree of caution, not to ignore it altogether. (And in fact these sorts of intuitions regarding possible and impossible verbs seem to be fairly reliable.)

      >I see no reason to believe that it has anything to do with the theories governing acceptability judgments.

      I find this a very odd statement. Surely we would expect the judgment to be accounted for by some theory of argument structure (or the like) which would also account for lots of instances of what we would uncontroversially term acceptability judgments.

      Delete
    6. @Alex. Your judgment that there is this impossible verb is not something any theory should account for. (I claim.) We agree however that a theory maybe should account for the apparent fact that such a verb is impossible.

      Delete
    7. Yes, I see that, but you haven't really given any reason for dismissing judgments of that sort except that they might be wrong. From my point of view that concern applies to any acceptability judgment, since acceptability is never a 100% reliable indicator of grammaticality. Is the claim that we somehow know in advance that impossible verb judgments ought to be ignored? Or that we've discovered by trial and error that paying attention to them is not theoretically fruitful?

      Delete
    8. I'm thinking I must be misunderstanding you somehow.

      We are trying to develop a theory which explains people's linguistic behaviour (ceteris paribus). Surprisingly, working with acceptability judgments seems to help us do this, despite the fact that we have a poor linking theory between competence theory and acceptability. (Clark and Lappin are doing interesting things in this regard.) A less surprising sort of judgment is semantic in nature, as part of the behaviour which seems at the core of linguistic behaviour is that expressions mean things (whatever that means).

      I am claiming that there is no reason to think that trying to model people's intuitions about whether languages exist which have lexicalized particular meanings is going to be helpful in this regard.

      Now, maybe this is not what you meant. Looking back, I see that you put this differently; you said that you have a judgement about whether X is a possible word of English, and likened this to judgments of phonotactic well-formedness. If it turned out that such judgments were robust like phonotactic judgments, it would be a reasonable thing to do to try and model them. And if it turned out that the same model that we used to model acceptability judgments were useful here, that would be great.

      Delete
    9. Aha! Yes, I was only making the non-insane claim that the intuitions of an English speaker regarding possible and impossible English verbs might be worth paying attention to. (It seems we do not fundamentally disagree on that point.) I didn't make that explicit in my first post, however, so sorry for the confusion.

      Delete
    10. This comment has been removed by the author.

      Delete
  13. Norbert, I can see where you're coming from in this post, and I'm a regular user of both generative grammars and intuition data myself. But a brief comment on the rhetoric: I don't know if your intention is to galvanize linguists who already adopt the theoretical concepts and methods you're defending, or to persuade others who are skeptical of their value. If it's the former, ok. If the latter, then describing controlled experiments and statistical analysis as "obsessive" and "anal retentive" is extremely counter-productive. Even if you're right that Sprouse & Almeida showed once and for all that this is rarely necessary – a claim that could certainly be debated by thoughtful people on both sides – talking this way is a guaranteed way to ensure that your opinion will be dismissed out-of-hand by people who come from disciplines where the design, conduct, and statistical analysis of controlled experiments is considered one of the most basic aspects of the scientific method.

    ReplyDelete
    Replies
    1. @Dan

      " Even if you're right that Sprouse & Almeida showed once and for all that this is rarely necessary – a claim that could certainly be debated by thoughtful people on both sides – talking this way is a guaranteed way to ensure that your opinion will be dismissed out-of-hand by people who come from disciplines where the design, conduct, and statistical analysis of controlled experiments is considered one of the most basic aspects of the scientific method."

      I doubt that it is the harsh rhetoric that will put them off, but you may be right. S&A's value lies in showing that a certain kind of global skepticism about linguistic data is misplaced and ill-founded. The stats inclines will reject this on hygienic grounds. Psychologists are trained to understand data as stat massaged experiments. They will reflexively be drawn to the Hagoort point without thinking. The only way to get someone like this into thinking differently is to shock their sensibilities. Or that's what I think. So, the way I put things may make people angry, but it will likely get them to pay attention. Politeness and concessiveness is guaranteed to lead nowhere. So, who am I addressing? Well given that I doubt many psycho types read FoL my main audience is linguists and some psycho-ling types. But if some errant psychologist or neuro type does read this, then I hope that what I say will shock their sensibilities. Not because what I say is wrong, but because the way I say it will grab their attention and make them want to fight back. Once engaged we have a chance of airing the issues. Until engaged, the predominant culture prevails, even if there is no intellectual justification for it. Hope this helps.

      Delete
    2. "The only way to get someone like this into thinking differently is to shock their sensibilities."
      If it were me I'd ask the Dr. Phil question: How has this been working for you so far? [e.g. how many psychologists or neuro types have you shocked into paying attention to you?] If you do not like the status quo [and judging by your massive effort creating and maintaining this blog you don't] then maybe, just maybe, it might be worth your while trying some less off-putting approach. But, hey, no need to listen to a type-writing monkey :)

      Delete
    3. @Norbert I thought that might be your strategy. Like Cristina, I fear that it may be having the opposite effect of what you intend: reinforcing the perception that linguists are methodologically backward. Even supposing that you're right about the 'correct' methodology (and supposing that that description makes sense), I think you're losing the rhetorical battle when you approach it this way.

      Delete
    4. @Dan
      You may be right. However, it's not like the other concessive strategy has made much headway. One way of thinking of my strategy is that I define a pole that more reasonable arguers can distance themselves from. So in my own immodest way I help the reasonable gain a hearing and make a case. Look, I'm no Norbert, he goes too far, but….

      Let me a bit more serious for a moment. There is no reasonable opposition to the question that S&A address: to wit, that wholesale skepticism about linguistics is reasonable based due to flawed methods of data collection. This GLOBAL skepticism is dumb and cannot be argued against, only ridiculed. The reasonable position, moreover, is not contentious. Sure there are more and less careful methods of data collection and you use the more careful ones when they are useful to use. Ok, with that pablum out of the way, the argument proceeds case by case. This is a truism, not an insight.

      However, there is a push by certain quarters of the cogneuro world to think that their standards of investigation are the gold standard and anything not conforming to it is crap. I believe that this has roots in two related conceptions. First, it is based on the the idea that the reason current work in cog-neuro is so insightful is that we just don't have the right data. IMO, this is exactly backwards: what we don't have are the right questions because we have such poor theories. Cog-neuro is data rich and conceptually poor. However, what these guys have learned how to do is run experiments. Many of these are pointless, IMO but they are always elegantly crafted. This is what they want to export to linguistics. We can become just as intellectually barren as they are. CAVEAT: this does not apply to everything in cog-neuro, but it also does not apply to nothing, indeed, quite a bit. So, I think that the obsession with data collection (and yes I think it is an obsession) is badly misplaced.

      Furthermore, the roots of the obsession are in a very bad idea of how science works (there is not method, no standards no inviolable methodological principles…). I happen to believe that this vision stems from a certain philosophical conception (Empiricism, as if I had to tell you) and that this is a very bad way to think of things. It is also mainly there for bullying purposes. I don't know about you, but often when I am told about the right way to collect data there is a kind of condescending sneer to the tone of the instruction. And I don't like it. But more importantly, it is just wrong-headed. However, the pious tone arises from what I consider to be a deep misunderstanding. And here is what I fear: being concessive and reasonable means buying into this general picture and that doing this already gives the game away.

      So, yes I am shrill and yes I am pugnacious, at lest about certain issues. However, given that the other approach has not, form what I can see, been particularly successful, then it is not counter-productive. Moreover, it is based on a set of different tastes as regards the scientific enterprise. Disputes about taste are the only ones really wroth having, IMO. But they are seldom gentle. That's at lead how I see things. But thx for the comment.

      Delete
  14. It is unfortunate that most of the discussion here has focused on the reliability of acceptability judgments. We can't know for sure, but my hunch it that cognitive neuroscientists wouldn't suddenly express deep interest in minimalist syntax if all of the judgments in syntax papers had "p < 0.05" next to them. In my opinion the biggest gulf between theoretical linguists and cognitive scientists (or computer scientists) is the evaluation metric that's used to decide between competing representations. Some decisions seem to be based on aesthetic principles (e.g. dislike of traces, dislike of functional heads, preference for binary branching), rather than empirical arguments; it's often unclear whether two theories even make any different testable predictions for the kind of data the cognitive scientists or computer scientists care about. People would get interested if grammars with Larsonian VP-shells predicted reading times better than grammars without them, or improved parser accuracy, or accounted for some commonly accepted and quantifiable set of syntactic judgments. Even when some predictions can be wrested out of the theories and tested, it's unclear whether the results of those tests ever feed back into syntactic theory. John Hale's work and the papers that Colin mentioned are great examples of attempts to derive empirical predictions from representational theories, but they're the exception rather than the norm. As Tim pointed out there are questions about the linking function between the linguists' representations and the empirical data; given the complexity of contemporary syntactic theories, the only realistic way to get scientists outside the field interested is if linguists did the work to try to solve these problems and show them that the representations they care about are useful.

    ReplyDelete
    Replies
    1. @Dan:
      I largely agree with your conclusions. Let me elaborate a touch.

      " it's often unclear whether two theories even make any different testable predictions for the kind of data the cognitive scientists or computer scientists care about."

      Correct. I've said this repeatedly. Take ANY of the standard GG models and over a very large range (in fact, over almost all phenomena opt interest to cog-neuro types, any model will serve.

      "People would get interested if grammars with Larsonian VP-shells predicted reading times better than grammars without them, or improved parser accuracy, or accounted for some commonly accepted and quantifiable set of syntactic judgments."

      Right. The problem of getting linguistic ideas to give you use-of-gramar data is that there linking hypotheses are so weak. We used to have the DTC, but it was concluded (perhaps very hastily as discussions by Colin and Alec Marantz show (they argue that the arguments against the DTC were pretty bad and that this is really the central assumption still)). But after that was disposed of, the conclusion was that linking grammatical knowledge to manifestations of use of that knowledge is impossible. This strikes me as entirely too pessimistic. At least in child acquisition and processing there are some models exploiting grammatical knowledge of pretty much the GG variety that are making interesting headway (Colin and Jeff do this all the time in my own dept). Things are rougher in neuro, but even here some recent work is intriguing. I've discussed Dehaen's experiments on Merge, but there is also some new stuff by Poeppel that tries to find the right linking theories. What is confusing to me is why people think that it is linguist's responsibility to provide the lignin theory. Not that I wouldn't like to provide this were I able, but I have lots of evidence for ling theory independent of being able to provide a linking theory to OTHER kinds of data. I thought that this is what psycho types should be working on: how does memory, attention etc interact with linguistic knowledge to produce the behavioral outputs we see in real time.

      "Even when some predictions can be wrested out of the theories and tested, it's unclear whether the results of those tests ever feed back into syntactic theory"

      Yes: the reason is the weak linking theories. They are always, IMO, less well motivated than the syntax they embed. This noted, I think that the stuff by Hunter, Pietroski, Lidz, and Halberda has had an impact on some thinking in semantics, and it may have more in the future.

      As a political matter, I agree that the best thing to do is to solve their problems, or show that there are problems that are interesting to solve that use Gs of the GG type. Again, I am sanguine here as this is already being done in dribs and drabs. The problem is that psycho types don't join in because they really don't understand what a linking theory is nor do they know much about language. Those that do, do good work. But this is a small number of people. Why don't they use GG insights? Partially because it takes work to know this stuff, partially because, IMO, they are in the thrall of a perverse phil of science and mind. How to change this? Well I hope by doing good work that combines them will make a difference, but it is hard to appreciate this work if you know nothing and have nutty general views. And the latter I don't know how to change, though speaking nicely to them doesn't seem to work.

      Delete
    2. I am less convinced than you are that people are avoiding certain representations for deep philosophical reasons; I don't think connectionism is getting as much traction as it did 25 years ago, if that's what you're referring to.

      I don't know if it's the responsibility of linguists or psychologists to come up with better linking functions between representations and behavioral data. There's a continuum between those two disciplines; experimental psychologists who work on language and four other higher cognition domains are probably not going to be in a position to do it, and the same goes for linguists who spend most of their time conducting fieldwork in the Amazon. People in adjacent points on this continuum need to be talking to each other, and the linking hypotheses will gradually emerge. There has been some progress in this area in the last decade, in particular with respect to surprisal in sentence processing. Most of the evaluation has been on reading times, though there are some neuro examples as well, e.g. from Asaf Bachrach's dissertation (2008) or Jon Brennan's work (http://www.ncbi.nlm.nih.gov/pubmed/20472279) in fMRI, or this recent ERP paper by Stefan Frank, who I believe works in Hagoort's group: http://www.stefanfrank.info/pubs/BL2015.pdf

      For this work to be possible, there need to be computational implementations of the grammars that are being evaluated. Perhaps theoretical syntax papers should include a computational implementation of a grammar that includes the paper's contribution; at any rate, I don't know if it's realistic to expect a psychologist to delve into the details of a syntactic theory, and potentially make decisions that were left unspecified in the theoretical paper, to be able to use its results. Sociologically, we will never reach this point unless this kind of work is valued, emphasized in linguists' training and used to inform the development of syntactic theory.

      Delete
    3. Norbert wrote: What is confusing to me is why people think that it is linguist's responsibility to provide the [linking] theory. Not that I wouldn't like to provide this were I able, but I have lots of evidence for ling theory independent of being able to provide a linking theory to OTHER kinds of data. I thought that this is what psycho types should be working on: how does memory, attention etc interact with linguistic knowledge to produce the behavioral outputs we see in real time.

      I certainly agree that syntacticians do not have a "responsibility" to provide a linking theory that links their favourite theories to non-acceptability-judgement behaviour in ways that make accurate predictions. That is, those favourite theories are perfectly well-supported scientific theories purely by virtue of matching the acceptability-judgement behaviour. But I think it would be helpful if there were more discussion of the kinds of linking theories that could be combined with syntactic theories in order to make non-acceptability-judgement predictions. Note: I'm not saying empirically accurate predictions, just predictions at all. In caricature, I guess we could say roughly that it remains the job of the "psycho types" to work out which syntax+linking theory's predictions are empirically borne out, but a precursor to that is having some options on the table, and getting some options on the table seems like a very natural thing for syntacticians to be working on (possibly in collaboration with others). (It's fun and speculative and abstract and pre-empirical, in all the ways that people like me enjoy.)

      This task of getting some options on the table seems to generally run into a sort of a brick wall because there's a tendency for outsiders to latch onto linking theories along the lines of "derivational time equals real time". For better or for worse, I think the only way we're going to get around that is for some syntax-friendly people to take the lead in getting some alternatives on the table. This is a practical matter, and acknowledging this practical matter does not entail retreating from the position that syntactic theories are perfectly well-supported scientific theories purely by virtue of matching the acceptability-judgement behaviour. And accordingly, pairing your favourite theory with a linking hypothesis, and finding that the predictions of this conjunction are not confirmed by a certain kind of non-acceptability-judgement behaviour, does not invalidate the support that your favourite theory had from acceptability-judgement behaviour. (Perhaps part of the problem is that this last point is not widely appreciated -- there's a tendency to think of these tests with linking theories as "the only real tests" -- which makes syntacticians feel like working with linking theories is going out on an extremely shaky limb?)

      Delete
    4. Just to clarify, in most of the many papers that used grammar-based surprisal as a predictor of reading times, the grammar was a PCFG that wouldn't correspond to any linguist's idea of natural language syntax - again, not because of any deep theoretical commitment, just because most of them have used a piece of software that didn't implement any of the fancier syntactic devices that modern theories have. It's not that clear who's responsible, but my guess is that a piece of software existed that was as easy to plug in as the PCFG parser they used and implemented VP-shells etc, some of those papers would have used it.

      And another small comment - the Brennan paper I mentioned did not use surprisal as the linking hypothesis but node count, which I believe is less well-supported than surprisal.

      Delete
    5. I agree with Tim that the question here is not which kind of data is more important to account for - acceptability judgments are interesting to linguists and are worth explaining, if you're linguist. The question we're debating is how to interface better with people in other fields who are interested in other types of data, such as reading times, speech recognition word error rates, etc.

      Delete
    6. Surely we want to define ourselves based on our questions rather than based on our favorite kind of data? If we have "acceptability judgment people" and "eye movement folks" and "brain wave types", then we're in bad shape. We want to understand the representations and computations, and surely should just pursue whatever it takes to understand them (and at whatever grain of analysis proves fruitful). There's a widespread notion, peculiar to syntax, I think, that once you move beyond a specific level of analysis and a specific type of data you stop being a syntactician and become a "psycho type". That's unfortunate.

      Delete
    7. I do think there are computations that rely on the output of the grammar but are not themselves part of what we consider to be the grammar -- for example, eye movement control in reading. These computations are not normally of interest to syntacticians (and possibly shouldn't be). People who study eye movements may still want to use the result of the syntacticians' work to predict where regressions will occur, though they don't work on the same representations and computations as the syntacticians.

      Even if you're interested in linguistic representations ("competence"), there's always the question of whether acceptability judgments and online measures tap into the same representations. I'm don't know if we understand the linking function between the grammar and people's acceptability judgments better or worse than we understand the linking function for reading times, but sociologically there definitely seem to be people who privilege one type of data over the other (I guess in Colin's terms it means that we're in bad shape). Showing that grammars derived from acceptability judgments do a good job of predicting reading times might help.

      Delete
    8. Colin wrote: Surely we want to define ourselves based on our questions rather than based on our favorite kind of data?

      Yes I agree with that. My phrasing above was probably misleading; by "the job of the psycho types" all I meant was something like "the thing we hope to gain by deploying psycholinguistic methods". (I was carrying over Norbert's phrase, and I would suspect that he probably meant it in roughly the same way, but he should speak for himself.) My main point was that I suspect it would be good if people who are in the habit of constructing theories on the basis of acceptability-judgement data played more of a part in formulating candidate linking theories that could be used to connect things up to other kinds of data. So I certainly don't want to suggest that people should pick only one kind of data and focus only on that.

      Put differently: whether or not one is involved in collecting many different kinds of data (for whatever reason), one can still think about predictions concerning many different kinds of data.

      Delete
    9. Just to add to the discussion of linking hypotheses: there's the contention sometimes made within type logical grammar that parsing difficulty is correlated with the number of unresolved axiom links in a proof net (after each word). See for example here: http://tinyurl.com/nahzo3n

      Delete
    10. @Colin:
      Tim took (put) the right words into my mouth. I think we are all looking at the same neuro-cog object. What syntacticians and phonologists and semanticists aim to do is provide a recursive characterization of the range of possible structures. I believe that doing this will help in explaining how we navigate this domain of the possible in getting to our actual Gs and the structure of the actual sentence being uttered/produced. I even believe that as a good first hypothesis we should assume a large measure of transparency between the recursive definition the formalists supply and the algorithms deployed in real time. I personally think that the Gs used in real time are effectively the same as the Gs used to characterize the l-language. However, as you know, this is an extra assumption. It may well be that the G used is a covering grammar of G, or that G plus a bunch of non G processes are used (think Bever) or…These issues require ancillary assumptions beyond those required to triangulate on adequate recursive specifications and as of now the evidence for these ancillary assumptions is, IMO, much weaker than the evidence for the syntactic/phono/sem hypotheses. I very much hope that this will one day change. But right now, that's where we are.

      As you also know, I think that real progress has been made in some "conjunctive" areas (i.e. Ling + ?). , e.g. processing and acquisition. But even here, the progress is at the level of specific proposals for specific phenomena. We have few overarching principles to guide the work. More like Aspects style work than GB or later research. In other areas (e.g. neuro, production) things are far less well developed.

      What's this imply? That right now we should be tenacious in holding onto the linguistics we have for there accounts are well grounded empirically, are theory rich and there is a plausible route from here to what we want in the conjunctive areas. As I see it, the main problem is that people outside the linguistics "core" seem ready to throw all this out without seriously having tried to combine it with linking principles. When this has been tried (e.g. by you, Jeff, Lina, Alec, and your students) it has generated interesting and plausible stories. We need more of that and less dumping what we have good evidence for in favor of future riches we can't even begin to describe.

      Delete
  15. This comment has been removed by the author.

    ReplyDelete
  16. I wanted to raise a related but slightly different issue. Hagoort says that representations in the rest of cognitive science involve 'high dimensional geometric manifolds' rather than propositional representations. Norbert responded that the 'linguaform' nature of the syntactic representations is crucial. But I think these two views are compatible. What is a syntactic representation? As an output of the computational system it is, in fact, something that is not a million miles from a 'high dimensional geometric manifold'. Take a structure for a wh-adjunct question like 'how did he dance?'. A Merge style analysis of this has the initial (pair)-Merge position of 'how' in a different dimension from the VP, with reentrant structures (essentially curves in the structure) linking the Merge position of the adjunct with C, C with T, TP with vP, etc. There's possibly extra dimensionality, perhaps temporal, given by the cyclic nature of the object. What the representation is, as a configuration of basic units, is distinct from how it is interpreted, which is where, I think, Hagoort goes wrong. Clearly the structure is interpreted as a propositional like thing, but that's because it is interpreted as (input to) instructions to a language external system used for thinking, memory, panning etc (and everyone, I believe thinks we need propositionality for those). Similarly, it can be interpreted as (an input to) instructions to wave articulatory organs around. Hagoort is confusing the propositional meaning of syntactic structures with their form. Not that it'll probably help our case to say 'hey, we have high dimensional geometric manifolds computable too.'!

    ReplyDelete
    Replies
    1. @David, in mainstream linguistics the syntactic representations are discrete combinatorial objects. I think these are radically different from the sorts of "vector space" representations that Hagoort is referring to. The term "dimension" has a specific technical meaning that can't be applied to trees. It is possible to map trees into vector spaces using various techniques but they have a different topology -- we can't take the average of two trees, whereas the high dimensional manifolds are smooth.

      One can consider trees where the labels aren't discrete categories but are elements of a vector space, as many people have considered over the years, most recently Socher/Manning/Ng, but that is something quite different, and I would have thought heretical to Chomskyans...

      Delete
  17. @Alex, I guess the point I was trying to make is that the propositional nature of these objects isn't an inherent part of the object, it's rather an interpretation of the object. Hagoort's objection really seems to be that `linguaform' objects are propositional while cogsci should be using more maplike objects. But, as I pointed out, syntactic representations are not propositional qua configurations, only (possibly) qua interpretations. You're right about the `averaging trees vs spaces' issue, though - there doesn't seem to be any reason to take syntactic representations to be non-discrete.

    ReplyDelete
  18. So there certainly are people out there who reject the idea that we should have discrete combinatorial representations of syntactic structure at all. The idea is that one can make do with, in essence, a point in a high dimensional space that corresponds to the activations of an array of abstract "neural" units. So I don't buy that story (yet), because I don't see how you can do what you need syntactic structure to do using such a representation. But models based on these techniques can do quite interesting things, (speech processing, machine translation etc. ) and they can be learned automatically, and in the non-Gallistel part of neuroscience are seen as more compatible with what we know of computational neuroscience. I didn't read the target article closely enough to know whether Hagoort is that radical, but with the increased success of deep learning in the last 10 years, these views are certainly gaining traction.

    ReplyDelete
  19. This is nice blog. The information you provide is really good. Want to see Sociology and Linguistics in Cognitive Science

    ReplyDelete