Tuesday, May 12, 2015

Darwin's problem; some reasonable

Karthik Durvasula has pointed me to a thoughtful blog post by the Confused Academic (CA) critical of using Darwin’s Problem (DP) kind of considerations as a criterion in the evaluation of linguistic proposals (see here).[1] The main thrust of the (to repeat, very reasonable) points made is that we know very little about how evolutionary considerations apply to cognitive phenomena in general and linguistic phenomena in particular and, as such, we should not expect too much from DP kinds of considerations. Indeed, as I noted here, there are several problems with giving a detailed account how FL evolved. Let me remind you of some of the more serious issues, as CA adverts to similar ones.

The most obvious is the remove between cognitive powers and genetic ones. In particular, for an account of the evolution of FL we need a story of how minds are incarnated in brains and how brains are coded in genes. Why? Because evolution rejiggers genomes, which in turn grow brains, which in turn secrete cognition.  Sadly, every link of this chain is weak, most particularly in the domain of language (though the rest of cognition is not in much better shape as Lewontin’s famous piece (noted by CA) emphasizes). We really don’t know much about the genetic bases of brain development, nor do we know much about how brains realize FL. So, though we do know a fair bit about the cognitive structure of FL, we don’t have any really good linking hypotheses taking us from this to the brain and the genome, which are the structures that evolution manipulates to work its magic. In other words, to really explain how FL evolved (at least in detail) we need to account for how the brains structures that embody FL rose via alterations in our ancestors’ genomes (broadly construed to include epigenetic factors), and right now, though we have decent cognitive descriptions of FL, we have no good way of linking these up to brains and genes.

Second, if Lewontin is right (and I for one found his discussion entirely persuasive) then the prospects of giving standard evo accounts of how FL evolved will be largely nugatory due to the virtual impossibility of finding the relevant evidence, e.g. there really exist no “fossil” records to exploit and there is really nothing like our linguistic facility evident in any of our primate “cousins.” This makes constructing a standard evolutionary account empirically very dicey. 

In sum, things do not look good, so there arises the very reasonable question of what good DP thinking is for the practicing linguist. That’s the question CA asks. Here’s what I think (acknowledging that the problems CA notes are serious): Despite these problems, I still think that DP thinking is useful. Let me say why.

As I’ve noted (I suspect too many times) before (e.g. here) there is a tension between Plato’s Problem (PP) and DP. The former encourages packing as much as possible into UG to make the acquisition problem easier while the latter eschews this strategy to make the evolvability problem more tractable. Put another way: the more linguistic knowledge is given rather than acquired[2], the easier it is to account for why acquisition is as easy and effortless as it seems to be. However, the more there is packed into FL/UG the more challenging it is to explain how this knowledge could have evolved. This is the tension, and here is why I like DP: this tension is a creative one. These two problems together generate a very interesting theoretical problem: how to have one’s DP cake and PP’s too (i.e. how to allow for a solution to both PP and DP). I have suggested several strategies about how this might be accomplished that leads, IMO, to an interesting research program (e.g. here). My claim is that if you find this program attractive, then you need to take DP semi-seriously. What do I mean by “semi” here? Well, nobody expects to explain how FL actually evolved given how little we know about the relevant bridging assumptions (see above), but by thinking of the problem in tandem with PP we know the kinds of things we need to do (e.g. effectively eliminate the G internal modularity of characteristic of GB style theories and show that all the apparently different linguistic dependencies found outlined in the separate GB modules are effectively one and the same). That’s the first necessary step needed to reconcile DP and PP.[3]

The second, also motivated by DP, is yet more interesting (and challenging): to try and factor out those G operations, principles and primitives that are not linguistically specific. Thus, given DP we want not only a simpler (more elegant, more beautiful yadda, yadda, yadda) theory, but a particular kind of simpler etc. theory. We want one with as little linguistic specificity as possible. Maybe an example would help here.

The cyclic nature of derivations has long been a staple of GG theory. Ok, how to explain this? Earlier GG theory simply stipulated it: rules apply cyclically. The Minimalist Program (MP) has tried to offer slightly deeper motivation. There are two prominent accounts in the literature: the cycle as expression of the Extension Condition (EC) and the cycle as the expression of feature checking (Featural Cyclicity (FC)). FC is the idea that “bad” (unvalued or un-interpretable) features must be discharged quickly (e.g. in the phase or phrase that introduces them). EC says that derivations are monotonic (i.e. constituents that are inputs to a G operation must be constituents in the output of the operation).

There are various empirical linguistic-theory internal reasons that have been offered for preferring one or the other of these ideas. Both apply over a pretty general domain of cases and, IMO, it is hard to argue that either is in any relevant sense “simpler” than the other. However, IMO, the world would be a better place minimalistically were the EC the right way to conceptually ground the cycle. Why? Because it has the right generic feel to it because it looks like a very generic property of computations. In other words, IMO, it would be natural to find that cognitive rules systems in general are monotonic (information preserving) so that to find this holding of Gs would not be a surprise. FC, on the other hand, strikes me as relying on quite linguistically special assumptions about the toxicity of certain linguistic features and how quickly they need to be neutralized (some examples of very linguistically special properties: only probes contain toxic features, only phase heads have toxic features, toxic features must be very quickly eliminated, Gs contain toxic features at all). Personally, I find it hard to see FC and its special conception of features generalizing to other cognitive domains of third factor considerations. Of course I could be wrong (indeed given my track record, I am likely wrong!). What’s nice about a DP perspective is that it encourages us to try and make this kind of evaluation (i.e. ask how generic/specific a proposed operation/principle is) and, if my reasoning is on the right track, it suggests that we should try to maintain EC as a core feature precisely because it is plausibly not linguistic specific (i.e. not tied to special properties of human Gs).[4]

You could rightly object that such considerations are hardly dispositive, and I would agree. But if the above form of reasoning is even moderately useful, it suggests that DP considerations can have some theoretical utility. So, DP points to certain kinds of accounts and encourages one to develop theories with a certain look. It encourages the unification of GB modules and the elimination of linguistically specific features of Gs. It does this by highlighting the tension between DP and PP. One might construe all of this in simplicity terms, but from where I sit, DP encourages a very specific kind of simple, elegant theory (i.e. it favors some simple theories over others) and for this alone it earns its theoretical keep.[5]

CA has one other objection to DP that I would like to briefly touch on: DP encourages reductionism and reductionism is not a great methodological stance. I have two comments.

First, I am not personally against reduction if you can get it. My problem with it is that it very hard to come by, and I do not expect it to occur any day soon in my neck of the scientific woods. However, I do like unification and think that we should be encouraged to seek it out. As my good and great friend Elan Dresher once sagely noted: there are really only two kinds of linguistic papers. The first shows that two things that appear completely different are roughly the same. The second shows that two things that are roughly the same are in fact identical. I don’t know about linguistic papers in general, but this is a pretty good description of what a good many theory papers look like. Unification is the name of the game. So if by reduction we mean unification, then I am for it. Nothing wrong with it and a great deal that is right In fact, there is nothing wrong with real reduction either, if you can pull it off. Sadly, it is very very hard to do.

Second, I think that for MP to succeed then we should expect lots of reduction/unification. We will need to unify the modules (as noted above) and unify certain cognitive and computational operations. If we cannot manage this, then I would conclude that the main ideas/intuitions behind MP are untenable/unworkable/sterile. So, maybe unlike CA I welcome the reductionist/unificationist challenge.  Not only is this good science, DP is betting that it is the next important step for GG.

This said, there is something bad about reduction, and maybe this is what CA was worried about. Some reductionism takes it as obvious that the reducing science is more epistemologically privileged than the reduced one. But I see no reason for thinking that the metaphysics of the case (e.g. if A reduces to B) implies that the reducing theory is more solidly grounded epistemologically than the reduced one. So the fact that A might reduce to B does not mean that B is the better theory and that A’s results must bow to B’s theoretical diktats. Like all science, reduction/unification is a completed affair. It is best attempted when there is a reasonable body of doctrine in the theories to be related. I suspect that CA (and I am pretty sure Karthik) thinks that this is actually the main problem with DP at this time. It is premature (and if Lewontin is right, it will remain premature for the foreseeable future) precisely because of the problems I noted above concerning the assumptions required to bridge genes and cognition. My only response to this is that I am (slightly) more optimistic. I think DP considerations, even if inchoate, have purchase, though I agree that we should proceed carefully given how little we know of the details. So, we should make haste very very slowly.

Let me end. I think that DP has raised important questions for theoretical linguistics. Like most questions, they are yet somewhat undefined and fuzzy. Our job is to try and make them clearer and find ways of making them empirically viable. I believe that MP has succeeded in this to a degree. This said, CA (and Karthik) are right to point out the problems. Needless to say (or as I have said), I remain convinced that DPish considerations can and should play a role in how we develop theory moving forward for if nothing else (and IMO this is enough) it helps imbue the all to vague notions of elegance and simplicity with some local linguistic content. In other words, DP clarifies what kinds of elegant and simple theories of FL and UG we should be aiming for.

[1] Importantly, CA is not anti minimalist and believes that general criteria like elegance and simplicity (suitably contextualized for linguistics) can play a useful role. It’s DP that bothers CA not theoretical desiderata on syntactic theories.
[2] Indeed the more linguistic specific this knowledge is, the easier it is to explain the facility of acquisition despite the limitations in the PLD for G construction.
[3] I should be careful here: it is conceptually possible that one small genetic change creates a brain that looks GBish. Recall, we really don’t know how a fold here and there in a brain unlocks cognition. But, if we assume that the small thing that happened genetically to change our brains resulted in a simple new cognitive operation being added to our previous inventory, we are home free. This seems like a reasonable assumption, though it might be wrong. Right now, it is the simplest least convoluted assumption. So it is a good place to start.
[4] To say what need not be said: none of this implies that EC is right and FC wrong. It means that EC has more MPish value than FC does if you share my views. So, if we want to pursue an MPish line of inquiry, then EC is a very good way to go. Or, you should demand lots of good empirical reasons for rejecting it. DP, like PP, when working well, conceptually orders hypotheses making some more desirable than others all things being equal.
[5] These considerations were prompted by a very fruitful e-mail exchange with Karthik. He pointed out that notions like simplicity etc., in order to be useful, need to be crafted to apply to a domain of inquiry. The above amounts to suggesting that DP helps in isolating the right kind of simplicity considerations.


  1. It's Durvasula, not Darvusala.

    1. thx, repaired. And sorry for the misspelling.

    2. @Norbert: Thanks for talking about this topic some more. I pasted below my reply last reply to Norbert in our email discussion:

      >>>My own thoughts over the last few months/years have also sort of tended towards avoiding DP-eque argumentation. In fact, the post strongly resonated with me for the reasons I mention below. I think there are at least two separate aspects mixed up in the post, but the charge of “reductionism” might be at the center of both (provided, one is allowed a more liberal view of what it means to be reductionist):

      1) The first issue is, what does DP provide that a simplicity criterion does not? To wade through this question, let’s separate different hypothetical cases. Case 1: DP considerations suggest minimal baggage, but even the simplest theory we can envision needs more than what DP considerations can support. Case 2: DP suggests a lot of baggage, but the simplest theory we can envision accounts for the facts with far less. In both cases, my own scientific intuitions would force me to align to the simplicity criterion (applied to the domain of enquiry) and basically say that our understanding of DP-criteria is incomplete and insufficient. So, what exactly does DP give us? It is only in a third case, where DP considerations line up with simplicity criteria that at least I am willing to explore the ideas personally. In fact, The Minimalist Programme itself as you have pointed out many times (at least on my reading) is an extension of the older project to get the simplest theory to account for the facts. Chomsky uses the phrase “virtual conceptual necessity”, which seems more like a simplicity argument than DP-eque thinking. And so, I simply don’t see what kind of useful constraint DP can provide. On the other hand, I can see how a theory developed based on simplicity criteria (intuitionist as they are) can help our understanding of the evolution of language. (note: the anti-reductionist point here).

      (to be continued...)

    3. Part 2...

      2) Which brings me to the second point: It is not clear to me that using DP in linguistic theorising is completely reductionist. However, I think we can make progress (by being charitable to the blogger), and instead ask the question Why is reductionism bad in science? To me what’s bad about reductionism is that when one tries to reduce ones understanding of one slice of “reality” (Domain2) to the understanding in another (Domain1), there is the obvious problem that what we know about either domain is itself tenuous, and subject to change. Therefore, any over-eager attempts to reduce Domain2 to Domain1 might reasonably put us on the wrong path. This as I see is the real reason why reductionism is unhealthy. Surely, the fact that we can’t explain even C Elegans with our current understanding of neurophysiology is the primary empirical reason to resist reduction of linguistics to current neurophysiology. But, if this is the real reason why reductionism is bad, then it applies equally to DP considerations in linguistic theorizing. We know precious little about the evolution of cognition of C Elegans too. And on top of that, if Lewontin is taken seriously (“It might be interesting to know how cognition (whatever that is) arose and spread and changed, but we cannot know.“), then one can’t know in any meaningful sense how fast/slow or how extensive/limited the evolution of cognition is, for the reasons he mentions in the article. In which case, whatever we use as DP-reasoning is just intuition about how the evolution of cognition works, and is not based even on a superficial understanding of the topic. This hardly seems like a genuine/meaningful constraint from DP-eque thinking, and whatever remains of it boils down at some level to simplicity criteria again.

      As if I haven’t rambled enough, I will say one more thing: I think I am beginning to form the opinion that reductionism is innate (to steal Gleitman’s wonderful statement), and even people who are able to see that it is bad in other places, seem to be using it for their own purposes (forgive me for the indirect accusation here); we seem to be constantly seduced by its lure, unwittingly. And a part of the maturation of the field is to learn as a community to resist such instincts. I used to have Physics envy for the longest time, but over the last few years, I have developed more of a Chemistry envy than anything else, because the field of Chemistry needed much more resistance to the seductive allure of reductionism than Physics, which of course had its own fight earlier.

      Hope at least some of the verbal diarrhoea made sense :), and thanks for engaging in the madness! Your blog (may your ears burn) has been a daily highlight since day 1.

  2. I don't think reductionism is really relevant to this issue. I agree that it is vastly premature to try to reduce language to neuropsychology, let alone genetics, but I don't think evocations of DP attempt that. The point of it to me is just a nod to say, "let our theories be *congruent* with the many possible shapes that an evolutionary explanation could take."

    Dissenters here seem to be saying that because we know next to nothing about the mechanisms of cognitive evolution, we cannot rule out any speculations about the speed or products of cognitive evolution. I think this is a non-sequitur. Despite us not understanding HOW cognition evolves, we have substantial examples of WHAT has evolved and WHEN, and that precedence ought to set limits on our hypotheses.

    We know, for example, that there is no evidence in the 3.5bn years of life on this planet for an organism that evolved in the space of, say, 10,000 years, a cognitive ability for intuitively understanding the fundamental laws of physics. This is an extreme example but the point is this: are you going to say that we can't rule out the possibility that that COULD actually happen just because we don't know how cognition evolves in general, or does it not seem sensible to say that the precedence of life on earth makes it quite likely that there are design constraints making such an evolution impossible even though we don't fully understand what they are?

    Obviously, we don't have the luxury of linguistic cognition being so extreme, even though it is unique, so we don't find ourselves at a clear boundary. We can't say with certainty that a large amount of domain-specific knowledge couldn't evolve to be innate in a short time. However, I think it's reasonable to look at the time-frame we have and the communicative abilities of our closest relatives, and then appeal to the general trends in cognitive development throughout the planet's history to propose that we should be looking for a small, non-gradual change, concluding that we need to reduce the innate content we are theorising. It may be drastically wrong but it's a basis for inquiry.

    Besides, I think it's important that this gives us cause beyond Occam's Razor to avoid stuffing UG full of domain-specific knowledge because, without DP, why should we automatically evoke simplicity particularly in syntactic operations? Why mightn't we instead desire simplicity in language acquisition by throwing a little more into our bag of innate tricks? There is a complex, multi-layered give-and-take with simplicity and I think it is a falsehood to believe that there will eventually appear only ONE theory that is THE simplest theory and it just so happens that the minimalist program will take us there. There will be competing simple theories in different frameworks and DP could help us choose between them.

    1. @Callum: This seems to be the crux of your argument “we have substantial examples of WHAT has evolved and WHEN, and that precedence ought to set limits on our hypotheses.”
      My reading of Lewontin is exactly that we don’t know the WHAT or the WHEN since cognition doesn’t leave any “physical” evidence behind. This is not to say comparative biology might not help in theory, but it isn’t clear to me that this is a particular useful constraint, since we are vastly under-informed about the cognitive abilities of other species, and are only slowly unraveling the mysteries. I respond to Norbert’s particular use of DP below in more detail.

    2. @CA:

      I think that we somewhat know two things: that prior to us there was no other species that had an FL like ours. And second that this FLish capacity arose try recently in the species. These conclusions seem relatively secure, the first especially, but even the second is quite firm. What we don't know is exactly what it was that gave rise to FL. The MP bet is that a lot of FL's innards were there "already" and a small addition kicked our specific FL into existence. Were something like this true, then our job is to find both the stuff that was there already (i.e. that is NOT linguistic specific) and the additional addition that was added and that is linguistically specific. Now, all of this can proceed, I think, without having to deal with Lewontin's justified skepticism. Or at least it can be approached with some useful idealizations (e.g. the assumption that any cognitive facility any of our animal friends have that is nonlinguistic our immediate predecessor had too). So there is a project here and it can usefully guide investigation, IMO.

      Again, let me give an illustration. I assume that we can assume that humans have memory systems like other mammals. This looks to be content addressable memory with costs for keeping things in memory. This is a cognitive boundary condition that we can assume our ancestors had too. Does this tell us anything about how FL works? Well, I think it might. It looks like the kind of cognitive fact that fits well, for example, with the kind of relativized minimality restrictions we find on dependencies within Gs. So, one might argue (I might) that Relativized Minimality is the kind of restriction of dependencies that we might expect to find were the linguistic data structures optimized for a use system with the kinds of memories we have. I find this line of thinking productive. If you do too, then we have gotten something out of DP styles of tinkling. Is it true? Dunno, but interesting.

    3. The MP bet is that a lot of FL's innards were there "already" and a small addition kicked our specific FL into existence. [emphasis mine]

      @Norbert: Isn't this precisely what the charge of reductionism is being leveled at, though? Let's take for granted that FoL arose recently and quickly within the species. As pointed out in your post, we don't know how genetic changes lead to changes in the brain, much less how changes in the brain lead to changes in cognition.

      So, again taking for granted that FoL arose recently and quickly, it seems fine to assume that the genetic change was small. However, the charge of reductionism seems to be that since we don't know much about the links between genes and the brain and between the brain and cognition, it doesn't follow from the genetic change being a small one that the cognitive change was a small one.

    4. Yes, I agree. It doesn't follow. It is possible that a small genetic change had far ranging cognitive repercussions. Indeed, it is possible that the apparent intricacy of say a GB theory is the result of one allele changing its stripes (do they have stripes?). But, absent any good information to the contrary I am happy to assume that if we can identify what looks to be a small change at the cognitive level (e.g. the addition of a novel operation) will correspond to s mall change at the brain level (i.e. the emergence of a new circuit). This does not follow, but it seems reasonable. I take it we will decide just how reasonable by the kind of work that it supports. If we can unify the modules and can reduce linguistic dependencies to a few (one?) basic kind don't you think we will have found out something intriguing? Moreover, won't it make looking for the source of this in the brain easier (Dehaene certainly thinks so). So, I agree the reasoning is not apodictic, but it is very suggestive.

    5. @Nobert in this end, I agree that PP is a far more useful constraint than DP. DP just seems wishy-washy. As far as I am concerned, the grunt work typically ascribed to DP considerations can equally well be ascribed to more standard simplicity criteria. I prefer this option, as it doesn't open any reductionist can of worms.

  3. @Norbert: First of all, thanks for discussing the content of my post. I too think the issue is relevant to how the field progresses. I think you have represented my position very fairly, and I want to address (deconstruct?) the one particular argument for the utility of DP constraints that you discuss in the post: The extension condition (EC) vs. feature cyclicity (FC).

    The fundamental argument for why you prefer EC over FC is this:
    (a) “Because it has the right generic feel to it because it looks like a very generic property of computations.”

    You also mention:
    (b) “it would be natural to find that cognitive rules systems in general are monotonic”.

    In regards to (a), we know a fair bit about the general theory of computation, and perhaps Alex Clark, Greg Kobele, Jeff Heinz, or Thomas Graf (and other regular contributors to the blog) could add to the discussion to see if EC is indeed a more reasonable extension of our current understanding of “computation”. Given we know something reasonable about the topic, of course it can serve as a decent constraint on linguistic theorizing, and is in a sense more unificationist. Crucially, this is not a constraint from DP, but from our general theory of computation.

    But, (b) has no evidentiary force at all, since we know precious little about other cognitive systems to apply this argument. It boils down to an argument from ignorance. Add to this, that we know nothing about the evolution of cognition, so the expectation of monotonicity is nothing but intuitionist hope at this point (a point that Karthik raised too). Therefore, it seems like pure and unhealthy reductionism, based on intuition! IF this is allowed, we might as well allow associationist constraints, since they are also strongly in sync with many people’s intuitions. (b) to me is simply just-so-y theorizing - something we should all beware.

    Ultimately, it is (a) that seems to drive your argument; the argument does not depend on how computation is carried out in natural systems, but actually on the EC being more naturally situated in some view of computation that you have. Therefore, the actual substantive force of the argument comes not from DP considerations, but from your view of what is more natural in a general theory of computation as envisioned by you. Furthermore, to the extent that the minimal (“simplest”) extension is what one means when they say “X” is more naturally situated within a view, your argument (as I pointed out in my blog post) has again appealed at some level to simplicity criteria. This much seems inevitable (again, as touched upon by KD above).

    To repeat the basic argument of my own blog post, since we know nothing about the evolution of cognition and are only now slowly discovering the cognitive abilities of other species, how could it possibly put any DP constraint on our theorizing of the language faculty? The arguments I have seen inevitably boil down to simplicity criteria at some level, and don’t actually need the invocation of DP at all. This, as I see it, is a good result, since the invocation of DP in theorizing about the language faculty is reductionism in the bad sense, and not unificationism (given that, to repeat myself, we know nothing of the evolution of cognition).

    Note: I initially created this id to point out the silliness of CB’s argumentation. But, more recently I have realized anonymity is a great way to focus on the argument, away from names and associated baggage, as it enforces a blind review of the content/argument at hand. Though I don’t comment much (if at all), I am indeed a regular reader of your blog, and I thank you sincerely for creating a forum where linguists of all stripes can grapple with issues that are core to the enterprise. And thanks to KD for passing on the blog post to you.

    1. Regarding the Extension Condition and whether it is indeed a more reasonable extension of our current understanding of “computation”: that's a tough one. The EC is usually stated as a property of phrase structure trees. For the sake of clarity, I take it to comprise the following two components:

      1) All operations add nodes/branches (or, slightly weaker, no operation removes nodes; and I leave it up to you how this is refined to accommodate feature checking).
      2) All operations manipulate the root of the tree (and we tacitly assume that these manipulations exceed some kind of triviality threshold so that it isn't possible to make operations EC-obeying by, say, adding a useless feature to the root)

      As I just said, these two properties are evaluated with respect to phrase structure trees. But those are actually redundant from a formal perspective as all necessary work can be done directly over derivation trees. Therefore the standard EC is formally vacuous under its strict interpretation since it can be deprived of its application domain. Let's see, then, whether derivation trees conform to the EC.

      Under the most natural interpretation of derivation, yes, because a derivation is a record of the operations that are carried out, and every step you take necessarily adds something to the tree, and clearly you can't do something before you do it, so each step is added to the root of the current tree. But i) the derivational EC does not entail cyclicity because countercyclic operations can be implemented while obeying the derivational EC, and ii) derivations are simply trees under a specific interpretation (interior nodes correspond to operations), and depending on how loose your interpretation is, things can get pretty mindbending very quickly.

      For example, the way I implement lowering movement in MGs means that movement operations take place at points in the derivation where the moving constituent has not even been introduced yet. That makes perfect sense if you have a "representational" view of derivations as an encoding of the dependencies of the feature calculus and how that is translated into output objects (strings, phrase structure trees, formulas, whatever you need). If you don't like that abstraction and want to stick with the common temporal intuition of derivations as the partially ordered sequence in which operations must be run to yield the desired output, it seems non-sensical at first, but it can be salvaged by a more general interpretation of how your operations work: i) introduce derivational time travel (hey, it works for physics ;) ) or ii) generalize Move such that material can be inserted under the assumption that a suitable Merge site will be found at a later point (Move as hypothetical reasoning), or iii) lowering is covert feature movement. There isn't anything about the derivational EC that clearly blocks these widenings of the interpretation.

      That said, it is the case that the mappings from derivation trees to output objects can be squeezed into a weaker class if the derivational EC is satisfied and movement is not allowed to be countercyclic. So it is possible to construct a formal argument for a particular parallelism between derivations and outputs that roughly lines up with the linguistic notion of EC, but the connection is tacit enough that I consider the former a fairly promising restriction while remaining ambivalent about the EC.

    2. I find these discussions confusing, because the computations that actually take place are those related to production and comprehension (parsing and generation) which are only distantly related to the abstract computational level description that is used in generative grammar. Generating a derivation tree (or other structured expression) nondeterministically bottom up just doesn't map onto the actual computations that take place.

      Also from a general computational perspective, there are different ways of realizing computations, some of which naturally satisfy these computational principles like NTC for example, and others of which don't. I don't want to reopen the whole recursion can of worms but for example, Turing Machines modify variables (on the tape) and Godel style recursive functions don't.
      So it seems hard to say that monotonicity (is this the same as immutability?) is a natural computational property without a better grasp of the high level computational infrastructure of the brain.

      Back when I taught computer science to undergraduates I would explain the advantages of disadvantages of mutable and immutable data structures (i.e. those that can be changed and those that can't be changed), but that seems a long way from these concerns.

    3. @Alex: I too find the discussion hard to make sense of, from the perspective of the grammar as merely a high-level description of the kinds of regularities present in the parsing and generation algorithms.
      I think that this perspective is not shared by the people invested in this discussion. There seems to be an idea that the grammar is a description of something that exists, independently of the parser and generator, and that these interact with this thing by querying it from time to time; the grammar is basically an oracle.
      (Whether the discussion makes sense even in this context is unclear to me.)

    4. @CA:

      I guess I was trying to make a more modest claim: that comparing the two kinds of accounts we have for grounding the cycle, EC or FC that EC has a far less domain specific feel to it. FOr FC to work we need lots of assumptions about features and their toxicity. We need many assumptions of what heads they sit on and how fast their toxicity needs to be discharged. THe problem with such theories is that features need to be fine tuned to get these results. There is nothing general about this. EC is in this regard far less domain specific in conniption: I understand it to say that computations preserve information: the inputs to a derivation are preserved in the outputs of the derivation. If I understand Alex C's comment above, this corresponds to the different between mutable and immutable data structures. The fact that EC corresponds to a an identifiable general computational feature speaks in its favor. And this looks to me different from FC.

      Now is EC unique to linguistic cognition? I doubt it, but that would be nice to know. How information preserving are other domains of computation? If visual illusions are like garden paths and if the latter are the result of deterministic parsing which is itself related to not being able to rescind parsing statements then… So, I agree that we should investigate these matters on other domains. The nice thing about DP is that it gives us possible questions to investigate, e.g. are there similar kinds if info processing in other cognitive domains? Well, let's see.

      In the end, I agree with you about one thing: unlike PP, DP will be harder to integrate because DPish considerations are less well developed than PPish ones. That's fine with me. I am happy to take what I can get. You seem to think we can get nothing at all. I believe that we can get next to nothing at all, but every little bit helps.

    5. @Greg and Alex: What would be the computation that takes place, then, when you ask somebody for an acceptability judgment? Also, Greg, I'm not sure that I entirely understand what distinction you're making between the 'grammar' and the 'generator'. Could you clarify?

    6. @Adam: By `generator' I meant language production (not anything having
      to do with a `generative' grammar; sorry for the confusion).

      It is a good question how acceptability judgments work. Alex and
      Shalom Lappin have been working on this question; theirs is the most
      worked out linking theory between grammar and acceptability behaviour
      I have seen. Naively, there seems to be some relation too between
      acceptability and parsability and interpretability. There's
      psycholinguistic work which suggests that binary acceptability
      judgments should be thought of as derivative of gradient acceptability
      judgments (basically, that people `have' gradient judgments, and to
      form binary ones they simply cut their gradient scale in half).

      It's worth thinking about how on earth the
      grammar-as-ontologically-separate-entity perspective would deal with
      (gradient) acceptability judgments.

    7. @Greg: Hmm, I'm not familiar with their work. Thanks for the reference. I'll take a look at it.

      As just a cursory remark, I'm not sure I see what the immediate problem would be with accounting for gradient acceptability judgments given 'the
      grammar-as-ontologically-separate-entity perspective', whereas your 'how on earth' comment seems to suggest that this would be very hard. Why wouldn't we also assume that processing and memory would interact with 'the
      grammar-as-ontologically-separate-entity perspective' in such a way so as to produce gradient acceptability judgments? Granted, that's a longshot from a formalized model of said interaction, which I'm sure is something you'd like to see. But surely such a model could be constructed.

    8. @Adam: I meant the `on earth' as an exhortation to think about the mechanisms involved; while I have no doubts that a model could be constructed, I think that it interacts worryingly with the motivation for the grammar. What is a grammar a theory of, under that view? At least the results of the queries the parser makes (but we don't have a theory of that). (I take something like this to be David Adger's expressed view in his most recent post.) Sometimes you hear that the grammar is somehow directly implicated in acceptability judgments, in a way that isn't mediated (solely) via queries. It's not clear how this should work, whence my question.

      Here's what it seems you would need to say. When the parser constructs a structure for a sentence (there's no reason to think that the parser constructs grammatical structures, as the grammar plays only an (heretofore unspecified) advisory role), maybe the grammar looks at that structure, and assigns it some score, based on how far from ideal it is? But of course, the grammar can't really look at this structure; it is just a knowledge base. You need some performance system (let's call it the acceptability judgment system) to look at the structure the parser came up with, and query the grammar about it.
      This is kind of a strange system to have, at least as I've described it. Certainly, instead of acceptability judgments being a `direct line' to the grammar, they are now even farther away from the grammar, as they involve the mysterious acceptability judgment system inspecting the (unknown) output of the parser system. Something more natural might be an exemplar based `i've seen something like this before' system, but then the grammar wouldn't be directly implicated in acceptability judgments anymore. The most natural thing to do (it seems to me) would be to ignore the grammar, and just base the acceptability score off of the behaviour of the parser (or some I think compatible alternative cf Clark&Lappin). (But of course, the parser is an algorithm implementing some computation; so there is a grammar in the Marr sense anyway underlying the behaviour of the parser, and this is what is relevant for acceptability...)

      Your question I originally responded to (how to account for acceptability in the Grammar as Parser model) I interpreted as proceeding from the assumption that there was a reasonable way to think about how this would work in the Grammar-as-ontologically-separate-entity model. As you can see, I am not convinced of this assumption.

    9. @Greg @Alex: A quick question. To what extent do we know grammaticality, even acceptability, is gradient? This surely can't because we get gradient responses when we give the subjects gradient tasks (e.g., rating): gradient responses are guarantee, as the Gleitmans found out with odd and prime numbers long ago. I'm not very familiar with the syntax literature on this business, but in morphology there are loads of claims about morphological productivity being gradient, almost always based on some inherently gradient task, but the actual language data, especially from children's acquisition of morphology, shows a near categorical distinction between productive processes, which are often over-used, and lexicalized/unproductive ones, which are almost never over-used. If syntactic grammaticality/acceptability is truly gradient, what would count for the fact that in conversational speech to children, 99.93% of adult sentences are perfectly grammatical after editing out ums and uhs and prescriptive "errors" (Newport, Gleitman & Gleitman 1977)?

    10. @Charles: Following standard terminology, grammaticality is a theoretical construct and acceptability is what we get when we ask people whether a sentence is ok.
      The standard view, simplified, is that acceptability is gradient and grammaticality is categorical. But of course for a long time (since Aspects?) people have accepted that the grammar might need to do a bit more than just classify structures or sound/meaning pairs into +/- wff.

      Of course one needs to be a bit careful as there is a tendency for subjects to "use the full width of the road" even for intuitively categorical distinctions.

      Some papers on this are available here.

    11. @Charles: (Addressing a different point than the quite correct one of Alex) You seem to be assuming some sort of linking theory between the grammar and production of sentences that I don't quite understand. Note that under the Fodorian/Adgerian interpretation of grammar (as a knowledge base which is consulted by the parser/producer), there is no reason to expect that only (or predominantly) grammatical sentences are produced. Under this interpretation of grammar it plays only an advisory role, and the real theory of language is in the theory of the parser/producer, which are completely unconstrained (architecturally) by the grammar.

    12. @Alex: thanks! More to talk about in London.

      @Greg: I certainly had no Jerry or David in mind but only asked an empirical question about the basis on which we say grammaticality/acceptability is gradient, which was not clear to me given the methodological complications. I don't think the question of linking theory can even be raised without knowing what the facts are.

      But the last bit of your point confuses me: Why could Fodor/Adger, or anyone, say language production is completely unconstrained by the grammar? The grammar determines what could be said, and the parser/producer, in consultation with other factors, determines what is actually said. Maybe you have something specific in mind when using the term "architecturally"?

    13. @Charles: I read If syntactic grammaticality/acceptability is truly gradient, what would count for the fact that in conversational speech to children, 99.93% of adult sentences are perfectly grammatical after editing out ums and uhs and prescriptive "errors" as suggesting that non-gradient grammaticality would be better able to explain this, whence my puzzlement about what that explanation would be.

      The grammar determines what could be said, and the parser/producer, in consultation with other factors, determines what is actually said. On my understanding of the Fodorian view (where the grammar is a knowledge base that the parser/producer queries while parsing/producing), the grammar does nothing at all like `determining what could be said', unless further strong assumptions are made about the parser/producer. What determines what could be said is the computational theory of the parser/producer, which may (or may not) make reference to the results of grammar queries.

    14. @Greg: thanks for the clarification. I took the 99.93% finding as suggesting grammaticality is so categorical such that only ums and uhs can disrupt it. This is not the only interpretation but a possible one.

      On the grammar/production linking: I meant to say the grammar specifies what possible *forms* could be said. what one wants to speak is dependent on many including non-linguistic factors (e.g., "good morning"), the parser/producer constrains the actual forms (e.g., no five level self embedding etc.) but presumably have nothing to do with, say, structure independent strings which would not have been generated by the grammar. I don't know if this is what Fodor had in mind or not, but seems to be something that he could have meant.

    15. @Charles: On the grammar/production linking: I meant to say the grammar specifies what possible *forms* could be said. what one wants to speak is dependent on many including non-linguistic factors
      Yes, this is my view as well. However, on the view I attributed to Fodor/Adger this does not seem to follow. On that view, the grammar plays no causal role in anything. it is simply a knowledge base which may be queried (but there is no requirement about what to do with the answer). One would imagine that the parser should somehow assign a structure to a string which is related in some way to a structure that the grammar qua knowledge base licenses for it, but there is no need for this to obtain. (Cf ideas in psycholinguistics about shallow parsing; Bever's NVN heuristic, F.Ferreira's Good enough etc)

    16. @Greg

      I must be missing something: it is a standard assumption that parsing uses the grammar to assign structure to incoming "strings." There is "no need" but everyone assumes that something like this is true. Some think that this is not all that parsing does, but most everyone believes that this is at least what it does. If so, I am not sure what your reply to Charles is getting at. Is it that what one knows about language doesn't IMPLY how one uses what one knows in parsing and production. If that is your claim, sure. But why is this a problem? There are many theories tying in linguistic knowledge with use. Are these in some sense problematic? I must be missing something.

    17. @Norbert: Yes, everyone assumes that the parser assigns structures to incoming strings which are fairly closely related to the structures licensed them by the grammar. One idea is that this is because the grammar simply is a computational theory of the parser. The other view I am familiar with is that the grammar is something like a database, and serves as an oracle for the parser.
      In this round of comments I have been trying to explain why I think that this latter view, despite being thought by me to represent the mainstream, is problematic in various ways.
      In the comment I think you're asking about, I am responding to Charles' incredulity that this mainstream view doesn't in fact entail that the grammar specifies what forms *could* be said. (It does in the grammar-as-parser view.)
      It is not at all clear to me what kinds of assumptions about how the parser uses the answers to its queries would allow a proponent of the mainstream view to say that the grammar specifies what forms could be said. I have never seen anything resembling a worked out view of this mainstream type, and would be very interested in so doing. For example, the marcus parser had nothing like a way to interact with an external knowledge base (grammar); it was just a parser, which more or less did what we thought it should do, given our theories of grammar, but the actual grammar qua computational theory underlying it was quite different from our theories of grammar.

    18. @Greg: Isn't the mainstream view just the standard factorization into grammar and parsing strategy (which we also find in CS) + an ontological commitment that these components are distinct, rather than the former being an abstraction of the latter?

      So in general we specify a parsing schema, say, recursive descent, shift reduce, or left corner via inference rules that take rules of the grammar as parameters (and we would also have to throw in a control structure and basic data structures, but let's put that aside). The actual parser is the combination of the parsing schema and a given grammar. I think what most linguists mean by the parser is not so much the combined output product as the function from grammars to specific parsers for individual languages.

      You're right that arbitrary functions could in principle map grammars to flying spaghetti monsters, but that's not the kind of functions represented by parsing schemata. So if we take parser to mean something like parameterized parsing schema (and Norbert and Charles can chime in as to whether that is a fair equivocation), there doesn't seem to be much of a problem.

      Or is your worry that even under this perspective the compilation process is done only once and then the parser never has to ask the grammar anything again because the fully specified output parser has been produced?

    19. @Thomas: I guess I don't really see this as different enough from the Marr perspective. Or in other words, this is just the grammar being the theory of the parser, with an alternative, compact, representation of the latter. (I'm not quite sure what it means that we're talking about representations of the parser, but ...) If this were a satisfactory depiction of the Fodor/Adger position, I would be happy enough with it.

    20. It might be helpful in this discussion to distinguish a couple of things that often get conflated when we move from talking about grammars to parsers.

      One is a shift in levels of analysis, e.g., describing a structure building system that specifies order of operations, possibly also with more specific commitment to a memory and encoding architecture. That can be done without making any commitments to how this system relates to external input/output in any practical task (such as parsing, or acceptability judgment).

      It's something else to then specify how, this system is used to generate suitable representations when presented with input strings. That's what people generally have in mind when they're talking about parsing strategies.

      I love me some Marr, though I find the attempt to squeeze linguistic models into Marr levels to be counterproductive, especially the notion that it's useful to draw clear lines between 3 levels.

    21. I'm with Colin on the Marr issue. Where Marr is very relevant to linguistics is the specification of the computational level (what I called `why this pairing of sounds and meanings' in my post) as distinct from issues of brain organisation (the physical level). I've always struggled to see how Marr's intermediate level fits in to the specification of what is really just knowledge. It's a bit like arithmetic: we can specify an arithmetical system proof- or model-theoretically, and maybe that's even a system that is human-biology specific like language and has a physical species-specific neural-level realisation (perhaps subitizing plus the successor function is what humans do, while Martians multiply quantum vectors). But where would the algorithmic level come in? Sure, when I used to work behind a bar and added up the costs of rounds, I used that knowledge to carry out computations that used addition and whole numbers, but I think that's very distinct from Marr's algorithmic level, which makes sense for input systems like vision, but not obviously for knowledge systems like language (or arithmetic). So I think of the parser/parsing strategies/whatever as using linguistic knowledge in a way that is a bit like counting behind a bar: it's just the application of the knowledge specified in the grammar to solve a particular task (say, making an utterance). I'm assuming that the cognitive system(s) involved in that task are fairly well adapted to it in the case of language, through evolution, development, and daily use. I actually think that much of it may end up being effectively compiled into reusable procedures (in fact, one might think that that is actually what a word is - a phonologically indexed reusable compiled heuristic parsing procedure) for actual use, and tat's where you can capture the kind of sociolinguistic or frequency effects that excites many outside of GG.

      I think this is also in agreement with what Thomas was saying, inasmuch as under this view a parser can be somewhat specific to a grammar. So the actual strategy for parsing japanese would be a bit different rom that for parsing English (which I think there's some evidence for).

      So when we turn to acceptability judgments, these are just the same kind of thing: a task that involves the use of knowledge, perhaps precompiled into (re)usable procedures of various sorts.

    22. Actually, I'm not sure that David and I are so close on this one as he suggests (sorry!). I'm closer to what Greg is saying. In fact, for any FoL readers who happen to be in Seoul right now, I'll be giving a talk on this topic tomorrow.

      My reservations about Marr levels as applied to linguistics are: (i) identifying standard GG with Marr's computational level. Standard GG does way more than just characterize inputs/outputs. (ii) the neat division into 3 levels of analysis.

      Unlike David, I'm entirely happy with intermediate levels of analysis for language. In fact, I'd be happy to specify many intermediate levels. As far as I'm concerned, we have a (single) capacity for encoding and constructing structured linguistic representations, and we can describe it at many different grains of detail, based on whatever yields the most insight for what we're currently trying to explain.

      At one end of the scale, we can give a true Marr-style computational description, in which all we care about is the possible output representations, with little regard for what generates them. We can do standard GG, where the components of the theory that yield the correct inputs/outputs are taken more seriously (e.g., they could be the locus of variation, they might be learned). We could add in considerations of sequential order of operations (roughly left-to-right). We could add in considerations of time. We could add in considerations of the memory architecture used. And so on through to the neural level. Marr levels are merely major bus stops on the journey from more abstract to less abstract characterizations of the system.

      Actual parsing and production tasks are simply uses of this system under special task conditions, where the goal is to use the structure generation system to match to an externally specified sound or meaning. One can characterize how the structure generation system carries out those tasks at multiple levels of abstraction also. The same is true for acceptability judgments. Distinctions between linguistic tasks (speaking, understanding, judging) are orthogonal to distinctions between levels of analysis.

    23. Well, I think we maybe agree, a bit, at least, but not too much. Standard GG doesn't just characterise input/outputs, definitely, so I agree with you there, but that's because syntax isn't an input/output system. So you have to adapt Marr's ideas a bit, but adapting them and applying the computational level to syntax, what you get, I think, is a computational theory of the possible pairings between sound and meaning at species level and a theory of particular pairings at I-language level (where we have to take `sound' here to mean whatever configuration of information is relevant to the kinds of mental processes that eventuate in making/perceiving sounds and signs; ditto for the meaning side). That's not input/output at all, which is why I think Marr's algorithmic level doesn't really play a role, as it is really about procedures for transforming information. Syntax is a generative system, producing configurations that are computed with by other systems and it doesn't transform information, it creates information (configurations really). The information produced by the language faculty is transformed at the interfaces, to be used by other systems. Extra-syntactic information (like order, for example) is an interpretation (really a transformation) of syntactic information by those other systems. You can also add things like memory, etc into the general theory of use and processing as general constraints on how that information is transformed. So these are aspects of the more general system that one might look at through the lens of Marr's lower levels. But I guess where we don't agree is that you see much of this as an issue of abstraction from performance, while I see it as the interaction of multiple cognitive systems performing different tasks (generating structure, interpreting it, transforming it, integrating it with other systems that generate other kinds of structure, etc). Does that seem about right?

    24. @David: I get where you're coming from (I think), but I don't see why on earth you would want to characterize grammar as a knowledge system instead of as (what you called) an input system. Thomas has given me a way of making sense of this claim (as a funnily formulated version of my own), but it doesn't seem faithful to your words.
      @Colin: I completely agree with your points. Peacocke has suggested that GG is characterizing the I-O behaviour as well as gross data structures, which he dubs level 1.5. I think that something like this is closer to what people are actually doing. But anyways, I agree that the division into Computational/Algorithmic/Implementational should not be treated as gospel, but rather as a simple reminder that there are multiple complementary ways of understanding a complex system, and that we need to understand systems at multiple levels to really get a handle on them.

    25. @David: Oops, we overlapped a bit, and you answered preemptively the question I asked.
      It seems like a big part of language is an information processing task; we transform sounds into meanings, and back. The computational theory of these information processing systems is a specification of sound-meaning pairs. I guess the big question is: why should there be anything else?

  4. @Greg. I think the reason I take there to be a specification of structure that is not just input/output is that I'm impressed by cases like those discussed by Chomsky in Aspects where knowledge goes beyond use (all the standard cases of memory issues in parsing etc) and a theory that generates configurations and interprets them seems to get that right(isn) without saying much extra. So it probably boils down to that debate and hunches as to how it's best resolved. Also, it's a way of thinking that allows me to get a grip on those sociovariation questions I've been worrying about for a while. And the cyclicity of syntactic computation doesn't seem to track memory issues as usually understood. And ... well, I guess I think all of the answers to these issues fall in one direction for me, which is a generate and interface architecture, rather than an input/output with the grammar as an abstraction architecture.

    1. (Hope this is linking to the right part of the thread.)

      Yes, I wouldn't want to equate a grammar with a perceptual device. As you say, it's a representational system that links sounds and meanings, but not a device for transforming information. One can certainly use that device in a task that involves mapping external percepts onto meanings, but that's different than saying that it is a device that is 'for' transforming information.

      Rephrasing something that I've said above, I don't find it helpful to talk about grammar as an "abstraction over processing", because that tends to conflate issues of (i) level of description and (ii) task. Aspects conflates these, and also uses a terminological sleight of hand to further conflate these with the distinction between internal representations/computations vs. external behaviors. I generally avoid talking about terms like 'performance' (and to some degree also 'use') because it invites that conflation.

      I'm taking "memory" to be just another term for "encoding". It's whatever allows information to be carried forward in time. So "abstracting away from memory" just amounts to being less specific about the nature of the encoding. And once we specify the memory encoding in more detail, we can further distinguish (i) the nature of the memory encoding, and (ii) the whatever places bounds on our ability to accurately target different encodings in memory (decay, interference, focal attention, etc.).

    2. @David: So, the competence-performance distinction? Yes, that's very important. From my perspective, it's similar to noting that one's pocket calculator doesn't `actually' compute addition, because of memory limitations and the like. Maybe a change in grammar formalism would be useful here? Mark Steedman, using CCG, has claimed that `syntactic structure is nothing more than a trace of the algorithm for computing meaning." Certainly using CCG doesn't limit one in the ways you seem to be worried about.
      Your other point, that you find it easier to think about things from your perspective, is of course perfectly reasonable.

      @Colin: I'm not sure what you're getting at with identifying memory and the `encoding'. (Or even what exactly this latter means.) Bounding the size of the stack of a push-down automaton makes the language recognized regular, despite using context-free structure. If our language processing mechanism were (abstracting away from everything else) a PDA with a bounded stack, I would want to describe the CFG underlying the PDA without stack bound, which I would characterize as abstracting away from memory limitations. I take this to be roughly the situation in linguistics, and basically (one aspect of) the competence-performance distinction. I'm not sure how to think of this in terms of being more or less specific about `encodings'.

    3. A way of ducking the contentious issue of grammar-parser relationships is to say that the grammar is a partial (and probably somewhat confused) description of whatever it is that's responsibility for grammaticality phenomena; whether it's a data structure consulted by the parser or (certain aspects of) the structure of the parser, or something completely different does not have to be regarded as determined.

      This is essentially the Peacocke-Davies 'Level 1.5' idea. Whether or not grammaticality is a real phenomenon distinct from other aspects of acceptability is something that can be argued about, but there is clearly a domain of linguistically significant generalizations that grammars are useful for describing, phenomena of impaired acceptability that they are not so clearly great for (for example reduced acceptability of overly complex prenominal possessors), and others where it probably makes no real difference.

  5. @greg. I'm a fan of CCG (Mark was my first ever syntax teacher, when I was 18!) and that comment of Mark's is very compatible with the pretty common view that minimalist syntactic derivations are computations of semantics with spell out as a sideshow. I guess there might be different ways within CCG of building in processing difficulty into the formalism itself (e.g. in particular derivations penalising particular interactions of composition and type-raising, or whatever) while saying that if there's a well formed derivation overall, the structure is grammatical, but I'm not sure that that would actually be helpful.

    Maybe it's just about different ways of configuring your abstractions, but I do think there's something to the very simple idea that our knowledge of language goes beyond our parsing capacity (centre embedding being the obvious case), and we want an architecture which that falls out of. A classical way to think of the model of use is as a number of finite transducers operating on the configurations delivered by the generative component, and those transducers are where memory limitations, etc, apply. That allows us to capture the fact that the knowledge of language stored in the generative system is greater than what is usable by the other systems, which are time and memory bounded. If we take the generative system to be an input system in the Marrian sense, it becomes a bit of a mystery as to why we see a disparity between knowledge of language and capacity to parse.

    I'm not keen on the position that Avery sketched out (although I'm embarrassingly ignorant of the Peacock-David proposal which I'll go find out about), because I don't think we should duck the issue. It's probably better to be sharp about what the proposal is, so then we can at least be sure if it turns out to be wrong.

    1. @David. Tasks ≠ levels of analysis ≠ architectural components.

      I agree that it's clearly the case that there are sentences that we struggle to comprehend but that are compatible with what our grammars sanction, such as center embeddings. But there are many ways to capture that fact. And the classic Monty Python approach ("Now for something completely different") is likely overkill.

      The simple observation that some sentences are hard has been around forever, but we now know a whole lot more about the relation between the generalizations that linguists capture in grammatical theories and the things that people bring to bear in real-time computation. Basically: the relation is very tight. Real-time linguistic processes are highly grammatically sophisticated. Even when people mess up, they seem to do so in a way that closely follows their grammar. In fact, the current state of the literature is such that it's pretty boring to find cases of close alignment, as they're so common. It's news when people find cases of apparent mis-alignment. (Shevaun Lewis and I review these in a 2015 paper in J. Psycholing. Research.)

      I agree with you that we should not regard the human structure building capacity specifically as a perceptual system (which would be a way of taking literally the notion of "input system"). But I don't think there's much mystery at all about the relation between knowledge of language and what people do in real time. It's an active area of research, a great deal is known about the close alignments, and a lot is also known about the scope and possible accounts of the misalignments.

    2. Davies1987, Tacit Knowledge and Semantic Theory: Can a Five Percent Difference Matter? Mind 96:441-462 is the one to look at. I think it is a good way to go unless we really can tell the difference between for example a brain-implemented counterpart of a parser written as prolog clauses for a specific language, and a general purpose one that looks at a database of rules for the particular language. Linguists obviously can't do that at all; I doubt anybody can, therefore we should not strike postures about it: the message of syntax to the outside world should consist to the greatest extent possible of things we can reasonably claim to know, rather than arbitrary guesses.

      One possible way to put this is that if we have two theories of UG, T1 and T2, such that the grammar+parser combinations provided by T2 can be mapped one-to-one onto those of T2, computably in both directions, preserving both the sound-meaning relation and the 'succinctness' relation between the grammars in the theory (and maybe other relations such as how much revision is required to convert one grammar into another), then they should be regarded as the same theory. [Hopefully no vitiating errors in that attempted formulation]

      Succinctness of (aka evaluation metric score or Bayesian Prior) and revision distance between grammars are ideas that need more thought, but note that for example any historical linguist who thinks that certain changes can be meaningfully described as 'Simplifications' is implicity accepting the evaluation metric essentially as proposed by Chomsky, even if they are not aware of this fact.

  6. I also don't think its a mystery! I said IF we think of knowledge if language as an input system, then it'd be a mystery. On the sketch I gave, you'd expect quite a tight linkup, since the systems of use are using linguistically generated configurations. The only thing I think I'm trying to say here is that the generative system that creates the structures that are interpreted is not an input system, and is best specified at something more like Marrs computational level.