Monday, September 23, 2013

Why Formalize- II?

Ewan (here) provides me the necessary slap upside the head, thereby preventing a personality shift from stiff-necked defender of the faith to agreeable, reasonable interlocutor. Thanks, I needed that. My recognition that Alex C (and Thomas G) had reasonable points to make in the context of our discussion, had stopped me from thinking through what I take the responsibilities of formalizations to consist in. I will try to remedy this a bit now. 

Here’s the question: what makes for a good formalization. My answer: a good formalization renders perspicuous the intended interpretation of the theory that it is formalizing.  In other words, a good formalization (among other things) clarifies vagaries that, though not (necessarily) particularly relevant in theoretical practice, constitute areas where understanding is incomplete. A good formalization, therefore, consults the theoretical practice of interest and aims to rationalize it through formal exposition. Thus formalizing theoretical practice can have several kinds of consequences. Here are three (I’m sure there are others): it might reveal that a practice faces serious problems of one kind or another due to implicit features of its practice (see Berwick’s note here) (or even possible inconsistency (think Russell on Frege’s program)), or it might lay deeper foundations (and so set new questions) for a practice that is vibrant and healthy (think Hilbert on Geometry), or it may attempt to clarify the conceptual bases of a widespread practice (think Frege and Russell/Whitehead on the foundations of arithmetic). At any rate, on this conception, it is always legit to ask if the formalization has in fact captured the practice accurately. Formalizations are judged against the accuracy of their depictions of the theory of interest, not vice versa.

Now rendering the intended interpretation of a practice is not at all easy.  The reason is that most practices (at least in the sciences) consist of a pretty well articulated body of doctrine (a relatively explicit theoretical tradition) and an oral tradition. This is as true for the Minimalist Program (MP) as for any other empirical practice. The explicit tradition involves the rules (e.g. Merge) and restrictions on them (e.g. Shortest Attract/Move, Subjacency). The oral tradition includes (partially inchoate) assumptions about what the class of admissible features are, what a lexical item is, how to draw the functional/lexical distinction, how to understand thematic notions like ‘agent,’ ‘theme,’ etc.  The written tradition relies on the oral one to launch explanations: e.g. thematic roles are used by UTAH to project vP structure, which in turn feeds into a specification of the class of licit dependencies as described by the rules and the conditions on them. Now in general, the inchoate assumptions of the oral tradition are good enough to serve various explanatory ends, for there is wide agreement on how they are to be applied in practice. So for example, in the thread to A Cute Example (here) what I (and, I believe David A) found hard to understand in Alex C’s remarks revolved around how he was conceptualizing the class of possible features. Thomas G came to the rescue and explained what kinds of features Alex C likely had in mind:

"Is the idea that to get round 'No Complex Values', you add an extra feature each time you want to encode a non-local selectional relation? (so you'd encode a verb that selects an N which selects a P with [V, +F] and a verb that selects an N which selects a C with [V, +G], etc)?"

Yes, that's pretty much it. Usually one just splits V into two categories V_F and V_G, but that's just a notational variant of what you have in mind.

Now, this really does clarify things. How? Well, for people like me, these kinds of features fall outside the pale of our oral tradition, i.e. nobody would suggest using such contrived items/features to drive a derivation. They are deeply unlike the garden variety features we standardly invoke (e.g. +Wh, case, phi, etc.) and, so far as I can tell, restricted to these kinds of features, the problem Alex C notes does not seem to arise.

Does this mean that all is well in the MP world?  Yes and No.

Yes, in the sense that Alex C’s worries, though logically on point, are weak in a standard MP (or GB) context for nobody supposes the kinds of features he is using to generate the problem exist. This is why I still find the adverb fronting argument convincing and dispositive with regard to the learnability concerns it was deployed to address. Though I may not know how to completely identify the feature malefactors that Thomas G describes, I am pretty sure that nothing like them is part of a standard MPish account of anything. [1] For the problem Alex C identifies to be actually worrisome (rather than just be possibly so) would require showing that the run of the mill, every day, garden variety features that MPers use daily could generate trouble, not features that practitioners would reject as “nutty” could.[2]

No, because it would be nice to know how to define these possible “nutty” features out of existence and not merely rely on inchoate aspects of the oral tradition to block them. If we could provide an explicit definition for what counts as a legit feature (and what not) then we will have learned something of theoretical interest even if it fails to have much of an impact on the practice as a result the latter’s phobia for such kinds of features to begin with. Let me be clear here: this is definitely a worthwhile thing to do and I hope that someone figures out a way to do this. However, I doubt that it will (or should) significantly alter the conclusion concerning FL/UG like those animated by Chomsky’s “cute example.” [3]  

[1] Alex D makes this same point (here), and I agree:
I completely agree with you that if we reject P&P, the evaluation measure ought to receive a lot more attention. However, in the case of trivial POS argument such as subject/aux inversion, I think the argument can be profitably run without having a precise theory of the evaluation measure.
[2] Bob Berwick’s comment (here) shows how to do this in the context of GPSG and HPSG style grammars. These systems rely on complex features to do what transformations do in GB style theories. What Bob notes is that in the context of features endowed with these capacities, serious problems arise. The take home message: don’t allow such features.
[3] This seconds David Adger’s conclusion in the comment section (here). Let me quote:
I am convinced by what I think is a slightly different argument, which is that you can use another technology to get the dependency to work (passing up features), but that just means our theory should be ruling out that technology in these cases. as the core explanandum (i.e. the generalization) still needs explained. I think that makes sense…


  1. I don't think restricting the set of features is a good solution, for several reasons. In increasing order of importance:

    1) It obscures the inner workings of the formalism. Restricting the set of category features in MGs is not like Ristad's restrictions of the GPSG feature calculus. Ristad really prunes down the complexity of feature expressions. But category features in MGs are already atomic, flat units without any structure to them. Writing V_F and V_G is just a convenient way for constructing the proof, to the grammar those features are just as different as C and T. So all you can do is restrict the size of the set of category features, and that's like studying only numbers between -17 and +17, you deprive yourself of the opportunity to say anything interesting about integers vs rationals vs reals.

    2) Just like formal universals are preferable to substantive universals, we should first try to patch up the feature checking mechanism. The feature coding strategy only works because subcategorization is in a sense symmetric: A needs to be selected to enter the derivation, and B needs an argument of category A.
    But this kind of pattern does not hold for adjuncts, for example. B does not need to be adjoined to, and an adjunct does not need to be selected. The standard MG implementation of adjunction captures this via a variant of Merge in which feature checking is asymmetric --- the adjunct has an adjunction feature that must match the category feature of the phrase it adjoins to, but only the adjunction feature is deleted. Move can no longer be reliably emulated via Merge in this case. I have argued recently that this asymmetry even derives the Adjunct Island Constraint.

    3) You are presupposing that features are tangible objects that can easily be distinguished from other parts of the grammar. But the feature coding result not only tells us that constraints can be represented as features, it also tells us that features represent constraints, the two are interchangeable from a technical perspective. So whatever set of features I start out with, I can reduce it to a fixed one by adding constraints to the grammar.

    4) The previous argument raises an even bigger issue. Mainstream minimalism does not just consist of Merge and Move and some features, there are tons of additional constraints such as the Specifier Island Constraint that started this whole discussion.
    So let's assume that the SPIC holds for all MGs, and now let's look at it from the perspective of feature coding. We know that how many new features you need to encode a constraint through Merge varies between grammars even if they started out with the same set of features. So a universal SPIC actually entails that not all grammars have the same set of features if viewed from the perspective of feature refinement. So if you want to have a fixed set of features, it must include every feature that is included in some grammar. But since not every feature occurs in every MGs, some MGs will have "unused" features that we can exploit to encode new constraints and we are back to square 1.

    Imho the issue is not really with the set of features, it is with the way we currently allow information to flow through the tree via Merge. You cannot stop that in a conceptually coherent way by assuming a fixed set of features because the whole notion of feature has become ephemeral in MGs. Constraints are features, features are constraints. So ultimately, we are dealing with a constraint problem, not a feature problem, and I think that's how we should try to solve it.

    1. @Thomas. I just wanted to ask you if you think that your analysis of adjunct islands could somehow be extended to the SPIC. Intuitively it seems that the same sort of general principle applies. Very roughly, making changes to material within a specifier should not affect whether or not the containing structure is licit. Do you foresee any problems here, or would it be a straightforward extension? I can't immediately see any other way to tackle the problem within the grammatical formalism itself. We know that verbs and nouns can select for CPs with heads that have a particular feature. We know that C-T-V and D-N enter into selectional dependencies, and that T enters into dependencies with its specifier. So it seems that any grammatical formalism which permits those dependencies is going to have a hard time preventing dependencies percolating up from within [Spec,TP] (except maybe by ad-hoc constraints like limiting the number of features). This is why I have the hunch that we might want to offload the problem onto a theory of the evaluation measure. But it's obviously worth exploring the possibility of tightening up the formalism if that's feasible.

    2. It might be, but at this point I don't see how. The AIC simply follows from the fact that we provably cannot extract from X if the following two properties hold:
      1) Movement involves checking a feature at the target site.
      2) X is optional (in a specific technical sense).

      Specifiers do not satisfy property 2, so we cannot derive the SPIC this way. However, the notion of optionality is still very coarse and needs to be refined for empirical reasons anyway (some optional adjuncts allow extraction, some apparently non-optional ones do not). It might be the case that once we have a better grip of how to understand optionality the SPIC will be derivable, too.

      I have to admit, though, that I cannot think of any deep properties of specifiers that are similar to the optionality of adjuncts. In particular, how do XPs in specifier position differ from XPs in complement position?

    3. There is one difference, but it relies on the species of features being checked. Johannes Jurka, following Wexler and Culicover, notes that it is not specifies that are islands but those in their criterial position, e.g. Case. This is an old idea. I believe that Diesing makes a similar point. At any rate, if one makes reference to these things, distinctions are possible. But this means getting into the featureless gutter.

  2. There is a useful difference between "being precise" and "being formalized".
    The examples you give are all cases where a perfectly precise theory (naive set theory, plane geometry, arithmetic) were formalized and some problems appeared or didn't appear. It's notable also that none of these are *scientific* examples.

    But there is no vagueness or imprecision in the pretheoretic notion of an integer or of addition.

    So take a linguistic example -- your bog standard tree. So there are several different ways of formalizing the notion of a tree -- terms over ranked alphabet or Gorn style tree domains , and no doubt some others -- but they don't matter. What matters is that the idea of a tree is perfectly precise, and maybe you need to specify what the labels are and so on, but basically it just doesn't matter which formalization you use (footnote: probably there are some areas where it might matter but lets leave them to one side).

    The problem we have been discussing here is nothing to do with formalization and everything to do with precision. The "theory" you are talking about is just not specified precisely enough for it to have any empirical content.

    Whether the set of features is universal and innate (and if so what they are, and if not where they come from) is not some theoretically uninteresting detail of the formalization (like the tree formalization issue): this is a centrally important point of the theory.

    So sure, you can start adding a whole lot of extra stuff to your theory, but that makes it a different theory -- more complex, maybe less evolutionarily plausible etc etc.
    What irritates me is the shifting between a clean simple and elegant theory that just doesn't explain what you claim it does, and some hugely complex ad hoc cobbled together theory that perhaps might (though I have my doubts). These are just two different theories, and you can't cherry-pick the positive features of both and pretend it is a consistent theory.

    1. Alex, I'm not so sure about the ``any empirical content'' part seeing how pretty much no syntactic proposal in the literature pushes MGs beyond the PMCFL boundary. Now which subclass of the PMCFLs is generated depends on assumptions such as the set of features, types of movement and so on. So yes, every linguist makes slightly different assumptions and those probably result in slightly different subclasses being generated, but at least the upper bound seems very stable.

      I also noticed that throughout the discussion you never fully explained to the non-technical part of the crowd why it makes sense to put at least some universals into the learner: the whole class of MG string languages is not learnable under any reasonable learning paradigm. Some grammars will never be entertained by the learner, so we do not have to worry about blocking them from the set of possible grammars. The class of string languages generated by MGs with the SPIC is exactly the class of monadic branching MCFLs. Now if the class of learnable MCFLs is a subclass of that, every learnable MG has the SPIC. Right, no? Or have I completely lost track of what we're actually arguing about?

    2. Frege might not have agreed that his favorite examples were not scientific, or that there was no imprecision in 19th century uses--in mathematics, or elsewhere--of 'number' and 'function'. But I agree that "being precise" and "being formalized" are very very different. And I have a LOT of sympathy for Alex's last paragraph here. My own interests make me think about practice in semantics, where formalization has been common (perhaps even excessive), but there has arguably been less clarity--much less precision--about the empirical content of proposals.

      (Side point, and pace Norbert: I'm suspicious of intended interpretations. My intentions are usually very imprecise and not to be trusted. I think of formalization as partly an invitation to be explicit about how one wants to interpret the metalanguage for purposes of theorizing...even if that means consigning prior intentions to flames.)

      If one is inclined to "cry foul" in reply to some of the machinery that gets invoked to save certain accounts of meaning from apparent counterexamples, or to make their derivational trains run on time, it can be useful to ask what that "inner referee" would say when one starts adding "extra stuff" to a simple model of syntax (to accommodate annoying facts and/or get its derivational trains run on time).

    3. What you're saying, Thomas, is really quite subtle. As you of course know, most of the restrictiveness results on Minimalist Grammars rely on Stabler's SMC, or, more generally, on a finite upper bound on the number of possible movers. This restriction is not enforced by any of the constraints on the linguistic market, as far as I can see. The reduction by Salvati of membership to proof search in (multiplicative) linear logic (decidability: unknown) goes through under (what I take to be) the standard notion of `attract closest' or `shortest move'. And even with the SMC Michaelis and I have shown that late operations can do surprising things. Some obvious ways of implementing feature percolation also result in unrestrictiveness. So, I think it is important to distinguish between the well-behaved formalism (or family of formalisms if you'd like) of MGs, and the wealth of proposals that could very well result in unrestrictiveness which seem to fall under the `intended interpretation' of minimalist practice.

    4. I think Greg makes an important point here about the (mis)match between the SMC and "constraints on the linguistic market" or the "intended interpretation" of the theory. This is (I think; correct me if I'm wrong) the same point as I was getting at in a related discussion a while back. I added this comment to that earlier discussion long after the thread had effectively died, but I'd be very curious what you guys think of it.

    5. [Regarding Tim's linked comment] I guess topicalization might be a more straightforward example than QR because it actually affects the string language. As far as I can tell there's no limit in principle on the number of adverbials which can be topicalized, so you could make the same point but without the complicating issue of covert movement.

    6. Yes, I should have stated my point above more carefully. This is unfamiliar territory for a significant part of the audience, so it's easy to accidentally spread misinformation. Thanks for the damage control.

      However, I do remain unconvinced that there are any real challenges to my claim. I think that's mostly due to us having different perspectives as to how we should conduct our "literature exegesis", but let's look at the technical details first.

      If I remember correctly, the feature percolation proof hinges on percolation being unbounded and LIs being allowed to carry several occurrences of the same LI --- I can't think of any feature percolation analysis in the literature that uses both assumptions. Late Adjunction is unproblematic if we only allow it right above Move nodes, which seems to suffice in most cases (I am not sufficiently familiar with the Wholesale Late Merge idea to tell what is going on there).

      The stronger argument against my claim, I think, is QR, topicalization, and also wh-movement and scrambling. Not because they are empirical challenges, but because linguists might indeed be tempted to analyze them in a way such that the formalism would break the PMCFL boundary. But in this case we should ask ourselves whether they are truly committed to all the technical minutia of their account. That is to say, if I presented an alternative version that mirrors the spirit of their analysis but differs slightly on a technical level, would they reject my modification as a misrepresentation of their ideas?

      Take transderivational constraints. If we implemented them the way they are stated in the literature, then they would be an extremely powerful device. But if we actually look at how they are used, it turns out that we can use a formalization that's a lot simpler and still gets the work done in exactly the same fashion. A similar example is SPE, which by the letter is completely unrestricted but is actually finite-state the way it's being used by phonologists.

      tldr: Syntacticians sometimes introduce very powerful devices even if they only use a small portion of that power. In such a case, we should look at how the device is being used, not how it is defined.

    7. Am I wrong or are there two related but quite different issues here ...

      First, as Thomas says, there are mechanisms in the literature that have "hidden power", in the sense that their power goes beyond the ways those mechanisms are generally used. In this situation, it's nice to try to reign things in by identifying a mechanism that is only just as powerful as is necessary (maybe we could even say "just as powerful as was *intended*"). Thomas's stuff on transderivational constraints is this kind of situation. This is a mismatch between the intended use of a device and the formal power of the device.

      Second, what I understand Greg to be saying is that there's a mismatch between (a) constraints that are assumed in nearly all of the mathematical work on MGs (some bound on the number of movers, e.g. the SMC), and (b) constraints that are generally adopted out there in the wild. And I don't think this is because the constraints in the wild are "accidentally over-powered" in the way that Thomas mentions.

      Is this the way people see the lay of the land?

    8. I think that's right, Tim. I believe that, methodologically, we do well to attempt to describe phenomena with the fewest possible resources (measured formally in terms of computational or language theoretic complexity). I think that the mechanisms and generalizations postulated by linguists are sometimes overpowered in this respect, and that the `same' effect can be achieved with less. I think Thomas agrees with all of this. (Actually, I don't think we disagree on any points brought up in this discussion.)
      I think that your terminology ("just as powerful as was *intended*", "accidentally over-powered") is presupposing an answer to the dispute that Norbert and Alex were having previously. Clearly no one is thinking, when postulating a generalization or mechanism, about these crazy `tricks'; for example, how to simulate turing machines if you have the SPIC-move but no finite upper bound on the number of movers (-SMC). Is it reasonable to say, though, that they didn't `intend' to postulate that mechanism, but some other? I think rather that linguists in the post-P&P tradition are really implicitly making a lot of non-formal (i.e. substantive) assumptions about the class of possible languages, which are not reflected in the theory (which has been formalized as MGs). In particular, I mean the idea that not only are the basic categories identical across languages, but also that their hierarchy (fseqs in Starke's terminology) and the gross featural make-up of lexical items is identical. These sorts of assumptions are necessary to draw the universal inferences (`voice and v are different in all languages' because they are different in Yaqui according to analysis X) that people are used to making. In the context of these assumptions, it is unimportant that mechanism Y makes the theory über-powerful; we only really care about those (finitely many) languages that are licensed by these implicit assumptions. I think that the implicitness of these assumptions was what made it at first difficult in `Why Formalize (I)' for Norbert, AlexD, and David to understand what on earth AlexC was talking about; he was not making these assumptions, and was not announcing his not making them.
      I think one way of understanding what AlexC is after is a way to derive appropriate substantive universals from the learning algorithm.

    9. This comment has been removed by the author.

    10. [Tidied up and toned down deleted comment.]

      I'm still having some difficulty seeing what Tim and Greg are getting at because we haven't yet seen many concrete examples of overpowered mechanisms. I'm going to try to give an example myself (building on Tim's) so that Greg and Tim can tell me if it bears any resemblance to the sort of examples they have in mind.

      Topicalization out of embedded clauses is possible, and multiple argument PPs can topicalize. Assume that topicalization is movement. In principle, it should therefore be possible to have a sentence of the form T1 T2 T3, ... Tn [ ... t1 ... [ ... t2 ... [ ... t3 ... ... tn ... ] ] ] for arbitrary n. Since finite state automata can't count, it's clearly impossible to check that the number of topics matches the number of traces if the derivation tree language is regular and the derivation trees are structured in roughly the same way as the derived trees (so that you can't just pair off topics and traces one-by-one). Moreover, given that all of the topics could in principle be obligatory argument PPs, this may even have implications for the classification of the string language. I.e., it's not immediately obvious that there is an MG "covering grammar" which generates the right strings (while failing to encode the correct structural relations between topics and traces).

      Now, if that is the sort of case under consideration, I remain unsure what exactly the concern is. Natural languages do seem to permit patterns of dependencies of this sort. If there is in fact a way to associate each topic with its original clause while keeping to a regular derivation tree language (or while satisfying whatever other formal desiderata might be relevant) then well and good. If there isn't, then it seems that natural languages are just not formally very well behaved in this respect and the "over-powered" theories may not in fact be over-powered at all.

    11. Greg and Tim aren't worried that the SMC might be empirically wrong (at least not in the discussion here). Their point is that I replied to Alex C's claim that the hidden assumptions of Minimalism deprive it of any empirical content by pointing out that MGs stay within the class of PMCFLs even with those hidden assumptions --- which is problematic because the relation between Minimalism and MGs isn't that straightforward.

      There's two issues:
      1) Even with the SMC, some devices blow up the power of MGs (feature percolation, Late Adjunction). I'm not convinced by those cases because they are not in line with linguistic usage.
      2) Without the SMC the impact of adding constraints and devices is rather unpredictable, and the SMC is an odd constraint that isn't entertained by any linguist. Now this doesn't worry me because the SMC is perfectly compatible with 99% of all analyses I have seen in the wild. The only potential counterexamples are analyses of multiple wh-movement, scrambling, QR and topicalization, like the one you are sketching above.

      Now if some syntactician took a firm stance that these operations must involve movement of the individual DPs and that every other analysis is fundamentally flawed and that the process is truly unbounded, and if this was actually the majority opinion in the field, then my reply to Alex would be a lot harder to defend. But in my experience, if you tell people that there are conceptual problems with this analysis and present some plausible alternative, they are i) surprised that anybody would seriously worry about cases with more than 3 or 4 DPs, and ii) don't really care which one of the two is right unless it's their very own analysis you are arguing against. So the great majority of the field would not run into serious issues if they were suddenly teleported into a parallel universe where they are obliged by law to frame their analyses in terms of MGs.

    12. @Thomas. That all makes sense to me. The only thing is that I don't quite see what would be the big deal about a syntactician saying that all the topics move independently and the process is unbounded. Ok, if that's the case, then UG taken in abstraction from all the other constraints which apply in practice would permit a very unrestricted set of grammars. That might not be a pretty result, formally speaking, but is there a good reason to think that's not the way things actually are?

    13. I think this is just standard scientific procedure: If removing property X from a theory results in something that is not as well-behaved, harder to study, and makes fewer interesting claims, then X should be dispensed with only if absolutely necessary.

      At which point this becomes necessary is a murky issue, of course. But in the case of the SMC, nobody has given a demonstration that it is empirically untenable. There's several phenomena that are problematic under a specific interpretation of the data in combination with a specific analysis, but they have not been studied in sufficient detail.

      Take topicalization. Unbounded topicalization of adjuncts is unproblematic because adjuncts have no in-situ depedency (it doesn't matter syntactically whether they leave a trace because nothing depends on this trace). So who's to say that adjunct topicalization involves Move?
      Moreover, multiple topicalized adjuncts could be given a covert coordination analysis, so that we could use a clustering analysis:

      (1) Yesterday (and) full of glee I killed my wife.

      And mandatory adjuncts apparently cannot be topicalized.

      (2) * Well this book reads.

      So this leaves us with topicalization of arguments, which does not seem to be unbounded:
      (3a) Bill John told Peter that Mary likes
      (3b) Peter John told that Mary likes Bill (that's grammatical, right?)
      (3c) *Peter Bill John told that Mary likes.

      Similar things happen with scrambling where it is very hard to have long-distance scrambling of multiple DPs that have the same case and animacy. QR is very difficult because nobody really knows how many readings are available once you deal with more than 3 quantifiers, the data judgements are just too hard. This leaves us with multiple wh-movement, which has an alternative MG account in terms of clustering (Gartner&Michaelis 2010).

    14. @Thomas. I think it's absolutely worthwhile to try to come up with alternative analyses of these cases and see how they work out. I was more asking the hypothetical question of whether it would really be so bad if it turned out that there were no good alternative analyses.

      Unbounded topicalization of adjuncts is unproblematic because adjuncts have no in-situ depedency (it doesn't matter syntactically whether they leave a trace because nothing depends on this trace)

      Actually it does appear to matter because topicalized adjuncts show Condition C reconstruction effects. In (1), the pronoun can refer to John only if the PP has moved from the matrix clause:

      (1) In front of John's mother, Mary said that he talks frequently.

      Maybe the binding constraints themselves aren't part of the grammar. But if the derivation tree for the sentence doesn't somehow link the adjunct to its clause of origin, it's hard to see how the Cond C effect in (1) can be explained.

      Moreover, multiple topicalized adjuncts could be given a covert coordination analysis,

      Only if they all topicalize from the same clause, but in principle there can be another layer of clausal embedding for each adjunct. Maybe it's not actually possible to have two topicalized adjuncts in the matrix where one has moved from the matrix and one has moved from the embedded clause. The judgments are kind of iffy.

      So this leaves us with topicalization of arguments, which does not seem to be unbounded.

      This is certainly true for DP arguments. I think it's probably true for PP and CP arguments too but it's a bit harder to tell.

    15. [W]ould [it] really be so bad if it turned out that there were no good alternative analyses
      Well my poor little heart would be broken ;)
      More seriously, no, it would not be the end of the world, linguistics, or even my very own research, but if I had a choice as to which world I want to live in, I'd pick one where all natural languages are underlyingly regular (i.e. compatible with the SMC). In particular because this has turned out to be an assumption that is shared across a variety of formalisms.

      A quick remark on adjuncts: In order to block certain readings, we only have to know possible origins of the adjunct, which is rather easy. Semantics can then restrict the set of possible origins even further.

    16. Alex D wrote: "The only thing is that I don't quite see what would be the big deal about a syntactician saying that all the topics move independently and the process is unbounded. Ok, if that's the case, then UG taken in abstraction from all the other constraints which apply in practice would permit a very unrestricted set of grammars. That might not be a pretty result, formally speaking, but is there a good reason to think that's not the way things actually are?"

      I agree: there's no particularly good reason to think that's not the way things actually are, and there's no reason to think that the world would end if they were indeed that way. My point was just a rather mundane one: to the extent that Minimalism-at-large assumes that there is no such bound on movers, we should be careful about assuming that properties of MGs (efficient parsability, re-encoding of constraints in the feature system, availability of nice probability models, whatever) carry over to Minimalism-at-large.

      (A slightly different point, of course, is the following: to the extent that natural language grammars really do not have a bound on movers, so much the worse for MGs as an empirical hypothesis. Arguably that's a more important point, because it's a "scientific" point rather than a "merely sociological" one, but personally I think we'd do well to keep the sociological one in mind a bit more.)

    17. @Thomas. I'm actually surprised that you think that the SMC is compatible with 99% of linguistic analyses (aside from A-bar movement cases). It may very well be that my gut feel is guiding me astray. However, to take a simple case, the usual VP-internal subject hypothesis is incompatible with the SMC. (This is where both S and O originate below the object case position.) Now, there is an alternative analysis (Koizumi/Koopman&Sportiche) which fares better, but it (or so it seems to me) is not canonical. Maybe by `compatible with the SMC' you mean something like `one can come up with another analysis which seems to preserve the main ideas of the original one but which doesn't run afoul of the SMC'? I confess to finding it mildly astounding that being fairly straightforwardly in accord with the SMC and being an A-movement analysis seem to overlap to the extent they do.

      @AlexD. You are of course right. There is no a priori reason even to think that language has a type I description in Marr's sense. It is, however, a remarkable fact that the linguistic data we actually have access to, the sequences of words making up well-formed sentences (I know, even this is a stretch), all have a very restrictive property; they are in the complexity class LOGCFL \subseteq PTIME. This is not controversial. It is still up for grabs whether they share even more restrictive properties (monadic branching MCFL, well-nested MCFL, MCFL, etc). I will call this restrictive property they share `P'. Why might this be? It could be a grammatical fact; the learner simply cannot hypothesize outside this class. It could however be a grammatical accident; maybe the other possible hypotheses of the learner are not evolutionary stable - they do not serve the needs of efficient communication well (for whatever reason). (I think of Simon Kirby's group as having explored this direction a bit.)

      Now, this is just looking at sets of strings. Of course, strings convey meanings in contexts. (Now we already have a foundational problem because no one knows what meanings are, much less how they should be represented (mentally or formally).) This is one justification for a particular structural representation. Now, it sometimes turns out (e.g. my work with Jens Michaelis on CoTAGs and late adjunction) that the mechanisms proposed to associate structures with strings would also allow the learner to hypothesize (string) languages which are no longer have property P. There are two responses. We could say, so much the worse for the grammatical explanation of P-ness (as I read you as asking about). Or we could instead try to hold on to the grammatical explanation of P-ness and re-evaluate our structures. I don't know of a principled way to say which is better. Note, of course, that it is (or seems to be) a brute fact that all known languages have P, and as such it demands an explanation. Taking the grammar-based route gives us not only an explanation thereof, but also helps us design learning algorithms. Taking the non-grammar-based route amounts to writing a big explanatory IOU.

    18. @Greg. Those are all excellent points (and also thanks for the reminder below that we shouldn't lump the different types of SPICs together). My definition of what it means to be compatible with the SMC is more relaxed than yours.

      First, let's make sure that I'm not misconstruing your example: The VP-internal subject hypothesis is incompatible with the SMC under a strict interpretation because we have an object with a case feature that is checked after the subject is introduced, but the subject carries a case feature, too, so the derivation is aborted due to the SMC before the object even has an opportunity to get its case feature checked.

      I don't consider this scenario incompatible with the SMC because it can easily be fixed in a principled manner. Suppose we had an MG toolkit that allowed linguists to design syntactic analyses by assigning features to lexical items and specifying the order in which they need to be checked. It also allows them to activate a number of constraints. Once they're done, the toolkit automatically converts this grammar into an MG that is then used to test the predictions for a given data corpus.

      How would our toolkit deal with the VP-internal subject hypothesis? It would design a grammar with two case features case_1 and case_2, instantiate all locality conditions on Move in such a way that case_1 and case_2 are treated as the same kind of feature, and then enforce the locality conditions through Merge. None of this would be visible to the linguists using the toolkit, and the strategy will work for 99% of all analyses. So if we ran a linguistic version of the Turing test with our toolkit, the majority of linguists would not be able to tell that the underlying grammar relies on the SMC. Only the topicalization/wh-movement/QR/scrambling cases we have discussed would cause the toolkit to bomb out.

      So that's what I mean by SMC-compatible: that we can design "high-level MGs" that accommodate almost every idea a syntactician might throw at them and still maintain backwards-compatibility with the low-level SMC-MGs.

  3. I am not sure about the technical part of that as the learning theory doesn't necessarily guarantee that the languages would be represented by monadic branching MCFGs, even if the languages are all monadic branching MCFLs.
    Indeed I think there are even finite languages where one would learn a non-monadic grammar.

    But you are right, no empirical content is overreaching -- I got carried away by my rant.

    So what *are* we arguing about? The origin of structure dependence of syntax (SDS), and in particular whether it is
    learned or innate.
    The claim was that the MP had an explanation of SDS because it had a hypothesis class that excluded the structure independent
    hypothesis -- and this is the standard Chomsky line going all the way back to when it was framed in terms of transformations of surface sentences -- front the first auxiliary.. etc.
    And I think it is clear now that the current theories of minimalist syntax do not exclude the structure independent (SI) rules in any meaningful way.
    So one answer is to say that the hypothesis class includes both SI and SD options, and the child learns which ones are which.
    And there are quite precise models of how this might happen.
    Then you need to explain why natural languages only use the SD option, and there are a number of options there; probably it arises out of some Herb Simon style functional considerations interacting with some biases of the learner. But that is a different problem, and one I am not very interested in, at least at the moment.

    And the other is to say that we need to make the hypothesis class smaller in some way; which is an approach that in my view is unlikely to succeed. Because it is I think mathematically impossible to express that sort of language theoretic property through a grammar theoretic restriction -- I think I mentioned this earlier to Benjamin B -- for example, you can't define a restriction that gives you the class of CFGs that generate non-regular languages. So I think you can't define the class of MGs that only give you SD rules, even if you could make the notion of a SD rule precise. (Define here mean having a decidable property on the grammars).

    So that is the real reason I think you want to put it in the learner rather than in UG.

    1. Okay, so I had indeed lost track of part of the discussion. I was mostly focused on the validity of the SPIC as a language universal. So the thought experiment I was trying to describe above actually goes along the following lines:

      1) Suppose the class of learnable MG languages is a subclass of the monadic branching MCFLs.
      2) Then from the perspective of a linguist who thinks in terms of tree structures and constituency, i.e. SD regularities, it will look like the SPIC is a universal restriction on MGs, when in fact it is an epiphenomenon of certain learnability requirements.

      This does not address the bigger issue of structure (in)dependence, of course, but it shows how structural generalizations can be pushed into the learner rather than UG.

    2. Speaking just for myself, I would love to discover that SPIC phenomena are epiphenomena of "certain learnability requirements." I am just as happy finding out that SPIC is what one gets because rules/operations that violate it is not acquirable because of the way the learner generalizes as I would be finding out that it is a hard constraint on shapes of grammars. What I want, however, is not merely that it hasn't been acquired because the relevant data was absent but that it is impossible (or very unlikely) to be acquired because of fundamental properties of how learning takes place. One of these kinds of explanations would suit me just fine. Do you have one? If so, let;s discuss it.

    3. I don't think we have a good answer at the moment, but Alex C might have suggestions.

      There are many different learnability paradigms (Gold, PAC, MAT) that differ on what kind of information is available to the learner and to which degree of accuracy the target language has to be learned. The full class of all (P)MCFLs is not learnable under any of these paradigms, so learnability considerations do filter out some grammars. Alex C and Ryo Yoshinaka have provided a learner for a subclass of MCFLs, and I believe Ryo is also working with Ed Stabler on giving an MG characterization of this subclass. Until that is in place, though, we are left with speculations and though experiments.

  4. And I think it is clear now that the current theories of minimalist syntax do not exclude the structure independent (SI) rules in any meaningful way.

    This is where I don't quite follow. I can see two ways of encoding structure-independent rules in MGs. One would be to have a grammar generating a uniformly right-branching derivation tree language, so that there was no real distinction between hierarchy and linear order. The other would be to sneak in linear relations via selection. (I haven't worked out the details, but I guess you could probably use selectional features to run a finite state machine over the linear sequence of heads, and this would presumably be sufficient to correlate the presence of a +Q head with inversion of the linearly first auxiliary .) The first method depends on the learner adopting a structural hypothesis which is completely inadequate (the size of the grammar will balloon as soon as they encounter e.g. complex subjects, and in any case, uniformly-branching hypotheses are plausibly ruled out by UG). The second method requires the addition of enormous numbers of spurious features which would be punished by any conceivable evaluation mechanism. So, not surprisingly, as we get into the technical details, claims need to be stated in a more nuanced and sophisticated way. "UG does not permit rules of this type" turns out to mean "UG does not permit rules of this type unless you also have (i) uniformly-branching tree structures (which are probably blocked by UG too and which in any case will eventually result in a huge blow up in the size of the grammar) or (ii) a huge lexicon".

    In other words, yes, you can have linear rules, but what you can't have is linear rules and a structure that isn't completely wrong and a reasonably small lexicon. So, in effect you can't have them at all. And you can't have them because UG makes it a real PITA to encode what is on the face of it a perfectly reasonable hypothesis.

    1. Yes, that's on the right track I think. So sure you can have a MG that forms questions by inserting a particle at some fixed linear position in the surface string, but that is going to be a big grammar if it inserts it at position 4 and quite small if it is initial or final, and that is probably why we see particles added at the end (e.g. Japanese) and never (AFAIK) at position 4.

      (You probably can't have a uniform right branching tree structure for various language theoretic reasons, depending on the details)

      And so if you have your UG being MGs and you have some reasonable learning mechanism, then that would explain the observed universal. But you need to have that reasonable learning mechanism.

      Then the question becomes: is UG just MGs or is UG MGs with additional principles that limit it further? Or is it just say some much larger class of grammars that can represent say all PTIME languages? And is the learner a general Bayesian learner?

    2. At this level of abstraction, I'm not sure that you would get lots of push back, as Alx D noted a while ago. I am pretty sure that if one had a learning theory that had as a consequence that learning rules that extracted expressions from SPICs was impossible or very costly then this would be warmly received in my part of the world. Do you have one such? If so, what is it?

    3. Sorry for delay. So the question is about whether one could predict SPIC from learnability considerations.
      So here is an apparently unhelpful answer that may shed some light on this issue.
      So is there any difference between MGs with SPIC and MGs without SPIC?
      Well yes, depending on the ancillary assumptions (i.e. the SMC) , Kobele and Michaelis (2005) show there is a huge difference. Without SPIC, MGs are Turing complete .. so there is a huge computational difference. So one bad argument would be that there is therefore a strong computational argument in favour of SPIC. But I don't see it. It is an argument in favour of some restriction but not specifically in favour of SPIC.
      So that is the unhelpful answer.

      Maybe to get a helpful answer we need to spell out SPIC behaviorally? Because the standard version of the SPIC doesn't actually blog everything that you want it to block. It blocks one particular structural configuration, but we don't observe the structural configurations, and it may not block other structural configurations that generate things that might look like SPIC violations.So if we could spell it out more precisely, then we could get a handle both on whether it is predictable from a learning model, and also more fundamentally, whether it actually has empirical content, or whether it is something like "Merge is always binary branching".

    4. Hi Alex, minor typo: the landscape is +SMC = +SMC,+SPIC-mv = MCFL, +SMC,+SPIC-mrg = Monadic Branching MCFL, -SMC,+SPIC-mv = -SMC,+SPIC = Type-0, -SMC,-SPIC = maybe type-0, maybe not (Salvati).

    5. The relevance of the above is:
      1) with usual assumptions about argument structure, the SPIC on merger is untenable in MGs
      2) the SPIC on movement (a la Norbert's no movement out from inside a DP in a case position) does not change the WGC of MGs in the context of the SMC.

    6. It seems like the SPIC is not doing what it is meant to, namely to express some universal about subject islands.

      It reminds me of some of the remarks by Quine in his 1970 paper, "Methodological Reflections on Current Linguistic Theory" where he says

      "The trouble is that there are extensionally equivalent grammars. Timely reflection on method and evidence should tend to stifle much of the talk of linguistic universals. "

    7. @Alex C. I suspect that trying to reconstruct the subject island constraint as a purely formal constraint is misguided. Roughly speaking it does appear to be subjects which are islands, and ‘subject’ is at least partly a substantive rather than a formal notion.

      On the whole it seems that restricting MGs to generating MCFLs requires the addition of constraints which are not linguistically well-motivated, and which sometimes appear lead to undergeneration (e.g. with regard to the QR, topicalization etc. examples discussed).

      Notice that if there is a problem here it is one that arises directly from the facts on the syntactic ground, and it is therefore a problem for everyone. If natural languages are not MCS then that is also too bad for anyone who has a learning theory which can deal only with some subset of the MCS languages.

      There is, however, an asymmetry in terms of who is best placed to solve the problem. If acquirable natural languages do not fall within some fairly constrained formal class, then empiricists are probably screwed. But ‘nativists’ still have substantive universals to fall back on.

    8. I think that's right about the SPIC. The Quine paper I mentioned above has some discussion about a related issue -- actually subject-predicate structure as a universal, so in modern terms the claim that every language has a subject (is that currently considered to be a universal?)
      In order for that to have content, we need to define what a subject is, and the MG definition of subject is far too weak.

      And actually I agree that empiricists are doomed if the facts turn out that natural languages are all over the place. Not the Bayesians of course, since their theory doesn't rely on any specific facts about syntax.

      But I would ask -- is the fact that minimalist syntax will be absolutely fine no matter what the empirical facts turn out to be a good thing or a bad thing?

    9. "in modern terms the claim that every language has a subject (is that currently considered to be a universal?" This is no longer a straightforward question due to the issues of how to define subject. Chris Manning's 1996 book of his 1993 PhD thesis would be a good thing to read about it. According to me, the basic insight about subjects was attained by Paul Schachter in the 1970s, on the basis of his work on Tagalog, to the effect that in Tagalog there are effectively two different grammatical relations with subject-like properties, one the 'Actor' (a-subject for Manning), the other the traditional Philippine 'Focus' (marked with ang, g-subject for Manning). This idea has taken different forms in different grammatical theories (first in Foley & Van Valin's (1984) Role and Reference Grammar, where the g-subject is the `pragmatic peak'), such as in Guilfoyle et al's (1992) 'two subject' analysis of Austronesion (one spec of VP=a-subject, one spec of IP=subject).

      So, there are languages where g-subject arguably doesn't exist, such as Warlpiri, where there is no voice system, but Warlpiri does have evidence for a GR shared by the sole argument of one-participant verbs and most active argument of two/three particicpant verbs (a-subjects). Arguments have also been made to the effect that certain languages that don't have a-subjects (I discuss some of them in the 2007 version of my Major Functions of the NP chapter in the Shopen (ed) volumes on typology & syntactic description), but the evidence from the languages where this has been claimed is very scanty.