Thursday, April 16, 2015

A (shortish) whig history of generative grammar (part 4, the end)

3.     Minimalism: The third epoch

Where are we?  We reviewed how the first period of syntactic research examined how grammars that had to generate an unbounded number of hierarchically organized objects might be structured. It did this by postulating rules whose interactions yielded interesting empirical coverage, generating both a fair number of acceptable sentences and not generating an interesting number of unacceptable sentences. In the process, this early work discovered an impressive number of effects that served as higher-level targets of explanation for subsequent theory. To say the same thing a little pompously, early GG discovered a bunch of “effects” which catalogued deep-seated generalizations characteristic of the products of human Gs.  These effects sometimes fell together as “laws of grammar” and were taken, reasonably, as consequences of the built-in structural properties of FL.

This work set the stage for the second stage of research: a more direct theoretical investigation of the properties of FL. The relevant entrée to this line of investigation was Plato’s Problem: the observation that what native speakers know about their languages far exceeds what they could have learned about it by examining the PLD available to them in the course of language acquisition. Conceptually, addressing Plato’s Problem suggested a two-prong attack: first, radical simplification of the rules that Gs contain and second, enrichment of what FL brings to the task of acquisition. By factoring out the complexity built into previous rules into simple operations like Move a made the language particular rules that were acquired easier to acquire. This simplification, however, threatened generative chaos. The theoretical task was to prevent this. This was accomplished by enriching the innate structure of FL in principled ways. The key theoretical innovation was trace theory.  Traces simplified derivations by making them structure preserving and it allowed for the unification of movement and binding. These theoretical moves addressed the over-generation problem.[1] They also set the stage for contemporary minimalist investigations. We turn to this now.

The main problem with the GB theory of FL from a minimalist perspective is its linguistic specificity.  Here’s what we mean.

Within GB, FL is both very complex and the proposed innate principles and operations are very linguistically specific. The complexity is evident in the modular architecture of the basic GB theory as well as in the specific principles and operations within each module. (26) and (27) reiterate the basic structure of the theory.

  (26)   a. X’ theory of phrase structure
b. Case
c. Theta
d. Movement
      i. Subjacency
      ii. ECP
e. Construal
      i. Binding
f. Control

(27)     DS: X’-rules, Theta Theory, input to T-rules
  |   Move a (T-rules)/ trace theory, output SS
            SS: Case Theory, Subjacency, gamma-marking, BT
             |    |       Move a (covert movement)
           |        |
         PF     LF: BT, *+gamma,

Though some critical relations crosscut (many of) the various modules (e.g. government), the modules each have their own special features. For example, X’ theory traffics in notions like specifier, complement, head, maximal projection, adjunct and bar level. Case theory also singles out heads but distinguishes between those that are case assigning and those that require case. There is also a case filter, case features and case assigning configurations (government). Theta theory also uses government but for the assignment of Q-roles, which are assigned in D-structure by heads and are regulated by the theta criterion; a condition that requires every argument to get one and at most one theta role. Movement exploits another set of concepts and primitives: bounding node/barrier, escape hatch, subjacency principle, antecedent government, head government, g-marking, a.o. Last the construal rules come in four different types, one for PRO, one for local anaphors like reflexives and reciprocals, one for pronouns and one for all the other kinds of DPs, dubbed R-expressions. There is also a specific licensing domain for anaphors and
pronouns, indexing procedures for the specification of syntactic antecedence relations and hierarchical requirements (c-command) between an antecedent and its anaphoric dependent. Furthermore all of these conditions are extrinsically ordered to apply at various derivational levels specified in the T-model.[2]

If the information outlined in (26) and (27) is on the right track, then FL is richly structured with very domain specific (viz. linguistically tuned) information. And though such linguistic specificity is a positive with regard to Plato’s Problem, it raises difficulties when trying to address Darwin’s Problem (i.e. how FL could have arisen from a pre-linguistic cognitive system). Indeed, the logic of the two problems seems to have them pulling in largely opposite directions.  A rich linguistically specific FL plausibly eases the child’s task by restricting what the child needs to use the PLD to acquire.  However, the more cognitively sui generis FL is, the more complicated the evolutionary path to FL.  Thus, from the perspective of Darwin’s problem, we want the operations and principles of FL to be cognitively (or computationally) general and very simple.  It is this tension that modern GG aims to address.

The tension is exacerbated when the evolutionary timeline is considered. The consensus opinion is that humans became linguistically facile about 100,000 years ago and that the capacity that evolved has remained effectively unchanged ever since.  Thus, whatever the addition, it must have been relatively minor (the addition of at most one or two operations/principles). Or, putting this another way, our FL is what you get when you wed (at most) one (or two) linguistically specific features with a cognitively generic brain. 

Threading the Plato’s/Darwin’s problem needle suggests a twofold strategy: (i) Simplify GB by unifying the various FL internal modules and (ii) Show that this simplified FL can be distilled into largely general cognitive an/or computational parts plus (at most) one linguistically specific one.[3]

Before illustrating how this might be managed, note that GB is the target of explanation. In other words, the Minimalist Program (MP) takes GB to be a good model of what FL looks like. It has largely correctly described (in extension) the innate structure of FL. However, the GB description is not fundamental. If MP is realizable, then FL is less linguistically parochial than GB supposes. If MP is realizable, then FL exploits many generic operations and principles (i.e. operations and principles not domain restricted to language) in its linguistic computations. Yet, MP takes GB’s answer to Plato’s Problem to be largely correct though it disagrees with GB about how domain specific the innate architecture of FL is. More concretely, MP agrees with GB that Binding Theory provides a concise description of certain grammatical laws that accurately reflect the structure of FL. But, though GB’s BT accurately describes these laws/effects and correctly distinguishes the data that reflects the innate features of FL from what is acquired, the actual fundamental principles of binding are different from those identified by GB (though these principles are roughly derivable from the less domain specific ones that characterize the structure of FL). (Grandiosely (pehaps)) borrowing terminology common in physics, MP takes GB to be a good effective theory of FL but denies that it is the fundamental theory of FL. A useful practical consequence of this is to take the principles of GB to be targets for derivation by the more fundamental principles that minimalist theories will discover.

That’s the basic idea.  Let’s illustrate with some examples. First let’s do some GB clean-up. If MP is correct, then GB must be much simpler than it appears to be. One way to simplify the model is to ask which features of the T-model are trivially true and which are making substantive claims. Chomsky (1993) argues that whereas PF and LF are obviously part of any theory of grammar, DS and SS are not (see below for why). The former two levels are unexciting features of any conceivable theory, while the latter two are empirically substantive. To make this point another way: PF and LF are conceptually motivated while DS and SS must be empirically motivated. Or, because DS and SS complicate the structure of FL, we should attribute them to FL only if the facts require it.  Note, that this implies that if we can reanalyze the facts that motivate DS and SS in ways that do not require adverting to these two levels, we can simplify the T-model to its bare conceptual minimum.

So what is this minimum? PF and LF simply state that FL interfaces with the conceptual system and the sound system (recall the earlier <m,s> pairs in section 0). This must be true: after all, linguistic products obviously pair meanings and sounds (or motor articulations of some sort). So FL’s products must interact with the thought system and the sound system.  Bluntly put, no theory of grammar that failed to interact with these two cognitive systems would be worth looking at.

This is not so with DS and SS. These are FL internal levels with FL internal properties. DS is where Q-structure and syntax meet. Lexical items are put into X’-formatted phrase markers in a way that directly reflects their thematic contributions. In particular, all and only Q-positions are occupied in DS and the positions they occupy reflect the thematic contributions the expressions make. Thus, DPs understood as being the logical objects of predicates are in syntactic object positions, logical subjects in syntactic subject positions etc.   

Furthermore, DS structure building operations (X’-operations, Lexical Insertion) are different in kind from the Transformations that follow (e.g. Move a) and the T-model stipulates that all DS operations apply before any Move a operation does (viz. DS operations and Move a never interleave). Thus, DS implicitly defines the positions move a can target and it also helps distinguish the various kinds of phonetically empty categories Gs can contain (e.g. PRO, versus traces left by Move a). 

So, DS is hardly innocent: it has distinctive rules, which apply in a certain manner, and produces structures meeting specific structural and thematic conditions.  All of this is non-trivial and coded in a very linguistically proprietary vocabulary.  One reasonable minimalist ambition is to eliminate DS by showing that its rules can be unified with those in the rest of the grammar, that its rules freely mix with other kinds of processes (e.g. movement), and that Q-relations can be defined without the benefit of a special pre-transformational level. 

This is what early minimalist work did. It showed that it was possible to unify phrase structure building rules with movement operations, that both these kinds of processes could interleave and (this is more controversial) that both structure building rules and movement operations could discharge Q-obligations. In other words, that the properties that DS distinguished were not proprietary to DS when looked at in the right way and so DS did not really exist. 

Let’s consider the unification of phrase structure and movement.  X’ theory took a big step in eliminating phrase structure rules by abandoning the distinction between phrase structure rules and lexical insertion operations.  The argument for doing so is that PS rules and lexical insertion processes are highly redundant. In particular, given a set of lexical items with specified thematic relations determines which PS rules must apply to generate the right structures to house them. In effect, the content of the lexical items determines the relevant PS rules. By reconceiving phrase structure as the projection of lexical information, the distinction between lexical insertion and PS rules can be eliminated. X’ theory specifies how to project a given lexical item with given lexical information into a syntactic schema based on its lexical content instead of first generating the phrase structure and then filtering the inappropriate options via lexical insertion.   

Minimalist theories carry this one step further. It unifies the operations that built phrases and the operations that underlie movement.  The unification has been dubbed Merge. What is it? Merge takes two linguistic items X, Y and puts them together in the simplest imaginable way. In particular, it just puts them together: it specifies no order between them, it does not change them in any way when putting them together and it puts them together in all the ways that two things can be put together.  Concretely, it takes two linguistic items and forms them into a set. Let’s see how.

Take the items eat and bagels. ‘Merge (eat, bagels)’ is the operation that forms the set {eat, bagels}. This object, the set, can itself be merged with Noam (i.e. Merge (Noam, {eat,bagels}) to form {Noam, {eat, bagels}}. And we can apply Merge to this too (i.e. Merge (bagels, {Noam, {eat,bagels}}) to get {bagels, {Noam, {eat, bagels}}.  This illustrates the two possible applications of Merge X,Y: The first two instances apply to linguistic elements X and Y where neither contains the other. The last applies to X and Y where one does contain the other (e.g. Y contains X). The first models PS operations like complementation, the second movement operations like Topicalization.  Merge is a very simple rule, arguably (and Chomsky has argued this) the simplest possible rule that derives an unbounded number of structured objects. It is recursive and it is information preserving (e.g. like the Projection Principle was). It unifies phrase building and movement. It also models movement without the benefit of traces. Two occurrences of the same element (i.e. the two instances of bagels above) express the movement dependency. If sufficient, then, Merge is just what we need to simplify the grammar. It simplifies it by uniting phrase building and movement and models movement without resorting to very “grammatiky” element like traces.[4]

Merge has other interesting properties, when combined with plausible generic computational constraints.  For example, as noted, the simplest possible combination operation would leave the combining elements unchanged. Call this the No Tampering Condition (NTC). The NTC clearly echoes the GB Projection Principle in being a conservation principle: objects once created must be conserved. Thus, structure once created cannot be destroyed. Interestingly, the NTC entails some important grammatical generalizations that traces and their licensing conditions had been used to explain before. For example, it is well known that human Gs do not have lowering rules, like (27) (we use traces to mark whence the movement began): [5]

            (27) [ t1…[…[ba1…]…]

The structure in (27) depicts the output of an operation that takes a and moves it down leaving behind an unlicensed trace t1. In GB, the ECP and principle A filtered such derivations out. However, the NTC suffices to achieve the same end without invoking traces (which, recall, MP aims to eliminate as being too “linguistiky”). How so? Lowering rules violate the NTC. In (27), if a lowers into the structure labeled b the constituent that is input to the operation (viz. b without a in it) will not be preserved in the output.  This serves to eliminate this possible class of movement operations without having to resort to traces and their licensing conditions (a good thing given Minimalist ambitions with regard to Darwin’s Problem).

Similarly, we can derive the fact that when movement occurs the moved expression moves to a position that c-commands the original movement cite (another effect derived via the ECP and Principle A in GB). This is illustrated in (28). Recall, Movement is just merging a subpart of a phrase marker with another part. So say we want to move a and combine it with the phrase marked b. Then, unless a merges with the root, the movement will violate the NTC. Thus, if Merge obeys the NTC, movement will always to a c-commanding position. Again a nice result, achieved without the benefit of traces.
(28)  [b ….[….a….]…] à [a [b ….[….a….]…]]

In fact, we can go one step further: the NTC plausibly requires the elimination of traces and their replacement with “copies.” In other words, not only can we replace traces with copies, given the NTC we must do so. The reason is that defining movement as an operation that leaves behind a “trace” violates the NTC if strictly interpreted. In (28), for example, were we to replace the lower a on the right of the derivational arrow with a trace (i.e. [e]1) we will not be conserving the input to the derivation in the output. This violates the NTC. Thus, strengthening the Projection Principle to the NTC eliminates the possibility of traces and requires that we derive earlier trace theoretic results in other ways. Interestingly, as we have shown, the NTC itself already prohibits lowering and requires that movement be to a c-commanding position; two important consequences of the GB theory of trace licensing. Thus, we derive the same results in a more principled fashion. In case you were wondering, this is a very nice result.[6]

In sum, not only can we unify movement and phrase building given a very simple rule like Merge, but arguably the computationally simplest version of it (viz. one that obeys the NTC, a very generic (non language specific) cognitive principle regarding computations) will also derive some of the basic features of movement that traces and their specific licensing conditions accounted for in GB.  In other words, we get many of the benefits of GB movement theory without their language specific apparatus (viz. traces).

We can go further still. In the best of all possible worlds, Merge should be the sole linguistically specific operation in the FL.  That means that the GB relations that the various different modules exploited should all resolve to a single Merge style dependency. In practice, this means that all non-local dependencies should resolve to movement dependencies. For example, binding, control, and case assignment should all be movement dependencies rather than licensed under the more parochial conditions GB assumed.  Once again, we want the GB data to fall out without the linguistiky GB apparatus. 

So, can the modules be unified as expressions of various kinds of movement dependencies? The outlook is promising. Let’s consider a couple of illustrative examples to help fix ideas. Recall, that the idea is that dependencies that in GB are not movement dependencies are now treated as products of movement (which, recall, can be unified with Phrase structure under a common operation Merge).  So, as a baseline, let’s consider a standard case of subject to subject raising (A-movement). The contrast in (29) illustrates the well-known fact that raising from non-finite subjects is possible, while raising from finite subjects is not.
(29)     a. John1 was believed t1 to be tall
b. *John1 was believed t1 is tall
Now observe that case marking patterns identically in the same contexts (c.f. (30)).  These are ECM structures, wherein the embedded subject him in (30a) is case licensed by the higher verb believe.[7] On the assumption that believe can case license him iff they form a constituent, we can explain the data in (30) on the model of (29) by assuming that the relevant structure for case licensing is that in (31). Note that where t1 is acceptable in (29) it is also acceptable in (31) and where not, not. In other words, we can unify the two cases as instances of movement.

(30)     a. John believes him to be tall
b. *John believes him is tall
(31)     a. John [him1 believes [t1 to be tall]]
b. *John [him1 believes [t1 is tall]]

The same approach will serve to unify Control and Reflexivization with movement. (32) illustrates the parallels between Raising and Control. If we assume that PRO is actually the residue of movement (i.e. the grammatical structure of (32c,d) is actually (32e,f)), we can unify the two cases.[8] Note the structural parallels between (29b), (31b), (32b) and (32f).
(32)     a. John1 seems t1 to like Mary (Raising)
            b. *John1 seems t1 will like Mary
c. John1 expects PRO1 to like Mary (Control)
d. *John expects t will like Mary
e. John1 expects [t1 to like Mary]
f. *John expects [t will like Mary]

The same analysis extends to explain the Reflexivization data in (33) on the assumption that reflexives are the morphological residues of movement.

(33)     a. John1 expects himself1 to win
b. John1 expects t1 to win
c. *John expects (t1=)himself will win
d. *John expects t1 will win

These are just illustrations, not full analyses. However, we hope that they serve to motivate as plausible a project aiming to unify phenomena that GB treated as different.

There are various other benefits of unification. Here’s one more. Another nice side-benefit of this unification is an explanation of the c-command condition in anatecedent-anaphor licensing that is part of the GB BT. Thus, the c-command condition on Reflexive licensing follows trivially once Reflexivization is unified with movement, as, recall, that a moved expression must c-command it’s launch site is a simple consequence of the NTC in a Merge based theory.  Thus if Reflexivization is an instance of movement, then the c-command condition packed into BT follows trivially. There are other nice consequences as well, but here is not the place to go into them. Remember, this is a shortish Whig History!

Before summing things up, note two features of this Minimalist line of inquiry. First, it takes the classical effects and laws very seriously. MP approaches build on prior GG results. Second, it extends the theoretical impulses that drove GB research. NTC bears more than a passing family resemblance to the Projection Principle. The radical simplification of Merge continues the process started with Move a. The unification of movement, case, reflexivization and control echoes the unification of movement and binding in GB. The replacement of traces with copies continues the process of eliminating the cognitive parochialism of grammatical processes that the elimination of constructions by GB began, as does the simplification of the T-model by the removal of D-structure and S-structure as “special” grammatical levels (which, incidentally is a necessary step in the unification of the four phenomena above in terms of movement). So the simplification of rules and derivations, and the unification of the various dependencies is a well-established theme within GG research, one that Minimalism is simply further extending. Modern syntax sits squarely on the empirical and theoretical results of earlier GG research. There is no radical discontinuity, though there is, one would hope, empirical and theoretical progress.

3. Conclusion.

As we noted at the outset, one mark of a successful science is that it is both empirically and theoretically cumulative. Even “revolutions” in the sciences tend to be conservative in the sense that new theories are (in part) evaluated by how they explain results form prior theory.  Einstein did not discredit Newton. He showed that Newton’s results were a special case of a more general understanding of gravity. Quantum mechanics did not overturn classical mechanics but showed that the latter were special cases of the former (when lots of stuff interacts).  This is the mark of a mature discipline. It’s earlier discoveries serve as boundary conditions for developing novelties. 

Moreover, this is generally true in several ways.  First, a successful field generally has a budget of “effects” that serve as targets of theoretical explication. Effects are robust generalizations of (often) manufactured data. By “manufactured” we mean not generally found in the wild but the result of careful and deliberate construction. In physics there are many many of these. Generative Grammar has a nice number of these as well (as we reviewed in section 2).  A nice feature of effects is that they are relatively immune to shifts in theory. SCO effects, Complex NP effects, CED effects, Fixed Subject Condition effects, Weak and Strong Crossover effects etc. are robust phenomena even were no good theory to explain why they have the properties they do.[9] This is why effects are good tests for theories.

Groups of effects, aka “laws,” are more theory dependent than effects but still useful targets of theoretical explanation. Examples of these in GG are Island conditions (which unify a whole variety of distinct island effects), Binding conditions, Minimality effects, Locality effects, etc.  As noted, these are more general versions of the simpler effects noted above and their existence relies on theoretical unification.  Unlike the simpler effects that compose them, laws are more liable to reinterpretation as theory progresses for they rely on more theory for their articulation. However, and this is important, a sign of scientific progress is that these laws are also generally conserved in later theoretical developments.  There may be some tidying up at the edges, but by and large treating Binding as a unified phenomenon applying to anaphoric dependencies in general has survived the theoretical shifts from the Standard theory to GB to MP.  So too with the general observations concerning how movement operations function (viz. cylically, no lowering, to c-commanding positions).  Good theories, then, conserve prior effects and tend to conserve prior laws.  Indeed, successful novel theories tend to treat prior theoretical results and laws as limit cases in the new schema.  As our WH above hopefully illustrates, this is also a characteristic of GG research over the last 60 years.

Last but not least, novel theories tend to conserve the themes that motivated earlier inquiry. GB is a very nice theory, which explains a lot.  The shift to Minimalist accounts, we have argued, extends the style of explanation that GB initiated. The focus on simplification of rules and derivations and the ambition to unify what appear to be disparate phenomena is not a MP novelty. What is novel (perhaps) is the scope of the ambition and the readiness to reanalyze grammar specific constructs (traces, DS, SS, the distinction between phrase structure and movement, etc.) in more general terms.  But, as we have shown, this impulse is not novel.  And, more important still, the ambition has been made possible by the empirical and theoretical advances that GB consolidated.  This is what happens in successful inquiry: the results of prior work provide a new set of problems that novel theory aims to explain without loosing the insights of prior theory. 

As we’ve argued, GG research has been both very successful and appropriately conservative.  Looked at in the right way (our WH!), we are where we are because we have been able to extend and build on earlier results.  We are making haste slowly and deliberately, just as a fruitful scientific program should. Three cheers for Generative Grammar!!!

[1] The prettiest possible theory, one that Chomsky advanced in early GB, failed to hold empirically. The first idea was, effectively, to treat all traces as anaphoric. Though this worked very well for A-traces, it proved inadequate for A’-traces, which seemed to function more like R-expressions than anaphors (or at least the “argument” A’-traces did). A virtue of assimilating A’-traces to R-expressions is that it led to an explanation of Strong Cross Over effects in terms of Principle C.  Unfortunately, it failed to explain a range of subject-object and argument- adjunct asymmetries that crystalized as the ECP. These ECP effects led to a whole new set of binding-like conditions (so-called “antecedent government”) that did not fit particularly comfortably with other parts of the theory. Indeed, the bulk of GB theory in the last part of the second epoch consisted in investigations of the ECP and various ways of trying to explain the subject/object and argument/adjunct effects.  Three important ideas came from this work: first that the domains relevant for ECP effects are largely identical to those relevant for subjacency effects. Second that ECP effects really do come in two flavors with the subject-object cases being quite different from the argument-adjunct cases. Third, Relativized Minimality.  This was an important idea due to Rizzi, and one that fit very well with later minimalist conceptions. This said, ECP effects, especially the argument/adjunct asymmetries have proven theoretically refractory and still remain puzzling, especially in the context of Minimalist theory.
[2] By ‘extrinsically’ we mean that the exact point in the derivation at which the conditions apply is stipulated.
[3] We distinguish cognitively general from computationally general for there are two possible sources of relief from GB specificity. Either the operations/principles are borrowed from other pre-linguistic cognitive domains or they arise as a general feature of complex computational systems as such. Chomsky has urged the possibility of the second in various places, suggesting that these general computational principles might be traced to as yet unknown physical principles.  However, for research purposes, the important issue lies with the non-linguistic specificity of the relevant operations and principles, not whether they arise as a result of general cognition or natural physical law.
[4] So understanding movement also raises a question that trace theory effectively answered by stipulation: why are some “copies” phonetically silent?  As traces were defined as phonetically empty, this is not a question that arose within GB. However, given a merge based conception it becomes important to give a non-stipulative answer to this question, and lots of interesting theoretical work has tried to answer it. This is a good example of how pursuit of deeper theory can reveal explanatory gaps which earlier accounts stipulated away rather than answered. As should be obvious, this is a very good thing.
[5] Traces are now being used for purely expository purposes. Minimalist theories eschew traces, replacing them with copies.
[6] Before we get too delighted with ourselves, we should note that there are other trace licensing effects that GB accommodated that are not currently explained in the more conceptually svelt minimalism.  So for example, there is currently no account for the argument/adjunct asymmetries that GB spent so much time and effort cataloguing and explaining. 
[7] This is not quite correct, but it will serve for a Whig History.
[8] To repeat, we are using trace notation here as a simple convenience. As indicated above, traces do not exist. What actually occupie the trace positions are copies of the moved expression, as discussed above.
[9] Which is not to imply that there are none. As of writing, there are interesting attempts to account for these effects in minimalist terms, some more successful than others.


  1. This comment has been removed by the author.

  2. Nicely put. I think that a different perception is widespread outside GG, as in Clark & Lappin's (2010 Wiley book p. 8) characterization of the Minimalist Program as "a drastic retreat from the richly articulated, domain specific mechanisms specified in Chomsky's previous theories." The MP is an advance, not a retreat. If it had been a retreat, then some competing theory's solutions would be looking more attractive.

    Regarding one of your details: If reflexivization is movement, why doesn't it abide by the CSC? John expects himself and Mary to get along, vs. *Who do you expect Mary and to get along?

    1. Good question. I can think of two lines of attack.
      1. Take the reflexives within conjunctions to be hidden Pronouns. What I mean is if you take the complimentary distribution with pronouns to be dispositive of reflexives then one might expect that reflexives within conjuncts are not "real" reflexives. This is not entirely nuts. Consider 'I talked to Sally about Frank and me/myself' Sue told Bill about sheila and her/herself'. These seem quite a bit better than 'I talked to Sally about me' or 'Sue told Bill about her.' If this is right, then…
      2. Treat island hood as a PF effect, even CSC. If this is right then plausibly gaps are necessary to island effects. This,as you know, is not a novel idea. Maybe the fact that reflexives are A-movement with copy pronounced shields them from island effects.

      That's the best II can do right now. Good question. Research topic?

    2. Another possibility would be to suppose that A-movement isn't subject to the CSC. (We touched on this once before.) It's hard to find very strong arguments one way or the other, but there are some examples (from the link above) where A-movement out of conjuncts doesn't seem particularly bad:

      (1) It seems to rain and that Norbert is handsome.
      (2) John expected Bill to win the Preakness and that Frank would win the Kentucky Derby.

      For the second example let's assume a raising-to-object type analysis.

      Neither is perfect but perhaps this is because of the conjunction of finite and non-finite clauses.

  3. I often wonder how seriously we are supposed to take the NTC. Feature Inheritance (Richards 2007, Chomsky 2013) and Agree with feature valuation/checking with deletion require that 'tampering' be permitted. Unless of course we say that No Tampering is a condition on Merge, and that Agree, being an entirely separate operation, is not constrained by it. We still need to say something about what can and cannot be tampered with, and what kind of consequences that would have, because presumably having quite an articulated feature system with tampering permitted could end up allowing us to derive the kinds of things that we wanted to rule out with the NTC in the first place. On the other hand, some kind of tampering seems to be entirely necessary to get feature bundles to interact in any kind of meaningful way.

    Any thoughts on this?

    1. I think I've had similar qualms expressed on FoL about feature valuation and NTC. Others tried to convince me not to worry. I still do. So, the real problem you point to is that within current minimalist theory there are actually two operations with different kinds of properties. The first is merge which subsumes structure building and movement. The second is Agree. Merge is relatively well behaved given Chomsky's plausible criteria. So that's the poster child when talking about minimalist successes. the second, IMO, is a bit of a dog's breakfast of assumptions (note the IMO here). I really don't like Agree or feature checking or valuation or… Not only does it introduce huge redundancies into the theory if treated as a Probe-Goal relation, but as you note it sits ill with other nice properties. Now, one can simply distinguish these agree phenomena from the other parts of the syntax. But that's not a great idea IMO. We still want to know why and where this stuff came from if Darwin's Problem is one you are interested in. A more radical thought is to try and reconceptualize feature checking along the lines of theta theory. Recall, the standard view is that there are no theta roles. The latter are just interpretive reflexives of grammatical relations. Why not extend the same idea to case and agreement. Why make invidiously distinguish such features from theta features? This would shunt much of these phenomena to the PF side of the ledger, but is it clear that this is a bad place for them? At any rate, let me admit that I feel as you do about these and think that this requires us to think harder about features and the role they play in the grammar. Are they central? Are they more morpho-phonological titivation? Dunno. Again: research topic?

    2. There are good reasons, I think, why (the set of grammatical processes/relations covered by) Agree cannot be relocated wholesale to PF. One that I am partial to is the following: if a language has morphologically expressed finite verb agreement at all, then either (i) the only DPs that can move to subject position are those that have been agreed with, or (ii) any DP, bearing any case, can move to subject position. What we don't find are languages where some proper subset of the set of all DPs (e.g. nominatives and accusatives, but not datives) can move to subject position, but that subset does not align with the subset of noun phrases that can control agreement. I've taken this to indicate that, in "type (i)" languages, agreement feeds movement to subject position. Since the latter has LF-consequences (scope), agreement cannot occur "at PF" and still stand in the relevant feeding relation.

      Now, there are several nuances to consider. One is the separation between what Arregi & Nevins call Agree-Link (the probe ascertaining which DP it is going to enter into a feature relationship with) and what they call Agree-Copy (the actual copying of morphological features from the DP to the probe). I think there are really good reasons to think Agree-Copy is "at PF"; so the paragraph above, recast through the prism of the Agree-Link/Agree-Copy division, is strictly about Agree-Link.

      Second, it is worth pondering how the generalization in the first paragraph would shake out in a system where there was no (long-distance) Agree, only valuation under (re)merge (with PF and LF each free to privilege the lower copy). If what I've been calling "moves to subject position" in the preceding paragraphs is just "the subset of chains stopping in [Spec,TP] in which PF pronounces the higher copy", then a PF-only conception of Agree could still be responsible for the aforementioned generalization: imagine everything (i.e., every DP in the clause/phase/domain/whatever) moves to [Spec,TP], but the only chains that receive higher-copy pronunciation are those where, at PF, agreement also obtains. But here's the problem: of all of these everything-moves-to-[Spec,TP] chains, only the one that is an agreement chain (and receives higher-copy pronunciation) behaves, scopally, as a subject. On this view, that is a mystery: the question of which of these chains will get to be the agreement-controlling (and hence, higher-copy pronounced) chain is determined after the PF-LF split.

    3. Nice problem. This looks like the old question of how to coordinate case at LF with over morphological differences that was discussed in the very first minimalist paper leading to case CHECKING. But, I'm sure there is more to it. Thx for the puzzle.

  4. Thanks, I've enjoyed reading this series, it's helped me see the logic behind how GG has developed in this tradition.

    But I wonder about some of some of the claims of ‘simplicity’ made within the MP. Just one example: while I agree that the elimination of traces is welcome from the point of view of the NTC (and Inclusiveness and other Good Things), it surely makes things much less simple at the interfaces. Given the copy theory of movement, at both PF and LF interfaces you need some mechanism that can tell, for any constituent, whether or not that constituent has a copy elsewhere in the structure and, if so, whether it is the lower or the higher copy.

    1. I'm not sure that it does make things less simple at the interfaces. Recall that trace theory needed a theory of reconstructions as well. So traces were not bare even in GB. As Chomsky has noted one of the attractive features of the Copy Theory is that it makes reconstruction less mysterious. What is required on any theory is a mapping between parts of the chain and their CI roles. Neither traces nor copies ARE variables; though they can map to them.

      Two last points: Yes one needs to make chains. But I think that this is so in all theories. It is not hard to "tell" whether a copy alone is kosher. It needs a case and a theta role. So chains must be reconstructed. It is easier with traces as these are indexed. One could, of course, index copies and so the problem equalizes, I think.

      Last: what makes the elimination of traces nice is not really the NTC, it plausibly follows from this. Rather it is that traces are theory internal objects with special properties that seem very linguisticky. Theory internal "special" formatives are conceptually undesirable seen from a DP perspective. Thus, eliminating them with something less special is a good idea. The NTC suggests a way of doing this: just let an expression assume multiple roles in a derivation (i.e. allow it to have more than a single property). That seems conceptually anodyne enough.

    2. What is the justification for the NTC on this whig analysis?
      It used to be I thought a computational simplicity argument, but I guess the alternative is an evolutionary simplicity argument; i.e. systems that satisfy the NTC are more likely to evolve.

    3. My justification or Chomsky's? For the latter it defines the "simplest" imaginable combination operation. One that fails to preserve this more "complex." You rightly ask in what does the simplicity reside and you enumerate the two options.

      What I find interesting is that there is a more historical "justification," if that is the right term. It is simply a generalization of the projection principle, which give us trace theory. This version delivers what trace theory did (and so shares its conceptual and empirical virtues) and also lays the groundwork for a theory of reconstruction (again, to the degree that connectedness effects hold, this is empirical justification for the Copy Theory and hence NTC from which the copy theory follows.

      It would be nice to find some other virtues by unpacking one of the two routes you mention. Monotonicity is not unknown as a feature of computational systems and NTC delivers that. But what's monotonicity good for? I don't know. Do you? And as for evolvability, we know so little about this that it is hard to take this seriously. But that's the current state of the art, IMO.

    4. It's not just that chains need to be made, but also that the interface mapping has to be able to tell whether it's dealing with the head or foot of a chain. In the GB setup this can be kept a completely local matter: if you're looking at a trace, then you're not at the head of a chain! But given copy theory, if the interface mapping sees e.g. John_1, it's going to have to look at some larger chunk of structure to tell whether this John_1 is the head John_1 or the foot John_1. That seems less simple.

      [Interlude: fair point about the elimination of traces being conceptually (if not always practically) separable from the elimination of indices.]

      Now the point about reconstruction, I take it, is that just knowing that it's looking at a trace isn't sufficient for the mapping to know what to actually do—so maybe the mapping has to look at a larger chunk of structure either way. But I think the force of this point depends on what your favoured theory of reconstruction phenomena is. There have been theories according to which traces always map to variables, and the needed flexibility comes from elsewhere.

      None of this is meant to argue that the copy theory isn't better *overall*. But I think the gain in simplicity might be overstated.

    5. These are fair points. But several observations. Traces do what they need to do by stipulation. They are designed to have these nice properties and so, all things being equal we should see if these problems can be overcome in a more principled way. By assuming the copy theory we are motivated to look for more principled answers. Second, not all traces/copies get converted into variables, or at least not apparently. Think of ones in A' positions. Worse, some of these are reconstruction sites. So it is not only the head and foot of the chain that are at issue. Third, some of the problems you note re copies might be finessed if copies are distinguished wrt their feature structures. Thus, most copies will not have a complement of theta role and case feature and A'-feature. This may give on a handle on whether some copy is a head or foot of a chain. Those without case are at the foot, for example. Last, the non-syntactic theories of reconstruction might be correct, but, they follow from very little. The nice feature of the copy theory is that is invites reconstruction effects given that an expression is "in" several positions at once and thus might be expected to exercise powers of the positions it occupies. We can mimic this with various non-syntactic theories (and in trace theory) but it's not clear why these effects should hold on these views. I take this to be the main advantage of a copy theory empirically: not that it can do reconstruction but that it makes sense on such a theory.

      That said, we should not oversell the CTM. It's just nice to see how much mileage one can get from a move that plausibly simplifies UG by cleaning out linguistic specific machinery.

      Thx for your comments. Made me think.

  5. This comment has been removed by a blog administrator.