3. The second epoch: Categorizing, Simplifying and Unifying the Rules
So, the first stage of GG research ends with a bunch of rules describing a bunch of effects in addition to providing early models of how the kinds of rules Gs contain might interact to generate the unbounded number of <m,s>s within a given NL. This prepared the way for research focusing on question (5b): what must FL look like given that it can derive Gs with these properties? Not surprisingly, given the logic of the enterprise, and given candidate Gs, the next steps moved along two tracks: (i) Cross linguistic investigations of Gs different from English to see to what degree the Gs proposed for English carry over to Gs of other NLs and (ii) Simplification and unification NL Gs so as to make them more natural “fits” for FL. The second epoch stretches from roughly the mid-1970s to the early 1990s. Within the Chomsky version of GG, the classical example of this kind of work is Lectures on Government and Binding (LGB). What did LGB accomplish and how did it do this?
LGB was a mature statement of the work that began with Conditions on transformations. This work aimed to simplify the rules that Gs contain by distilling out those features of particular G rules that could be attributed to FL more generally. Indeed, the basic GB strategy was to simplify particular Gs by articulating UG. Part of this involved categorizing the possible rules a G could contain. Part involved postulating novel theoretical entities (e.g. traces) which served two functions: (i) They allowed the T-rules to be greatly simplified and (ii) They allowed a partial unification of distinct parts of the grammar, viz. binding and movement.
Articulating UG in this way also had a natural acquisition interpretation: in learning a particular G only simple rules need be “abstracted” from the data, the more recondite kinds of knowledge attained by the child (and that had earlier been coded as part of a rule’s structural description) being traced to structural features built into UG. As these UG principles were argued to be innate, they did not have to be acquired on the basis of PLD. In sum, language specific rules are simplified by offloading many of their most intricate features to FL (i.e. the principles of UG). As language specific rules can vary, they must be learned. Thus, simplifying them makes them easier to acquire while enriching UG allows this simplification to occur without (it is hoped) undue empirical loss. That was the logic. Here are some illustrations.
LGB postulated a modular structure for NL Gs:
1. Base rules
a. X’ theory
b. Theta Theory
2. Movement rules (A and A’)
3. Case Rules
4. Binding Rules
5. Control Rules
The general organization of the grammar, the ‘T’-model, specifies how these various rules/conditions apply. The Base Rules generate X’ structured objects that syntactically reflect “pure GF- theta,” (viz. DP expressions output by the base rules occupy all and only theta positions, creating phrase markers analogous to, but not exactly the same as, Deep Structures in the Standard (i.e. Aspects) theory). Targets of movement operations are positions generated by the X’-rules in the base which lexical insertion (LI) rules have not lexically filled. The output of the base component (the combination of X’ rules and LI-rules) is input to the T-component, the part of the grammar that includes movement operations.
Movement rules are entirely re-conceptualized within LGB, in two important ways. First, they are radically simplified reducing essentially to the operation ‘move a’ (move anything anywhere). Second, all G rules, including move a are structure preserving in the sense that all the constituency present in the input to the rule is preserved in the output of the rule. In practical terms, this assumption motivates trace theory. Trace theory has two important theoretical consequences: (i) it serves to unify movement and binding theory and (ii) it is a necessary step in the simplification of movement rules to Move a. Let’s consider how GB does this.
First the process of simplification: LGB replaces complex rules like Passive, which in the Standard Theory look something like (15), with the simple rule of ‘move NP,’ this being an instance of Move a with a = NP.
(15) X-NP1-Y-V-NP2-Z --> X-NP2-Y-be+en-V-by NP1
(Where NP1 and NP2 are clausemates)
‘Move NP’ is simpler in three ways. First, (15) involves the movement of two NPs, raising NP2 and lowering NP1. Passivization when analyzed in Move NP terms, involves two applications of the simpler rule rather than one application of the compound one. Second, (15) not only moves NPs around, but it also inserts passive morphology as well as a by phrase. Third, in contrast to (15), Move NP allows any NP to move anywhere. Thus, the 'Move alpha' analysis of Passive factors out the NP movements from the other features of the passive rule. This effectively eliminates the construction based conception of rules characteristic of the rules in the earlier Standard Theory.
These simplifications, though theoretically desirable, create empirical problems. How? Rules like ‘Move NP’ left to themselves wildly over-generate, deriving all sorts of ungrammatical structures (as we illustrate below). GB addresses this problem in a theoretically novel way. It eliminates the empirically undesirable consequences by enriching UG. Thus, GB theory targets two related dimensions: it simplifies the rules of G while enriching the structure of UG. Most interestingly, the proposed enrichment relies on a computational simplification of the G rules. Let’s consider this in more detail.
'Move alpha' is the simplest possible kind of movement rule. It says something like ‘move anything anywhere.’ Languages differ in what values they allow a to assume. So, for example, English moves WH words to the front of the clause to form interrogatives. Chinese doesn’t. In English alpha can be WH, in Chinese it cannot be. Or Romance languages move verbs to tense, while English doesn’t. Thus in Romance alpha can be V, while in English it can’t. And so on. Again, while so simplifying the rules has some appeal, the trick is to simplify without incurring the empirical costs of over-generation. GB achieves this via Trace Theory, which is itself a consequence of the Projection Principle, a requirement that bars derivations from loosing syntactic information. Here’s the story.
In the GB framework, trace theory implements the general computational principle that derivations be monotonic. For example, if a verb has a transitive syntax in the base, then it must remain transitive throughout the derivation. Or, put another way, if some NP is an object of a V at some level of representation, the information that it was must be preserved at every subsequent level of representation. In a word, information can be created but not destroyed, i.e. G rules are structurally monotonic with the structure that is input to a rule preserved in the structure that is out to that rule. Within GB, the name of this this general computational principle is the Projection Principle, and the way it is formally implemented is via Trace Theory (TT), as we shall see anon.
This monotonicity condition is a novelty. Operations within the prior Standard model are not similarly monotonic. To illustrate, take the simple case of Raising to subject, which can be schematized along the lines of (16):
(16) X-T1-Y-NP-T2-Z --> X-NP-T1-Y-T2-Z
This rule can apply in a configuration like (17a) to derive a structure like (17b):
(17) a. [TP [T present] [VP seem [TP John [T to] [VP like Mary]]]
b. [TP John [T present] [VP seem [TP [T to] [VP like Mary]]]
Note that the information that John had been the subject of the embedded clause prior to the application of (16) is lost, as the embedded TP in (17b) no longer has a subject like it does in (17a).
As noted, TT is a way of implementing the Projection Principle. How exactly? Movement rules in GB are defined as operations that leave traces in positions from which movement occurs. Given trace theory, the representation of (17a) after Raising has applied is (18):
(18) [TP John1 [T present] [VP seem [TP t1 [T to] [VP like Mary]]]
Here t1 is a trace of the moved John, the co-indexing coding the fact that John was once in the position occupied by its trace. As should be clear, via traces, movement now preserves prior syntactic structure. This information preserving principle (i.e. that grammatical operations cannot destroy structure) becomes a staple of all later theory.
TT is GB’s first step towards simplifying G rules. The second bolder step is to propose that traces require licensing and the boldest step is to execute this by using TT to unify Binding and Movement. Specifically, Binding Theory expands to include the relation between a moved alpha and its trace. Executing this unification, given other standard assumptions (particularly that D-structure represents pure GF-theta) requires rethinking Binding and replacing rules like Reflexivization in favor of simpler more abstract principles. Again, let’s illustrate.
Say we treat Raising as just an instance of ‘Move NP,’ then we need a way of preventing the derivation of unacceptable sentences like (19a), from sentences with the underlying structure in (19b).
(19) a. *John seems likes Mary
b. [TP [T present] [VP seem [TP John [T present] [VP like Mary]]]
Now, given a rule like (16) this derivation is impossible. Note that the embedded T is not to but present. Thus, (16) cannot apply to (19b) as its structural description is not met (i.e. the structural description of (16) codes its inapplicability to (19b) thus preventing the derivation of (19a)). But, if we radically simplify movement rules to “move anything anywhere,” the restriction coded in (16) is not admissible and over-generation problems (e.g. examples like (19a)) emerge.
To recap, given a rule that simply says ‘Move NP’ there is nothing preventing the rule from applying to (19b) and moving John to the higher subject position. The unification of movement and binding via TT serves to prevent such over-generation. How? By treating the relation between a trace and its antecedent as identical to that between an antecedent and an anaphor (e.g. a reflexive). Specifically, if the trace in (20a) is a kind of “reflexive” then the derived structure is illicit as the trace left by movement is not bound. In effect, (20a) is blocked in basically the same way that (20b) is.
(20) a. [TP John1 [T present] [VP seem [TP t1 [T Present] [VP like Mary]]]
b. [TP John1 [T present] [VP believe [TP he/him-self1 [T Present] [VP like Mary]]]
Let’s pause and revel in the logic on display here: if derivations are monotonic (i.e. obey the Projection Principle) then when a moves it leaves a trace in the moved from position. Further, if the relation between a moved a and its trace is the same as an anaphor to its antecedent then the licensing principles that regulate the latter must regulate the former. So, simplifying derivations by making them monotonic and unifying Movement and Binding allows for the radical simplification of Movement rules without any empirical costs. In other words, simplifying derivations, unifying the modules of the grammar (always a theoretical virtue if possible) serves to advance the simplification of its rules. It is worth noting that the GB virtues of simplification and unification are retained as regulative ideals in subsequent Minimalist thinking.
That’s the basic idea. However, we need to consider a few more details as reducing (20a) to a binding violation requires reframing the binding theory. More specifically, it requires that we abstract away from the specifics of the construction and concentrate on the nature of the relation. Here’s what we mean.
The LK rule of Reflexivization contrasts with rules like Raising in that the former turns the lower “dependent” into a reflexive while the latter deletes it. Moreover, whereas Reflexivization is a rule that applies exclusively to clause-mates, Raising only applies between clauses. Lastly, whereas Reflexivization is an operation that applies between two lexical items (viz. two items introduced by lexical insertion in Deep Structure), Raising does not (in contrast to Equi, for example). From the perspective of the Standard Theory, then, Raising and Reflexivization could not look more different and unifying them would appear unreasonable. The GB binding theory, in contrast, by applying quite generally to all nominal expressions highlights the relevant dependencies that differentiate them and does not get distracted by other (irrelevant) features of the constructions (like their differing morphology). Let’s consider how.
GB binding theory (BT) divides all nominal (overt) expressions into three categories and associates each a licensing condition. The three are (i) anaphors (e.g. reflexives, reciprocals, PRO), (ii) pronominals (e.g. pronouns, PRO, pro) and (iii) R-expressions (everything else). BT regulates the interpretation and distribution of these expressions. It includes three conditions, Principles A, B and C and a specification of the relevant domains and licit dependencies:
(21) GB Binding Principles:
A. An anaphor must be bound in its minimal domain
B. A pronoun must be free in its minimal domain
C. An R-expression must be A-free
(22) a is the minimal domain for b if a is the smallest clause (TP) with a subject distinct
(23) An expression a is bound by b iff b c-comands a and b and a are co-indexed
These three principles together capture all the data we noted above in (8)-(14). Let’s see how. The relevant examples are recapitulated in (24). (24a,b,e,f) illustrate that bound reflexives and pronouns are in complementary distribution. (24c,d) illustrate that R-expressions cannot be bound at all.
(24) a. John1 likes himself/*him1
b. John1 believes Mary likes *himself/him1
c. *I expect himself1 to like John1
d. *He1 expects me to like John1
e. John1 believes *himself/he1 is intelligent
f. John1 believes himself/*he to be intelligent
How does BT account for these data? Reflexives are categorized as anaphors and so subject to Principle A. Thus, reflexives must be bound in their minimal domains. Pronouns are pronominals subject to Principle B. Thus, a pronoun cannot be bound in its minimal domain. Thus given BT, pronouns and reflexives must be in complementary distribution. This accounts for the data in the mono-clausal (24a,b). It also account for the data in (24f). The structure is provided in (25):
(25) [TP1 John Present [VP believe [TP2 himself/he to be intelligent]]]
The minimal domain for himself/he is the matrix TP1. Why? Because of (22), which requires that the minimal domain for a must have a subject distinct from a. But himself/he is the subject of TP2. The first TP with a distinct subject is the matrix TP1 and this becomes its binding domain. In TP1 the anaphor must be bound and the pronoun must be free. This accounts for the data in (24f).
(24e) requires some complications. Note that we once again witness the complementary distribution of the bound reflexives and pronouns. The minimal domain should then be the embedded clause if BT is to explain these data. Unfortunately, (22) does not yield this. This problem received various analyses within GB, none of which proved entirely satisfactory. The first proposal was to complicate the notion ‘subject’ by extending it to include the finite marker (which has nominal phi (i.e. person, number, gender) features). This allows the finite T to be a subject for himself/he and their complementary distribution follows given the contrary requirements that A and B impose on anaphors and pronominals.
Principle C excludes (24c,d), as in both cases he/himself bind John. (24c) also violates principle A.
In sum, BT accounts for the same binding effects LK does, though in a very different way. It divides the class of nominal expressions into three separate groups, abstracts out the notion of a binding domain, and provides universal licensing conditions relevant to each. As with the GB movement theory, most of the BT is plausibly part of the native structure of UG and hence need not be acquired on the basis of PLD. What needs to be acquired is what group a particular nominal expression falls into. Is each other an anaphor, pronominal, or R-expression? Once this is determined where it can appear and what its antecedents can be follows from the innate architecture of FL. Thus, BT radically simplifies BT by distinguishing what binding applies to from what binding is and this has a natural interpretation in terms of acquisition: knowledge of what belongs in which category must be acquired, knowledge of how something in a given category behaves is innate.
With this as background let’s return to how GB BT allows for the unification of binding and movement via trace theory. Recall, that BT divides all nominal expressions into three groups. Traces are nominal expressions (viz. [NP e ]1) and so it is reasonable to suppose that they too are subject to BT. Moreover, as traces determine the theta-roles of their antecedents, they must be related to them for semantic reasons. This would be guaranteed were traces treated like anaphors falling under Principle A. This suffices to assimilate (20a) to (20b) and so it closes the explanatory circle.
So, by generalizing binding considerations to all nominal expressions, and by showcasing binding domains, GB makes it natural to unify movement and binding by categorizing traces as anaphoric nominal expressions (a categorization that would be part of UG and so need not be acquired). So, simplifying derivations with the Projection Principle leads to trace theory, which in turn allows for the unification of movement and binding, which in turn leads to a radical simplification of movement transformations, all without any diminution of empirical coverage.
Let us add one more point: recall that one of the big questions concerning language concerns its acquisition by kids on the basis of relatively simple input. The GB story laid the foundations for an answer: the rules were easy to learn because where languages vary the rules are simple (e.g. is a = NP or V or WH or…). GB factors out the intricacies of the Standard Theory Rules (e.g. ordering statements and clausemate conditions) and makes them intrinsic features of FL, hence a child can know them without having to acquire them via PLD. Thus, not only does GB radically simplify and unify the operations in the Standard Theory, a major theoretical accomplishment by itself, it also provides a model for successfully addressing Plato’s Problem; how can kids acquire Gs despite the impoverished nature of the PLD.
To end this section: we have illustrated how GB, building on earlier research (and conserving its discovered empirical “laws”) constructed a more principled theory of FL. Though we looked carefully at binding and movement, the logic outlined above was applied much more broadly. Thus, phrase structure was simplified in terms of X’-theory (pointing towards the elimination of PS rules altogether in contemporary theory) and Island Effects were unified under the theory of Subjacency. The latter echoed the discussion above in that it consolidated the view that T-rules are very simple and not construction centered. Rather constructions are complexes of interacting simple basic operations. The upshot is a rich and articulated theory that describes the fixed structure of FL in terms of innate principles of UG. In addition, the very success of GB theory, opens further important question for investigation. So just as research in the Standard Theory paves the way for a fruitful consideration of linguistic universals and what has been called “Plato’s Problem” (ie. How does knowledge of Gs arise in native speakers despite the relative paucity of data available) the success of GB allows for a consideration of “Darwin’s Problem (“how could something like FL have arisen in the species so rapidly?). We turn to this next.
 There are other versions of GG that appear formally different. Interestingly, virtually all took the effects described in 2 as boundary conditions on further research. Thus, though I will restrict my attention to the Chomsky version of the GG enterprise, the kinds of work discussed has analogues within other “frameworks.” Indeed, truth be told, despite some substantive disagreements, many of the differing approaches look to us as largely notational variants.
 One mark of an ungrammatical structure is that the sentences that coincide with these structures are judged unacceptable. However, ‘(un)grammatical’ is a predicate of syntactic structures (and is therefore carries theoretical content) while ‘(un)acceptable’ is a descriptive predicate applied to data. These two terms are often used interchangeably, which can result in quite a bit of unnecessary confusion. Speakers have not judgments concerning grammaticality, though they are expert concerning acceptability. What Chomsky discovered is that acceptability judgments by native speakers can be used to investigate the grammaticality of linguistic structures. But, and this is important, there are various sources for unacceptability besides ungrammaticality and the two notions need not swing together, though they often do (which is why querying for acceptability is an excellent source of data concerning grammaticality).
 By the way, this information preserving principle has been retained as a critical part of syntactic theory to the present day, though in modern theory it does not rely on traces for its implementation. We return to this later on.
 Ditto with Passive. The Passive rule (15) syntactically detransitivizes a transitive verb, viz. the V in the structural output of the rule no longer has a direct object, though it did have one in deep structure. We leave the details as an exercise for those inclined to build up their syntactic muscles.
 Actually, the structure underlying (19a), but you get the point.
 Indeed, the construction specificity of the rules made most rules look different. Thus unifying Reflexivization and Equi or Equi and Movement did not seem particularly plausible either. Only Trace theory and the abstractions it introduced made the potential similarities between these various dependencies easily visible.
 Things can be (and are) more complex than this. What counts as a clause might differ (TP or CP) and one can extend BT to nominal domains as well with an extension of the notion “subject.” We will put these complications aside here and assume that clause means TP.
 Recall: this fact was also accounted for by the Lees-Klima theory surveyed in section 1, albeit in a very different way. Effects constant, theories different. Just what we want.
 This effectively analogizes agreement markers to pronominals. Pronouns are also just bundles of person, number and gender features. Later approaches to binding were able to eliminate this assumption. See note 31.
 A later proposal (in Knowledge of Language) suggested a more radical proposal. Chomsky, following a proposal by Lebeaux, assumed that to be bound, reflexives must (covertly) adjoin to a position proximate to the antecedent (in this case matrix T, akin to what Romance reflexives do overtly). Such a movement is (plausibly) an ECP violation.
Neither the fix in the text or the one above is particularly compelling (at least to one of the authors (NH)), but both serve to derive the contrast in (24e).
 It is worth noting that the GB version of the binding theory requires no extrinsic ordering (i.e. stipulative ordering) of the operations that license reflexives and those that license pronouns. One can “apply” A and B in any order with the same results, in contrast to (12) and (13) above. As stipulations are never theoretically welcome, its elimination in the repackaging of the BT is a theoretical step forward.