Faculty of Language: Comments on lecture 3: the finale

This is the final part of my comments on lecture 3. The first three parts are (here, here and here). I depart from explication mode in these last comments and turn instead to a critical evaluation of what I take to be Chomsky’s main line of argument (and it is NOT empirical). His approach to labels emerges directly from his conception of the basic operation Merge. How so? Well, there are only two “places” that MPish approaches can look to in order to ground linguistic processes, the computational system (CS) or the interface conditions (Bare Output Conditions (BOC)). Given Chomsky’s conceptually spare understanding of Merge, it is not surprising that labeling must be understood as a BOC. I here endorse this logic and conclude that Chomsky’s modus ponens is my modus tolens. If correct, this requires us to rethink the basic operation. Here’s what I believe we should be aiming for: a conception that traces the kind of recursion we find in FL to labeling. In other words, labeling is not a BOC but intrinsic to CS; indeed the very operation that allows for the construction of SLOs. Thus, just as Merge now (though not in MPs early days) includes both phrase building and movement, the basic operation, when properly conceptualized, should also include labeling.

To motivate you to aim for such a conception, it’s worth recalling that in early MP it was considered conceptually obvious that Move and Merge were different kinds of things and that the latter was more basic and that the former was an “imperfection.” Some (including me and Chris Collins) did not buy this dichotomy suggesting that whatever process produced phrase structure (now E-Merge) should also suffice to give one move (now I-merge). In other words, that FL when it arose came fully equipped with both merge and move neither being more basic than the other. On this view, Move is not an “imperfection” at all. Chomsky’s later work endorsed this conception. He derived the same conclusion form other (arguably (though I’m not sure I would so argue) simpler) premises. I derive one moral from this little history: what looks to be conceptually obvious is a lot clearer after the fact than ex ante. Here’s a good place to mention the owl of Minerva, but I will refrain. Thus, here’s a project: rethink the basic operation in CS so that labels are intrinsic consequences. I will suggest one way of doing this below, but it is only a suggestion. What I think there are arguments for is that Chomsky’s way of including labels in FL is very problematic (this is as close as I come to saying that it’s wrong!) and misdiagnoses the relevant issues. The logic is terrific, it just starts from the wrong place. Here goes.

1. The logic revisited and another perspective on “merge”

There are probably other issues to address if one wants to pursue Chomsky’s proposal. IMO, right now his basic idea, though suggestive, is not that well articulated. There are many technical and empirical issues that need to be ironed out. However, I doubt that this will deter those convinced by Chomsky’s conceptual arguments. So before ending I want to discuss them. And I want to make two points: first that I think that there is something right about his argument. What I mean is that if you buy Chomsky’s conception of Merge, then adding something like a labeling algorithm in CS is conceptually inelegant, if not worse. In other words, Chomsky is right in thinking that adding labeling to his conception of Merge is not a good theoretical move conceptually. And second, I want to suggest that Chomsky’s idea that projection is effectively an interface requirement, a BOC in older terminology, has things backwards. The interfaces do not require labeled structures to do what they do. At least CI doesn’t, so far as I can tell. The syntax needs them. The interfaces do not. The two points together point to the conclusion that we need to re-think Merge. I will very briefly suggest how we might do this.

Let’s start. First, Chomsky is making exactly the right kind of argument. As noted at the outset, Chomsky is right to question labeling as part of CS given his view that Merge is the minimal syntactic operation. His version of Merge provides unboundedly many SLOs (plus movement) all by itself. One can add projection (i.e. labeling) considerations to the rule but this addition will necessarily go beyond the conceptual minimum. Thus, Merge cannot have a labeling sub-part (as earlier versions of Merge did). In fact, the only theoretical place for labels is the interface as the only place for anything in an MP-style account is as an interface BOC or the CS. But as labels cannot be part of CS, they must be traced to properties of the CI/SM interface. And given Chomsky’s view that the CI interface is really where all the action is, this means that labeling is primarily required for CI interpretation. That’s the logic and it strikes me as a very very nice argument.

Let me also add, before I pick at some of the premises of Chomsky’s argument, that lecture 3 once again illustrates what minimalist theorizing should aim for: the derivation of deep properties of FL from simple assumptions. Lecture 3 continues the agenda from lecture 2 by aiming to explain three prominent effects: successive cyclicity, FSCs and EPP effects. As I have stressed before in other places, these discovered effects are the glory of GG and we should evaluate any theoretical proposal by how well and how many it can explain. Indeed, that’s what theory does in any scientific discipline. In linguistics theory should explain the myriad effects we have discovered over the last 60 years of GG research. In sum, not surprisingly, and even though I am going to disagree with Chomsky’s proposal, I think that lecture 3 offers an excellent model of what theorists should be doing.

So which premise don’t I like? I am very unconvinced that labels reflect BOCs. I do not see why CI, for example, needs labeled structures to interpret SLOs. What is needed is structured objects (to provide compositional structure) but I don’t see that it needs labeled SLOs. The primitives in standard accounts of semantic interpretation are things like arguments, predicates, events, proposition, operator, variable, scope, etc. Not agreeing phrases, VPs or vPs or Question Ps etc. Thus, for example, though we need to identify the Q operator in questions to give the structure a question “meaning” and we need to determine the scope of this operator (something like its CC domain), it is not clear to me that we also need to identify a question phrase or an agreement phrase. At least in the standard semantic accounts I am familiar with, be it Heim and Kratzer or Neo-Davidsonian, we don’t really need to know anything about the labels to interpret SLOs at CI. It’s the branching that matters, not what labels sit on the nodes.[1]

I know little about SM (I grew up in a philo dept and have never taken a phonology course (though some of my best friends are phonologists)), but from what I can gather the same seems true on the SM side. There are difference between stress in some Ns and Vs but at the higher levels, the relevant units are XPs not DPs vs VPs vs TP vs CPs etc. Indeed the general procedure in getting to phrasal phonology involves erasing headedness information. In other words, the phonology does not seem to care about labels beyond the N vs V level (i.e. the level of phonological atoms).

If this impression is accurate (and Chomsky asserts but does not illustrate why he thinks that the interfaces should care about labeled SLOs) then we can treat Chomsky’s proposal as a reductio: He is right about how the pieces must fit together given his starting assumptions, but they imply something clearly false (that labels are necessary for interface legibility) therefore there must be something wrong with Chomsky’s starting point, viz. that Merge as he understands it is the right basic operation.

I would go further. If labeling is largely irrelevant for interface interpretation (and so cannot be traced to BOCs) then labeling must be part of CS and this means that Chomsky’s conception of Merge needs reconsideration.[2] So let’s do that.[3]

What follows relies on some work I did (here). I apologize for the self-referential nature of what follows, but hey it’s the end of a very long post.

Here’s the idea: the basic CS operation consists of two parts, only one of which is language specific. The “unbounded” part is the product of a capacity for producing unboundedly big flat structures that is not peculiarly linguistic or unique to humans. Call this operation Iteration. Birds (and mice and bats and whales) do it with songs. Ants do it with path integration. Iteration allows for the production of “beads on a string” kinds of structures and there is no limit in principle to how long/big these structures can be.

The distinctive feature of Iteration is that it could care less about bracketing. Consider an example: Addition can iterate. ((a+b)+c)+d) is the same as (a+(b+c+d)) which is the same as (a+b+c+d) etc. Brackets in iterative structures make no difference. The same is true in path integration. What the ant does is add up all the information but the adding up needs no particular bracketing to succeed. So if the ant goes 2 ft N and then 3 ft W and then 6 feet south and then 4 ft E, it makes no difference to calculation how these bits of directional information are added together. However you do this provides the same result. Bracketing does not matter. The same is true for simple conjunction: ((a&b)&c)&d) is equivalent to (a & (b & (c&d))) which is the same as (a&b&c&d). Again brackets don’t matter. Let’s assume then that iterative procedures do not bracket. So there are two basic features of Iteration: (i) there is no upper bound to the objects it can produce (i.e. there is no upper bound on the length of the beaded string), and (ii) bracketing is irrelevant, viz. Iteration does not bracket. It’s just like beads on a string.

Here’s a little model. Assume that we treat the basic Iterative operation as the set union operation. And assume that the capacity to iterate involves being able to map atoms (but only atoms to their unit sets (e.g. a--> {a}). Let’s call this Select. Select is an operation whose domain is the lexical atoms and whose range is the unit set of that atom. Then given a lexicon we can get arbitrarily big sets using U and Select.[4] For example: If ‘a’, ‘b’ and ‘c’ are atoms, then we can form {a} U {b} U {c} to give us {a,b,c}. And so forth. Arbitrarily big unstructured sets.

Clearly, what we have in FL cannot just be Iteration (ie. U plus Select). After all we get SLOs. Question: what if added to Iteration would yield SLOs? I suggest the capacity to Select the outputs of Iteration. More particularly, let’s assume the little model above. How might we get structured sets? By allowing the output to Iteration to be the input to Select. So, if {a,b} has been formed (viz. {a} U {b}-> {a,b}) and Select applies to {a,b} then out comes the structured SLO {{a,b}, c} (viz. {{a,b}} U {c} -> {{a,b},c}. One can also get an analogue of I-merge: select {{a,b},c} (i.e. {{{a,b},c}}, select c (i.e. {c}), Union the sets (i.e. {{{a,b},c}}} U {c}) and out comes {c, {{a,b},c}}. So if we can extend the domain of Select to include outputs of the union operation then we can get use Iteration to deliver unboundedly many SLOs.

The important question then is what licenses extending the domain of Select to the outputs of Union? Labeling. Labeling is just the name we give for closing Iteration in the domain of the lexical atoms.[5] In effect, labels are how we create equivalence classes of expressions based on the basic atomic inventory. Another way of saying this is that Labeling maps a “complex” set {a,b}to either a or b, thereby putting it in the equivalence class of ‘a’ or ‘b’. If Labels allow Select to apply to anything in the equivalence class of ‘a’ (and not just to ‘a’ alone), we can derive SLOs via Iteration.[6]

Ok, on this view, what’s the “miracle”? For Chomsky, the miracle is Merge. On the view above, the miracle is Label, the operation that closes Iteration in the domain of the lexical atoms. Label effectively maps any complex set into the equivalence class of one of its members (creating a modular structure) and then treats these as syntactically indistinguishable from the elements that head them (as happens in modular arithmetic (i.e. ‘1’ and ‘13’ and ‘12’ and ‘24’ are computationally identical in clock arithmetic). Effectively the lexicon serves as the modulus with labels mapping complexes of atoms to single atoms bringing them within the purview of Select.[7]

Note that this “story” presupposes that Iteration pre-exists the capacity to generate SLOs. The U(nion) operation is cognitively general as is Select which allows U to form arbitrarily large unstructured objects. Thus, Iteration is not species specific (which is why birds, ants and whales can do it). What is species specific is Label, the operation that closes U in the domain of the lexical atoms and this what leads to a modular combinatoric system (viz. allows an operation defined over lexical atoms to also operate over non-atomic structures). Note that if this is right, then labels are intrinsic to CS; without it there are no SLOs for without it U, the sole combination operation, cannot derive sets embedded within sets (i.e. hierarchy).

The toy account above has other pleasant features. For example, the operation that combines things is the very general U operation. There are few conceivably simpler operations. The products of U produce objects that necessarily obey NTC, Inclusiveness and produce copies under “I-merge.” Indeed, this proposal treats U as the main combinatoric operation (the operation that constructs sets containing more than one member). And if combination is effectively U, then phrases must be sets (i.e. U is a set theoretic operation so the objects it applies to must be sets). And that’s why the products of this combinatoric operation respect the NTC, Inclusiveness and produce “copies.”[8]

Let’s now get back to the main point: on this reconstruction, hierarchical recursion is the product of Iterate plus Label. To be “mergeable” you need a label for only then are you in the range of Select and U. So, labels are a big deal and intrinsic to CS. Moreover, this makes labeling facts CS facts, not BOC facts.

This is not the place to argue that this conception is superior to Chomsky’s. My only point is that if my reservations above about treating Labels as BOCs is correct, then we need to find a way of understanding labels as intrinsic to the syntax, which in turn requires reanalyzing the minimal basic operation (i.e. rethinking the “miracle”).

IMO, the situation regarding projection is not unlike what took place when minimalists rethought the Merge/Move distinction central to early Minimalism (see the “Black Book”). Movement was taken to be an “imperfection.” Rethinking the basic operation allowed for the unification of E and I-merge (i.e. Gs with SLOs would also have displacement). I think we need to do the same thing for Labeling. We need to find a way to make labels intrinsic features of SLOs, labels being necessary for building structure and displacing them. Chomsky’s views on projection don’t do this. They start from the assumption that Labels are BOCs. If this strikes you as unconvincing as it does me, then we need to rethink the basic minimal operation.

That’s it. These comments are way too long. But that’s what happens when you try and think about what Chomsky is up to. Agree or not, it’s endlessly fascinating.

[1] Edwin Williams once noted that syntactic categories cross cut semantic ones. Predicative nominals have the same syntactic structure as argument nominal, though they differ a lot semantically. I think Edwin’s point is more generally correct. And if it is, then syntactic labels contribute very little (if anything) to CI interpretation.

[2] Though I won’t go into this here, there is plenty of apparent evidence that Gs care about labeled SLOs. So languages target different categories for movement and deletion. Moreover there are structure preservation principles that need explaining: XPs move to Max P positions, X’s don’t move and heads target heads. In a non-labeling theory, it is still unclear why phrases move at all. And the Pied Piping mantra is getting a bit thin after 20 years. So, not only is there little evidence that the interfaces care about labels, there is non-negligible evidence that CS does. If correct, this strengthens the argument against Chomsky’s approach to projection.

[3] One more aside: I am always wary of explanations that concentrate in interface requirements. We know next to nothing about the interfaces, especially CI, so stories that build on these requirements always seem to me to have a “just so” character. So, though the logic Chomsky deploys is fine, the premise he needs about BOCs will have little independent motivation. This does not make the claims wrong, but it does make the arguments weak.

[4] If we distinguish selections from the “lexicon” so that two selections of a are distinguished (a vs a’), we can get unboundedly big sets. Bags can be substituted for sets if you don’t like distinguishing different selections of atoms.

[5] Chomsky flirted with this idea in his earlier discussion of “edge features” (EF). As yourself where EFs came from? They were taken as endemic to lexical atoms. It is natural to assume that complexes of such atoms inherited EFs from their atomic parts. Sound familiar? EFs, Labels? Hmm. The cognoscenti know that Chomsky abandoned this way of looking at things. This is an attempt to revive this idea by putting it on what might be a more principled basis.

[6] For those who care, this is a grammatical analogue of “clock/modular arithmetic” (see here).

[7] Here’s how Wikipedia describes the process:

In mathematics, modular arithmetic is a system of arithmetic for integers, where numbers "wrap around" upon reaching a certain value—the modulus. The modern approach to modular arithmetic was developed by Carl Friedrich Gauss in his book Disquisitiones Arithmeticae, published in 1801.

A familiar use of modular arithmetic is in the 12-hour clock, in which the day is divided into two 12-hour periods. If the time is 7:00 now, then 8 hours later it will be 3:00. Usual addition would suggest that the later time should be 7 + 8 = 15, but this is not the answer because clock time "wraps around" every 12 hours; in 12-hour time, there is no "15 o'clock". Likewise, if the clock starts at 12:00 (noon) and 21 hours elapse, then the time will be 9:00 the next day, rather than 33:00. Since the hour number starts over after it reaches 12, this is arithmetic modulo 12. 12 is congruent not only to 12 itself, but also to 0, so the time called "12:00" could also be called "0:00", since 12 is congruent to 0 modulo 12.

[8] Note, that Labels allow one to dispense with Probe/Goal architectures as heads are now visible in “Spec-head” configurations. Not that there is anything “special” about Specs (as opposed to complements or anything else). It’s just that given labels, XPs can combine with YPs even after “first” merge and still allow their heads to “see” each other. This, in fact, is what endocentricity was made to do: put expressions that are not simple heads “next to” each other. And they will be adjacent whether the elements combined are complements or specifiers. Chomsky is right that there is nothing “special” about specifiers. But that’s just as true of complements.

1 comment:

Marc van OostendorpJuly 14, 2014 at 10:01 AM
I find this very interesting. I agree with you that it is problematic to say that labels like N and V are needed for the interfaces. They seem so clearly syntactic.

On the phonological side, I suppose most phonologists would agree that the primary candidates for syntactic features that are visible are N and V. Jennifer Smith (at University of North Carolina) has many publications about phonological differences between nouns and verbs in many languages.

However, personally I am not convinced that even those features are really visible. I would think that the difference is merely one in structure. Take the English stress example. It seems to me that we could say that the verb record always has some phonetically empty syllabic ending which is lacking from the noun record. In other words, the relevant structures are /re-cord-0/ vs. /re-cord/. Stress is always on the penultimate syllable, and phonology does not have to see nouns or verbs. It only has to see that verbs typically have more (morphosyntactic) structure.

The question obviously still remains what ARE N and V? Why does syntax deal with these things rather than something completely different?

Faculty of Language

Comments

Friday, July 11, 2014

Comments on lecture 3: the finale

1 comment:

Contributors