Monday, July 14, 2014

A question about feature valuation

I've been working in a "whig history" (WH) of generative grammar. A WH is a kind of rational reconstruction which, if doable, serves to reconstruct the logical development of a field of inquiry. WHs, then, are not “real” histories. Rather, they present the past as “an inevitable progression towards ever greater…enlightenment.” Real history is filled with dead ends, lucky breaks, misunderstandings, confusions, petty rivalries, and more.  WHs are not. They focus on the “successful chain of theories and experiments that led to the present-day science, while ignoring failed theories and dead ends” (see here). The value of WHs is that they expose the cumulative nature of a given trajectory of inquiry. As one sign of a real science is that it has a cumulative structure and given that many think that the history of Generative Grammar (GG) fails to have a cumulative structure, many think that this tells against the GG enterprise. However, the "many" are wrong: GG has a perfectly respectable WH and both empirically and theoretically the development has been cumulative. In a word, we've made loads of progress.  But this is not the topic for this post. What is?

As I went about reconstructing the relation between current minimalist theory and earlier GB theory, I came to appreciate just how powerful the No Tampering Condition (NTC) really is (I know I know, I should have understood this before, but dim bulb that I am, I didn't). I understand the NTC as follows: the inputs to a given grammatical operation must be preserved in the outputs of that operation. In effect, the NTC is a conservation principle that says that structure can be created but not destroyed. Replacing an expression with a trace of that expression destroys (i.e. fails to preserve) the input structure in the output and so the GB conception of traces is theoretically inadmissible in a minimalist theory that assumes the NTC (which, let me remind you is a very nice computational principle and part of most (all?) current minimalist proposals).

The NTC has many other virtues as well. For example, it derives the fact that movement rules cannot "lower" and that movement (at least within a single rooted sub-"tree") is always to a "c-commanding" position. Those of you who have listened to any of the Chomsky lectures I posted earlier will understand why I have used scare quotes above. If you don't know why and don't want to listen to the lectures, as David Pesetsky. He can tell you.  

At any rate, the NTC also suffices to derive the GB Projection Principle and the MP Extension Condition. In addition, it suffices to eliminate trace theory as a theoretical option (viz. co-indexed empty categories that are residues of movement: [e]1). Why? because traces cannot exist in the input to the derivation and so they cannot exist in the output given the NTC. Thus, given the NTC, the only way to implement the Projection Principle is via the Copy Theory. This is all very satisfying theoretically for the usual minimalist reasons. However, it also raises a question in my mind, which I would like to ask here.

Why doesn't the NTC rule out feature valuation?  One of the current grammatical operations within MP grammars is AGREE. What it does is relate two expressions (heads actually) in a Probe/Goal configuration and the goal "values" the features of the probe.  Now, the way I've understood this is that the Probe is akin to a property, something like P(x) (maybe with a lambdaish binder, but who really cares) and the goal serves to turn that 'x' into some value, so turns P(x) into P(phi) for example (if you want, via something like lambda conversion, but again who really cares). At any rate, and here's my question: doesn't this violate the NTC? After all, the input to AGREE is P(x) and the output is, e.g. P(phi). Doesn't this violate a strict version of the NTC?

Note, interestingly, feature checking per se is consistent with the NTC, as no feature changing/valuing need go on to "check" if sets of features are "compatible."  However, if I understand the valuation idea, then it is thought to go beyond mere bookkeeping. It is intended to change the feature composition of a probe based on the feature composition of the goal.  Indeed, it is precisely for this reason that phases are required to strip off the valued yet uninterpretable features before Transfer. But if AGREE changes feature matrices then it seems incompatible with the NTC.

The same line of reasoning suggests that feature lowering is also incompatible with the NTC. To wit: if features really transfer from C to T or from v to V (either by being copied from the former to the latter or actually copied from the higher to the lower and deleted from the higher) then again the NTC in its strongest form seems to be violated. 

So, my question: are theories that adopt feature valuation and feature lowering inconsistent with the NTC or not? Note, we can massage the NTC so that it does not apply to such feature "checking" operations. But then we could massage the NTC so that it does not prohibit traces. We can, after all, do anything we wish. For example, current theory stipulates that pair merge, unlike set merge, is not subject to Extension, viz. the NTC (though I think that Chomsky is not happy with this given some oblique remarks he made in lecture 3). However, if  the NTC is strictly speaking incompatible with these two operations, then it is worth knowing, as it would seem to be theoretically very consequential. For example, a good chunk of phase theory, as currently understood, depends on these operations and would we discover that they are incompatible with the NTC then this might (IMO, likely does) have consequences for Darwin's Problem.

So, all you thoroughly modern minimalists out there: what say you?


  1. I might be a little misguided here, but I wanted to take a stab at answering this.

    I think adhering to the NTC qua a formal property of the grammar might be misguided. I take it to be something of an empirical matter that structure is preserved in the output of operations, not because there is some condition restricting such operations, but because it would be computationally costly to do otherwise. Therefore, we can recast the question: is feature valuation the ideal circumstance under which uninterpretable elements are ruled out of the derivation at the interfaces? My intuition is "probably so" given that it exploits minimal search in much the same way Chomsky describes displacement as being a natural consequence of I-Merge in the recent lectures. We don't want valuation to come from outside the structure as that seems to fly in the face of computational economy.

    Further, I think that even if we did take a stricter NTC, then why would we assume that valuation fundamentally changes the structure (in the way that trace elements and NP deletion/movement do)? If we suppose that Agree takes P(x), finds G(phi) and replaces P(x) with P(phi) from some inventory of functional elements, then we have a problem, but if we assume that P is the same structural element and Agree is just rewriting properties of that element, then I don't think we run into an issue of the same magnitude. So in effect, Agree is not tampering with the structure, just individual properties of elements in that structure.

    1. Thx, this is interesting. I have few intuitions concerning computational cost so I cannot say if it is the best one. Recall, the arguments that Chomsky gives related to conceptual complexity, not other kinds. In fact, wrt memory load there is an argument to be made that a diacritic here or there can alleviate memory costs. I would not mention this but for the fact that Chomsky, in the 4th lecture (which I just listened to) talks about phases as ways of unburdening memory. Given this, I am not sure that the version of feature valuation is computationally best. Imagine, for example, if copying of phi features simply placed a diacritic (say a - sign) on the valued object indicating that these are copied features, then there would be virtually no memory load at the phase level wrt features as one could tell on inspection whether they were "interpretable" or not. This, of course, would screw many of the explanations that Chomsky has been attributing to phase based computation as these rely on having to retain in memory what's a copy of features and what are real features.

      Last point: I don't get the idea of "just rewriting." This seems like saying that it is a form of rewriting that by definition is not a violation of NTC.

      Thx. Interesting ideas. I will think on it.

    2. This comment has been removed by the author.

    3. Good points.

      By "just rewriting" I was in fact drawing a distinction in terms of what is and isn't a violation of NTC in the sense that changing out whole heads/features doesn't seem to be of the same order as modifying the value of a feature, although that approaches providing an arbitrary escape-hatch to NTC that I don't think I would want to strongly argue.

      That said, Erich makes some good points, and I think he's probably on the right track that the SMT strongly suggests a non-valuation instantiation of features. I have been mulling over the problem over the past day, and I think I may have a tentative solution that addresses the NTC concern and also dovetails nicely with some of the architectural concerns raised by Chomsky ("we shouldn't be doing trees") that Adger's mirror theoretic model more or less sidesteps rather elegantly (at least in my view).

      I'll have to think on it a bit more, too.

  2. My first post here!

    Yes, I have always believed that feature-valuation violates the NTC. And I fully agree that there’s no way of tweaking it (perhaps by defining equivalence classes of elements) that does not open the door to arbitrary escape routes out of the NTC.
    Something like feature-checking is needed, but again, we don’t want to introduce any diacritics marking features, potentially violating FI in the same way that indexation does.

    Feature-checking entails, of course, that the features are already there on both elements undergoing Agree. As a die-hard “All is Merge guy” (AIMG), that’s fine with me: I assume all feature-structures are the result of Merge.

    It seems to me that the simplest way that features can Agree is that they be literally the same feature, i.e. we abandon entirely the notion of distinct tokens of features in Narrow Syntax. All occurrences of a feature F in a phase are of a single token, which by FI must receive a semantic interpretation somewhere (pace Adger and Ramchand, or Pesetsky and Torrego’s work), though phonologically it might appear nowhere or in more than one place. If two SOs fail to Agree for some FF and have distinct values, this will mean the features they bear are actually distinct tokens, each of which needing to be interpreted separately: presumably, this will result in a crash, since the feature will be uninterpretable on one of the elements (phi on T, for example). This approach has the consequence that FFs must receive an interpretation at SEM at least on some element; agreement involving purely formal features of no semantic import anywhere (such as, arguably, certain cases of gender or class features) would have to be parasitic on interpretable FFs.

    Another possibility is that what we call “uninterpretable” FFs are the reflex of a purely PF association between a feature and a syntactic object, on which is not composed until PHON, i.e. in determining the morphological form of the SO. Thus, T would not bear phi-features in NS, but must be associated with them by PF via morphological structure-building. At Transfer, the value of phi for T must be “read off” by PHON. But since T itself doesn’t bear phi, the phase is searched, and the phi values of some local DP are used.

    In any case, I think the SMT strongly suggests an approach that does not involve feature-valuation, and we should expect the architecture of the computational system to dwell within those limits.

    1. Welcome. Nice comment. Hope it's not your last. I too am partial to AIM and so count me as a fellow G.

      I have been thinking a lot of just how tightly tied Chomsky's current technology is to a very strong version of NTC/Inclusiveness (which, I think may actually be very closely related). I will try to write something up soon on this as he discusses related issues in the last lecture.

    2. Let me raise a couple of issues that each of the two approaches just sketched here faces (not to "shoot them down" or anything, just as food for thought) —

      Regarding the single-token/interpretability view, I don't see how this can be reconciled with minimality. If I understand the suggestion, what ensures featural identity between probe and goal is to not have a "stranded" (interpretation-wise) instance of a feature value, that only exists on a probe, where it can't be interpreted. But now, consider a scenario where a probe P is in the same phase as two viable goals, G1 and G2, such that: G1 is structurally closer to P than G2 is, and G1 and G2 have different values for some phi feature F (call those values F1 and F2, respectively). I don't see how this system would rule out "agreement" (I'm using this in the descriptive sense) between P and G2, since - in this scenario - neither F1 nor F2 is "stranded" on the probe. I have in mind, of course, omnivorous agreement patterns, but really any domain where syntactic intervention is at work would do just as well.

      Regarding the PF-association view, I don't see how it can be reconciled with the typological evidence suggesting that in some languages, movement to subject position is parasitic on phi-agreement. Since movement to subject position has LF consequences (binding, scope), it cannot be dependent on something that happens only on the PF branch.

      I have my own agenda, here, obviously - that is, I think we already know that the architecture of the computational system does not dwell within the limits set by the SMT (less polite version: the SMT is demonstrably false). [And when I say "SMT" here, I'm referring to the Chomskyan version, not Norbert's systems-of-use/transparency version.] But I'd of course be interested to see if someone comes up with a way around that.

  3. @Norbert: thanks! I look forward to your NTC/Inclusiveness thread. I too think these matters are tightly linked.

    @Jonathan: Yes, I think the notion of feature-valuation appears more "innocent" than it really is. It divides the object in question into two parts: the feature and its value, thus providing a pleasant assurance that the feature may remain the same while its value is changed or established. But in a strictly merge-based system, the only association between is their combination by Merge. So you really have to say the syntax doesn't "see" any internal properties of the value at all (sort of like a root in the Borer/Chomsky/Marantz approach, which is computationally truly inert in NS) to claim that altering/fixing it doesn't violate the NTC. But if NS can't see the feature-values, I don't know how it can operate on them. . .

    But then maybe it doesn't, sort of as I proposed in my second approach. There really is only a blank there in NS, but PHON sees it, and picks out a value for it from elsewhere in the phase.

    @Omer: I think you're right concerning minimality under the first approach. Allow me a tangent: this might be stretching things a bit, but I'm not sure I quite believe in minimality of the relativized type, at least not in classic terms involving intervention. There seems to be a serious overlap with phases, to the extent that phases remove elements from computation, and tend to remove lower elements first. If it can be shown that Probes do not reach lower elements not due to an intervening element, but due to an intervening phase that removes the lower element from computation, then we don't need relativized minimality. Admittedly, this is a bit of a project to carry out, but it is suspicious that we have both phasal and minimality domains operating at the same time.

    The second approach might be more appetizing. The approach is just a way of describing how uF features get read onto SOs at Transfer. Languages in which agreement is tied to raising, then, simply show that the domain searched for phi-features will include elements that have moved to Spec position and not those that have not. For example, if DP is sitting in Spec T, and T needs to see some phi-features within its phase when it is transferred, well, there they are, on DP, whereas if DP hasn't raised to Spec TP, they are not seen at Transfer. Thus in principle there's no problem handling such movement/phi-agreement relations: agreement will interact with movement as a function of whether phi-features can be found.

    1. @Erich:

      Regarding minimality and phases: I think the whole draw of Rizzi’s Relativized Minimality was that it unified things like dative intervention with wh-superiority effects and with the Head Movement Constraint. And certainly, phases won't get you the HMC (unless every head is a phase, à la Müller). So trying to replace minimality with phases undoes what I view as a fundamental result in syntactic theory.

      Regarding movement and agreement: your second approach presupposes that in languages where subjecthood and agreement are tied to one another, it is movement that facilitates agreement. I have argued at length (e.g. here; see pp. 99-133) that it's the other way around. I will mention two relevant arguments here.

      The first comes from "case-discrimination" (my name for Bobaljik 2008's observation). Agreement is case discriminating in every language where this is testable: it can only target noun phrases bearing some proper subset A of the entire set of case markings B that the language makes available. NP movement, on the other hand, is not always case-discriminating (for example, it is not in Icelandic and, arguably, not in Basque, either). Now the argument goes like this: in every language where NP movement is case-discriminating, the set of case-markings it can target is the same proper subset A singled out by agreement, rather than some other subset of B. This, I argued, is evidence that NP movement (in languages where it is case-discriminating) is parasitic on agreement, thus inheriting the case-discrimination of the latter.

      The other argument is more complicated, and concerns the typology of when dative intervention does and doesn't give rise to ungrammaticality (as opposed to some morphological "default"). In a nutshell, dative intervention seems to cause ungrammaticality exactly and only in instances where NP movement is parasitic on agreement and agreement has failed (as opposed to the failure of agreement alone, which is systematically tolerated, as is the point of that entire work).

    2. @Omer: Thanks for you response, and apologies for the delay; I've drifted up into a cabin in the woods where the deer are thick but the internet is sparse. . . Just a few comments.

      I completely agree with you concerning the importance of Rizzi's result. To be honest it was reading his MIT monograph that convinced me Generative Grammar was making real advances in explanation, and I'm not suggesting we throw it out! I'm pointing out, however, that there is overlap between the way phases dole out locality and the way RM does. It makes me wonder whether understanding RM, which I think is quite real, should be understood in terms of intervention in the manner to which we have become accustomed. Note incidentally that the HMC, which made sense for RM in its original formulation in terms of position-types (A, A-bar, X-zero), makes less sense if expressed in terms of features, since zero-level categories do not necessarily bear any particular common feature, so RM does not really get at the HMC either (to the extent that it holds at all).

      I don't think that what I have suggested concerning agreement is necessarily inconsistent with what you proposed in your thesis; I'm going to make a project of reading it carefully to find out; I have the feeling it could be decisive.

    3. To echo a point that Erich is making: there is a lot of redundancy between Phases and RM, though neither seems able to cover ALL the effects of the other. But, to take one example at random: RM only comes into play when there are two exemplars of a a given category, say two DPs. Then these DPs may interfere with one another in the right structural configurations. Verbal phases also induce locality (PIC) effects when there are two DPs, i.e. v*s are phases while v-unaccusatives are not. This seems like a curious overlap. Rizzi discusses this in his contribution to the Basque Minimalist volume in interesting ways. So I think that Erich is onto something and that how exactly the two notions overlap is worth serious investigation. Indeed, whether we want both notions is an interesting question.

    4. @Erich: These are good points. I see what you're saying when it comes to the problematic status of the HMC from a feature-centric perspective. If you're planning on reading my stuff (which I was in no way suggesting that you have to), shoot me a message offline. I'll share with you something that supersedes the thesis I linked to.

      @Erich, @Norbert: Assuming for now that there is indeed considerable overlap between phases and minimality, the million dollar question becomes whether all intervention configurations -- let's restrict ourselves to the phrasal scenario, to temporarily side-skirt the aforementioned HMC issue -- are "heterophasal" (i.e., the two putative goals G1 and G2 of a given probe P are always in separate phases).

      This is a methodologically slippery slope, of course; it's not always clear what kind of evidence would count for/against declaring something to be a phase, especially given the availability of "interface-vacuous" movement (where both PF and LF privilege the lower copy) + multiple specifiers (meaning a phase has as many escape-hatches as we need it to have). So let's focus here on the "consensus"(?) view of phases -- roughly, one at the CP level, one at verb-phrase level (roughly where the external argument is introduced), and one in some/all DPs.

      One prominent case to look at is then intervention in "applicative unaccusatives" (Rezac 2008). These are two-place unaccusatives where the ABS argument can be agreed with by a higher probe if the hierarchical arrangement of the arguments is ABS > DAT, but not if it is DAT > ABS. On the more conservative allotment of phase boundaries, it seems to me like both of these (internal) arguments will always be tautophasal, and so there is a kernel of not-reducible-to-phases minimality at work here. But again: I'm not sure, at this point, what would count as evidence against someone declaring, "No, ApplP is a phase too" in these structures.

    5. I agree that the issue to look at is whether all "interventions" are also cross phases, and I agree that the answer is probably not. But this said, the rather large overlap is worrying/interesting. The real question may be conceptual: RM only applies when there are multiple targets and so suggests that locality arises when competing alternatives are at issue. Phases suggest a more absolute conception of locality. The problem is that one can always add features to make RM operative "absolutely" and can add phases to insure PIC effects. Nonetheless, the intuitions motivating each approach is different.

  4. I agree that Agree/feature-valuation violates NTC; to me it suggests that it should be relegated to PF (where we know NTC is violated anyway, plus there's lots of variation in agreement). It seems to me that Chomsky hasn't made this move yet primarily because he stubbornly holds on to traditional Case Theory, which for him is tied to Agree.

    But feature-checking as typically conceived also violates the NTC, at least if "checked" features are marked in some way as "checked off." Even if we use strike-through to indicate deletion of a feature that's still nothing other than an extraneous symbol we're introducing.

    For these and other reasons (e.g. Bobaljik's) it seems to me that the ideal picture would have all Agree-type featural interaction relegated to PF, but as always there are empirical arguments (by Omer and others) against this move.

  5. Norbert, can you elaborate on why Pair-Merge doesn't obey NTC? I'm probably missing something very simple here...

    1. The operation itself does, the problem is that it is generally allowed to be counter cyclic. So head raising and even adjunction are often take to be exempt from Extension. If we require that adjunction be cyclic, like all other operations, then pair-merge is fine, though how exactly to distinguish pari merge from set merge without labels eludes me.