Monday, July 27, 2015

Relativized Minimality and similarity based interference

The goal of the Minimalist Program is to reduce the language specificity of FL (i.e. UG) to the absolute minimum, so as to reduce the Logical Problem of Language Evolution (LPLE) to manageable size. The basic premise  is that LPLE is more challenging the more linguistiky FL’s operations and principles are. This almost immediately implies that to solve the LPLE requires radically de-linguistifying the structure of FL.

There are several reasonable concerns regarding this version of LPLE, but I will put these aside here and assume, with Chomsky, that the question, though vague, is well posed and worth investigating (see here and here for some discussion). One obvious strategy for advancing this project is to try and reduce/unify well grounded linguistic principles of UG with those operative in other domains of cognition. Of the extant UG principles ripe for such reconceptualization, the most tempting, IMO and that of many others as we shall see, is Relativized Minimality (RM). What is RM to be unified with? Human/biological memory. In particular, it is tempting to see RM effects as what you get when you shove linguistic objects through the human memory system.[1] That’s the half-baked idea. In what follows I want to discuss whether it can be more fully baked.

First off, why think that this is doable at all? The main reason is the ubiquity of Similarity Based Interference (SBI) effects in the memory literature. Here is a very good accessible review of the relevant state of play by Van Dyke and Johns (VD&J).[2] It seems that human (in fact all biological) memory is content addressable (CA) (i.e. you call the memory in terms of its contents (rather than, say, an index)). Further, the more the contents of specific memories overlap, the more difficult it is to successfully get at them. More particularly, say that one accesses a memory via certain content cues, the more these cues overlap the more they “overload” the retrieval protocol making it harder to successfully get the right one. On the (trivial) assumption that memory will be required to deal with the ubiquitous non-(linearly)-adjacent dependencies found in language we should expect to find restrictions on linguistic dependencies that reflect this memory architecture.[3] VD&J review various experiments that show the effects that distracters can have on retrieving the right target when these distracters “resemble” the interacting expressions.

Given that memory will create SBIs it is natural to think that some kinds of dependencies will be favored over others by this kind of memory architecture. Which? Well ones in which the cues/features relating the dependents are dissimilar from those of the intervening elements. Graphically, (1) represents the relevant configuration. In (1), if non-adjacent X and Y need to be related (say there is a movement or antecedence dependency between the two) then this will be easiest if the cues/features relating them are not also shared by intervening Zish elements.

(1)  …X…Z…Y…

This should look very familiar to any syntactician who has ever heard the name ‘Luigi Rizzi’ (and if you haven’t think of either changing fields or getting into a better grad program). Since Rizzi’s 1990 work, (1), in the guise of RM, is standardly used to explain why WHs cannot move across other WHs (e.g. superiority and Wh-island effects) or heads over heads (the head movement constraint).

IMO, RM is one of the prettiest (because simplest) empirically useful ideas to have ever been proposed in syntax.[4] Moreover, its family resemblance to the kinds of configurations that induce SBI effects is hard to miss.[5] And the lure of relating the two is very tempting, so tempting that resistance is really perverse. So the question becomes, can we understand RM effects as species of SBI effects and thus reflections of facts about memory architecture?

Psychologists have been pursing a similar (though as we shall see, not identical) hunch for quite a while. There is now good evidence that VD&J reviews that encumbering (working) memory with word lists while sentences are being processed differentially affects processing times of non-local dependencies and that the difficulty is proportional to how similar the words in memory are with words in the sentence that need to be related. Thus, for example, if you are asked to keep in memory the triad TABLE-SINK-TRUCK while processing It was the boat that the guy who lived by the sea sailed after two days then you do better at establishing the dependency between boat and sail then if you are asked to parse the same sentence with fix in place of sail. Why? Because all three of the memory list words are fixable, while none are sailable. This fact makes boat harder to retrieve in the second fix sentence than the first sail sentence (p. 198-9).

Syntactic form can also induce interference effects. Thus, the subject advantage inside relative clauses (RC) (i.e. it is easier to parse subject relatives than object relatives, see here) is affected by the kinds of DPs present in the RC.  In particular take (2) and (3). The Subject Advantage is the fact that (2) is easier to parse.

(2)  The banker that praised the barber climbed the mountain 
(3)  The banker that the barber praised climbed the mountain

VD&J note that SBI effects are detectable in such cases as the Subject Advantage can be reduced or eliminated if in place of D-NP nominal like the barber one puts in pronouns, quantified DPs like everyone and/or proper names. The reasoning is that the definite descriptions interfere with one another, while the names, pronouns and quantifiers interfere with D-NPs far less.[6]

VD&J offers many more examples making effectively the same point: that the contents of memory can affect sentential parsing of non-local dependencies and that they do so by making retrieval harder.

So, features matter, both syntactic features and “semantic” ones (and, I would bet other kinds as well).

What are the relevant dimensions of similarity? Well, it appears that many things can disrupt, including grammatical and semantic differences. Thus, the “semantic” suitability of a word on the memorized word list and the syntactic features that differentiate one kind of nominal from another can serve to interfere with establishing the relation of interest. [7]

Friedmann, Belletti and Rizzi (FBR) (here) reports similar results, but this time for acquisition. It appears, for example, that subject relatives are more easily mastered than object relatives, as are subject vs object questions. FBR discusses data from Hebrew. Similar results are reported for Greek by Varlokosta, Nerantzini and Papadopoulou (VNP) here. Moreover, just as in the processing literature, it appears that DPs interfere with one another the more similar they are. Thus, replacing D-NP nominal with relative pronouns and bare whs (i.e. what vs what book) eases/eliminates the problem. As FBR and VNP note, the subject advantage is selective and, in their work, is correlated with the syntactic shapes of the intervening nominal.[8] The more similar they are, the more the problems caused.

So, at first blush, the idea that RM effects and SBI effects are really the same thing looks very promising. Both treat the shared features of the interveners and dependents as the relevant source of “trouble.” However (and you knew that knew was coming, right?) things are likely more complicated. What’s clear is that features do make a difference, including syntactic ones. However, what’s also clear is that not only syntactic shape/features matters. So do many other kinds.

Moreover, it is not clear which similarities cause problems and which don’t. For example, the standard RM model (and the one outlined in FBR) concentrates on cases where the features are identical vs when they overlap vs when they are entirely disjoint. The problem with relative clauses like (3) for example, is that the head of the relative clause and the intervening subject have the exact same syntactic D-NP shape and the reason subbing a pronoun or name or quantifier might be expected to mitigate difficulty is that the subject intervener only share some of their features thereby reducing the minimiality effects. So in the case of RCs the story works as we expect.

The problem is that there are other data to suggest that this version of RM delivers the wrong answers in other kinds of cases. For example, a recent paper by Atkinson, Apple, Rawlins and Omaki (here) (AARO) shows that “the distribution of D-linking amelioration effect [sic] is not consistent with Featural Relativized Minimality’s predictions…” (1). AARO argues that carefully controlled rating methods of the experimental syntax variety show that moving a which-NP over a which-NP in Spec C is better than moving it over a who (i.e. (4) is reliably better than (5)). This is not what is expected given the featural identity in the first case and mere overlap in the second.[9]

(4)  Which athlete did she wonder which coach would recruit
(5)  Which athlete did she wonder who would recruit

IMO, superiority shows much the same thing. So (6) is quite a bit better than (7) to my ear.

(6)  I wonder which book which boy read
(7)  I wonder which book who read

Once again, the simple syntactic version of RM suggests that the opposite should be the case. If this is so, then there is more than just structural similarity involved in RM effects.

This, however, might be a good thing if one’s aim is to treat RM effects as instances of more general SBI effects. We expect many different factors to interact to provide a gradation of effects, with syntactic shape being one factor among many. The AARO data suggests that this might indeed be correct, as are the parallels between the VD&J parsing data and the FBR/VNP acquisition data. So, if AARO is on the right track, it argues in favor of construing RM effects as kinds of SBI effects, and this is what we would expect were RM not a grammatically primitive feature of FL/UG, but the reflection of general memory architecture when applied to linguistic objects. In other words, this need not be a problem, for this is what one would expect if RM were just a species of SBI (and hence traceable to human memory being content addressable).[10]

What is more problematic, perhaps, is settling what “intervention” means. In the memory literature intervention is entirely a matter of temporal order (i.e. if Z is active when X and Y are being considered it counts as an “intervener,” i.e. roughly speaking if Z is temporally between X and Y then Z intervenes), while for RM the general idea is that intervention is stated in terms of c-command (i.e. Z intervenes between X and Y if X c-commands Z and Z c-commands Y) and this has no simple temporal implications. Thus, the notion explored in the memory literature mainly explores a “linear” notion of intervention while RM relies on a structural notion and so it is not clear that RM effects should be assimilated to memory effects.

However, I am not currently sure how big a problem this might be. Here’s what I mean.

First, much of the literature reviewed in VD&J involves English data where linear and hierarchical intervention will be the same. We know that when these two are pulled apart in many cases it is hierarchy that matters (see the discussion of the Yun et. al. paper here. It shows that the Subject Advantage persists in languages where linear order and hierarchical order go in different directions).

Similarly, Brian Dillon (here) and Dave Kush (in his unavailable thesis; ask him) show that hierarchical, not linear, intervention is what’s critical in computing binding relations.

Of course, there are also cases where hierarchy does not rule, and linear intervention seems to matter (e.g. agreement attraction errors and certain NPI licensing illusions are less sensitive to structural information, at least in some online tasks, then hierarchical restrictions suggest they should be).[11]

So both notions of proximity seem to play a role in language processing. I don’t know whether both have an effect in acquisition (but see note 11). So, does this mean that we cannot unify RM effects with SBI effects for they apply in different kinds of configurations?  Maybe not.  Here’s why.

Memory effects arise from two sources: the structure of memory (e.g. is it content addressable, rates of decay, number of buffers, RAM etc.) and the data structures that memory works on. It is thus possible that when memory manipulates syntactic structures that it will measure intervention hierarchically because linguistic objects are hierarchically structured. In other words, if phrase markers are bereft of linear order information (as say a set theoretic understanding of phrase markers entails) then when memory deals with these it will not be able to use linear notions to manipulate them because such objects have no linear structure. In these cases, when task demands use memory to calculate the properties of phrase markers, then RM effects is what we expect to see: SBIs with c-command determining intervention.  Of course, sentences when used have more than hierarchical structure and it is reasonable to suppose that this too will affect how linguistic items are used.[12] However, this does not prevent thinking of RM as a memory effect defined over PM-like data structures. And there is every reason to hope that this is in fact correct for if it is then we can treat RM as a special case of a more general cognitive fact about us; that we have content addressable memories that are subject to SBI effects. In other words, we can reduce the linguistic specificity of FL.

Maybe an analogy will help here. In a wonderful book, What the hands reveal about the brain (here), Poizner, Bellugi and Klima (PBK) describe the two ways that ASL speakers with brains damage describe spatial layouts. As you all know, space has both a grammatical and a physical sense for an ASL speaker. What PBK note is that when this space is used grammatically then it functions differently than when it is used physically. When used physically, stroke patients with spatial deficits show all the characteristic problems that regular right hemisphere stroke patients show (e.g. they use only half the space). However, when signing in the space (i.e. using the space linguistically) then this particular spatial deficit goes away and patients no longer neglect half the signing space. In other words, depending on how the space is being used, linguistically or physically, determines what deficits are observed.

Say something analogous is true with memory; when used in computing grammatical properties, intervention is hierarchical. When used otherwise, linear/temporal structure may arise.  Thus what counts as an intervener will depend what properties memory is being used to determine. If something like this makes sense (or can be made to make sense) then unifying RM effects as SBI effects with both related to how human memory works looks possible.[13]

Enough. This post is both too long and too rambling. Let me end with a provocative statement. Many at Athens felt that Darwin’s Problem (the LPLE) is too vague to be useful. Some conceded that it might have poetic charm, that is was a kind of inspirational ditty. Few (none?) thought that it could support a research program. As I’ve said many times before, I think that this is wrong, or at least too hasty. The obvious program that LPLE (aka Darwin’s Problem) supports is a reductive/unificational one. To solve LPLE requires showing that most of the principles operative in FL are non-linguistically specific. This means showing how they could be reflections of something more cognitively (or computationally or physically) general. RM seems ripe for such a reanalysis in more general terms. However, successfully showing that RM is a special case of SBI which is grounded in how human memory operates will take a lot of work, and it might fail. However, the papers I’ve cited above outline how redeeming this hunch might proceed. Can it work? Who knows. Should it work? Yup, the LPLE/DP hangs on it.[14]

[1] The first person I heard making this connection explicitly is Ivan Ortega Santos. He did this in his 896 paper at UMD in about 2007 (a version published here). It appears that others were barking up a similar tree somewhat earlier, as reviewed in the paper by Friedmann, Belletti, Rizzi paper discussed in what follows. The interested reader should go there for references.
[2] Julie Van Dyke and Brian McElree (VD&M) wrote another paper that I found helpful (here). It tries to zero in on a more exact specification of the core properties of content addressable memory systems. The feature that they identify as key is the following:
The defining property of a content addressable retrieval mechanism is that information (cues) in the retrieval context enables direct access to relevant memory representations, without the need to search through extraneous representations (164).
In effect, there is no cost to “search.” Curiously, I believe that VD&M get this slightly wrong. CA specifies that information is called in virtue of substantive properties of its contents. This could be combined with a serial search. However, it is typically combined with a RAM architecture in which all retrieval is in constant time. So general CA theories combine a theory of addressability with RAM architecture, the latter obviating costs to search. That said, I will assume that both features are critical to human memory and that the description they offer above of CA systems correctly describes biological memory.
[3] In other words, unifying RM with CA systems would effectively treat RM effects as akin to what one finds in self-embedding structures. These are well known to be very unacceptable despite their grammaticality (e.g. *that that that they left upset me frightened him concerned her).
[4] My other favorite is the A-over-A principle. Happily, these two can be unified as two sides of the same RM coin if constituents are labeled. See here for one version of this unification.
[5] In fact, you might enjoy comparing the pix in VD&J (p. 197) with (1) above to see how close the conceptions are.
[6]  VD&J do not report whether replacing both D-NPs with pronouns or quantifiers reintroduces the Subject Advantage that replacing barber eliminates. The prediction would seem to be that it should on a straightforward reading of RM. Thus, someone who you/he saw should contras with someone who saw you/him in roughly the same way that (2) and (3) do. VNP (see below) report some data suggesting that quantifiers might pose separate problems. VD&M reporting on the original Gordon & Co studies note “their data indicate that similarity-based interference occurs when the second noun phrase is from the same referential class as the first noun phrase, but it is reduced or eliminated when the noun phrases are from different classes (158).” This suggests that the SBI effects are symmetric.
[7] The scare quotes are here for in the relevant examples exploit a “what makes sense” metric, not a type measure. All the listed expressions in boat-sail and boat-fix examples are of the same semantic type, though only boats are “sailable.” Thus it is really the semantic content that matters here, not some more abstract features. VD&M review other data that points to the conclusion that there are myriad dimensions of “similarity” than can induce SBIs.
[8] VD&J cites the work of Gordon and collaborators. They do not link the abatement of SBIs to syntactic shape but to their semantic functions, their “differeing referential status.” This could be tested. If Gordon is right, then languages with names that come with overt determiners (e.g. Der Hans in German) should, on the assumption that they function semantically the same as names in English do, obviate SBIs when a D-NP is head of the relative. If MR is responsible, then these should function like any other D-NP nominal and show a Subject Advantage.
[9] This is the main point. Of course there are many details and controls to worry about which is why AARO is 50 pages rather than a paragraph.
[10] This might create trouble for the strong RM effects, like moving adjuncts out of WH islands: *How did you wonder whether Bill sang. This is a really bad RM effect. This is correct and the question arises why so bad? Dunno. But then again, we currently do not have a great theory of these ECP effects anyhow. One could of course concoct a series of features that led to the right result, but, unfortunately, one could also find features that would predict the opposite. So, right now, these hard effects are not well understood, so far as I can tell, by anyone.
[11] Note the very tentative nature of this remark. Are there any results from language processing/production/acquisition that implicate purely linear relations? I don’t know (a little help would be nice from you in the know). The NPI stuff and the agreement attraction errors are not entirely insensitive to hierarchical structure. Maybe this: VNP cite work by Friedman and Costa which shows that children have problems with crossing dependencies in coordinate structures (e.g. The grandma1 drew the girl and t1 smiled). The “crossing” seems to be linear, not hierarchical. At any rate, it is not clear that the psycho data cannot be reinterpreted in large part in hierarchical terms.
[12] However, from what I can tell, pure linear effects (e.g. due to decay) are pretty hard to find and where they are found seem to be of secondary importance. See VD&J and VD&M for discussion. VD&J sum things up as follows:
…interference is the primary factor contributing to the difficulty of integrating associated constituents…wit a more specialized role arising for decay…
[13] One other concern might be the following: aren’t grammatical restrictions categorical while performance ones are not? Perhaps. Even RM effects lead to graded acceptability, with some violations being much worse than others. Moreover, it is possible that RM effects are SPI effects that have been “grammaticized.” Maybe. So RM is a grammatical design feature of G objects so that such objects mitigate the problems that CA memory necessarily imposes. I have been tempted to this view in the past. But now I am unsure. My problem lies with the notion “grammaticization.” I have no idea what this process is, what objects it operates over, and how it takes gradient effects and makes them categorical. At any rate, this is another avenue to explore.
[14] There are some syntactic implications for the unification of RM and SBI effects in terms of the structure of CA memory. For example, if RM effects are due to CA architecture then issues of minimal search (a staple of current theories) are likely moot. Why? Well, because as observed (see note 2), CA “enables direct access to relevant memory representations, without the need to search through extraneous representations.”
In other words, CA eschews serial search and so the relevance of minimal search is moot if RM effects are just CA effects. All targets are available “at once” with none more accessible than any others. In other words, no target is further/less accessible than any other. Thus if RM special case of CA then it not search that drives it. This does not mean that distance does not matter, just that it does not matter for search. Search turns out to be the wrong notion.

Here’s another possible implication: if decay is a secondary effect, then distance per se should not matter much. What will matter is the amount of intervening “similar” material. This insight is actually coded into most theories of locality: RM is irrelevant if there is only one DP looking for its base position. Interestingly, the same is true of phase based theories, for structures without two DPs are “weak” phases and these do not impose locality restrictions. Thus, the problems arise when there are structures supporting two relevant potential antecedents of a gap, just as a theory of minimality based on SBI/CA would lead one to suppose.


  1. I'm confused about note 14: we have plenty of evidence by now – don't we? – that in a configuration where X is looking for a goal, and {Y, Z} are both possible goals, the (hierarchical) order of Y & Z matters. (My favorite is what I like to call the Albizu-Rezac effect; references below.) When Z c-commands Y, X can target Z; when Y c-commands Z, it cannot.

    This is anathema to the idea that it only matters "whether some similar distractors are hanging around." Of course one could rig the relevant locality domains so that in the Y-c-commands-Z scenario, there just happened to be a locality boundary somewhere between X and Z, and in the Z-c-commands-Y scenario, there wasn't one. But that suggests to me a fairly massive cross-linguistic and cross-construction conspiracy in the data (otherwise, RM (as formulated by syntacticians) might not have gotten off the ground, in the first place).

    Perhaps the solution is what you hint at in note 2: memory (at least linguistic memory) still involves a hierarchical search, but one where the search criteria are informational and/or structural cues. How well such a picture would mesh with the broader cogsci findings is something I can't evaluate (don't know enough).



    Albizu, Pablo. 1997. Generalized Person-Case Constraint: a case for a syntax-driven inflectional morphology. Anuario del Seminario de Filología Vasca Julio de Urquijo (ASJU, International Journal of Basque Linguistics and Philology) XL:1–33.

    Rezac, Milan. 2008. The syntax of eccentric agreement: the Person Case Constraint and absolutive displacement in Basque. Natural Language & Linguistic Theory 26:61–106, DOI: 10.1007/s11049-008-9032-6.

    1. I was probably even more obscure than usual. The facts that motivate minimal search are correct. However, if RM is SBI plus hierarchy then it is not search that is minimal. Why, because in CA systems there is no cost to search. So how should we understand the "search" effects? In terms of interference or retrieval. I hope that these cover the same set of cases, but that search is not relevant. A more empirically objective way of stating the facts is that short dependencies pre-empt longer ones. Search is an explanation for this fact. Interference is another kind of explanation for this fact. Does that make sense?

    2. I'm not sure. If both Y and Z are in the domain of X, why should the relative hierarchical ordering (of Y and Z) make a lick of difference for "interference"? That seems to me to strongly implicate "search" (modulo the kind of locality-domains conspiracy alluded to in my previous comment).

    3. Why? Well why should search care about hierarchy? Because we define "closeness" in terms of something like c-command. It is completely possible to define search so that hierarchy is irrelevant, but we don't. Ok, lets say that A interferes with B iff A is close to B and define close in terms of c-command. COuld we do this? Sure, why not? SHould see do this? Sure, it seems that the fact say so. Why is this the right notion of "closeness"? Because the data structures being examined have no linear properties (see the 4th and 3rd to last paragraphs and the analogy to linguistic vs physical space. So, it is not part of an interference system that everything interferes, only that certain things do. Which, the nearby ones. Of course, this will look a lot like the search interpretation because both limit things via c-command. But that's not a problem, but a reflection of the apparent facts to be explained. Does this help?

    4. Let me make another attempt at explaining what I'm not getting. If we want to reduce RM to SBI, and the SBI findings tell us that human memory is CA (content-addressed) and RAM (random access), then it should be equally easy – or difficult – for Infl to access a DP that is in the same phase as a dative argument whether or not the dative is closer to Infl than the DP is. Even if the addressing scheme is hierarchical rather than linear, the RAM part of the equation tells us that the addressing scheme simply shouldn't matter.

      So what I'm saying is that the facts, as far as I can tell, militate against the reduction of RM to a version of SBI that is composed of CA+RAM.

      Does that make sense?

    5. This assumes that accessibility is entirely a feature of being in the same phase, right? So A and B are equally accessible if they are in the same phase. What you are noting is that this is wrong. To get around this you add that within a phase there is search subject to CC or something like it. So that both phase-matedness and minimality are relevant to determining accessibility. I agree with this, What I don's see is why this implies that there is SEARCH within the phase. Why not say that being CCed by a content similar target adds an extra amount of feature interference. In other words, close is measured on two dimensions; in same phase and c-commanded by. Thus, there is no search just more and more trouble separating out similar things.

      Does this make sense?

    6. It does. I think that, to the extent that we're still arguing, we might be arguing mostly about terminology at this point. C-command – that's what you meant by CC, yes? – is a (partial) ordering over the constituents in a given phase. One can say that the way this ordering comes to affect retrieval is via "search", or some other way (decay?). But in any event, if there were true Content-Addressable retrieval (i.e., "find me a DP!", à la "find me a red dot!"), I don't see how c-command could be made to matter.

      I think maybe you're envisioning a richer data structure than I am, such that when you say "find me a DP!", and try to retrieve one via Content-Addressing, the retrieval itself is more difficult if the DP is c-commanded by a dative intervener than if the DP c-commands the dative, independently of any kind of incremental search. I confess I don't see what that kind of data structure would look like.

    7. I am thinking that the relevant data structures are phrase markers of the MP sort. The way I am thinking of this, the problem is not finding the relevant DP but not confounding it with other ones. CC things (yes c-command) interfere with one another more strongly than non cc ones do. Why? They are "closer" to each other. So confusion is a function of feature identity and proximity. That's the idea. I agree, however, that it comes very close to the search idea. I want this for what is true about search is that it embodies the idea that shorter dependencies trump longer ones, and that needs to be retained in an empirically adequate story. I jsut don't want the mechanism that enforces this short-over-long preference to be search because I don't think that searching is very hard or relevant.

    8. I have (I think) the same question as Omer, but let me ask it yet another way. What interference gets us is the idea that, very roughly speaking, difficulties arise when there are two goals competing to interact with the same probe. This is a good thing in the sense that the well-known syntactic facts match with this, i.e. certain interesting stuff happens when there are two wh-phrases competing to interact with a questioning complementizer. But this is not the extent of the well-known syntactic facts: the well-known facts tell us that the particular interesting stuff that happens is that the hierarchically closer candidate is able to interact with the probe and the other one is not. I don't see how the interference idea provides any kind of explanation for this additional detail. One could have imagined, for example, that when two goals are competing, interference results in a situation where neither is able to check the relevant features; but this isn't what happens.

      So at the very least, interference seems to leave some of the syntactic facts unexplained. And then Omer's further point, I think, is that the still-unexplained facts seem to be a particularly poor fit for the kind of explanation that interference can figure in, because it's hard to see how such an explanation could end up treating the two candidate goals differently.

      I think the fact that they are distinguished by their hierarchical position may be a bit of a red herring: if things are fully content-addressable, then (to the extent that I understand this term) it should be difficult to distinguish them by any kind of "positioning", whether linear or hierarchical or whatever. I can see that distinguishing them based on linear (i.e. temporal) order might make sense because of decay, but this seems like an orthogonal issue: it's not that content-addressable systems are able to operate in ways that are sensitive to linear order any better than hierarchical order, it's simply that decay is sensitive to time.

      (Of course all of this comes with the caveat, I think, that the term "content-addressable" is only meaningful once you specify what "content" is, and in principle you can probably encode anything you want by making, even structural position, content -- but I'm just trying to run with roughly the usual kind of assumptions.)

    9. Ok, I must be using the terms incorrectly given that two people that usually understand me don't. But let me try again. The idea is that the content of the relate matter. What we find is that when there are two things that need to match then the one that is closer (in the sense of hierarchy) interferes with the one that is further away but not vice versa. That appears to be the fact we are all interested in. Now, is this surprising? Well can might say that in parsing the structure that the stuff that is closer makes it harder for the stuff that is further to get a clear shot at the "goal" that both can satisfy. Whereas the stuff that is further does not. Thus we might expect to see what VD&J call proactive interference (p. 197). So say that RM is proactive interference defined over distances measured in PMish terms. A cc element can proactively interfere with a lower element but not with a non-cc element. This makes RM a case of proactive overload, which they note is one of the two options in a CA memory. Now, we can ask, can we explain why proactive overload matters in Gs but retroactive does not? Sure. Because the higher meets the requirement of the probe and so there is no longer a pair of relations to consider. In other words, if the higher expression meets the requirements of the probe then nothing more needs doing. What if it doesn't then the higher one interferes. Would we ever expect the lower one to interfere with the higher? I don't see why it should.

      So, the proposal: RM is proactive interference. Is there retroactive interference in Gs? I am hoping that the answer is no for some principled reason.

    10. Aoun & Li (2003) gives data from Lebanese Arabic, where a non-c-commanding phrase intervenes:

      1. *[CP wh1 . . . [IP . . . [island . . . wh2 . . .] . . . x1 . . . ]]

      In (1) wh2 does not c-command the A-position of wh1 but it intervenes and blocks movement of wh1. What I understand from (1) is that c-command is not a necessary condition for intervention, at least for their data.

    11. There are English examples that are worth comparing as well:
      1. I wonder who who impressed
      2. I wonder who pictures of who impressed
      3. I wonder who John asked who left
      4. I wonder who asked John who left
      The first two ask about the impact that CC has on locality. To my ear, the second sounds better than the first. If this is so, then we see an effect of CC. The second two are cases of potential practice interference. Again, these seem to me degraded when compared with cases without a WH in the matrix:
      5. I wonder if John asked Mary who left
      6. It was Mary that John asked who left
      If there is a contrast here, then we get some proactive interference.
      I'm not all that sure what this points to, except to say that there is a lot of intermixing of linear and hierarchical effects involved in even simple RM effects and it would be nice to disentangle these if possible.

    12. @Norbert: There is a potential confound in your first example.

      As Richards (2010) shows, Distinctness (understood as a linearization constraint) tends to militate against multiple wh-phrases of the same syntactic category in "close proximity" (examples are Richards'):

      a. I know everyone danced with someone, but I don't know [who] [with whom].
      b. I know every man danced with a woman, but I don't know [which man] [with which woman].

      a. * I know everyone insulted someone, but I don't know [who] [whom].
      b. * I know every man insulted a woman, but I don't know [which man] [which woman].

      These examples are not, of course, structurally parallel to your first example (in particular, these would probably involve multiple wh-phrases in a single CP area, whereas your example involves a wh-phrase in CP and another one in [Spec,TP]); but depending on the finer details of how Distinctness is implemented, that difference may not matter.

    13. Yes, interesting. If it is indeed a PF constraint, which is what R suggests as I recall, then the fact that there is no adjacency in my cases seems to suggest that they ought not to be collapsed. But as you say, the details matter here. Curiously things can be very near so long as they are not adjacent, as your cases show. That said, yes, there may be a confound, but maybe not.

    14. I don't know where you fix the boundary of PF constraint, but your remark about linear adjacency suggests a rather literal view. If I understood you correctly, then I don't think Richards suggests that Distinctness is a PF constraint and there certainly exist examples with very phonetically distinct elements. It is really the identity of syntactic category that is important in that case. Here is an extreme Japanese example of his.

      *Sensei no koto-ga suki-na gakusei-ga koko-ni oozei iru kedo dono gakusei-ga dono sensei no koto-ga ka oboeteinai.

      Teacher of side-NOM liked student-NOM here-P crowd is but which student-NOM which teacher of side-NOM remember not.

      (Intended) There are here many students who show appreciation towards their teachers but I don't remember which towards which.

      Even though the preposition semantically disambiguate the sentence, the fact that the which students and which teachers of side are both nominative make the sentence inacceptable. By contrast,

      Sensei-o hihanshita gakusei-ga koko-ni oozei iru kedo dare-ga dare-o oboeteinai.
      Teacher-ACC criticized student-NOM here-P crowd is but who-NOM who-ACC remember not.
      Many students here criticized many teachers but I don't remember who who.

      is fine.

    15. I am not sure Richards suggests that Distinctness is a PF constraint:

      If a linearization statement is generated, the derivation crashes.
      (Richards, 2010: 5)

      For Richards (2010), Distinctness requires that more than one node of the same
      functional element cannot occur in an asymmetric c-command relation in the same Spell-Out domain (SOD), sameness being open to crosslinguistic variation.
      He also gives numerous examples where Distinctness is violated although the elements are not linearly adjacent but SOD-mate.

      Even if Distinctness were a PF constraint, I do not understand how PF is sensitive to category labels. Do we have any PF constraint that tells us to pronounce /p/ only when it is part of a DP?

  2. Similarity-based interference effects are fairly subtle, aren't they? In the following pair, (1) may be harder to parse than (2), but is still perfectly grammatical:

    (1) The banker that the barber praised climbed the mountain
    (2) The banker that John praised climbed the mountain

    So I'm not sure I understand how relativized minimality effects can be reduced to similarity-based interference. Except perhaps through a historical process that makes difficult-to-process constructions ungrammatical.

    1. Right. I guess what I am thinking is that RM effects are special cases of SBI effects, the latter encompassing a wider domain. So RM effects are SBI effects but not all SBI effects are RM effects. Which SBI effects are RM? The ones that induced by set like data structures (i.e. phrase markers). In particular, you find RM effects when "parsing" for structural info. In these cases the relevant data structures are hierarchically ordered sets. These will bring with them their own special interference profiles. This is what the discussion of Poizner et al was intended to motivate or illustrate.

      In sum, RM effects may be special instances of SBIs and hence reducible to them without their being co-extensive. Or that's the suggestion.