Thursday, April 3, 2014

Chomsky's two hunches

I have been thinking again about the relationship between Plato’s Problem and Darwin’s. The crux of the issue, as I’ve noted before (see e.g. here) is the tension between the two. Having a rich linguo-centric FL makes explaining the acquisition certain features of particular Gs easy (why? Because they don’t have to be learned, they are given/innate). Examples include the stubborn propensity for movement rules to obey island conditions, for reflexives to resist non-local binding etc. However, having an FL with rich language specific architecture makes it more difficult to explain how FL came to be biologically fixed in humans. The problem gets harder still if one buys the claim that human linguistic facility arose in the species in (roughly) only the last 50-100,000 years. If this is true, then the architecture of FL must be more or less continuous with that we find in other domains of cognition, with the addition of a possible tweak or two (language is more or less an app in Jan Koster’sense). In other words, FL can’t be that linguo-centric! This is the essential tension. The principle project of contemporary linguistics (in particular that of the Minimalist Program (MP)), I believe, should be to resolve this tension.  In other words, to show how you can eat your Platonic cake and have Darwin’s too.

How to do this? Well, here’s an unkosher way of resolving the tension. It is not an admissible move in this game to deny Plato’s Problem is a real problem. That does not “resolve” the tension. It denies that there is/was one to begin with. Denying Plato’s Problem in our current setting includes ignoring all the POS arguments that have been deployed to argue in favor of linguo-centric structure for FL. Examples abound and I have been talking about these again in recent posts (here, here). Indeed, most of the structure GB postulates, if an accurate description of FL, is innate or stems from innate mental architecture.  GB’s cousins (H/GPSG, LFG, RG) have their corresponding versions of the GB modules and hence their corresponding linguo-centric innate structures. The interesting MP question is how to combine the fact that FL has the properties GB describes with a plausible story of how these GBish features of FL could have arisen. To repeat: denying that Plato’s Problem is real or denying that FL arose in the species at some time in the relatively recent past does not solve the MP problem, it denies that there is any problem to solve.[1]

There is one (and so far as I can tell, only one) way of squaring this apparent circle: to derive the properties of GB from simpler assumptions.  In other words, to treat GB in roughly the way the theory of Subjacency treats islands: to show that the independent principles and modules are all special cases of a simpler more plausible unified theory. 

This project involves two separate steps.

First, we need to show how to unify the disparate modules. A good chunk of my research over the last 15 years has aimed for this (with varying degrees of success). I have argued (though I have persuaded few) that we should try and reduce all non-local dependencies to “movement” relations. Combine this with Chomsky’s proposal that movement and phrase building devolve to the same operation ((E/I)-Merge) and one gets the result that all grammatical dependencies are products of a single operation, viz. Merge.[2] Or to put this now in Chomsky’s terms, once Merge becomes cognitively available (Merge being the evolutionary miracle, aka, random mutation), the rest of GB does as well for GB is nothing other than a catalogue of the various kinds of Merge dependencies available in a computationally well-behaved system.  

Second, we need to show that once Merge arises, the limitations on the Merge dependencies that GB catalogues (island effects, binding effects, control effects, etc.) follow from general (maybe ‘generic’ is a better term) principles of cognitive computation. If we can assimilate locality principles like the PIC and Minimality and Binding Domain to (plausibly) more cognitively generic principles like Extension (conservativity) or Inclusiveness then it is possible to understand that GB dependencies are what one gets if (i) all operations “live on” Merge and (ii) these operations are subject to non-linguocentric principles of cognitive computation. 

Note that if this can be accomplished, then the tension noted at the outset is resolved. Chomsky’s hunch, the basic minimalist conjecture, is that this is doable; that it is possible to reduce grammatical dependencies to a (at most) one (or two) specifically linguistic operations which when combined with other cognitive operations plus generic constraints on cognitive computation (one’s not particularly linguo-centric) we get the effects of GB.

There is a second additional conjecture that Chomsky advances that bears on the program. This second independent hunch is the Strong Minimalist Thesis (SMT). IMO, it has not been very clear how we are to understand the SMT. The slogan is that FL is the “optimal solution to minimal design specifications.”  However, I have never found the intent of this slogan to be particularly clear. Lately, I have proposed (e.g. here) that we understand the SMT in the context of the question one of the four classic questions in Generative Grammar: How are Gs put to use? In particular, the SMT tells us that grammars are well designed for use by the interface.

I want to stress that SMT is an extra hunch about the structure of FL. Moreover, I believe that this reconstruction of the problematic (thanks Hegel) might not (most likely, does not) coincide with how Chomsky understands MP. The paragraphs above argue that reconciling Darwin and Plato requires showing that most of the principles operative in FL are cognitively generic (viz. that they are operative in other non-linguistic cognitive domains). This licenses the assumption that they pre-exist the emergence of FL and so we need not explain why FL recruits them. All that is required is that they “be there” for the taking. The conjecture that FL is optimal computationally (i.e. that it is well-designed wrt to use by the interfaces) goes beyond the evolutionary assumption required to solve the Plato/Darwin tension. The SMT postulates that these evolutionarily available principles are also well designed. This second conjecture, if true, is very interesting precisely because the first Darwinian one can be true without the second optimal design assumption being true. Moreover, if the SMT is true, this might require explanation. In particular, why should evolutionary available mechanisms that FL embodies be well designed for use (especially given that FL is of recent vintage)?[3]

That said, what’s “well designed” mean? Well, here’s a proposal: that the competence constraints that linguists find suffice for efficient parsing and easy learnability. There is actually a lost literature on this conjecture that precedes MP. For example, the work by Marcus and Berwick & Weinberg on parsing, and Culicover & Wexler and Berwick on learnability investigate how the constraints on linguistic representations, when transparently embedded in use systems, can allow for efficient parsing and easy learnability.[4]  It is natural to say that grammatical principles that allow for efficient parsing and easy learning are themselves computationally optimal in a biologically/psychologically relevant sense. The SMT can be (and IMO, should be) understood as conjecturing that FL produces grammars that are computationally optimal in this sense.

Two thoughts to end:

First, this way of conceiving of MP treats it as a very conservative extension of the general generative program. One of the misconceptions “out there” (CSers and Psychologists are particularly prone to this meme)  is that Generativists change their minds and theories every 2 months and that this theoretical Brownian motion is an indication that linguists know squat about FL or UG. This is false. The outlines of MP as necessarily incorporating GB results (with the aim of making them “theorems” in a more general theoretical framework) emphasizes that MP does not abandon GB results but tries to explain them. This what typically takes place in advancing sciences and it is no different in linguistics. Indeed, a good Whig history of Generative Grammar would demonstrate that this conservatism has been characteristic of most of the results from LSLT to MP. This is not the place to show this, but I am planning to demonstrate it anon.

Second, MP rests on two different but related Chomskyan hunches (‘conjectures’ would sound more serious, so I suggest you sue this term when talking to the sciency types on the prestigious parts of campus): first that it is possible to resolve the tension between Plato and Darwin without doing damage to the former and that the results will be embeddable in use systems that are computationally efficient.  We currently have schematic outlines for how this might be done (though there are many holes to be filled). Chomsky’s hunch is that this project can be completed.

IMO, we have made some progress towards showing that this is not a vain hope, in fact that things are better than one might have initially thought (especially if one is a pessimist like me).[5] However, realizing this ambitious program requires a conservative attitude towards past results. In particular, MP does not imply that GB is passe. Going beyond explanatory adequacy does not imply forgetting about explanatory adequacy. Only cheap minimalism forgets what we have found, and as my mother repeatedly wisely warned me “cheap is expensive in the long run.” So, a bit of advice: think babies and bathwaters next time you are tempted to dump earlier GB results for purportedly minimalist ends.

[1] It is important to note that this is logically possible. Maybe the MP project rests on a misdescription of the conceptual lay of the land. As you might imagine, I doubt that this is so. However, it is a logical possibility. This is why POS phenomena are so critical to the MP enterprise. One cannot go beyond explanatory adequacy without some candidate theories that (purport to) have it.
[2] For the record, I am not yet convinced of Chomsky’s way of unifying things via Merge. However, for current purposes, the disagreement is not worth pursuing.
[3] Let me reiterate that I am not interpreting Chomsky here. I am pretty sure that he would not endorse this reconstruction of the Minimalist Problematic. Minimalists be warned!
[4] In his book on learning, Berwick notes that it is a truism in AI that “having the right restrictions on a given representation can make learning simple.” Ditto for parsing. Note that this does not imply that features of use cause features of representations, i.e. this does not imply that demands for efficient parsability cause grammars to have subjacency like locality constraints. Rather, for example, grammars that have subjacency like constraints will allow for simple transparent embeddings into parsers that will compute efficiently and support learning algorithms that have properties that support “easy” learning (See Berwick’s book for lots of details).
[5] Actually, if pressed, I would say that we have made remarkable progress in cashing in Chomsky’s two bets. We have managed to outline plausible theories of FL that unify large chunks of the GB modules and we have begun to find concrete evidence that both parsing, production and language acquisition transparently use the kinds of representations that competence theories have discovered. The project is hardly complete. But, given the ambitious scope of Chomsky’s hunches, IMO we have every reason to be sanguine that something like MP is realizable. This, however, is also fodder for another post at another time.


  1. I have a question regarding this proposal, maybe someone can help me out?

    If I understand the above correctly, the innovative MP proposal seems to be that "movement and phrase building devolve to the same operation ((E/I)-Merge) and one gets the result that all grammatical dependencies are products of a single operation, viz. Merge." So in a way Darwin's problem reduces to accounting for Merge, "the evolutionary miracle, aka, random mutation" [lets not get hung up on the miracle invocation here].

    Now for something to jump into existence in one mutation, not to mess up what is already present in the organism, and to be passed on precisely as it was to every next generation including us, it has to be fairly simple. Like the Merge Chomsky describes in SoL: "You got an operation that enables you to take mental objects [or concepts of some sort], already constructed, and make bigger mental objects out of them. That's Merge. As soon as you have that, you have an infinite variety of hierarchically structured expressions [and thoughts] available to you." [Chomsky, 2012:11].

    Now here is my problem: If Merge is THAT simple then it does not explain much. To use an analogy: imagine a 3 year old kid [K1] with a supply of Lego blocks. Lets abstract away from pesky physical limitations like time and space and imagine K has an unlimited amount of Legos and infinite time and patience. Now if you instruct K1 to take a Lego-object already constructed and make a bigger object until he runs out of blocks, K1 could build an infinite tower. That would be an analog to counting.

    But if we want language we'd need something more. Imagine another kid [K2] who uses her Legos [and Merge: take an object already constructed and make a new bigger object] to build castles, and people living in the castles, and animals, and trees, and ships, and ... Now that would be a lot more like language. Of course, K2 needs not only the unlimited supply of Legos and Merge but also a bunch of instructions that tell her which piece to use first, second, third... and exactly where to add the new pieces. So, to get from counting to language not just this one miraculous operation Merge but all the instructions would need to be innate. [When one looks at the LingBuzz Collins&Stabler paper, one quickly notices that there is a lot more to Merge then just putting objects together, and as Norbert reminds us, so far minimalists have just account for "chunks of language" so presumably there are still some things missing from the C&S account - or any other currently available].

    Now exactly where in the MP story is the account of how "the instructions" evolved? Please don't answer with any of this "third factor" [laws of physics] or "optimal solution to minimal design specifications" handwaving: K1 is doing just fine building his tower under those constraints, so they cannot explain the castles, and people, and... K2 builds.

    1. I believe a technical approximation of your question is: why aren't there any natural languages that are regular string sets? The set of all natural numbers is a regular string set (a very simple one, in fact). Phonology is probably also regular on this level (albeit more complicated than counting), but every language seems to have some syntactic construction that gives rise to more complex string patterns that are not regular.

      So why the difference between phonology and syntax? Merge (in the MG sense) makes it possible to generate context-free languages, and in order to restrict it to regular string languages it has to be limited in various ways. So if you just apply Merge randomly, odds are you'll end up with a string language that is not regular.

      Now the main difference between MG Merge and Merge as invoked in the language evolution discussions is that the former is also controlled by features, whereas the latter is not. So what I said above only applies to the latter if the feature calculus can be lifted from somewhere else --- the third factors. I don't think anybody has made a serious attempt at this, but Alex Clark's learnability work can be interpreted as a method for inferring linguistic features via domain-general means.

      Even if this doesn't work I don't think it's much of a problem. The standard conception of Merge is "combining items", the MG conception is "combining items that are alike"; if somebody is seriously committed to the claim that Merge is the product of a random mutation, then either version should be plausible, they are both very simple. Or the other way round, given what we know about evolution so far, nothing rules out one but not the other.

    2. This comment has been removed by the author.

    3. Thanks for this, Thomas. From an evolutionary point of view I am not concerned about why there are no natural languages that are just regular string sets. If a system has evolved that can handle context free grammars that system can also handle regular grammars - there would be no need to have two systems. I am concerned about what you call the MG conception of Merge: "combining items that are alike". Once you have such a requirement you need to specify alike in what regard.

      To get back to the Lego example: if you tell a third kid [K3] he is supposed to combine all Legos that are alike, he will ask: alike in what regard? Should I combine all the blue ones, all the red ones, ... or do you mean all the cubes, all the rectangular prisms, all... [even then K3 will be building just blue or red or cube-based or ... towers]
      Regarding language if MG Merge instructs: “combine items that are alike”, one could end up with a system that combines only verbs, or only nouns, or maybe only monosyllabic words, only bisyllabic words, etc. etc. - that is certainly not the kind of language we're interested in.

      I have dabbled in programming only once decades ago but still remember my frustration about how "stupid" these systems are: unless you tell them everything [=write a program that does] they won't do the simplest things. Yes, once you have written a program that works, the machine suddenly 'behaves like magic' and can handle any problem it was programmed to. But it can only do that BECAUSE you have put all the necessary information in the program. So if Chomsky's Martian scientist shows up and asks for an account of the machine you can't just talk about computational efficiency and other niceties [because that applies to ALL machines] but need to reveal the program this machines implements.

      If you believe in a Chomskyan language faculty you need to account for the 'program' that specifies which items are combined into the gazillion meaningful sentences we produce all the time. According to Chomsky 3 factors are involved in LF design: 1. genetic factors, 2. experience, and 3. third factors [laws of nature]. According to anyone who defends poverty of the stimulus 2. can not supply the information we're interested in [or kids could just learn from the input]. This leaves 1. and 3. Here is Merge qoute:

      ""How are these two kinds of Merge employed?" And here you look at the semantic interface; that's the natural one. There are huge differences. External Merge is used, basically, to give you argument structure. Internal Merge is basically used to give you discourse-I related information, like focus, topic, new information, all that kind of stuff " that relates to the discourse situation." [Chomsky, 2012: 17].

      I agree with what you say below about the equivocation of Merge; and based on this quote Chomsky might too [and the quote is barely scratching the surface of what Merge REALLY needs to do].

      The problem for the evolutionist is: how could third factors explain that IM "is basically used to give you discourse-I related information, like focus, topic, new information, all that kind of stuff "? There seems nothing in the laws of physics that differentiates between language parts that are relevant to topic, new or old information etc. In order to get a meaningful account one has to include not just Merge but the entire GB machinery Norbert reminds us is still there in the background, doing all the hard work while Merge reaps all the glory...

    4. So there is an interesting point here: Thomas says the MG version of MERGE is "combining items that are alike", and Christina asks the right question which is "alike in what regard? ", which is, I have to say, the solidly Chomskyan question.

      So I think this is the right dialectic and the answer is "alike in the way that they combine."
      Of course this answer requires three ingredients to be something more than mere handwaving. First, a precise specification of the modes of combination (.e.g linear regular functions that are non-permuting, non-deleting, monadic branching, well-nested etc etc ); second a precise specification of the appropriate notion of similarity (equality, maximal clique etc ) and third a model of how you might learn this (probabilistic learning, some semantic bootstrapping, ...)
      But if you could answer these, (and that is a big if), then you would have an answer.

    5. Thanks for this, Alex. We agree [roughly] that something like your 1-3 ought to be in place. But keep in mind that Norbert was talking language evolution [= evolution of UG]. So if Merge arrived as late on the scene as Chomsky tells us then all the other nice "ingredients" you mention above would have to be already in place, patiently waiting for Merge to arrive . Now that sounds so massively unlikely that the 'miracle Merge mutation' pales in comparison. So the task for anyone explaining the evolution of UG would be to account for the giant miracles that resulted in 1-3 not just for the comparably tiny Merge miracle 'at the end'...

    6. Good point, would not perhaps recursive planning (with minimal guidance by stimuli in the immediate environment) be a way for a merge-like operation to arise and be useful without a preexisting linguistic system? E.g. to make a fire we need to a) make an ax b) find some wood c) cut the wood into appropriate-sized pieces, d) do whatever it is we do to start the actual fire, each of which requires sub-plans. Crows can do some impressive goal-chaining, but in the attested examples such as the recent famous video, all the bits and pieces required are present in the bird's environment, so it's plausibly rather stimulus-bound. There's a classic 1960 book 'Plans and the Structure of Behavior', now available for free on the internet, it seems, that might be relevant.

      The planning-based theory of Merge has the advantage that for plans, sometimes the relative order of steps is fixed other times it isn't, and optimal planning requires that we vary the order when possible to suit the circumstances, so that the mysterious tension between fixed and variable order in language is built in from the beginning.

    7. @Christina: I think there are many problems with the idea that MERGE is the key ingredient; I don't even think that thinking in terms of "key ingredients" is the right way to think about, but if you do, then Norbert's suggestion that LABEL is the important one, is perhaps the better direction to take. Which translating into my idiom would be the ability to categorise sets of strings of constituents on the basis of their combinatorial properties.

      @Avery -- yes, I think there must be some continuity with hierarchically structured representations in other cognitive domains, and planning is I think an obvious candidate.

    8. There was some literature about this in the old days, I think I recall reading something in the late 60s by either a Liberman or a Lieberman about recursive syntax & the kind of planning abilities needed to make the tool to do something, then do it, or make the tool to make the tool, etc. I don't know whether this stuff has just been forgotten, or subsumed in some useful way into current literature.

      One interesting similarity I note between planning and MGs is that planning could be taken to involve something like 'features', in the form of perceptual specifications of the preconditions for doing something. So lighting a fire for cooking something might have as its 'demand features' (corresponding to positive polarity in MGs) perceptual specifications of 'enough wood of the right kind, shape etc, to hand' and 'a fire-starting kit'. The negative polarity version of the features being produced by the actual perception of these items in the location where the fire is to be lit. When the positives are matched with their negatives, the activity begins.

      Syntax would differ from planning in that the processing is much faster, not involving cues lying around in the external environment, and with a highly routinized and stereotyped feature system.

    9. & (I wish it was 01/04/XXXX so I'd have an escape path handy), there is even something like movement here, since if you're coming back with some wood, and pass by Fred's hut, who has some flint that you need for the firestarter kit, you can 'move' part of the kit-gathering activity into the wood-gathering activity.

  2. This isn't all that pertinent to your point, but since Christina brought up a related issue I feel the need to finally come out of the closet and announce to the world that Chomsky’s proposal that movement and phrase building devolve to the same operation ((E/I)-Merge) has always struck me as conceptually flawed.

    The equivocation of Merge and Move ignores the fact that a system with only E-Merge is a lot simpler than one with E/I-Merge. Take a look at Collins&Stabler: every major complication in the system is brought about by I-Merge. Syntactic objects, occurrences, Transfer, all those things would be a lot simpler if Merge was only E-Merge and there was nothing like I-Merge.

    In a sense, reducing Move to I-Merge is akin to reducing subtraction to addition. It is true that you do not need it since 5-3 = 5+(-3), but this is possible only if you operate with integers instead of natural numbers. In many respects, the set of natural numbers with addition and subtraction is a simpler structure than the integers with addition (the former is well-founded, for instance).

    1. Ahh the need to confess. Feels good, doesn't it? So that you don't feel alone in this, let me add my doubts to yours. I personally like the idea of treating both move and merge as arising from the same source. However, I believe (and have proposed) that there are different ways of getting this. My own hunch is that Move is the composite of Copy and Merge, that merge is just E-merge and that Copy is a generic cognitive operation (i.e. one that we have quite independent of FL). The "miracle" is not Merge, but Label, this being the operation that closes Merge in the domain of the lexical primitives. At any rate, for the topic of the post, it really doesn't matter much whether this way, Chomsky's way or yet some third way is adopted. The aim will be to find SOME way to do this. Look for the single miracle which when added to the available cognitive landscape gets you roughly the laws of GB. Do this, and we are a good way towards redeeming the first of Chomsky's hunches. Show that GBish principles allow for easy learning and efficient parsing and you get a good way towards redeeming the SMT. Is this doable? Dunno. Is it worth trying? You bet.

    2. I always thought that the extra thing required is search. Without a means of searching inside an object, internal Merge can't identify its argument so though the operation Merge doesn't need to change to be applied internally, you need to add to it a means of identifying an argument, which connects with the similarity/differentiation issue mentioned above.

    3. As you no doubt know, "search" can do the work, by stipulation. Chomsky has recently observed that EM plausibly involves more search than IM (it is bounded after all). At any rate, it involves some unless one assumes that the numeration never involves any search. Moreover, it involves a further assumption about what does the searching. If it is the head that searches, the search is potentially unbounded. If it is the target that searches, then all that is required is a peek at the top of the tree (stack). So there are many ways of elaborating the search idea, only some of which clearly get what you want. Given this, it is possible to make the right distinction, but just as easy to make any other, as Chomsky's recent off the cuff comments regarding the trading relations between E and I merge indicate.

      Last point: this is not JUST search, it is search in a linguistic object with lingo-centric features. This bodes ill for the kind of reduction I think one wants.

    4. Indeed. Was just saying in reply to Thomas that Merge can give you the possibility of movement, but without search, it might not actually give you movement. As you say, the issue is whether search can be reduced. I think it probably can. Transmission of information through structure seems ubiquitous, but that's again just a hunch.

    5. I agree that the trick hiding under the surface in attempts to unify merge and move (or at least one trick) is related to this idea of where they get their operands from. It seems plausible and reasonable to say that they "do the same thing", but what they operate on, and therefore how they affect the overall state of the derivation, is quite different. Keeping things simple, if we consider a movement operation that satisfies a Case requirement, this is generally going to have the effect of fulfilling an obligation that the derivation saddled itself with in the past, namely when it chose to merge that particular DP into its theta position; this movement step is the second of two steps that you in effect promised to take when you decided to take the first one, and if you decide not to move that thing now, you'll have to move that thing at some later point eventually. (And maybe putting it off would violate minimality of something else, but I'm just thinking about the basic combinatorics here.) With an external merge step, on the other hand, you're deciding what to draw from the lexicon; you might decide to grab X or you might decide to grab Y, and there's no obligation of the form "you must use X sooner or later"; you can instead just grab Y. Of course at the point of any particular merge step there will be categorial restrictions on what's an eligible mergee, but that's different from the kind of obligation/requirement that you're satisfying when you do a movement step.

      There are probably other ways to think about things, but this is one perspective (the one provided by Stabler's MGs) that for me puts the issue in these terms.

    6. Though I am, I guess, on record as (app)lauding Chomsky's unification of EM & IM as a Big Deal, I've also had reservations that seem along the lines of this discussion (in email w/Norbert, as it happens):

      "Or perhaps Chomsky is wrong about IM—maybe it doesn’t just follow from Merge and lack of stipulations. After all, if it is not stipulatively defined that the 'members of the members' of a syntactic object are available for Merging, then IM won’t be able to find the relevant stuff, will it . . . .can there be some way to get 'IM' without an arbitrary 'members of the members' kind of statement (it’s still there buried somewhere, right)? "

      For whatever it's worth, this bit of otherwise idle musing was part of a larger (idle) musing about Islands in MP. If IM isn't just the most general, nonstipulative form/extension of "Merge", as Chomsky suggested, then maybe figuring out how and why it is what it is could shed light on Islands. If there's nothing to think about wrt IM, then presumably there's nothing that thinking about it will get you. But if there is something to think about . . .


    7. RC wrote: "maybe it doesn’t just follow from Merge and lack of stipulations. After all, if it is not stipulatively defined that the 'members of the members' of a syntactic object are available for Merging, then IM won’t be able to find the relevant stuff"

      I agree with this. There are ways to define merge which make it insensitive to the internal/external distinction, from which the possibility of movement would follow automatically, but there are also many ways to define merge that aren't like that. I am not aware of any arguments for taking one or the other of these kinds of definition to be a priori simpler or better than the other. So I don't think it follows from "we must have merge (in the virtually conceptually necessary, but not yet precisely defined, sense)", that there will be movement.

      Of course one reasonable direction to go from here is to say, "Well, it sure looks like we have movement, so it makes sense to pick one of the definitions of merge that is insensitive to the internal/external distinction, in order to kill the two empirical birds with one theoretical stone". But that's an argument from the empirical fact of movement to the choice of definition of merge, which is the reverse direction from what seems to sometimes be suggested.