Friday, October 18, 2013

What's Next in Minimalist Grammars?

After a series of technical posts, I really feel like kicking back and waxing all poetic about some lofty idea without the constant risk of saying something that is provably nonsense. Well, what could be safer and loftier and fluffier than to follow Norbert's example and speculate on the future of my field? Alas, how narrowly should "my field" be construed? Minimalist grammars? Computational linguistics? Linguistics? Tell you what, let's tackle this inside-out, starting with Minimalist grammars. Warning: Lots of magic 8-ball shaking after the jump.

The Technical Toil

Most MG research in the last 15 years has been concerned with the computational properties of the formalism, and that aspect will remain an important focus in the near future. Not because we're age-challenged canines and thus incapable of learning new tricks, but because there are still some important questions that need to be resolved. I think the next few years will see a lot of progress in two areas: adjunction and movement.


MG implementations of adjunction have been around for over a decade,1 but various people with very distinct research agendas have suddenly found themselves working on adjunction within the last two years. Greg Kobele and Jens Michaelis have been studying the effects of completely unrestricted Late Adjunction on MGs,2 and while it is clear that this increases the power of the formalism, we do not quite know yet what is going on. Meanwhile Meaghan Fowlie is trying to reconcile the optionality and iterability of prototypical adjuncts with the ordering facts that were established by Cinque in his work on syntactic cartography.3 And Tim Hunter and me have our own research projects on how the properties of adjuncts give rise to island effects.4 5

The red thread connecting all these questions is the intuition that adjuncts are not as tightly integrated as arguments, which grants them various perks such as optionality and the possibility to enter the derivation countercyclically, but also limits how they may interact with the rest of the tree and turns them into islands, among other things. Right now we're all formalizing different facets of this intuition, but once we figure out how to combine them into a cohesive whole we will have captured the essence of what makes adjuncts adjuncts and separates them from arguments with respect to Merge and Move.


Speaking of Move: It has been known for a long time that movement is the locus of power of the MG formalism, it is indispensable for generating languages that aren't context-free. Hence it is not at all surprising that lots of work has been done on it --- it is an extremely well-understood operation. And not just phrasal movement, nay, we also know quite a lot about head movement, affix hopping, and sidewards movement.

So how could there be anything left to study about movement? Well, about a year ago I felt a rather peculiar itch that was in dire need of scratching, and the scratching took the form of publishing a paper.6 In this paper I presented a system for defining new movement types without increasing the expressivity of MGs over strings. The system allows for many kinds of movement besides vanilla upward movement of phrase to a c-commanding position, for instance head movement, affix hopping, sidewards movement, and lowering, i.e. downward movement of a phrase to a c-commanded position. What's really interesting is that every type of phrasal movement definable this way can be replaced by standard upward movement followed by downward movement. From this perspective sidewards movement, for example, is just a particular way of tying upward and downward movement together into one tidy package. Other combinations even yield movement types that allow MGs to emulate completely different formalisms such as Tree Adjoining Grammar.

This raises a long forgotten question from the dead: do we need downward movement in syntax? The received view is that all instances of downward movement can be handled by upward movement, but now we can see that that is not true if the two types of movement are allowed to interact: no matter how much you fiddle with your features and the lexicon, if you only have Merge and upward movement then certain tree structures cannot be generated by MGs even though they can be generated by Tree Adjoining Grammars, and consequently MGs with both upward and downward movement. So now the question about the status of downward movement is whether any of those tree structures are of linguistic interest. In the long run, this should allow us to evaluate formalisms in terms of the empirical adequacy of the movement operations that are needed to emulate them via MGs. And that is exactly what we need to do: once we have a good understanding of our own formalism, we absolutely have to get a better understanding of how alternative proposals relate to it, how they carve up language along different joints, and why.

Oh, just in case you're wondering: yes, feature coding will get some attention of course (e.g. by yours truly), but not as a topic of its own but rather in relation to other concerns such as adjunction. And I am also willing to bet all my precious Faberge eggs that we will see a couple of papers on MGs without the Shortest Move Constraint, a condition that keeps the power of MGs in check but is also considered too restrictive by some people (maybe even the majority of the MG community, I'm increasingly getting the impression that I'm the only one who's willing to stand up for the poor little fellow).

Capacities Beyond Generative Capacity

All the things above sound appreciably neato, but strictly speaking they're just variations of the same old generative capacity shtick that's been the bread and butter of computational linguistics since the fifties. Now admittedly I will never say no to a thick slice of bread with a thin layer of butter (there's no fighting my Central European upbringing), but it's nice to throw bacon and sausage into the mix once in a while. In the case of MGs, that would be learnability and parsing, and both have been frying in the pan for quite some time now.

Fellow FoL commenter Alex Clark and his collaborators are pushing the learnability boundary further and further beyond the context-free languages. An MG characterization of the learnable string languages is closer than ever --- heck, it might even be done already and just needs to be written up. If I weren't a hopelessly jaded 90s kid I would get all giddy with excitement just thinking about the possibilities that will open up for us. For the first time we can seriously study how much of a role learning may play in explaining language universals and how the workload should be distributed between UG and the acquisition algorithm. Sure, the first few hypothesis will come wrapped in seven layers of ifs and buts with some maybes sprinkled on top, but with time we will get better at interpreting the formal learnability claims from a linguistic perspective.

Should you get bored of all the learnability literature, there'll be brand new papers on parsing MGs for you to devour. The top-down parser developed by Ed Stabler has been implemented in five different programming languages and everyone can download the code, play around with it, modify it to their own liking.7 The paper hasn't been around for long and we already have a follow-up that shows why the parser correctly predicts nested dependencies to be harder to parse than crossing dependencies even though the latter are structurally more complex.8 Once again it will take a while to figure out how to interpret the behavior of the parser, and various modifications and extensions will be grafted onto it to account for oddball phenomena like merely local syntactic coherence effects9. But the all-important first step has been taken, everything is now in place waiting for us to play around with it. And play with it we shall.

Empirical Enterprises Emerge!

The biggest shift I anticipate in the MG community is the increasing focus on modeling empirical phenomena. I vividly remember a conversation I had with Greg Kobele at MOL 2009 in Bielefeld where he said that we finally had a good understanding of MGs and it was time to focus on empirical applications. Back then I disagreed, mostly because I was still wrestling with some specific formal issues that needed to be solved, e.g. the status of constraints. But once we've got adjunction and movement all figured out, using MGs as a framework for the analysis of empirical phenomena will likely turn out to be the most productive line of research.

Don't get me wrong, the formal work has never been limited to the Care Bear cotton kingdom of pure math devoid of any linguistic impact. There have always been empirical aspects to it. However, producing formal results requires a rich background in computer science, a good dose of general mathematical maturity, the ability to work at a very abstract level, and the Zen mindset that will allow you to fully accept the fact that most of your colleagues will have a hard time understanding why you're doing what you're doing. Needless to say, formal work is not exactly a mainstream activity among linguists, and as a consequence few people have tried working with MGs themselves.

But linguists are not a bunch of buffoons that cannot deal with technical machinery. It's just that their strengths lie in a different area. Rather than reasoning about language by proxy of formalisms, they're more interested in using formalisms to analyze language in the most direct way possible --- in other words, traditional empirical work. So now that we have a really good grasp of how MGs work, plus several tools such as the Stabler parser that can readily be used for empirical inquiries, there is no reason why your average linguist shouldn't take MGs for a spin. Students in particular are very curious about how computational work can help them in their research, and we finally have a nice wholesome package to offer to them that does not require two years of studying just to get started.

Of course I do not expect everybody to suddenly adopt MGs as their favored syntactic framework. I think this is one of the precious few moments where an uncomfortably nerdy analogy is in order: Linux has always been a minority operating system, albeit an important one. A world without Linux would be clearly worse, irrespective of how many people are actually using it on their computers. But Linux has reached a point of maturity where it can be used by anybody with basic computer skills and the advantages of doing so are readily apparent. I became a Linux user pretty much at the same time that I got interested in MGs, about seven years ago. Linux has come a long way since then, and so have MGs. For Linux it has already started to pay off. The thought that the same might soon be true for MGs gives me a warm fuzzy feeling in my tummy. Until the 90s cynicism kicks in. So if you think that my prophecies are utter nonsense, now's your chance to cure me of my delusions in the comment section. I, for one, blame the magic 8-ball.

  1. Frey, Werner and Hans-Martin Gärtner (2002): On the Treatment of Scrambling and Adjunction in Minimalist Grammars. Proceedings of the Conference on Formal Grammar (FG Trento), 41--52.
  2. Kobele, Gregory M. and Jens Michaelis (2011): Disentangling Notions of Specifier Impenetrability: Late Adjunction, Islands, and Expressive Power. The Mathematics of Language, 126--142.
  3. Fowlie, Meaghan (2013): Order and Optionality: Minimalist Grammars with Adjunction. Proceedings of MOL 2013, to appear.
  4. Hunter, Tim (2012): Deconstructing Merge and Move to Make Room for Adjunction. Syntax, to appear.
  5. I have a paper that will be made available on my website within the next few days, in the mean time you can check out some slides.
  6. Graf, Thomas (2012): Movement Generalized Minimalist Grammars. Proceedings of LACL 2012, 58--73.
  7. Stabler, Edward (2013): Two Models of Minimalist, Incremental Syntactic Analysis. Topics in Cognitive Science 5, 611--633. All the code is on github.
  8. Kobele, Gregory M., Sabrina Gerth and John Hale (2013): Memory Resource Allocation in Top-Down Minimalist Parsing. Formal Grammar 2012/2013, 32--51.
  9. Tabor, Whitney, Bruno Galantucci and Daniel Richardson (2004): Effects of Merely Local Syntactic Coherence on Sentence Processing. Journal of Memory and Language 50, 355--370.


  1. I wonder how much of these developments could be carried over into LFG, which has trees, features, structure sharing, and, with glue-semantics, can also be endowed with feature-interpretation (although nobody but me tries to develop this in any way, as far as I am aware).

    1. Computationally, LFG (and HPSG) have been shown to be much more powerful than MGs due to the underlying feature unification mechanism. But these proofs just formalize the system as it is defined, rather than how it is used in practice, so I'm not sure what to make of them. For example, there is a translation procedure from HPSG to TAG that seems to work fairly well for most grammars. So if HPSG and LFG could actually be understood as special types of TAGs, then they can also be studied from the MG perspective in a rather straight-forward manner.

  2. I am on the optimistic side too -- but there are some incompatibilities between MGs and the learnability research. In particular the sorts of universals that we get out of learnability seem of a completely different type to the sorts of universals that linguists are interested in. I don't know if that is necessarily a road block to the further integration.

    The other problem I worry about is the MG = MCFG equivalence. Because this is a very very large class. The arguments for non context freeness only justify a very small move outside of CFGs. And other, non minimalist grammars tend to use a much smaller (more minimal?) class -- namely the TAG = CCG = LIG class. So I know that the TAG guys have moved a bit further up the hierarchy over the years, but it would be nice if you MG guys moved down a bit too. MCFGs are a really big class.

    1. But isn't the point of bringing learnability and parsing into the picture that the formalism can be allowed to overgenerate in at least some respects? So moving MGs down a few notches should be a last resort strategy.

    2. Well, now that I think about it, I never considered the case where something can be ruled out in an elegant way via the formalism and via the learner, at the expense of making one of the two slightly more complicated. I'm not sure which route I would prefer in that case.

    3. One extreme is to say that the formalism is very large -- say all PTIME languages using something like RCGs, which just involves some very innocuous functional argument from parsing, and have every interesting constraint arrive from the learning algorithm. Or just from functional constraints on parsing. (Since parsing MCFGs gets harder the higher the dimension is)

      But why are nearly all languages weakly context-free?

    4. Is that actually true? Ever since Shieber's proof that Swiss German is at least a TAL, there has been little interest in showing that other languages, too, are at least TALs. People are usually hunting for something bigger, i.e. MCFLs and PMCFLs. It's also the case that disproving weak context-freeness is extremely difficult in languages with impoverished morphology such as English, simply because the underlying dependencies aren't reflected in the string. But I would wager that if you look closely enough in languages with rich morphology you will find non-CF dependencies. It's just that nobody is interested in doing that.

    5. So maybe I am not thinking clearly here.

      I think there are non-CF dependencies in *English* (what the parsing guys call non-projective dependencies) but there aren't an unbounded number of them. So extraposition (is that the right term?) examples like
      "I saw a great exhibition on Saturday by Sarah Lucas"
      have crossing dependencies no? exhibition---by crosses saw---on Saturday.

      I guess I think of that as a non CF dependency.

      What are the syntactic arguments for having a dimension greater than 2 though in MCFG terms?

    6. Do you want a weak generative capacity argument that some natural language is a (P)MCFL but not a 2-(P)MCFL? I have to admit that I can't think of any established results right now. Even Chinese number names and Groenink's examples of cross-serial dependencies across coordinations in Dutch can be done with 2-PMCFGs. But for non-copying MCFGs they would require dimension greater than 2.

    7. A convincing strong generative capacity would be fine, as long as it isn't too theory internal -- I think the weak/strong line isn't in quite the right place anyway.
      Multiple wh-movement maybe?

    8. I find the idea of a strong generative capacity argument against 2-MCFLs somewhat paradoxical. Can we rephrase it as "Are there MG analyses that use more than one licensee feature (and would thus be translated into an n-MCFG for some n>2)"?

      Such MGs are unavoidable if you adopt even the most basic assumptions such as that subjects move from a lower position to SpecTP, that auxiliaries in English undergo affix hopping, or that all language are underlyingly SVO and Det-Num-A-Noun with other orders derived by movement. The last point is particularly interesting because it makes the right typological predictions about DP-internal word order.

      But I guess those ideas are too theory-driven for what you have in mind, so let's add the restriction that we only consider clear-cut cases where something occurs in a position different from the one it usually occupies in the "surface structure" (so we ignore all instances of move that are posited for cross-linguistic reasons).

      Those cases do exist, too. You already mentioned multiple wh-movement, and there's also scrambling and possibly multiple topicalization (I have my doubts about the latter). Any case where at least two co-arguments undergo movement, e.g. topicalization and heavy NP shift, will also need at least two features. Note that here it doesn't matter if the process is unbounded, if you want to treat both via movement you need at least two distinct licensee features to avoid SMC violations.

      I would also think that any of the examples that are used to motivate the switch from TAG to set-local MCTAG should qualify (possibly even the ones for tree-local MCTAG if you care mostly about strong generative capacity).

    9. A big picture remark: When you said that MGs should go down a few levels, I expected you to bring up well-nested and/or monadic branching MCFGs, which really affect the quality of Move in MGs. The restriction to a specific dimension, on the other hand, is just an upper bound on the total number of licensee features and wouldn't really change all that much about the formalism. So if all natural languages are indeed 2-PMCFLs, that wouldn't be too much of a curve ball for MGs.

      Oh, and I completely forgot to ask: the sorts of universals that we get out of learnability seem of a completely different type to the sorts of universals that linguists are interested in. Do you have some examples? And to which extent are those universals independent of the targeted language class (i.e. do all, say, MAT-learnable classes share some interesting universals)?

    10. @Alex: 'I think there are non-CF dependencies in *English* ' .. but in the example given, "I saw a great exhibition on Saturday by Sarah Lucas", the final PP is optional, and PPs not associated with the subject can appear freely in this position "by Sarah Lucas" -> "in Boston", so I don't see how any generative capacity argument can be made unless connecting form to meaning is introduced into the picture, presumably some sort of strong generative capacity, but exactly how?

      wrt which it might be appropriate to point out that languages tend to be relative relaxed about adding in extra information about participants in a discontinuous way as long as the semantics stays intersectional, so that in various Slavic languages and Greek, it's OK to split intersective adjectives off from their nouns in certain constructions, but not various other kinds such as modals (Ntelitheos 2004 and various other sources):

      to prasino forese i Maria fustani
      the green wore the Mary dress "Mary wore the red dress"

      *ton ipotithemeno idha dholofono
      the alleged I-saw murderer "I saw the alleged murderer" (p58-59)

      I suspect that generative capacity arguments will have to be connected to semantics in some way that makes sense to descriptively oriented linguists (as power to update DRS's or some roughly similar form of knowledge representation, perhaps?) in order to make much of an impression on them.

    11. @Avery -- yes, I am thinking exactly of arguments based on the fact that we have to derive the right semantics for a sentence like "I saw a great exhibition on Saturday by Sarah Lucas" from its syntactic structure, and getting the right interpretation looks like it will require some non-projective dependency structure, regardless of whether it is an adjunct or not.

      @Thomas: the constraints I am thinking of are things like the Finite Context Property (FCP) which says roughly that every syntactic category can be defined using a finite number of contextual features; for the mathematical details see e.g. my recent paper in Machine Learning with Ryo, (yay, another crossing dependency) doi 10.1007/s10994-013-5403-2.
      I hesitate to say that these will apply to all learning algorithms because there may be whole other families of learning algorithms that we don't know about yet, but they seem to affect all of the ones we know about so far.

    12. One more thing -- I think well-nestedness will turn out to be really very important; for parsing sure, but also for things like binarising derivation structures and so on; it doesn't show up as an interesting property when we think of weak learning but I think it will be crucial for strong learning.
      So I guess I need to look more closely at the arguments for going from TAG to MCTAG.