Tuesday, February 27, 2018

Universals; structural and substantive

Linguistic theory has a curious asymmetry, at least in syntax.  Let me explain.

Aspects distinguished two kinds of universals, structural vs substantive.  Examples of the former are commonplace: the Subjacency Principle, Principles of Binding, Cross Over effects, X’ theory with its heads, complements and specifiers; these are all structural notions that describe (and delimit) how Gs function. We have discovered a whole bunch of structural universals (and their attendant “effects”) over the last 60 years, and they form part of the very rich legacy of the GG research program. 

In contrast to all that we have learned about the structural requirements of G dependencies, we have, IMO, learned a lot less about the syntactic substances: What is a possible feature? What is a possible category? In the early days of GG it was taken for granted that syntax, like phonology, would choose its primitives (atomic elements) from a finite set of options. Binary feature theories based on the V/N distinction allowed for the familiar four basic substantive primitive categories A, N, V, and P. Functional categories were more recalcitrant to systematization, but if asked, I think it is fair to say that many a GGer could be found assuming that functional categories form a compact set from which different languages choose different options. Moreover, if one buys into the Borer-Chomsky thesis (viz. that variation lives in differences in the (functional) lexicon) and one adds a dash of GB thinking (where it is assumed that there is only a finite range of possible variation) one arrives at the conclusion that there are a finite number of functional categories that Gs choose from and that determine the (finite) range of possible variation witnessed across Gs. This, if I understand things (which I probably don’t (recall I got into syntax from philosophy not linguistics and so never took a phonology or morphology course)), is a pretty standard assumption within phonology tracing back (at least) to Sound Patterns. And it is also a pretty conventional assumption within syntax, though the number of substantive universals we find pale in comparison to the structural universals we have discovered. Indeed, were I incline to be provocative (not something I am inclined to be as you all know), I would say, that we have very few echt substantive universals (theories of possible/impossible categories/features) when compared to the many many plausible structural universals we have discovered. 

Actually one could go further, so I will. One of the major ambitions (IMO, achievements) of theoretical syntax has been the elimination of constructions as fundamental primitives. This, not surprisingly, has devalued the UG relevance of particular features (e.g. A’ features like topic, WH, or focus), the idea being that dependencies have the properties they do not in virtue of the expressions that head the constructions but because of the dependencies that they instantiate. Criterial agreement is useful descriptively but pretty idle in explanatory terms. Structure rather than substance is grammatically key. In other words, the general picture that emerged from GB and more recent minimalist theory is that G dependencies have the properties they have because of the dependencies they realize rather than the elements that enter into these dependencies.[1]

Why do I mention this? Because of a recent blog post by Martin Haspelmath (here, henceforth MH) that Terje Lohndal sent me. The post argues that to date linguists have failed to provide a convincing set of atomic “building blocks” on the basis of which Gs work their magic. MH disputes the following claim: “categories and features are natural kinds, i.e. aspects of the innate language faculty” and they form “a “toolbox” of categories that languages may use” (2-3). MH claims that there are few substantive proposals in syntax (as opposed to phonology) for such a comprehensive inventory of primitives. Moreover, MH suggests that this is not the main problem with the idea. What is? Here is MP (3-4):

To my mind, a more serious problem than the lack of comprehensive proposals is that linguistics has no clear criteria for assessing whether a feature should be assumed to be a natural kind (=part of the innate language faculty).

The typical linguistics paper considers a narrow range of phenomena from a small number of languages (often just a single language) and provides an elegant account of the phenomena, making use of some previously proposed general architectures, mechanisms and categories. It could be hoped that this method will eventually lead to convergent results…but I do not see much evidence for this over the last 50 years. 

And this failure is principled MH argues relying that it does on claims “that cannot be falsified.”

Despite the invocation of that bugbear “falsification,”[2] I found the whole discussion to be disconcertingly convincing and believe me when I tell you that I did not expect this.  MH and I do not share a common vision of what linguistics is all about. I am a big fan of the idea that FL is richly structured and contains at least some linguistically proprietary information. MP leans towards the idea that there is no FL and that whatever generalizations there might be across Gs are of the Greenberg variety.

Need I also add that whereas I love and prize Chomsky Universals, MH has little time for them and considers the cataloguing and explanation of Greenberg Universals to be the major problem on the linguist’s research agenda, universals that are best seen as tendencies and contrasts explicable “though functional adaptation.” For MH these can be traced to cognitively general biases of the Greenberg/Zipf variety. In sum, MH denies that natural languages have joints that a theory is supposed to cut or that there are “innate “natural kinds”” that give us “language-particular categories” (8-9).

So you can see my dilemma. Or maybe you don’t so let me elaborate.

I think that MH is entirely incorrect in his view of universals, but the arguments that I would present would rely on examples that are best bundled under the heading “structural universals.” The arguments that I generally present for something like a domain specific UG involve structural conditions on well-formedness like those found in the theories of Subjacency, the ECP, Binding theory, etc. The arguments I favor (which I think are strongest) involve PoS reasoning and insist that the only way to bridge the gap between PLD and the competence attained by speakers of a given G that examples in these domains illustrate requires domain specific knowledge of a certain kind.[3]
And all of these forms of argument loose traction when the issue involves features, categories and their innate status. How so?

First, unlike with the standard structural universals, I find it hard to identify the gap between impoverished input and expansive competence that is characteristic of arguments illustrated by standard structural universals. PLD is not chock full of “corrected” subjacency violations (aka, island effects) to guide the LAD in distinguishing long kosher movements from trayf ones. Thus the fact that native speakers respect islands cannot be traced to the informative nature of the PLD but rather to the structure of FL. As noted in the previous post (here), this kind of gap is where PoS reasoning lives and it is what licenses (IMO, the strongest) claims to innate knowledge. However, so far as I can tell, this gap does not obviously exist (or is not as easy to demonstrate) when it comes to supposing that such and such a feature or category is part of the basic atomic inventory of a G. Features are (often) too specific and variable combining various properties under a common logo that seem to have little to do with one another. This is most obvious for phi-features like gender and number, but it even extends to categories like V and A and N where what belongs where is often both squishy within a G and especially so across them. This is not to suggest that within a given G the categories might not make useful distinctions. However, it is not clear how well these distinctions travel among Gs. What makes for a V or N in one G might not be very useful in identifying these categories in another. Like I said at the outset, I am not expert in these matters, but the impression I have come away with after hearing these matters discussed is that the criteria for identifying features within and across languages is not particularly sharp and there is quite a bit of cross G variation. If this is so, then the particular properties that coagulate around a given feature within a given G must be acquired via experience with that that particular feature in that particular G. And if this is so, then these features differ quite a bit in their epistemological status from the structural universals that PoS arguments most effectively deploy. Thus, not only does the learner have to learn which features his G exploits, but s/he even has to learn which particular properties these features make reference to, and this makes them poor fodder for the PoS mill.

Second, our theoretical understanding of features and categories is much poorer than our understanding of structural universals. So for example, islands are no longer basic “things” in modern theory. They are the visible byproducts of deeper principles (e.g. Subjacency). From the little I can tell, this is less so for features/categories. I mentioned the feature theory underlying the substantive N,V,A,P categories (though I believe that this theory is not that well regarded anymore). However, this theory, even if correct, is very marginal nowadays within syntax. The atoms that do the syntactic heavy lifting are the functional ones, and for this we have no good theoretical unification (at least so far as I am aware). Currently, we have the functional features we have, and there is no obvious theoretical restraint to postulating more whenever the urge arises.  Indeed, so far as I can tell, there is no theoretical (and often, practical) upper bound on the number of possible primitive features and from where I sit many are postulated in an ad hoc fashion to grab a recalcitrant data point. In other words, unlike what we find with the standard bevy of structural universals, there is no obvious explanatory cost to expanding the descriptive range of the primitives, and this is too bad for it bleaches featural accounts of their potential explanatory oomph.

This, I take it, is largely what MH is criticizing, and if it is, I think I am in agreement (or more precisely, his survey of things matches my own). Where we part company is what this means. For me this means that these issues will tell us relatively little about FL and so fall outside the main object of linguistic study. For MH, this means that linguistics will shed little light on FL as there is nothing FLish about what linguistics studies. Given what I said above, we can, of course, both be right given that we are largely agreeing: if MH’s description of the study of substantive universals is correct, then the best we might be able to do is Greenberg, and Greenberg will tell us relatively little about the structure of FL. If that is the argument, I can tag along quite a long way towards MH’s conclusion. Of course, this leaves me secure in my conclusion that what we know about structural universals argues the opposite (viz. a need for linguistically specific innate structures able to bridge the easily detectable PoS gaps).

That said, let me add three caveats.

First, there is at least one apparent substantive universal that I think creates serious PoS problems; the Universal Base Hypothesis (UBH). Cinque’s work falls under this rubric as well, but the one I am thinking about is the following. All Gs are organized into three onion like layers, what Kleanthes Grohmann has elegantly dubbed “prolific domains” (see his thesis). Thus we find a thematic layer embedded into an agreement/case layer embedded into an A’/left periphery layer.  I know of no decent argument arguing against this kind of G organization. And if this is true, it raises the question of why it is true. I do not see that the class of dependencies that we find would significantly change if the onion were inversely layered (see here for some discussion). So why is it layered as it is? Note that this is a more abstract than your typical Greenberg universal as it is not a fact about the surface form of the string but the underlying hierarchical structure of the “base” phrase marker. In modern parlance, it is a fact about the selection features of the relevant functional heads (i.e. about the features (aka substance) of the primitive atoms). It does not correspond to any fact about surface order, yet it seems to be true. If it is, and I have described it correctly, then we have an interesting PoS puzzle on our hands, one that deals with the organization of Gs which likely traces back to the structure of FL/UG. I mention this because unlike many of the Greenberg universals, there is no obvious way of establishing this fact about Gs from their surface properties and hence explaining why this onion like structure exists is likely to tell us a lot about FL.

Second, it is quite possible that many Greenberg universals rest on innate foundations. This is the message I take away from the work by Culbertson & Adger (see here for some discussion). They show how some order within nominals relating Demonstratvies, Adjectives, Numerals and head Nouns are very hard to acquire within an artificial G setting. They use this to argue that their absence as Greenberg options has a basis in how such structures are learned.  It is not entirely clear that this learning bias is FL internal (it regards relating linear and hierarchical order) but it might be. At any rate, I don’t want anything I said above to preclude the possibility that some surface universals might reflect features of FL (i.e. be based on Chomsky Universals), and if they do it suggests that explaining (some) Greenberg universals might shed some light on the structure of FL.

Third, though we don’t have many good theories of features or functional heads, a lazy perusal of the facts suggest that not just anything can be a G feature or a G head. We find phi features all over the place. Among the phi features we find that person, number and gender are ubiquitous. But if anything goes why don’t we find more obviously communicatively and biologically useful features (e.g. the +/- edible feature, or the +/- predator feature, or the +/- ready for sex feature or…). We could imagine all sorts of biologically or communicatively useful features that it would be nice for language to express structurally that we just do not find. And the ones that we do find, seem from a communicative or biological point of view to often be idle (gender (and, IMO, case) being the poster child for this). This suggests that whatever underlies the selection of features we tend to see (again and again) and those that we never see is more principled than anything goes. And if that is correct, then what basis could there be for this other than some linguistically innate proclivity to press these features as opposed to those into linguistic service.  Confession: I do not take this argument to be very strong, but it seems obvious that the range of features we find in Gs that do grammatical service is pretty small, and it is fair to ask why this is so and why many other conceivable features that we could imagine would be useful are nonetheless absent.

Let me reiterate a point about my shortcomings I made at the outset. I really don’t know much about features/categories and their uniform and variable properties. It is entirely possible that I have underestimated what GG currently knows about these matters. If so, I trust the comments section will set things straight. Until that happens, however, from where I sit I think that MH has a point concerning how features and categories operate theoretically and that this is worrisome. That we draw opposite conclusions from these observations is of less moment than that we evaluate the current state of play in roughly the same way.

[1] This is the main theme of On Wh Movement and I believe what drives the unification behind Merge based accounts of FL.
[2] Falsification is not a particularly good criterion of scientific adequacy, as I’ve argued many times before. It is usually used to cudgel positions one dislikes rather than push understanding forward. That said, in MH, invoking the F word does not really play much more than an ornamental role. There are serious criticisms that come into play.
[3] I abstract here from minimalist considerations which tries to delimit the domain specificity of the requisite assumptions. As you all know, I tend to think that we can reduce much of GB to minimalist principles. The degree to which this hope is not in vain, to that degree the domain specificity can be circumscribed to whatever it is that minimalism needs to unify the apparently very different principles of GB and the generalizations that follow from them.


  1. This is a topic that's very close to my own research interests because

    1) the choice of feature system can have a huge effect on computational complexity (which level of the subregular hierarchy does a given dependency belong to?)

    2) I showed several years ago that every syntactic theory where subcategorization requires exact matches ("I select a DP, only a DP, and nothing but a DP!") overgenerates on a massive scale because we have no theory of categories, so all kinds of information can be indirectly encoded in category distinctions.

    Imho the main problem is that research has focused on arguing for or against specific feature systems/category hierarchies instead of identifying basic properties such a system must have.

    For example, is a category system a flat unordered set or are there necessarily implicational relations such that if X selects Z it can also select Y (e.g. everything that can select a numeral can select a noun). Or more radically, could a natural language have a system where every lexical item is its own category, but only this one category? If not, why not?

    The same goes for feature systems. Rather than argue about the specifics of certain feature decompositions or feature geometries, we should focus on general properties a feature system must have. A recent example is the work by Bobaljik & Sauerland on deriving the *ABA generalization from content-agnostic feature combinatorics.

    A more radical approach is to completely give up on describing anything in terms of features and categories. That's the route I've been taking in my approach to the *ABA generalization and morphosyntax in general. I've also looked a bit at what happens if you remove category features from syntax, and it has some nice effects:

    For example, you predict that all categorial ambiguity is resolvable within a local context. Suppose that a head can no longer say "I want a noun" but instead has to say "I can take cat, or dog, or water, or ...". The problem here is that 'water' may also be a verb. But you can tell immediately that it isn't if it has already selected 'the'. In some cases you may have to look a little deeper, but you shouldn't have to look arbitrarily deep to find a disambiguating lexical item. As far as I can tell, this holds for pretty much all natural languages. If category features are real, it's much less clear why this property should hold since the grammar never faces any ambiguity to begin with.

    The generalization that heads don't care about the arguments of their arguments can also be explained in this fashion, but is mysterious in a system with category features since we have no criterion to rule out "Possessive determiner selecting an animate DP and an inanimate mass noun" as a possible category.

    1. @Thomas: what you say here about lexical categories seems to me to be pretty much mainstream – at least within DM, or any other approach that endorses category-neutral roots. What is the ontological status of this linguistic object "water" you speak of, such that it can be a noun or a verb? Well, the DM (et al.) take on it is that it is a syntactic terminal whose category is determined contextually. (In DM, this is done by the closest little-x head dominating it; but, formally speaking, this is contextual, rather than featural, specification of category.) Am I missing something?

    2. I had a longer post here that got eaten (argh). But long story short, decomposing LIs into roots + category-assigning heads doesn't get you around any of the problems. Without a theory of little x-heads, you can still proliferate them as much as you want and use them as buffers to encode all kinds of information. And you do not have categorial underspecification, within a derivation the cateogry of each LI is explicitly encoded. As a result you can design very unnatural systems:

      "1-Local skipping selection": the head does not care about the category of its argument, but about the category of the argument of its argument

      "n-local skipping": the same, but now its the argument of the argument of the argument... of the argument.

      "non-local skipping": the head only selects an argument if a phrase of category X has been selected anywhere inside of the argument, no matter how much earlier

      "Boolean selection": any random Boolean combination of skipping selection criteria

      "counting selection": selection is licit only if at least n categories of type X are contained in the selected phrase

      And so on, and so forth. Most of those systems are impossible in a system without categories/x-heads where selection is local. Once there's some encoding mechanism for LI categories, all these things become available and one needs a good theory of categories to block them.

    3. @Thomas: I must be missing something. If you don't have categories, how do you generalize? You say that a head (for example the verb "like") says “I can take cat, or dog, ...” -- a list of nouns (nevermind that I think transitive verbs combine with D, not N, and nevermind that I am not 100% sure that transitive verbs actually subcategorize for any syntactic category). How does the learner know that the next transitive verb (say, "hate") takes the same list? Isn’t the list the category? Learners don't have to establish a new list of possible objects for each new verb, they just assume that the new verb combines with the things on the noun list.

      If you learn a new noun, you add it to the list, not just to the subcategorization frame for the verb you heard it with, and if you learn a new transitive head then the default assumption is that if it combines with one thing on the list, it combines with anything on the list.

      If it turns out that membership is predictable from some property that all of the things on the list share, then that’s insight as opposed to listing, but either way you have a category.

      As for the locality problem, I agree that we can’t have an unconstrained algebra of categories that allows non-local selection. So what you must be saying is that modeling categories as lists somehow prevents the sort of category abuse that would allow 1-Local skipping and so on. I guess I have to read your paper.

    4. Having categories in the learner is not the same thing as having syntactically represented categories. A learner might say "alright, this X is new and occurs in a place where Y occurs, so I'm gonna put X in the same bin as Y". So then you can build an entire system of bins and define a mapping from lexical items to bins. Alex Clark has something along those lines, I think it's in his paper The syntactic concept lattice: another algebraic theory of the context-free languages.

      But that's not the same thing as putting a category feature on a lexical item, and the difference arises when there are multiple lexical items that have the same phonetic exponent but belong to different bins. If your syntactic representation tells you "this node is X, with category C", there is no categorial ambiguity, and you get all the problems that I described above. When it only tells you "this is X", things are trickier because X might be a C or a D. Now suppose that the grammar must be able to resolve this ambiguity within a local structural context (and keep in mind that without category features, all the DM-style little x-heads will look pretty much the same, so they don't help). Then many of the unwanted selectional systems become impossible.

      There's also a more technical way of saying this that you might find conceptually more appealing. We can say that category features do exist, but a category system must satisfy the property that it is impossible for all three of the following to hold simultaneously:

      1) X and Y have the same base form (~ phonetic exponent),
      2) X and Y belong to different categories,
      3) ignoring category features on lexical items, there is some fixed n such that at least one syntactic configuration S of size n can contain X but not Y.

      I think that's pretty much on the right rack empirically, with the important open issue being how small n is.

    5. Sorry, messed up statement 3):

      3) ignoring category features on lexical items, there is some sufficiently large n such that every syntactic configuration S of size n that contains X may also contain Y instead.

      So for example, you cannot refine category X into X-him and X-nohim to distinguish betweens heads whose argument is a subtree (not) containing *him*. That's forbidden because *him* may be arbitrarily far away from the X-head. So X-him and X-nohim would look the same, have different category, but cannot be distinguished within any (category-free) syntactic context of size n.

    6. @Thomas: I don’t understand the relevance of phonological exponence of lexical items, so I’m still missing something, but setting that aside, here’s the simplest theory of categories, features, and selection I can imagine.

      The inventory of heads in a language is partitioned into categories (the learner assigns a category to each head; these are like your “bins,” I guess). Each head has one and only one category (for those of you who like roots, let "root" be a category for purposes of this exercise; alternatively, roots aren’t heads). A head may also bear one or more features, either bundled (no structure) or in a stack (ordered), but without further structure (features don’t have specifiers).

      A feature is a combination of a category (or possibly some other atomic label for a class of elements, such as [wh]) with a second-order property. A second-order property is an instruction to perform a syntactic operation (I’m combining elements from such sources as Stabler 1996, 'Derivational minimalism,' Rizzi 2010, 'On the elements of syntactic variation,' Adger and Svenonius 2011, 'Features in minimalist syntax,' and most recently, my forthcoming paper 'Syntactic features').

      A subcategorization feature would be a combination of a label for the class of whatever is subcategorized for -- by hypothesis a category -- with a second-order property requiring the head bearing that feature to merge as soon as possible with something of that category. The number of second-order properties is restricted by the number of distinct kinds of syntactic operations, as in the Minimalist grammars (up to twice as many, in the sense that a probe-goal pair might involve one operation but two second-order properties, e.g., [uwh] probes for an [iwh] goal).

      There are no other symbols and no feature embedding, hence no more complex categories (Adger 2010 'A minimalist theory of feature structure'); a head has a stack or bundle of features, which don’t themselves bear features -- only second-order properties determining parametric options having to do with move, agree, licensing, and spell-out.

      Some versions of Minimalist Grammar could be characterized as having a third-order rank of properties, like merge-left vs. merge-right, or head vs. phrase movement, or overt move vs. covert move, as additional specification of a merge instruction. But these are not tantamount to feature embedding, because the third order properties are distinct from the categories and features and second order properties (which could then be simplified, as when Norbert reduces Agree to Merge, I think). The combinations are still finite and restricted by the total number of kinds of syntactic operations.

      Your 1-local skipping selection would seem to require two additions to this simple theory of features and categories: it would require a head to bear a feature consisting not of a category plus a second-order property, but a feature consisting of a *variable* over categories, further bearing a subcategorization feature. So it involves a category variable, which I don’t need as far as I know, and also feature embedding, which has been indepedently argued against in Adger 2010 and Adger and Svenonius 2011 already mentioned.

      I would furthermore assume that what string of phonological segments is associated with a head is independent of the syntactic identity of the head, so homophony is irrelevant to the grammar, though it might play a role in learning, for example in a violable principle of homophony avoidance.

    7. I meant to put in a link to the unpublished paper I mentioned above, an encyclopedia entry on syntactic features:

    8. From my perspective that's a lot of stipulations to put in place. I don't want to stipulate a specific category or feature system. I want to identify general properties that hold of categories and subcategorization in natural language independently of how subcategorization is encoded in the grammar. That is to say, I want certain base requirements that any proposed category system has to meet in order to prevent non-local skipping, Boolean selection, counting selection, and so on.

      More importantly, though, I don't understand how your system rules out any of those unwanted options (unfortunately I haven't had time yet to read to paper; thx for the link though). As far as I can tell, the choice of head and category is still unrestricted.

      Suppose I posit a grammar that distinguishes two heads X and Y; both are spelled-out as "like", X has category V, Y has category V_refl, and X takes any DP as complement, Y only reflexive DPs. What aspect of your system would that violate?

      Since you assume that categories partition the set of lexical items, there must be duplicates to accommodate cases of categorial ambiguity such as "water". A partition maps each lexical item to exactly one category, so there must be two different lexical items spelled-out as "water", one a V, one a N. So the fact that X and Y are pronounced the same isn't a problem.

      Furthermore, the choice of subcategorization feature must be allowed to be at least partially determined by the category of a lexical item, you say so yourself. So that part of X and Y works out to.

      And by the same logic one may posit a head Z that only selects V_refl. And now we have a head selecting for verbs selecting reflexives --- which we do not want.

      But perhaps V_refl violates your ban against complex categories (a useful term, I'll run with it)? If so, I don't see how. You do not define what makes a category complex, and that's exactly what makes subcategorization so hard to constraint. As far as the grammar is concerned, V_refl is not a complex category. It is just a category. V and V_refl could just as well be called "X" and "Y", or "17" and "Seth Green".

      That is the major loophole in all theories of subcategorization, and so far it's only been plugged by stipulating a specific list of non-complex categories. I find that unsatisfying. I want to know what encoding-indpendent aspect separates complex from non-complex categories, so that we can state an effective, formal universal on category systems that will make it impossible to smuggle in complex categories through the backdoor.

    9. It seems to me that the situation you describe here isn't what you called 1-local skipping, it's just selection for a category that has limited selectional properties. I don't see any reason to rule that out.

      I thought that what you called 1-local skipping was when a subcategorizing head doesn't care about the category it merges with, but does care about the subcategorization features on that category. I agree we don't want that.

      I believe it is ruled out by Adger's (2010) no complex categories generalization, which bans feature recursion (also promoted in our 2011 joint paper). For your head Z to select all and only things that subcategorize for your category reflexive DP, Z would have to have a subcategorization feature for a variable over categories which are specified as subcategorizing for reflexive DP. So their subcategorization feature (for the variable) would have to contain another subcategorization feature (for the reflexive). That's feature recursion, common in HPSG but eschewed --- and in my opinion appropriately banned --- in Minimalist Grammar. It would be something like Z[subcat:X[subcat:Reflexive DP]], where Z is a head but X is a variable over categories of head.

      I’ve never heard of a language in which a head, say some kind of aspect or a desiderative or something, selects only for reflexive verbs. So it seems to me that learners don’t tend to posit a category V-refl, distinct from V, for verbs that happen to exclusively require reflexives. I believe that's because category (as opposed to subcategory features) is determined by external distribution, i.e. what the maximal projection of a head is embedded under. verbs with and without reflexives tend to appear under the same set of tenses and aspects and embedding predicates, and the learner's expectation is that if that's mostly true, it's true -- so they assume that reflexive and nonreflexive V are the same category, possibly with different featural specifications such as subcategorization requirements.

    10. 1) Once you allow something like Z, you get massive overgeneration. That's a corollary of the equivalence of category-refinement and MSO constraints, which I described in my Glossa paper.

      2) Z produces the behavior of 1-local skipping, even though it is not implemented as 1-local skipping. That's an instance of the power of subcategorization in a free category system.

      3) No subcategorization variable is needed. I just posit a head Z that selects for V_refl, another Z that selects for X_refl, and so on. That's exactly parallel to saying that *the* may select mass nouns or count nouns (two options), whereas *a* may only select count nouns (one option).

      4) External distribution of maximal projections is exactly what would lead to a V and V_refl split because VPs containing a reflexive don't have the same distribution (they need a phi-matching antecedent in a c-commanding position).

      But I feel like I'm doing a horrible job at explaining why this is a problem that's so hard to avoid at a formal level. Time to write a paper, I guess ;)

    11. This reminds me somewhat of the pre GPSG arguments that CFGs were inadequate because they couldn't model subj verb agreement. The reasonable intuition is that nonterminals in the grammar should correspond to syntactic categories, rather than to bundles of information that flow up and down the tree. Unless we have an unbounded amount of such information we can stick it in a single atomic category at the price of having an excessively large and redundant grammar.

      If you have an unbounded number of categories then there isn't really a way to rule out this sort of thing, though it is cheating in a sense.

    12. @Alex: Yes, that's exactly it. GPSG's AVMs are what Peter would call complex categories, and from the linguist's perspective they are indeed complex. But in the grammar itself they can be treated as unanalyzable atomic objects so that every category is simplex.

      I believe there are ways of avoiding this that revolve around requiring closure properties and certain entailment relations for category systems. But it's tricky, and there isn't that much empirical data to build on because selection hasn't been studied nearly as much as displacement.

    13. @Thomas, re your 4 on March 3rd: Subjects that bind reflexives are not different from subjects that don't; so from where I'm sitting the external environments of V with and without reflexive objects look identical. I think category can be determined on the basis of very local information.

  2. I agree with Norbert that there are good reasons to treat structure and feature content differently, and in fact I would argue further for separating feature content (like past, plural, distal) from the kinds of syntactic instructions features can carry (like move or agree, or whatever distinguishes those two).

    But I think there are good candidates for PoS-type arguments even among the contentful features. Take phi features of person, number, and gender, which have some special relationship to DP modification and to argument agreement. One of the phi-features is number, where languages very commonly make sg-pl or sg-du-pl distinctions and occasionally a handful of other distinctions (trial, paucal, some few other, and according to Daniel Harbour, all analyzable in terms of primitives like atomic, minimal, and augmented).

    As a phi feature, number marking commonly shows up in nominal morphology, including on determiners and adjectives, agreement morphology on the predicate, and the pronominal system.

    A question is, what is it that makes the singular-plural distinction linguistically special in this particular way, as opposed to, say, a three-way distinction between (i) singulars, (ii) collections which are connected or otherwise move together (bunches of grapes, fingers on a hand, wolves in a pack, a swarm of bees, floats or weights on a net, fringe on a garment, ripples on a pond), and (iii) pluralities that don't move together (groups of stones, rabbits, sunflowers, arrows not quivered, etc.)? Or items and collections of items which are manageable enough to carry versus ones that aren’t? Or any number of other ways of categorizing ways of grouping things.

    The distal-proximal distinction is so common that it is a good candidate for a universal feature of demonstratives, but it's not a phi-feature. In principle it could be a phi feature and get copied onto the noun and any attributive adjectives, and get coded in the argument agreement on the verb, the way plural commonly does -- but it doesn’t.

    So you commonly find "these-du good-du hunters-du caught-du.subj-pl.obj some-pl brown-pl kangaroos-pl" but not "these-proximal good-proximal hunters-proximal caught-proximal.subj-distal-obj that-distal brown-distal kangaroo-distal." There are some cases that look a little like proximal/distal agreement, but they're rare and could possibly be analyzed another way, whereas plural agreement is common and often mechanical.

    1. A possible angle on Peter's point might be that deictic features help to specify the referent of the NP by means of its relationship to the discourse situation, resembling in this way Person, which is also very reluctant to spread, as various people have noticed (I think Halldor Sigurðsson has written about this, but can't remember what or where).

      Gender and Number otoh help to specify the referent terms of its intrinsic features, while Case specifies its relationships to other things, events and situations in the situation being talked about, rather than the situation being talked in.

      Why this should affect the tendency to spread is mysterious, but it is at least a correlation.

    2. @Peter: one example of which you are no doubt aware, but perhaps other readers of the blog are not, which looks exactly like proximal/distal features "spreading" (i.e., being marked on the predicate) is Ritter & Wiltschko's work – specifically, regarding Halkomelem (Salish). See here:

    3. Thanks but I think the Halkomelem case is very different from what I was suggesting isn't found -- in Halkomelem, you don't copy a distal/proximal feature which is inherent on an argument onto an argument slot in the predicate, instead you mark the predicate according to whether the event is distal or proximal. So distal/proximal is still behaving different from number (and, another point in my comment, number in this sense is pretty much always singular/(dual)/plural, not "connected quantities" or "quantities that fit in one's hand" or any other of an infinite variety of conceivable ways to classify referents).

    4. Right, I agree on both counts. I was just pointing out something that comes close – or at least, closer than I thought possible until reading that paper.

  3. Thus we find a thematic layer embedded into an agreement/case layer embedded into an A’/left periphery layer. I know of no decent argument arguing against this kind of G organization. And if this is true, it raises the question of why it is true. I do not see that the class of dependencies that we find would significantly change if the onion were inversely layered (see here for some discussion). So why is it layered as it is?

    At least part of this seems amenable to an account in functional/processing terms. As a first shot, the periphery (left and right) is a key site for discourse-relevant stuff; left and right periphery positions have long been known to be key sites for things like topicalisation, question word fronting, and discourse markers & particles (cf. Lambrecht on information structure, basically all work in interactional linguistics, or Martina Wiltschko's work for a more formal take on same). Given this, it is actually highly unlikely that a reversely layered onion would be able to work in the same way: given how we use language in interaction, you want to have the interactionally relevant stuff in the periphery where it can do things like draw attention, manage discourse expectations, and so on.

    I'm not sure what the account would be for the relative ordering of thematic vs. case/agreement features but it is probably a useful idea to look to functional/processing considerations first — things like salience, reference tracking, given/new etc. are likely to play an important role in explaining the layered structure of the clause (indeed, in using that phrase, I realise that this is one of the slogans of Role and Reference Grammar, which offers another functionally motivated take on this).

    1. @mark: if years of work on case & agreement have taught us anything, it's that reducing them to things like salience, reference tracking, given/new etc. is hopeless. They might be historically rooted in such things (I have no way to judge that assertion); they might be statistically correlated with such things (though I can say with confidence that the correlations in question would not be crosslinguistically stable); but they do not deterministically reflect any such extra-linguistic categories.

    2. @Omer Preminger: Well, given the hybrid status of languages as constrained by brains yet shaped by and for interaction, historical motivations and statistical correlations are about the best one can hope for, and perfectly admissible as empirical evidence; indeed few if any functionalist/processing theories would expect or predict fully deterministic accounts of structural relations in language. I'm inclined to say that's not a bug, but a feature.

    3. @mark: Looks to me like you and I are using the terms "theory / account of" very differently... Agreement and case are about as deterministic as anything you'll find (yes, there are errors and "agreement attraction"; but as Gleitman et al. taught us, you can also bend people's judgments about "how even/odd is this number?"). So I have no qualms saying that agreement systems _developed_ for this or that functionalist/processing reason – like I said, I simply don't know how to judge such claims, so I'm happy to leave it to the folks who do – but that's not the thing I am trying to give a theory/account of.

  4. With respect to the "anything does not go" point, I think that examining sign languages can be illuminating. I am not a formal expert on American Sign Language, but space definitely seems to be a phi feature, i.e. you get subject- or object-verb agreement depending on the assigned location of the referent in space. So it seems exactly like the availability of space affords sign languages the option of using space as a phi-feature while preventing its use in spoken languages.

    1. Schlenker has a good paper on the use of spatial "indices" in ASL in NLLT in 2016. The feature involved is not the proximal-distal distinction seen in the world's demonstratives.

  5. "The atoms that do the syntactic heavy lifting are the functional ones, and for this we have no good theoretical unification (at least so far as I am aware). Currently, we have the functional features we have, and there is no obvious theoretical restraint to postulating more whenever the urge arises. Indeed, so far as I can tell, there is no theoretical (and often, practical) upper bound on the number of possible primitive features"

    Hm. I wonder if there are two intermingled issues here: how many features do we have (and could we have) and what are they?

    The first questions seems relatively tractable to me. Suppose features are all abstract and ±. How about "we have as many as needed to make binary structures suitable (for instance for the LCA or for lexical insertion) but not more?" for an answer? Note that this would ultimately collapse generalizations about features and categories into structural ones (because features exist to deal with structures).

    The second (that is the why person but not edible? or why number but not proximal?) is more mysterious to me. In fact, I find it especially mysterious for those supposedly linked to the left-periphery (why focus? hell, if FL evolved independently from communication, why ±wh?).

    If I had to offer a speculation, I would say (gun to my head) that all commonly found features ultimately reflect one specific human cognitive ability which is clearly linked with language acquisition (yet is rarely discuss explicitly as such, at least in syntax), namely pointing. But the road is long between that and even the semblance of a theory.

    1. @Olivier: Just a minor point: there is, I think, pretty good evidence that features in syntax are not +/−, but privative. That is, there are just features vs. the absence of those features.

      (Of course, any privative system can be recast as a +/− system in which the '−' values are systematically inert to syntactic operations, but that just seems like an obfuscation to me.)

      If you're interested in the relevant arguments, here are some slides from an invited talk at BLS from last year.

    2. Thanks Omer. Aside from the empirical validation (important, of course),I think the privative approach you advocate is also more natural. I also believe that such an approach and especially the specific cases you discuss in K'ichean are evidence in favor of ultimately reducing features to structural conditions as a way to solve Norbert's conundrum.

  6. Thanks for discussing my blog post! As I have said repeatedly elsewhere (e.g. here, I don't deny the language instinct, and I find the PoS for it quite convincing. But being more of a linguist than a philospher, I'd like to see worked out proposals for structural universals that actually work or that can at least be tested. So I don't see that claimed structural universals (ECP, X' theory, Binding theory) fare much better than substantive universals. I know that many of my empirically minded colleagues are more optimistic for the future, but what seems to be clear is that even after five decades of generative grammar, we have many more new ideas than stable results.

  7. I am not surprised that you don’t buy the claims, though I am glad you buy the logic. I would not however, contrary to something your last sentence implied, there has been remarkable stability in the basic findings in GG since its inception. So, we have no good evidence that there exist lowering rules, or that “adjuncts”can extract out of islands, or that movement can target non CC positions, or that control into embedded object positions exists or that a head can select the complement of its complement or...I could go on. These principles are firmly established and I take your skepticism about them to be more a reflection of an (no doubt admirable) general skeptical attitude than a conviction based in an appreciation of the material. But that is just my view.

  8. There may be no good enough evidence for lowering rules, but actually, there is no good enough evidence for most of the tree structures that we see routinely in syntax papers – so the claim that no lowering rules exist is almost impossible to evaluate. And in my perspective, the notions "head", "adjunct", and "embedded object" are substantive categories, just like "verb" and "noun". Note that I'm not generally skeptical – in fact, in my Stuttgart talk a few days ago, I made a large number of empirical claims about universals (of the Greenbergian type). I would say that I am merely asking for more rigour in stating the universal claims.

  9. I don't think there is a large consensus anymore that it would be possible to find very deep or interesting substantive universals in phonology either. The main piece of evidence for features is that certain sounds behave as a natural class for phonological processes and there is quite a catalogue of sounds behaving in the same way for which there is no obvious 'natural' phonological feature. And even rather superficial looking universals such as 'every language has a [t] sound' require quite some level of abstraction if we want them to be true (for example: is it enough if there is some sound which has [t] as an allophone?) There is even no consensus on what the substance should be (whether it is articulatory or acoustic, for instance).
    Although there is still a relatively large group of phonologists who work under the assumption that there is some universal list of phonological features, there is also a rather large group of scholars who believe that there may be features but these are purely abstract labels, which the learner can then link to some set of phonetic events. All 'substantive' universals in this respect are then supposed to follow from mechanics, acoustics, etc., alone; e.g. the fact that all or most languages have a [t] follows from the fact that this sound is easy to make and easy to distinguish. But what makes this into language is that we force it into the frame of features.

    1. Martin is being somewhat disingenuous here. I was at his talk at Stuttgart where he presented some of the material he alludes to and there were, ahem, acknowledged large gaps in his proposals. Some of the universals were not all that universal and the functional explanations for these universals have more the flavor of just-so stories than actual implications of the assumptions. So, black pots and kettles seems an appropriate point to make as regards the skepticism regarding the claim that "there is no good enough evidence for most of the tree structures that we see routinely in syntax papers." I would differ, and have, but I will let readers decide for themselves.

      Last point: Martin also misspeaks when he points to the problem being one of "rigors." The principles are stated rigorously enough. Martin's point, if I get him, is that there is little data to support the claims (see quote above). But this is NOT an issue of rigor but one of empirics. I understand why Martin finds this wanting: he is besotted with surface forms (hence his attention to Greenberg Universals). The generalizations I mentioned all have to do with the structure and operations of Gs, not their realized surface forms. This means examine Hs not their outputs. I know that Martin finds this sort of exercise wanting. So be it. My aim is not to convince him. But people should appreciate that the problem Martin has with GG has to do with the idea that Gs have design features restricted by FL/UG. This is already an abstraction too far for him. Given this, his skepticism is more than a tad tendentious.

  10. No, the non-convergence of proposals in GG is not primarily an empirical issue – it's not only that cross-linguistic evidence is not taken into account sufficiently when making general claims. The biggest problem is the point that Marc makes: "The main piece of evidence for features is that certain phenomena behave as a natural class" – but there are many different possible classifications, and most claims in GG papers rely on one of many possibilities that happens to be currently widely adopted (e.g. strict binary branching of trees). This means that the proposals are not really testable, even if one had a lot of cross-linguistic data.

    1. Though there is something charming about the oracular style, it does not lend itself well to debate. I have provided some examples of what look to be pretty well grounded universals: no lowering rules, no extortion of adjuncts out of islands (ECP-adjunct effects), head to head selection and NO selection of the properties of the complement of one's complement. You are skeptical, but this seems to me currently an entirely rhetorical stance. Wholesale skepticism is cheap. Detailed skepticism is not. So, you are skeptical that Gs work this way. Fine, let's see some detailed arguments. I hereby invite you to choose a budget of Chomsky Universals and show that they fail. Take some hard ones (I would love to see the evidence AGAINST binary branching (though, IMO, if true this principle is not that solidly grounded theoretically). Show us that Gs systematically violate them. Then we will have something on the table to work with. So, to repeat, I invite you to post a detailed version of your skeptical position for all to see and evaluate.

  11. Thanks for continuing the discussion, but the problem is not that any claims are wrong - the problem is in the presuppositions, e.g. that all of syntax works in terms of binary trees. Linguists who grew up in the GG paradigm may simply not think about this, which is why I keep emphasizing the point: binary branching trees are not a discovery, but a notational decision. And transformational rules themselves are notations for which there are alternatives. So I cannot evaluate the claim that no lowering rules exist - I see this as a claim about notational preferences, not one about languages. It is true, however, that wh-fronting is frequent and wh-rightward shifting is hardly attested, and this is indeed a discovery of linguists inspired by GG (though one that involves the substantive notion wh-).

    1. So, I take it that you are not accepting my offer to make your case in detail. Is that right? Look, binary branching has implications for c-command and principles A, B and C, as well as blinding. Transformations have implications for scope, binding etc. These theories have empirical backing and theoretical motivation. To a argue that you cannot argue that they are wrong suggests, to me, that there is something right about them. But, to repeat, facile skepticism is, well, facile. I invite you again to make your case in detail so that it can be evaluated. If you cannot do this, then I would draw the conclusion that perhaps it is not makable because it is wrong, not because the position you are skeptical of has little content. The latter is an interesting and very strong claim, one that requires a lot more than simple assertion to be taken seriously, or so I would hope. I make the offer: make the case. I'd love to hear it.

    2. I am somewhat sympathetic to Martin's point. Many of the generalizations in GG are stated in terms that are more specific than can ever be supported empirically because of mathematical equivalences. Binary branching, for instance, only has implications for c-command given a specific definition of c-command. Change the latter, and you can also relax the former without losing any empirical coverage.

      Here's a concrete example: define c-command not in terms of dominance over phrase structure trees but derivationally in terms of feature checking (assuming features for selection). Then it does not matter at all what the output structure looks like, it could even be a string and you'll still have c-command. All the examples listed by Avery below would come out exactly the same.

      But my take home message from that isn't that the idea of binary branching and c-command is insufficiently supported. Rather, these GG claims home in on more abstract properties that we do not have the conceptual vocabulary yet to express in suitably general terms. Imho that's exactly what makes mathematical approaches so insightful, they make it much easier to tease apart substance and notation.

    3. I don't think it's ever been a particularly controversial point within GG that's it's difficult to find empirical evidence for specific mechanisms. The fact that two theories are mathematically equivalent in some respect does not rule out the possibility that they can be distinguished empirically. It does, however, often mean that the usual methods (acceptability judgments, etc.) won't suffice to do so. Chomsky has a lot of good discussion of these issues in the intro to LGB. I particularly like the following two paragraphs (which he quotes from Chomsky 1977):

      The pure study of language, based solely on evidence of the sort reviewed here, can carry us only to the understanding of abstract conditions on grammatical systems. No particular realization of these conditions has any privileged status. From a more abstract point of view, if it can be attained, we may see in retrospect that we moved towards the understanding of the abstract general conditions on linguistic structures by the detailed investigation of one or another 'concrete' realisation: for example, transformational grammar, a particular instance of a system with these general properties. The abstract conditions may relate to transformational grammar rather in the way that modern algebra relates to the number system.

      We should be concerned to abstract from successful grammars and successful theories those more general properties that account for their success, and to develop [universal grammar (Chomsky's elision)] as a theory of these abstract properties, which might be realised in a variety of different ways. To choose among such realizations, it will be necessary to move to a much broader domain of evidence. What linguistics should try to provide is an abstract characterization of particular and universal grammar that will serve as a guide and framework for this more general inquiry. This is not to say that the study of highly specific mechanisms (e.g. phonological rules, conditions on transformations, etc.) should be abandoned. On the contrary, it is only through the detailed investigation of these particular systems that we have any hope of advancing towards a grasp of the abstract structures, conditions and properties that should, some day, constitute the subject matter of general linguistic theory.

    4. "the problem is in the presuppositions, e.g. that all of syntax works in terms of binary trees. [...] binary branching trees are not a discovery, but a notational decision. And transformational rules themselves are notations for which there are alternatives. So I cannot evaluate the claim that no lowering rules exist - I see this as a claim about notational preferences, not one about languages."

      As an outsider from linguistics, I have always been quite mystified by this point, which apparently proves way too much or is tautologous. Let's start with a paraphrase: So I cannot evaluate the claim that there exists a central force in 1/r^2 between massive bodies - I see this as a claim about notational preferences, not one about physical objects. Why, yes, that would be true, but rather uninteresting as epistemological observation. The point, easily understood in that case, is that this notational preference leads to precise, new empirical predictions, and these can be evaluated. That's just normal science.

      So notations are notations, agreed, but so what? The crucial criterion is what one can learn from a general statement like "no lowering rules" and in particular 1) are the notations clear and well-defined enough so that they can be applied in other contexts? and 2) are the claims clear and general enough so that they can be evaluated independently (especially, in new situations where they could be false)?

      I claim that binary branching, as typically used in minimalist syntax, passes these elementary tests, and the proof is in the pudding: I can read an account of wh-movement in English and Chinese and deduce, all by myself, an account of wh-movement in Japanese and Spoken French, complete with empirical predictions (which may or may not be correct, but which exist).

      Or to be concrete, I can read the account of wh-in situ in Japanese in Miyagawa and deduce (all by my myself) the following paradigm in Spoken French and English.

      Iwen, il a apporté quoi? (Iwen, he has brought what?)
      *Personne, il a apporté quoi? (No one, he has brought what?)
      What did no one bring?

      Assuming the general framework and binary branching and movement, I just made this prediction, about a language, on a basis of related but quite different property of a completely different language. No reference to binary branching, transformation or whatever in the prediction. It's a plain empirical prediction (which turns out to be correct, but in some sense that is not even the main point). How is that different from Newton making the notational choice of positing a central force in 1/r^2 and predicting that planets orbited the Sun in ellipses?

      So of course one does not evaluate *in the same way* the claims "no lowering rules" or "there are phases" and the claim ""Je suis je" is ungrammatical" but that doesn't prevent *in principle* an evaluation of theoretical claims in linguistics, at least not more than in physics.

    5. Unfortunately, in practice it doesn't often work like this... "Movement" is not directly observable, and neither is c-command (or "higher"/"lower" positions). Until 1988, it was normal to assume that a clause like "Kim gave Lee money" has a ternary-branching VP, with very different c-command relations. Jackendoff argued forcefully for this older view, and he didn't lose the argument – but the mainstream ignored his objections and adopted the general assumption of binary branching and what it entails for binding etc. So the current mainstream view is not a discovery, but a convention of a certain (influential) group of scholars.

    6. The fact that gravitation is in 1/r^2 is also not directly observable. Who cares? Are the empirical predictions that follow from c-command (or "higher"/"lower" positions or whatever) directly observable? Yes? So where is the problem?

      "but the mainstream ignored his objections and adopted the general assumption of binary branching and what it entails for binding etc. So the current mainstream view is not a discovery, but a convention of a certain (influential) group of scholars."

      Personally, I really couldn't care less about who is or is not supposed to be influential. But you say *the mainstream ignored his objections* OK, assuming you're right, this entails that they were objections! That is to say, it entails that the binary branching model made directly observable predictions (wrong one, according to you, or according to Jackendoff). So where is the problem? That's just completely normal.

      Say the predictions are wrong. Show us the better alternative predictive model. But why suggesting that everything is unintelligible because there are abstract notions or that everything is a fad? (I admit the latter gambit often feels slightly directly insulting to me, because I'm a complete outsider who just happens to read papers in linguistics, so I resent the implication that somehow I am deluded for finding a meaning in them or that I believe there is one just because I want to hang with the cool kids or something.)

    7. but the mainstream ignored [Jackendoff's] objections

      The objections weren't really ignored. Larson wrote a response to Jackendoff's paper (which was partly directed at Larson 88), and Jackendoff wrote a response to that in turn, IIRC. Strict binary branching is an odd example to choose, as few if any analyses in generative grammar depend on strict binary branching. Some analyses depend on particular structures not being flat (most obviously ditransitive VPs), but one can argue that certain structures aren't flat without committing to the hypothesis that all structures are binary branching.

      In practice, simple 'flat' trees come at the price of more complex structural relations, and simple structural relations come at the price of more complex trees. The mainstream has chosen the latter trade-off. In the end nothing very much seems to hinge on this.

    8. I don't think that 'Movement' is any more obscure than central functionalist terms such as 'Grammaticalization', and possibly less obscure, although perhaps that is because I have spent a lot more time thinking about Movement. In both cases, there are clear examples (fronting fronting rules in Icelandic, the changes from free demonstratives to articles and clitic pronouns on verbs in Romance languages), and murky areas + not so clear theoretical verbiage.

    9. Grammaticalization is obscure indeed, like many concepts in diachronic linguistics. Movement is often a useful notational device, but I'm not sure how one could tell whether it's somehow real. GG often tries to use notations for explanation, but I think that this should only be the last resort.

  12. One point here is that there is no need to *assume* that *all* syntax is binary branching in order to collect and contemplate arguments that at least some of it is. So one point that I find interesting is that if we assume binary branching, the standard spec-before, complement after order, and that quirky case requires a kind of tree-adjency, then we get explanations for the fact that in Icelandic, you get to have at most two quirky case-marked NP (because a third at the top would have the one in spec in the way), and of various differences in properties between the first and second quirky NP.

    If you also suppose that the positions of the two objects in double double object constructions are not strongly fixed, various other things also seem to get explained, as detailed in work by Anagnostopoulou, Georgala (Euythymia), and others.

    Sometimes it relatively easy to restate the results nonbinary terms, and some times it's harder, but it does *not* have to be taken as an assumption.

    I'll add that although binarism is sometimes seen as a wierd linguistic thing, perhaps instead it is a more general thing having to do with efficient memory organization. Consider that if the 'now or never' principle of the _Creating Language_ book is accepted, we need to bang events into some kind of structural format extremely rapidly in order to remember anything about them, and binary trees are optimal for storage and search under many circumstances

  13. "Principle, A, B, C... These theories have empirical backing and theoretical motivation" – I've long wondered about the empirical backing of Chomsky's binding theory, which works fairly well as a description of English, but what does it say about human language? The notions "anaphor", "pronoun" and "r-expression" are categories, and if these cannot be identified across languages (as Norbet seems to tend to agree), then the binding theory is not a theory of human language. I have a 2008 paper ( where I discuss a range of universals involving "anaphors", based on a rigorous cross-linguistic definition of this notion. I make no use of the devices of GG.

    1. Martin, by 'language' do you mean E-languages (languages understood as outputs)? I-language, I think, might imply that a human language needs to be nothing more than Merge plus a few concepts. On this approach, a computer programming toy language might count as human language but lack many of the features we find in instituted E-languages (like English). I'm not a linguist but that's how I understand the I/E-language distinction these days.

  14. I mean languages as conventional (or normative) systems, as the term is used in everday language. I don't think many people mean the output when they say language, and we don't know much about our internal systems for conforming to the conventions.