Monday, November 30, 2015

In case you missed it; a remark on Gallistel's conjecture.

Patrick Trettenbrein posted a comment on the recent little post on Gallistel's conjecture (here). He included a little paper of his that reviews some interesting new work on aplysia that provides further evidence for the Gallsitel conjecture. Here are the concluding paragraphs.

All in all, it seems that there indeed are two different processes at work in learning and memory, as Chen et al. (2014) also point out. While the exact details about both remain obscure, there appears to be a dissociation between the way in which learning occurs and how memory works. We do not know how the brain implements a read/write memory, but there is good evidence that it does. Similarly, there is ample and convincing evidence, also in Chen et al. (2014), that synaptic conductivity and connectivity play a role in regulating behavior. Consequently, it appears that synaptic plasticity might not so much be a precondition for learning as it is a consequence of it, so that the observed rewiring of synaptic connections might constitute the brain's way of ensuring an “efficient,” or possibly even close to “optimal” (Cherniak et al., 2004; Sporns, 2012), connectivity and therefrom resulting activity pattern that is appropriate to environmental (and presumably also “internal”) conditions. Synaptic plasticity thus might be reinterpreted as a way of regulating behavior (i.e., activity and connectivity patterns) only after learning has already occurred (i.e., after relevant information has been extracted from the environment and stored in memory).

Extrapolating Chen et al.'s (2014) findings stemming from work on Aplysia to claims about much more complex nervous systems is, of course, speculative in nature, to say the least. However, it seems to be no more speculative than the almost universally accepted idea of the synapse being the locus of memory. Similarly to Johansson et al. (2014), the work of Chen et al. (2014) shows that (1) there is plenty of “room” for the implementation of symbols other than synapses, and (2) substantiates the understanding that the network approach of connectionism might indeed best be seen as an implementational theory (Fodor and Pylyshyn, 1988) that still requires representation, computation, and a Turing architecture (i.e., a read/write memory). Gallistel and Balsam (2014) proclaimed that is was about time to rethink the neural mechanisms of learning and memory, Chen et al.'s experimental results add to the urgency of this claim.

 I particularly like the speculation that the wiring is there not to code the relevant information but for efficient use and that it is a consequence of learning rather than a pre-condition for it. I also like the observation, that Gallistel and Matzel also emphasize (see here), that there is, at best, paltry evidence for the standard assumption that the "synapse [is] the locus of memory." The Gallistel conjecture is generally assumed to be some daring edge of thought kind of speculation for which there is little evidence in contrast to the well-established "fact" that memory lives in inter-neural connections. Vast academic enterprises are based on this assumption. This may be more truthy than true, however.

At any rate, take a look at Patrick's note and the Chen paper he links to. It seems that the Gallsitel conjecture is daily becoming less "exciting." About time.

A typological trap

Are the critics right? Is there scant evidence for universals? The right answer is that it all depends what you mean by ‘universal.’ If by this you intend a Greenberg Universal (GU) then it might be right (in fact, as you will see below, I think it should be right). If by this you mean a Chomsky Universal (CU) then you are not likely right. There is a big difference between these two and the empirical success of GG rests on keeping them firmly distinguished. Why? Because a priori there is little reason to think that there are many GUs out there. There may be a few, but the standard GG universals if understood as candidate GUs are not likely among them. When critics of GG argue that it’s hard to find universals expressed in the world’s many languages, they understand ‘universals’ as GUs. And here they may well be right!  However, as GG commits itself to CUs and not GUs it requires quite some ancillary argument to conclude from the absence of GUs to the non-existence of CUs.

Indeed, from a perfectly normal “scientific” point of view, we should not expect to find many GUs. What do I mean? Well, think of the analogues of GUs in a real science, like Newtonian mechanics. We observe that bodies fall. We ask what makes bodies fall. We propose that bodies “fall” because of gravitational attraction. In particular, there is a force G that causes bodies (i.e. masses) to attract. Larger masses exert a stronger pull than smaller ones. Thus, bodies much smaller than the earth, say a ball, will “fall” in the sense that the mass of the earth will more strongly attract the mass of the ball than vice versa. This will make it appear that the ball “falls” to earth when it is released (rather than the earth “falls” to the ball).  That’s the story, and a good one it is, though we now know that it needs amending, especially if the ball is travelling near the speed of light. At any rate, what’s this have to do with GUs?

Well, is it indeed phenomenologically accurate that when we observe falling bodies in the wild we observe them acting in accord with Newton’s law of gravitation? Nope. Note even close. A leaf drops from a tree.  Does it appear to fall in accordance with the law of falling bodies. Not on your life.  Drop a ball into lake and see how long it takes to hit bottom (or if it hits bottom at all). Does it appear to drop in accordance with the law of falling bodies? Nahh! Or take a body that is electrically charged and drop it in an electrical field and see if Newton’s law suffices to describe its trajectory. It doesn’t. What’s the right conclusion: that gravity is not a cause of falling bodies, that things don’t universally attract (i.e. fall)? Not on your life. Why?

Here’s the conventional wisdom. We understand that the law of falling bodies is not intended as a description of what we see outside our window. It describes one relevant force in causing what we see. And this force in complex interaction with many other factors, causes observed physical behavior. Thus, we know that shape matters (not just mass) if the object is not dropped in a vacuum (and vacuums are pretty rare out there in the real world). We know that the consistency of the space into which an object drops also matters (less frictional resistance in air than in water  and less in air than in mercury). We know that electrical charges exert forces on electrically charged objects and so this, as well as mass, can effect a falling object’s trajectory. If the object is small enough, then other factors may intercede as well. If shaped a certain way, drop an object into water and it will float rather than fall. So many factors stand between dropping and falling and nonetheless gravity does explain why bodies fall.

What’s this mean? It means that whatever the law of falling bodies is, it is not a description of what we see when we look at the world outside our window. In other words, it is not GUish.[1] It is not a description of the immediately phenomenal world, but a proposal about one of the fundamental forces that act on bodies and this force is (often) an important causal factor in determining how bodies that we see fall actually do fall. Pure expressions of the law take careful experimental set up. Indeed, we must control for many other factors, before we can “see” gravity’s effects.[2]

There is an excellent discussion of just how complex this is in Cartwright (here, chapter 4). I have discussed her main points in previous posts (see here). For current purposes, Cartwright makes two very important observations. First, that it takes a lot of work to hook a law up to observation. This is what a good experiment does. It establishes a way of observing the effects of abstract non-observable features to visible effects. It creates a “nomological machine,” a way of hooking up the underlying capacities to surface regularities. And, this is the important part:

There is no fact of the matter what a system can do just in virtue of having a given capacity. What it does , depends on its setting, and the kinds of settingsnecessary for it to produce systematic and predictable results are very exceptional (73).

So, gravity can be seen in action, but only if we arrange things very carefully! And if this is true of gravity, why should it be less true of a principle of UG?  Of course it might be different in the mental sciences, but it might not be, and assuming that linguistic “laws” (aka principles of UG) must be apparent to inspection in the wild is little more than methodological dualism (a real no-no).

Returning to the main topic, GUs are typological generalizations. They describe (and are intended to describe) generalizations thought to be observable across languages, surface generalizations. Why are we surprised that not many can be found? Why are we surprised that the UG principles proposed are not “surface true”? Why should we expect the visible surface properties of language to express the underlying grammatical forces at work any more than we expect the phenomenological observables of real world events to distinctly manifest their underlying causes (e.g. the law of gravity in bodies observed falling around us). We don’t in the latter, and shouldn’t in the former. Which brings us to CUs.

Chomsky understood universals to be properties of FL, FL being the specifically linguistic contribution that minds exploit to build language particular Gs. From the get-go, these were understood to be quite abstract, and to not be inducible from the simple inspection of the surface properties of sentences. Thus, CUs were not intended to be surface true, anymore than gravity is. Thus, the absence of GUs does not imply the non-existence of CUs any more than the phenomenological inadequacy of the laws of gravity to describe what happens when any object falls any time anywhere invalidates Newton’s theory of gravity and its explanation for the law of falling bodies.

IMO, none of this is or should be controversial. I mention it because it seems easily forgotten.  Linguists (or many of them) are currently quite skeptical that we have discovered any universals. But this is because many forget the distinction between GUs and CUs. Doing so leads to skepticism precisely because there is every reason to believe that universals understood as Greenbergian objects are not (and should not be) thick on the linguistic ground. Thus, when critics point out that such GUs are not pervasive we should agree and say that nobody thought (or should have thought) they would be. And then loudly repeat that GUs are not CUs and CUs is what we are looking for.

Why the warning? Because, it seems to me that typological work invites the inference that linguists are on the hunt for GUs and that GGers agree with critics of the Chomsky program that universals ought to be understood as GUs. But this is a mistake, one that misunderstands what GG is about. To repeat a venerable theme: GG takes the object of study to be the structure of FL/UG, not the properties of languages. These latter are interesting to the degree that they illuminate the former. And there is no reason to think that linguistic principles, any more than any other scientific principles, will be visible in the data used to investigate them.

Let me make this point another way. IMO, there is no way that something like FL/UG does not exist (see here and here for a defense). That FL/UG exists is a virtual truism. What’s in FL/UG is not. Thus, what’s up for grabs is the fine structure of FL/UG, not whether it exists. Here’s another triviality: language exhibits the properties of FL/UG only in interaction with many other adventitious linguistic factors, many non-linguistic cognitive factors and probably much else (like the weather, time of day, and who knows what else). This means that we expect the fine structure of FL/UG to be hard to discern and we do not expect it to sit out there waiting to be spotted by (even careful) observation. 

In fact, I would go further (as you knew I would). I suspect that the only really good way to argue for a CU is via something like a POS argument.  Looking at lots of languages and Gs might be helpful (see here), but if you want to zero in on potential candidate universals, there is nothing like a POS argument. Why? Because, POSs limn the borders of the grammatically possible. That’s what’s so nice about them. Inductive surveys of many Gs cannot do this. POSs are the linguistic analogue of Cartwright’s nomological machines. They afford the most direct access to CUs, and for those interested in FL/UG, CUs are the principle objects of interest.

So be careful out there. Languages and their fabulous intricacies can be confusing. It’s not that hard to mistake Greenberg Universals for Chomsky Universals, and it’s a slippery slope from there the dreaded vice of Empiricism (and its concomitant horrors (e.g. connectionism). So watch your step when you go into the field.

[1] As I’ve noted before, there is a tendency to understand universals as patterns in the data waiting to be revealed. Finding universals is then roughly a problem in signal processing in which the judicious use of statistical techniques will find the signal in the often very noisy noise. This conception understands universals as GUs. It is not the right model of a CU. For discussion see here. Incidentally, mistaking GUs for CUs will eventually lay low Deep Learning/Big Data approaches to language. The latter count on the fact that all universals will be GUs. If this is false, and it is, then such approaches cannot succeed, and so they won’t. Of course it will take time for this to become evident and by then another fad will sweep the Empiricist world.
[2] There is an excellent discussion of just how complex this is in Cartwright (here, chapter 4). I have discussed her main points in previous posts (see here).

Wednesday, November 25, 2015

Ok, tell me that this shouldn't be part of every Ling PhD defense?

Talk about outreach! Here is a screening of this year's social science winner of the "Dance your PhD competition." I believe that our talented Grads could do a whole lot better.

Tuesday, November 24, 2015

Minsky on Gallistel

I once heard of a class tight in the great days of literary theory entitled something like "The influence of Philip Roth on Charles Dickens."  My memory tingles the suggestion that I have the names wrong here, but I am pretty sure that I got the gist right. A linguistic version of this might be "The influence of Chomsky on von Humboldt." The idea is that we see the past more clearly, when we see the present concepts more clearly. The inimitable intellectual archivist Bob Berwick sent me this great quote from Marvin Minsky:

“Unfortunately, there is still very little definite knowledge about, and not even any generally accepted theory of, how information is stored in nervous systems, i.e., how they learn. … One form of theory would propose that short-term memory is ‘dynamic’—stored in the form of pulses reverberating around closed chains of neurons. … Recently, there have been a number of publications proposing that memory is stored, like genetic information, in the form of nucleic-acid chains, but I have not seen any of these theories worked out to include plausible read-in and read-out mechanisms. (Minsky 1967, 66). Minsky, Finite and Infinite Machines.
So, it seems that Randy's conjecture has a distinguished pedigree and we cog-neuro has investigated the theory of genetic information storage largely by ignoring it. Let's hope that this time around this alternative hypothesis, one which really would challenge long held views in cog-neuro, is carefully vetted. Conceptually, the Gallistel view seems to me very strong. This does not mean that it is right, but it does mean that a perfectly reasonable alternative view has not even been pursued.

Monday, November 23, 2015

The concise Gallistel on how brains compute

Jeff Lidz sent me this great little piece by Randy Gallistel on his favorite theme: how most neuroscientists have misunderstood how brains compute. I’ve discussed Randy’s stuff in various FoL posts (here, here, and here). Here in just four lucid pages, Randy makes his main point again. If he is right (and the form of his argument seems impeccable to me), then much of what goes on in neuroscience is just plain wrong. Indeed, if Randy is right, then current neo-connectionist/neural net assumptions about the brain are about as accurate as 1950s-60s behaviorist conceptions were about the mind. In other words, at best of tertiary interest and, more likely, deserving to be completely forgotten.[1] At any rate, Randy here makes four main points.

First, that there is recent evidence (discussed here) strongly pointing to the conclusion that information can be stored inside a single neuron (rather than in connections of many neurons).

Second, that there is scads of behavioral evidence showing that brains store number values and that there is no way of storing numbers this in connection weights, thus implying that any theory of the brain that limits itself to this kind of hardware must be at best incomplete and at worst wrong.

Third, that there is a close connection between neural net “plasticity” conceptions of the brain and traditional empiricist conceptions of the mind (especially learning). In fact, Randy argues that these are largely flip sides of the same coin.

Fourth, that brains already contain all the hardware that is required to function like classical computers, the latter being the perfect complements for the computational cognitive theories that replaced behaviorism.

And all in four pages.

There is one argument that Randy hints at but doesn’t stress that I would like to add to his four. It is a conceptual argument. Here it is.

Whatever one thinks of cognition, it is clear that animals use large molecules like DNA and RNA for information processing. Indeed, this is now standard biological dogma. As Gallistel and King (here) illustrates, this system has all the capacities of a classical computer (addresses, read-write memory, variables, binding etc.). So here’s the conceptual argument: imagine that you had an animal with the wherewithal to classically compute hereditary information but instead of repurposing (exapting) this system for cognitive ends it developed an entirely different additional system for this purpose. In other words, it had all it needed sitting there but ignored these resources and embodied cognition in a completely different way. Does this seem plausible? Is this the way evolution typically works? Isn’t opportunism the main mover in the evolution game? And if it is, doesn’t this suggest that Randy’s conjecture must be right? In fact, wouldn’t it be weird if large chunks of cognition did not exploit that computational machinery already sitting there in DNA/RNA and other large molecules? In fact, wouldn’t the contrary assumption bear a huge burden of proof? Well, you know what I think!

Why is this not the common perception? Why is Randy’s position considered exotic? Here’s the one word answer: Empiricism! In the cog-neuro world this is the default view. There is little to empirically support this conception (see here for a review of the pas de deux between unsupported empiricism in psychology and tendentious reasoning in neural net neuroscience). Indeed, it largely flourishes when we know next to nothing about some domain of inquiry. However, it is the default conception of the mind. What Randy is pointing out (and has repeatedly pointed out and is right to point out) is that it is fatally flawed, not only as a theory of mind but also as a theory of the brain. And its flaws are conceptual as well as empirical. I can’t wait for the day that this becomes the conventional wisdom, though given the methodological dualism characteristic of the cog-neuro-sciences, I suspect that this day is not just around the corner. Too bad.

[1] Note that I say “deserving” of amnesia. This concedes the sad fact that neo-behaviorism is making a vigorous comeback within cognition. Yet another indication of the collapse of civilization.

Wednesday, November 18, 2015

My feelings exactly

And if you think that linguistics is just part of the larger bio picture, it's even worse than this. Still, it's fun and keeps you off the streets at night.

Tuesday, November 17, 2015

Never thought I would say this

Never thought I would say this, but I found that I resonated positively to a recent small comment by Chris Manning on Deep Learning (DL) that Aaron White sent my way (here). It seems that the DL has computational linguistics (CL) of the Manning variety in its sights. Some DLers apparently believe that CL is just is nano-moments away from extinction. Here’s a great quote from one of the DL doyens:

NLP is kind of like a rabbit in the headlights of the Deep Learning machine, waiting to be flattened.

DL wise men like Geoff Hinton have already announced that they expect that machines will soon be able to watch videos and “tell a story about what happened” and be downsized onto an in-your-ear chip that can translate into English on the fly. Great things are clearly expected. Personally, I am skeptical as I’ve heard such hyperbole before. We have been five years away from this sort of stuff for a very long time.

Moreover, I am not alone. If I read Manning correctly, he is skeptical (though very politely so) as well.[1] But, like me, he sees an opportunity here, one I noted before (here and here). Of course we likely disagree about what kind of linguistics will be most useful for advancing these technological ends,[2] but when it comes to engineering projects I am very catholic in my tastes.

What’s the opportunity consist in? It relies on a bet: that generic machine learning (even of the DL variety) will not be able to solve the “domain problem.” The latter is the belief that how a domain of knowledge is structured matters a lot even if one’s aim is to solve an engineering problem.

An aside: shouldn’t those that think that the domain problem is a serious engineering hurdle also think that modularity is a good biological design feature? And shouldn’t these people therefore think that the domain specificity of FoL is a no-brainer? In other words, shouldn’t the idea that humans have domain specific knowledge that allows them to “solve” language problems (and support human facile acquisition and use) be the default position? Chris?  What think you? Dump general learning approaches and embrace domain specificity?

Back to the main point: The bet. So, if you think that using word contexts can only get you so far (and not interestingly far either), then you are ready to bet that knowing something about language will be useful in solving these engineering problems. And that provides linguists with an opportunity to ply their trade. In fact, Manning points to a couple of projects aimed at developing “a common syntactic dependency representation and POS (‘part of speech,’ NH) and feature label sets which can be used with reasonable linguistic fidelity and human usability across all human languages” (3).[3] He also advocates developing analogous representations for “Abstract Meaning.” This looks like the kind of thing that GGers could usefully contribute to. In other words, what we do directly fits into the Manning project.

Another aside: do not confuse this with investigating the structure of FL.  What matters for this project is a reasonable set of Greenberg “Universals.” Indeed, being too abstract might not be that useful practically, and being truly universal is not that important (what is important is finding those categories that best fit the particular languages of interest). This is not a bad thing. Engineering is not to be disparaged. It’s just not the same project as the one that GG has scientifically set for itself. Of course, should the Chomsky version of GG succeed, it is possible that it will contribute to the engineering problem. But then again, it might not. As I understand it, General Relativity has yet to make a big impact on land surveying. It really all depends (to fix ideas think birds and planes or fish and submarines. Last time I looked plane wings don’t flap and sub bodies don’t undulate).

Manning makes lots of useful comments about DL, many of which I didn’t understand. He makes some, however, that I did. For example, his the observation that DL has mainly proved useful in signal processing contexts (2) (i.e. where the problem is to get the generalization that is in the data, the pattern from (noisy) patternings). The language problem, as I’ve argued, is different from this (see here) so the limits of brute force DL will, I predict, become evident when the new wise men turn their attention to these. In fact, I make a more refined prediction: to “solve” this problem DLers will either (i) ignore it, (ii) restrict the domain of interest to finesse it or (iii) promise repeatedly that the solution is but 5 years away. This has happened before and will happen again unless the intricate structural constraints that characterize language are recognized and incorporated.

Manning also makes several points that I would take issue with. For example, IMO he (like many others) confuses squishy data for squishy underlying categories. See, in particular, Manning’s discussion of gerunds on p. 4. That the data does not exhibit sharp boundaries does not imply that the underlying structures are not sharp. In fact, at some level they must be for under every probabilistic theory there is a categorical algebra.  I leave it to you out there to come up with an alternative analysis of Manning’s observed data set. I give you a 30 second time limit to make it challenging.

At any rate, you will not be surprised to find out that I disagree with many of Manning’s comments. What might surprise you is that I think he is right in his reaction to DL hubris and he is right that there is an opportunity for what GGers know to be of practical value. There is no reason for DL (or Bayes or stats) to be inimical to GG. It’s just technology. What makes its practice often anathema is the hard-core empiricism gratuitously adopted by its practitioners. But this is not inherent to the technology. It is only a bias of the technologists. And there are some like Jordan and Manning and Reisinger who seem to get this. It looks like an opportunity for GGers to make a contribution? One, incidentally, that can have positive repercussions for the standing of GG. Scientific success does not require technological application. But having technological relevance does not hurt either.

[1] I confess to a touch of schadenfreude given that this is the kind of thing that Manning and Co like to say about my kind of linguistics wrt to their CL approaches.
[2] Though I am not confident about this. I am pretty confident about what kind of linguistics one needs to advance the cognitive project. I am far less sure about what one needs to advance the engineering one. In fact, I suspect that a more “surfacy” syntax will fit the latter’s design requirements better than a more abstract one given its NLPish practical aims. See below for a little more discussion.
[3] I have it from a reliable source that this project is being funded by Google to the tune of millions. I have no idea how many millions, but given that billions are rounding errors to these guys, I suspect that there is real gold in them thar hills.

Monday, November 16, 2015

What does typology teach us about FL?

I have been thinking lately about the following question: What does comparative/typology (C/T) study contribute to our understanding of FL/UG? Observe that I am taking it as obvious that GG takes the structure of FL/UG to be the proper object of study and, as a result, that any linguistic research project must ultimately be justified by the light it can shed on the fine structure of this mental organ. So, the question: what does studying C/T bring to the FL/UG table?

Interestingly, the question will sound silly to many.  After all, the general consensus is that one cannot reasonably study Universal Grammar without studying the specific Gs of lots of different languages, the more the better. Many vocal critics of GG complain that GG fails precisely because it has investigated too narrow a range of languages and has, thereby, been taken in by many false universals.

Most GGers agree with spirit of this criticism. How so? Well, the critics accuse GG of being English or Euro centric and GGers tend to reflexively drop into a defensive crouch by disputing the accuracy of the accusation. The GG response is that GG has as a matter of fact studied a very wide variety of languages from different families and eras. In other words, the counterargument is that critics are wrong because GG is already doing what they demand.

The GG reply is absolutely accurate. However, it obscures a debatable assumption, one that indicates agreement with the spirit of the criticism: that only or primarily the study of a wide variety of typologically diverse languages can ground GG conclusions that aspire to universal relevance. In other words, both GG and its critics take the intensive study of typology and variation to be a conceptually necessary part of an empirically successful UG project.

I want to pick at this assumption in what follows.  I have nothing against C/T inquiry.[1] Some good friends engage in it. I enjoy reading it. However, I want to put my narrow prejudices aside here in order to try and understand exactly what C/T work teaches us about FL/UG? Is the tacit (apparently widely accepted) assumption that C/T work is essential for (or at least, practically indispensible for or very conducive to) uncovering the structure of FL/UG correct?

Let me not be coy. I actually don’t think it is necessary, though I am ready to believe that C/T inquiry has been a practical and useful way of proceeding to investigate FL/UG. To grease the skids of this argument, let me remind you that most of biology is built on the study of a rather small number of organisms (e. coli, C. elegans, fruitflies, mice). I have rarely heard the argument made that one can’t make general claims about the basic mechanisms of biology because only a very few organisms have been intensively studied. If this is so for biology, why should the study of FL/UG be any different. Why should bears be barely (sorry I couldn’t help it) relevant for biologists but Belarusian be indispensable for linguistics? Is there more to this than just Greenbergian sentiments (which, we can all agree, should be generally resisted)? 

So is C/T work necessary? I don’t think it is. In fact, I personally believe that POS investigations (and acquisition studies more generally (though these are often very hard to do right)) are more directly revealing of FL/UG structure. A POS argument if correctly deployed (i.e. well grounded empirically) tells us more about what structure FL/UG must have than surveys (even wide ones) of different Gs do. Logically, this seems obvious. Why? Because POS arguments are impossibility arguments (see here) whereas surveys, even ones that cast a wide linguistic net, are empirically contingent on the samples surveyed. The problem with POS reasoning is not the potential payoff or the logic but the difficulty of doing it well. In particular, it is harder than I would like to always specify the nature of the relevant PLD (e.g. is only child directed speech relevant? Is PLD degree 0+?). However, when carefully done (i.e. when we can fix the relevant PLD sufficiently well), the conclusions of a POS are close to definitive. Not so for cross-linguistic surveys.[2]

Assume I am right (I know you don’t, but humor me). Nothing I’ve said gainsays the possibility that C/T inquiry is a very effective way of studying FL/UG, even if it is not necessary. So, assuming it is an effective way of studying FL/UG, what exactly does C/T inquiry bring to the FL/UG table?

I can think of three ways that C/T work could illuminate the structure of FL/UG.

First, C/T inquiry can suggest candidate universals. Second, C/T investigations can help sharpen our understanding of the extant universals. Third, it can adumbrate the range of Gish variation, which will constrain the reach of possible universal principles. Let me discuss each point in turn.

First, C/T work as a source of candidate universals. Though this is logically possible, as a matter of fact, it’s my impression that this has not been where plausible candidates have come from. From where I sit (but I concede that this might be a skewed perspective) most (virtually all?) of the candidates have come from the intensive study of a pretty small number of languages. If the list I provided here is roughly comprehensive, then many, if not most, of these were “discovered” using a pretty small range of the possible Gs out there. This is indeed often mooted as a problem for these purported universals. However, as I’ve mentioned tiresomely before, this critique often rests on a confusion between Chomsky universals with their Grennbergian eponymous doubles.

Relevantly, many of these candidate universals predate the age of intensive C/T study (say dating from the late 70s and early 80s). Not all of them, but quite a few. Indeed, let me (as usual) go a little further: there have been relatively few new candidate universals proposed over the last 20 years, despite the continually increasing investigation of more and more different Gs. That suggests to me that despite the possibility that many of our universals could have been inductively discovered by rummaging through myriad different Gs, in fact this is not what actually took place.[3] Rather, as in biology, we learned a lot by intensively studying a small number of Gs and via (sometimes inchoate) POS reasoning, plausibly concluded that what we found in English is effectively a universal feature of FL/UG. This brings us to the second way that C/T inquiry is useful. Let’s turn to this now. 

The second way that C/T inquiry has contributed to the understanding of FL/UG is that it has allowed us (i) to further empirically ground the universals discovered on the basis of a narrow range of studied languages and, (ii) much more importantly, to refine these universals. So, for example, Ross discovers island phenomena in languages like English and proposes them as due to the inherent structure of FL/UG. Chomsky comes along and develops a theory of islands that proposes that FL/UG computations are bounded (i.e. must take place in bounded domains) and that apparent long distance dependencies are in fact the products of smaller successive cyclic dependencies that respect these bounds. C/T work then comes along and refines this basic idea further. So Rizzi notes that (i) wh-islands are variable (and multiple WH languages like Romanian shows that there is more than one way to apparently violate Wh islands) and (ii) Huang suggests that islands needs to include adjuncts and subjects and (iii) work on the East Asian languages suggests that we need to distinguish island effects from ECP effects despite their structural similarity and (iv) studies of in-situ wh languages allows us to investigate the bounding requirements on overt and covert movement and (v) C/T data from Irish and Chamorro and French and Spanish provides direct evidence for successive cyclic movement even absent islands.

There are many other examples of C/T thinking purifying candidate universals. Another favorite example of mine is how the anaphor agreement effect (investigated by Rizzi and Woolford) shows that Principle A cannot be the last word on anaphor binding (see Omer’s discussion here). This effect strongly argues that anaphor licensing is not just a matter of binding domain size, as the classical GB binding theory proposes.[4] So, finding that nominative anaphors cannot be bound in Icelandic changes the way we should think about the basic form of the binding theory. In other words, considering how binding operates in a language with different case and agreement profiles from English has proven to be very informative about our basic understanding binding principles.[5]

However, though I think this work has been great (and a great resource at parties to impress friends and family), it is worth noting that the range of relevant languages needed for the refinements has been relatively small (what would we do without Icelandic!). This said, C/T work has made apparent the wide range of apparently different surface phenomena that fall into the same general underlying patterns (this is especially true of the rich investigations on case/agreement phenomena). It has also helped refine our understanding by investigating the properties of languages whose Gs make morpho-syntactically explicit what is less surface evident in other languages. So for example, the properties of inverse agreement (and hence defective intervention effects) are easier to study in languages like Icelandic where one finds overt post verbal nominatives than it is in English where there is relatively little useful morphology to track.[6] The analogue of this work in (other) areas of biology is the use of big fat and easily manipulated squid axons (rather than dainty, small and smooshy mice axons) to study neuronal conduction.

Another instance of the same thing comes from the great benefits of C/T work in identifying languages where UG principles of interest leave deeper overt footprints than in others (sometimes very very deep (e.g. inverse control, IMO)). There is no question that the effects of some principles are hard to find in some languages (e.g. island effects in languages which don’t tend to move things around much, or binding effects in Malay-2 (see here)). And there is no doubt that sometimes languages give us extremely good evidence of what is largely theoretical inference in others. Thus, as mentioned, the morphological effects of successive cyclic movement in Irish or Chamorro or verb inversion in French and Spanish make evident at the surface the successive cyclic movement that FL/UG infers from, among other things, island effects. So, there is no question that C/T research has helped ground many FL/UG universals, and has even provided striking evidence for their truth. However (and maybe this is the theorist in me talking), it is surprising how much of these refinements and evidence builds on proposals with a still very narrow C/T basis. What made the C-agreement data interesting, for example, is that it provided remarkably clear evidence for something that we already had pretty good indirect evidence for (e.g. Islands are already pretty good evidence for successive cyclic movement in a subjacency account). However, I don’t want to downplay the contributions of C/T work here. It has been instrumental in grounding lots of conclusions motivated on pretty indirect theoretical grounds, and direct evidence is always a plus. What I want to emphasize is that more often than not, this additional evidence has buttressed conclusions reached on theoretical (rather than inductive) grounds, rather than challenging them.

This leaves the third way that C/T work can be useful: it may not propose but it can dispose. It can help identify the limits of universalist ambitions. I actually think that this is much harder to do than is often assumed. I have recently discussed an (IMO unsuccessful) attempt to do this for Binding Theory (here and here), and I have elsewhere discussed the C/T work on islands and their implications for a UG theory of bounding (here). Here too I have argued that standard attempts to discredit universal claims regarding islands have fallen short and that the (more “suspect”) POS reasoning has proven far more reliable. So, I don't believe that C/T work has, by and large, been successful at clearly debunking most of the standard universals.

However, it has been important in identifying the considerable distance that can lie between a universal underlying principle and its surface expressions. Individual Gs must map underlying principles to surface forms and Gs must reflect this possible variation. Consequently, finding relevant examples thereof sets up interesting acquisition problems (both real time and logical) to be solved. Or, to say this another way, one potential value of C/T work is in identifying something to explain given FL/UG. C/T work can provide the empirical groundwork for studying how FL/UG is used to build Gs, and this can have the effect of forcing us to revise our theories of FL/UG.[7]  Let me explain.

The working GG conceit is that the LAD uses FL and its UG principles to acquire Gs on the basis of PLD. To be empirically adequate an FL/UG must allow for the derivation of different Gs (ones that respect the observed surface properties). So, one way to study FL/UG is to investigate differing languages and ask how their Gs (i.e. ones with different surface properties) could be fixed on the basis of available PLD. On this view, the variation C/T discovers is not interesting in itself but is interesting because it empirically identifies an acquisition problem: how is this variation acquired? And this problem has direct bearing on the structure of FL/UG. Of course, this does not mean that any variation implies a difference in FL/UG. There is more to actual acquisition than FL/UG. However, the problem of understanding how variation arises given FL/UG clearly bear on what we take to be in FL/UG.[8]

And this is not merely a possibility. Lots of work on historical change from the mid 1980s onwards can be, and was, seen in this light (e.g. Lightfoot, Roberts, Berwick and Nyogi). Looking for concomitant changes in Gs was used to shed light on the structure of FL/UG parameter space. The variation, in other words, was understood to tell us something about the internal structure of FL/UG. It is unclear to me how many GGers still believe in this view of parameters (see here and here). However, the logic of using G change to probe the structure of FL/UG is impeccable. And there is no reason to limit the logic to historical variation. It can apply just as well to C/T work on synchronically different Gs, closely related but different dialects, and more. 

This said, it is my impression that this is not what most C/T work actually aspires to anymore, and this is becuase most C/T research is not understood in the larger context of Plato’s Problem or how Gs are acquired by LADs in real time. In other words, C/T work   is not understood as a first step towards the study FL/UG. This is unfortunate for this is an obvious way of using C/T results to study the structure of FL/UG. Why then is this not being done? In fact, why does it not even seem to be on the C/T research radar?

I have a hunch that will likely displease you. I believe that many C/T researchers either don’t actually care to study FL/UG and/or they understand universals in Greenbergian terms. Both are products of the same conception; the idea that linguistics studies languages, not FL.  Given this view, C/T work is what linguists should do for the simple reason that C/T work investigates languages and that’s what linguistics studies. We should recognize that this is contrary to the founding conception of modern linguistics. Chomsky’s big idea was to shift the focus of study from languages to the underlying capacity for language (i.e FL/UG). Languages on this conception are not the objects of inquiry. FL is. Nor are Greenberg universals what we are looking for. We are looking for Chomsky universals (i.e. the basic structural properties of FL). Of course, C/T work might advance this investigation. But the supposition that it obviously does so needs argumentation. So let’s have some, and to start the ball rolling let me ask you: how does C/T work illuminate the structure of FL/UG? What are its greatest successes? Should we expect further illumination? Given the prevalence of the activity, it should be easy to find convincing answers to these questions.

[1] I will treat the study of variation and typological study as effectively the same things. I also think that historical change falls into the same group. Why study any of these?
[2] Aside from the fact that induction over small Ns can be hazardous (and right now the actual number of Gs surveyed is pretty small given the class of possible Gs), most languages differ from English in only having a small number of investigators. Curiously, this was also a problem in early modern biology. Max Delbruck decreed that everyone would work on e.coli in order to make sure that the biology research talent did not spread itself too thin. This is also a problem within a small field like linguistics. It would be nice if as many people worked on any other language as work on English. But this is impossible. This is one reason why English appears to be so grammatically exotic; the more people work on a language the more idiosyncratic it appears to be. This is not to disparage C/T research, but only to observe the obvious, viz. that person-power matters.
[3] Why has the discovery of new universals slowed down (if it has, recall this is my impression)? One hopeful possibility is that we’ve found more or less all of them. This has important implications for theoretical work if it is true, something that I hope to discuss at some future point.

[4] Though, as everyone knows, the GB binding theory as revised in Knowledge of Language treats the unacceptability of *John thinks himself/heself is tall as not a binding effect but an ECP effect. The anaphor-agreement effect suggests that this too is incorrect, as does the acceptability of quirky anaphoric subjects in Icelandic.
[5] I proposed one possible reinterpretation of binding theory based in part on such data here.  I cannot claim that the proposal has met with wide acceptance and so I only mention it for the delectation of the morbidly curious.
[6] One great feature of overt morphology is that it often allows for crisp speaker acceptability judgments. As this has been syntax’s basic empirical fodder, crisp judgments rock.
[7] My colleague Jeff Lidz is a master of this. Take a look at some of his papers. Omer Preminger’s recent NELS invited address does something similar from a more analytical perspective. I have other favorite practitioners of this art including Bob Berwick, Charles Yang, Ken Wexler, Elan Dresher, Janet Fodor, Stephen Crain, Steve Pinker, and this does not exhaust the list. Though it does exhaust my powers of immediate short term recall.
[8] Things are, of course, more complex. FL/UG cannot explain acquisition all by its lonesome; we also need (at least) a learning theory. Charles Yang and Jeff Lidz provide good paradigms of how to combine FL/UG and learning theory to investigate each. I urge you to take a look.