Comments

Friday, May 25, 2018

Three quintessentially minimalist projects

I was in Barcelona last week giving some lectures on the triumphant march forward of the Minimalist Program (MP). As readers may know, I believe that MP has been a success in its own terms in that it has gone a fair way towards answering the questions it first posed for itself, viz. Why do we have the FL we actually have and not some other? Others are more skeptical, but I believe that this is mainly because critics demand that MP address questions not central to its research mission. Of course, answering the MP question might leave many others untouched, but that hardly seems like a reason to disparage MP so much as a reason for pursuing other programs simultaneously. At any rate, this was what the lectures were about and I thank the gracious audience at the Autonomous University of Barcelona for letting me outline these views for their delectation. 

In getting these lectures into shape I started thinking about a question prompted by recent comments from Peter Svenonius (thx, see here). Peter thinks, if I understand him correctly, that the MP obsession (ok, my obsession) with Darwin’s Problem (DP) really adds nothing to the MP enterprise. Things could proceed more or less as we see them without this biological segue. This got me thinking about the following question: Which MP projects are largely motivated by DP concerns? I can think of three. They may well be motivated on other grounds as well. But they seem to me a direct consequence of taking the DP perspective on the emergence of FL seriously. This post is a first stab at enumerating these and explaining why I think they are intimately tied to DP (in much the same way that the P&P project was intimately tied to Plato’s Problem (PP)). So what are the three lines of inquiry? A warning: The order of discussion does not imply anything about their relative salience or importance to MP. I should note, that many of the points I make below, I have made elsewhere and before. So do not expect to be enlightened. This is probably more for my benefit than for yours.

First, Unification of the Modules (UoM). MP is based on the success of the P&P program, in particular the perceived success of GBish conceptions of FL/UG. Another way of saying this is that if you don’t think that GB managed to limn a fairly decent picture of the fine structure (i.e. the universals) of FL/UG then the MP project will seem to you to be, at best, premature and, at worst, hubristic. 

I believe that a lot of the problems that linguists have with MP has less to do with its failure to make progress in answering the MP question above, than with the belief that the whole project presupposes accepting as roughly right a deeply flawed conception of FL/UG (viz. roughly the GB conception). So, for example, if you don’t like classical case theory (and many syntacticians today do not) then you won’t like a project that takes it to be more or less accurate and tries to derive its properties from deeper principles. If you don’t think that classical binding theory is more or less correct then you won’t like a project that tries to reduce it to something else. The problem for many is that what MP presupposes (namely that GB was roughly correct, but not fundamental) is precisely what they believe ought to be very much up for grabs. 

I personally have a lot of sympathy for this attitude. However, I also think that it misses a useful feature of the presupposition. MP motivated unification/reduction doesn’t require that the GB description be correct so much that it be plausible enough (viz. that it be of roughly the right order of complexity) as to make deriving its properties a useful exercise (i.e. an exercise that if successful will provide a useful modelfor future projects armed with a more accurate conception of FL/UG). To put this another way: the GB conception of FL/UG has identified design features of FL/UG (aka, universals) that are theoretically plausible and empirically justifiable and so it is worth asking why principles such as theseshould govern the workings of our linguistic capacities. In Poeppel’s immortal phrase, these GB principles are the right grain size for analysis, have non-trivial empirical backing and so the exercise of showing why they should be part of FL would be doing something useful even should they fail to reflect the exact structure of FL.[1]

So, a central project animated by DP is unification of the disparate GB modules. And this is a very non-trivial project. As many of you know, GB attributes a rather high degree of internalmodularity to FL. There are diverse principles regulating binding vs control vs movement vs selection/subcategorization vs theta role assignment vs case assignment vs phrase structure. From the perspective of Plato’s Problem, the diversity of the modules does not much matter as their operations and principles are presumed to be innate (and hence not learned). In fact, the main impetus behind P&P architectures was to isolate the plausibly invariant features of Gs, and explain them by attributing them to the internal workings of FL thereby constraining the Gs FL produces to invariably respect these features. Thus the reason that Gs are always structure dependent is that FL has the property of only being able to construct Gs that are structure dependent. The reason that movement and binding require c-command is that FL imposes c-command as a condition on these diverse modular operations. The aim of GB was to identify and factor out the invariant properties of specific Gs and treat them as fixed features of FL/UG so that they did not have to be acquired on the basis of PLD (and a good thing too as there is not sufficient data in the PLD to fix them (aka PoS considerations apply to these)). The problem of G acquisition could then focus on the variable parts (where Gs differ) using the invariant parts as Archimedean fixed points for leveraging the PLD into specific Gs. That was the picture. And for this P&P project to succeed, it did not much matter how “complex” FL/UG was so long as it was innate. 

All of this changes, and changes dramatically, once one asks how this system couldhave arisen. Then the internal complexity matters, and matters a lot. Indeed, once one asks this question there is a great premium on simple FL architectures, with fewer modules and fewer disparate principles for the simpler the structure of FL, the easier it is to imagine how it mighthave arisen from the cognitive architecture of predecessors that did not have one. 

If this is correct, then one central MP project is to show that the diversity of the GB modules is only apparent and that they are only different reflections of the same underlying operations and principles. In other words, the project of unifying the modules is central to MP and it is central becauseof DP. A solution to DP requiresthat what appearsto be a very complex FL system (i.e. what GB depicts) is actually quite simple and what appearto be very different modules with different operations and regulative principles are really all reflections of the same underlying generative procedures. Why? Because short of this it will be impossible to explain how the system that GB describes could have arisen from a mind without it. 

This is entirely analogous, in its logic, to Plato’s Problem. How can kids acquire the Gs they do with the properties they have despite a poverty of the linguistic stimulus? Because much of what they know they do not have to learn. How could humans have evolved an FL from non-FL cognitive minds? Because FL minds are only a very small very simple step away from the minds that they emerged from and this requires that the modular complexity GB attributes to FL is only apparent. It’s what you get when you add to the contents of non-linguistic minds the small simple addition MP hypothesizes bridged the ling/non-ling gap.

Are there other plausible motives for such a project, the project of unifying the modules? Well perhaps. One might argue that an FL with unified modules are in some methodological sense better than one with non-unified ones. Something like a principle that says fewer modules are better than more. Again, I think that this is probably correct, but let’s face it, this kind of methodological Ockamist accounting is very weak (or at least perceived to be so). When push comes to shove data coverage (almost?) always trumps such niceties (remember the ceteris paribus clausethat always accompanies such dicta). So it is worth having a big empiricalfact of interest driving the agenda as well. And there are few facts bigger and heftier than the fact that FL arose from non-FL capable minds and it is easier to explain how this could havehappened if FL capable minds are only mildly different from non-FL capable minds and this means that the complex modularity that GB attributes to FL capable minds is almost certainly incorrect. That’s the line of argument. It rests on DPish assumptions and, to my mind, provides a powerful empirical motivation for module unification, which is what makes unification a central MP project.

It suggests a second related project: not only must the modules be unified, but the unification should makes use of the fewest possible linguistically proprietary operations and principles. In other words, linguistically capable minds, ones withFLs should be as minimally linguisticallyspecial as possible. Why? Because evolution proceeds most smoothly when there is minimal qualitative difference between the evolved states. If the aim is to explain how language ready minds appeared from non language ready minds than the fewer the differences between the two, the easier it will to be to account for the emergence of the former form the latter. If one assumes that what makes an FL mind language ready are linguistically special operations and principles then the fewer of these the better. In fact, in the best case there will be exactly a single relatively simple difference between the two, language ready minds just being non-language ready ones plus (at most) one linguistically special simple addition (the desideratum that it be simple motivated by the assumption that simple additions are more likely to become evolutionarily available than complex ones).[2]

So let’s assess: there are two closely related MP projects: unify the GB modules and unify them using largely non-linguistically proprietary operations and principles. How far has this project gotten? Well, IMO, quite far. Others are sure to disagree. But the projects though somewhat open textured have proven to be manageable and, the first in particular, has generated useful hypotheses (e.g. the Merge Hypothesis and extensions thereof, like the Movement Theory of Control and Construal), which even if wrong have the right flavor (Iknow, I know, this is self serving!). Indeed, IMO, trying to specify exactly where and how these theories go wrong (if they do, color me skeptical but I have dogs in these fights) and why they go wrong as they do, is a reasonable extension of the basic MP projects. It is a tribute to how little MP concerns drive contemporary syntax that such questions are, IMO, rarely broached. Let me rant a bit.

Darwin’s Problem (DP) currently enjoys as little interest among linguists today as Plato’s Problem (PP) does (and did, in earlier times). Indeed, from where I sit, even PP barely animates linguistic investigations. So, for example, people who study variation rarely ask how it might be fixed (though there are notable exceptions). Similarly, people who propose novel principles and operations rarely ask whether and how they might be integrated/unified with the rest of the features of FL. Indeed, most syntacticians take the basic apparatus as given and rarely critically examine it (e.g. how many people worry about the deep overlap between Agree and I-merge?). These are just not standard research concerns. IMO, sadly, most linguists could care less about the cognitive aspects of GG, let alone its possible bio-linguistic features. The object of study is language, not FL, and the technical apparatus is considered interesting to the degree that it provides a potentially powerful philological tool kit. 

Ok, so MP motivates two projects. There is one more, and it concerns variation. GB took variation to be bounded. It did this by conceiving UG as providing a finiteset of parameter values and conceived of language acquisition as fixing those parameters. So, even if the space of possible Gs is very large, for GB, it is finite. Now, given the linguistic specificityof the parameters, and given that GB treats them as internalto FL, the idea that variation is a matter of parameter setting proves to be a deep MP challenge. Indeed, I would go so far as to say, that ifMP is on the right track, thenFL does not contain a finite list of possible binary parameters and G acquisition cannot be a matter of parameter setting. It must be something else, something that is not specific to G acquisition. And this idea has caught on, big time. Let me explain.

I have many times mentioned the work by Berwick, Lidz and Yang on G acquisition. Each contains what is effectively a learning theory that constructs Gs from PLD using FL principles. It appears that this general idea is quite widely accepted now, with former parameter setting types (e.g. David Lightfoot) now arguing that “UG is open” and that there is “no evaluation of I-languages and no binary parameters” (1).[3]This view is much more congenial to MP as it removes the very specific parametric options fromFL and treats variation as entirely a “learning” problem. G learning is no different than other kinds, it is just aimed at Gs.[4]

Of course to make this work, will require specifying what kids come to the learning problem with, what kinds of data they exploit, and what the details of the G learning theory are. And this is hard. It requires more than pointing to differences in the PLD and attributing differences in Gs to these differences. However, this is a long way from an actual learning theory which specifies how PLD and properties of FL combine to give you a G. Not the least important fact is that there are many ways to generalize from PLD to Gs and kids only exploit some of these.[5]That said, if there is an MP “theory” of variation it will consist of adumbrating the innate assumptions the LAD uses to fix a particular G on the basis of PLD. To date, we have some interesting proposals (in particular from Lidz and Yang and their colleagues in syntax) but no overarching theory.

Interestingly, if this project can be made to fly, then it will also be the front end of an MP theory of variation. To date, the main focus of research has been on unifying and simplifying FL and trying to determine how much of FL is linguistically proprietary. However, there is no reason that the considerable current typological work on G variation shouldn’t feed into developing theories of learning aimed at explaining why we find the variation we do. It is just that thisproject is going to be very hard to execute well, as it will demand that linguists develop skills that are not currently part of standard PhD training, at least not in syntax (e.g. courses in stats, machine learning, and computation). But isn’t this as it should be? 

So, does taking MP seriously make a difference? Yes! It spawns three projects all animated by the MP problematic. These projects make sense in the context of trying to specify the internal structure of an FL that couldhave evolved from earlier minds. It suggests three concrete projects. So the programmatic aspects of MP are quite fecund, which is all that we can ask of a program.

And results? Well, here too I believe that we have made substantial progress as regards the first project, some as regards the third (though it is very very hard) and a little concerning the second.  IMO, this is not bad for 25 years and suggests that the DPish way of framing the MP issues has more than paid for itself.


[1]It’s worth adding that this sort of exercise is quite common in the real sciences. Ideal gases are not actual gases, planets are not point masses, and our universe may not be the only possible one but figuring out how they work has been very useful. 
[2]There is a lot of hand waving going on here. Thus, what evolves are genomes and what we are talking about here are phenotypic expressions thereof. We are assuming that simple genotypic difference reflect simple genetic differences. Who knows if this is right. However, it is the standard assumption for this kind of biological speculation so it would be a form of methodological dualism to treat it as suspect onlyin the linguistic case. See herefor discussion of this “phenotypic gambit” and its role in evolutionary thinking.
[3]See “Discovering New Variable Properties without Parameters,” in Massimo Piattelli-Palmarini and Simin Karimi, eds., “Parameters: What are they? Where are they?” Linguistic Analysis 41, special edition (2017).
            A very terse version of this view is advanced in Hornstein (2009) on entirely MP grounds. The main conceptual difference between approaches like Lightfoot’s and the one I advanced is that the former relies on the idea that “children DISCOVER variable properties of their language through parsing” (1), whereas I waved my hands and mumbled something about curve fitting given an enhanced representation provided by FL (see herefor slightly more elaboration).
[4]This folds together various important issues, the most important being that there is no overall evaluation metric for parameter setting. Chomsky argued that the shift from evaluation metrics to parameter setting modules increased the latters feasibility because applying global evaluation metrics to Gs is computationally intractable. I think Chomsky might have though that parameter setting is more localized than G evaluation and so will not require fancy learning theories. It turns, as Dresher and Kaye long ago noted, that parameter setting models have their own tractability issues unless the parameters can be set independently of one another. If they are not independent, problems quickly arise (e.g. it is hard to fix parameters once and for all). 
Furthermore, it is not clear to me that something like global measures of G fitness can be entirely avoided, though Lightfoot insists that they should be. The main reason for my skepticism is empirical and revolves around the question of whether the space of G options is scattered or not. At least in syntax, it seems that different Gs are kept relatively separate (e.g. bilinguals might code switch between French and English but they don’t syntactically blend them to get an “average” of the two in Frenglish. Why not?). This suggests that Gs enjoy an integrity and this is what keeps them cognitively apart. Bill Idsardi tells me that this might be less true on the sound side of things. But as regards the syntax, this looks more or less correct. If it is, then some global measure distinguishing different Gs might be required. 
I should add that more recently, if I recall correctly, Fodor and Sakas have argued that the evaluation metric cannot be completely dispensed with even on their “parsing” account.
[5]So, for example, invoking “parsing” as the driver behind acquisition does not do much unless one specifies howparsing works. Recall that standard parsers (e.g. the Marcus Parser) embody Gs that guide how it is that input data is analyzed. No G, no parsing. But if the aim is to explain how Gs are acquired then one cannot presuppose that the relevant G already exists as part of the parser. So what does a parse consist in in detail? This is a hard problem and it turns out that there are many factors that the child uses to analyze a string so as to recover a meaning. The MP project is to figure out what this is, not to name it.

Tuesday, May 22, 2018

David Poeppel in Quanta

David P has been a strong critic of cog-neuro practice. He has not minced words about how the field has badly misfired by not appreciating the Marrian complexity of research. Today this view gets some great publicity (here). Quanta (which is kinda like the New Yorker for science stuff) has published a 4 page discussion of his work and his critical views. Take a look and enjoy. It's nice when the right views get some decent press. Who knows, maybe next Bernie will enjoy some decent innings.

Wednesday, May 16, 2018

Talk about confirmation!!

As Peter notes in the comments section to the previous post, there has been dramatic new evidence for the Gallistel-King Conjecture (GKC) coming from David Glanzman's lab at UCLA (here). Their experiment on Aplysia. Here is the abstract of the paper:

The precise nature of the engram, the physical substrate of memory, remains uncertain. Here, it is reported that RNA extracted from the central nervous system of Aplysia  given long-term sensitization training induced sensitization when injected into untrained animals; furthermore, the RNA-induced sensitization, like training-induced sensitization, required DNA methylation. In cellular experiments, treatment with RNA extracted from trained animals was found to increase excitability in sensory neurons, but not in motor neurons, dissociated from naïve animals. Thus, the behavioral, and a subset of the cellular, modifications characteristic of a form of nonassociative long-term memory in Aplysia  can be transferred by RNA. These results indicate that RNA is sufficient to generate an engram for long-term sensitization in Aplysia  and are consistent with the hypothesis that RNA-induced epigenetic changes underlie memory storage in Aplysia.
Here is a discussion of the paper in SciAm.

The results pretty much speak for themselves and they clearly comport very well with the GKC, even the version that garnered the greatest number of superciliously raised eyebrows when mooted (viz. that the chemical locus of memory is in our nucleic acids (RNA/DNA). The Glanzman et. al. paper proposes just this.

A major advantage of our study over earlier studies of memory transfer is that we used a
type of learning, sensitization of the defensive withdrawal reflex in Aplysia , the cellular and molecular basis of which is exceptionally well characterized (Byrne and Hawkins, 2015; Kandel, 2001; Kandel, 2012). The extensive knowledge base regarding sensitization in Aplysia  enabled us to show that the RNA from sensitized donors not only produced sensitization-like behavioral change in the naïve recipients, but also caused specific electrophysiological alterations of cultured neurons that mimic those observed in sensitized animals. The cellular changes observed after exposure of cultured neurons to RNA from trained animals significantly strengthens the case for positive memory transfer in our study. Another difference between our study and earlier attempts at memory transfer via RNA is that there is now at hand a mechanism, unknown 40 years ago, whereby RNA can powerfully influence the function of neurons: epigenetic modifications (Qureshi and Mehler, 2012). In fact, the role of ncRNA-mediated epigenetic changes in neural function, particularly in learning and memory, is currently the subject of vigorous investigation (Fischer, 2014; Landry et al., 2013; Marshall and Bredy, 2016; Nestler, 2014; Smalheiser, 2014; Sweatt, 2013). Our demonstration
399  that inhibition of DNA methylation blocks the memory transfer effect (Fig. 2 ) supports the hypothesis that the behavioral and cellular effects of RNA from sensitized Aplysia  in our study are mediated, in part, by DNA methylation (see also Pearce et al., 2017; Rajasethupathy et al., 2012). The discovery that RNA from trained animals can transfer the engram for long-term sensitization in Aplysia  offers dramatic support for the idea that memory can be stored nonsynaptically (Gallistel and Balsam, 2014; Holliday, 1999; Queenan et al., 2017), and indicates the limitations of the synaptic plasticity model of long-term memory storage (Mayford et al., 2012; Takeuchi et al., 2014).


Two remarks: First, as the SciAm discussion makes clear, selling this idea will not be easy. Scientists are, rightfully in my opinion, a conservative lot and it takes lots of work to dislodge a well entrenched hypothesis. This is so even for views that seem to have little going for them. Gallistel (&Balsam) argued extensively that there is little good reason to buy the connectionist/associationist story that lies behind the standard cog-neuro commitment to net based cognition. Nonetheless, the idea is the guiding regulative ideal within cog-neuro and it is unlikely that it will go quietly. Or as Glanzman put it in the SciAm  piece:
“I expect a lot of astonishment and skepticism,” he said. “I don’t expect people are going to have a parade for me at the next Society for Neuroscience meeting.”
The reason is simple actually: if Glanzman is right, then those working in this area will need substantial retraining, as well as a big time cognitive rethink. In other words, if the GKC is on the right track, then what we think of as cog-neuro will look very different in the future than it does today. And nobody trained in earlier methods of investigation and basic concepts suffers a revolution gladly. This is why we generally measure progress in the sciences in PFTs (i.e. Plank Funereal Time).

Second, it is amazing to see just how specific the questions concerning the bio basis of memory become once one makes the shift over to the the GKC. Here are two questions that the Glanzman et. al. paper ends with. Note the detailed specificity of the chemical speculation:
Our data indicate that essential components of the engram for LTM in Aplysia  can be transferred to untrained animals, or to neurons in culture, via RNA. This finding raises two questions: (1) Which specific RNA(s) mediate(s) the memory transfer?, and (2) How does the naked RNA get from the hemolymph/cell culture medium into Aplysia  neurons? Regarding the first question, although we do not know the identity of the memory-bearing molecules at present, we believe it is likely that they are non-coding RNAs (ncRNAs). Note that previous results have implicated ncRNAs, notably microRNAs (miRNAs) and Piwi-interacting RNAs (piRNAs) (Fiumara et al., 2015; Rajasethupathy et al., 2012; Rajasethupathy et al., 2009), in LTM in Aplysia . Long non-coding RNAs (lncRNAs) represent other potential candidate memory transfer molecules (Mercer et al., 2008). Regarding the second question, recent evidence has revealed potential pathways for the passage of cell-free, extracellular RNA from body fluids into neurons. Thus, miRNAs, for example, have been detected in many different types of body fluids, including blood plasma; and cell-free extracellular miRNAs can become encapsulated within exosomes or attached to proteins of the Argonaut (AGO) family, thereby rendering the miRNAs resistant to degradation by extracellular nucleases (Turchinovich et al., 2013; Turchinovich et al., 2012). Moreover, miRNA-containing exosomes have been reported to pass freely through the blood-brain barrier (Ridder et al., 2014; Xu et al., 2017). And it is now appreciated that RNAs can be exchanged between cells of the body, including between neurons, via extracellular vesicles (Ashley et al., 2018; Pastuzyn et al., 2018; Smalheiser, 2007; Tkach and Théry, 2016; Valadi et al., 2007). If, as we believe, ncRNAs in the RNA extracted from sensitized animals were transferred to Aplysia  neurons, perhaps via extracellular vesicles, they likely caused one or more epigenetic effects that contributed to the induction and maintenance of LTM (Fig. 2 ).
Which RNAs are doing the coding? How are they transferred? Note the interest in blood flow (not just electrical conductance) as "cognitively" important. At any rate, the specificity of the questions being mooted is a good indication of how radically the filed of play will alter if the GKC gets traction. No wonder the built in skepticism. It really does overturn settled assumptions if correct. As SciAm puts it:
This view challenges the widely held notion that memories are stored by enhancing synaptic connections between neurons. Rather, Glanzman sees synaptic changes that occur during memory formation as flowing from the information that the RNA is carrying.
So, is GKC right? I bet it is. How right is it? Well, it seems that we may find out very very soon.

Oh yes, before I sign off (gloating and happy I should add), let me thank Peter and Bill and Johan and Patrick for sending me the relevant papers. Thx.

Addendum: Here's a prediction. The Glanzman paper will be taken as arguing that synaptic connections play no role in memory. Now, my completely uneducated hunch is that this strong version may well be right. However, it is not really what the Glanzman paper claims. It makes the more modest claim that the engram is at least partly located in RNA structures. It leaves open the possibility that nets and connections still play a role (though an earlier paper by him argues that it is quite unclear how they do as massive reorganization of the net seems to leave prior memories intact). So the fall back position will be that the GKC might be right in part but that a lot (most) of the heavy cog-neuro lifting will be done by neural nets. Here is a taste of that criticism from the SciAm report:
“This idea is radical and definitely challenges the field,” said Li-Huei Tsai, a neuroscientist who directs the Picower Institute for Learning and Memory at the Massachusetts Institute of Technology. Tsai, who recently co-authored a major review on memory formation, called Glanzman’s study “impressive and interesting” and said a number of studies support the notion that epigenetic mechanisms play some role in memory formation, which is likely a complex and multifaceted process. But she said she strongly disagreed with Glanzman’s notion that synaptic connections do not play a key role in memory storage.
Here is where the Gallistel arguments will really come into play. I believe as the urgency of answering Randy's question (how do you store a retrievable number in a connectionist net?) will increase for precisely the reasons he noted. The urgency will increase because we know how a standard computing device can do this and now that we have identified the components of a chemical computer we know how this could be done without nets. So those who think that connections are the central device will have to finally face the behavioral/computational music. There is another game in town. Let the fun begin!!









































































Friday, May 11, 2018

Ideas that break the mold

I am currently re-reading a terrific book on the history of modern molecular biology called The Eight Day of Creation (here, henceforth 8-day). The book reviews some of the seminal scientific events in modern biology, starting with Watson and Crick’s discovery of the double helix structure for DNA. The book is really fun to read given that it intersperses serious science with lots of titillating gossip about the relevant personalities.

The fun aside, the book (confession: I’ve read the first 200 pages so far and this deals exclusively with DNA) raises two interesting questions for someone like me. 

First, it seems to point to two kinds of “revolutions” in the sciences. The first kind is one that everyone is waiting to happen and that had the work that fomented it not been done, analogous work would soon have been produced making an analogous intellectual contribution. The second kind of work is the opposite: had the people who did it not been around, then nobody else would have done it (or at least not soon). Rather, the idea’s birth would have been long (maybe perpetually) delayed. Both kinds of work are groundbreaking and deserving of the kudos and prizes heaped upon it. The difference is that the discoverers of the first kind are distinguished by breaking the tape a bit ahead of others, while the latter is distinguished by having only one person running the race at all.

The second question, of course, is whether any work currently being pursued in my extended neck of the woods smells like either one of these. What is the next big idea? Needless to say, the first kind will be easier to sniff out than the second given that the second seems to come out of nowhere. But, I suspect that nothing really comes completely out of nowhere and I will suggest that one idea that we have been tracking in FoL that has been treated as scientifically dubious until now is gaining traction so that it is beginning to look like an idea whose time has come. In other words, if we take the progression from Ridiculous! to Obvious! via Sorta/Maybe! as an early indicator of an intellectual revolution, then I think the Gallistel-King conjecture (GKC) is about to enjoy some quality time in the intellectual sun.

It should go without saying (but I will say it nonetheless) that everything I say in what follows is entirely half-assed and speculative. This partially comes with the subject matter. But as I like these sorts of issues, and cannot resist, and have nothing better to talk about at the moment, I will indulge myself. You need not follow.

Let’s start with the two kinds of revolutions. 8-day makes the case (not deliberately, I should add) that the helix was waiting to be discovered and though Watson and Crick got their first, someone else would have grocked the structure very soon if they had not. Likely candidates include Wilkins, Franklin and, almost certainly Pauling. There were probably others around that could have figured out the basic ideas as well (or so 8-day leads me to believe).  In fact, Crick seems to agree with this assessment (see 8-day:155). I do not intend this observation to denigrate the achievement (more exactly: who the hell am I to be able to denigrate it?), just to note that it seems to be an idea whose time had arrived. Many researchers thought that DNA was the important big molecule to chemically understand. They thought this because they knew that it was the repository of hereditary information. Many thought that it was some sort of helix and many thought that X-ray pictures were the right kind of evidence to probe their structure. There were several mathematical accounts available (albeit imperfect) to argue from pictures to structure and it seems from the story 8-day tells that sooner or later the story would be cracked (maybe in dribs and drabs as Crick notes that Medawar suggested in the quoted note on 8-day p. 155).

One could say something similar for other great discoveries. Einstein’s theory of special relativity was very similar to other theories that cropped up at the time (Lorentz, Poincare), Darwin’s theory of natural selection was simultaneously discovered by Wallace. The same appears to be true of the work on QED in more modern times. Again, all of this stuff is great, but it was stuff that seems to have been “in the air” and was something that someone would have discovered pretty soon after whoever is credited with the work did it.[1]

This contrasts with other kinds of discoveries. I am told that Einstein’s General Theory is something that really arrived unexpectedly and that nobody was working along the same lines. Ditto with Mendelian genetics (which was so far ahead of its time that it lay undiscovered for about 35-50 years till it was rediscovered by others (Morgan)). McClintock’s theory of jumping genes might fit in here too from what I know of it as would Marshall and Warren’s theory of the bacterial origins of ulcers (which the rest of the scientific community scoffed at until it received the Nobel). 

To this list, I would add Chomsky’s discovery that humans have an FL built to acquire and use recursive Gs with distinctive computational properties. This is an idea, which though obviously correct is still resisted in many quarters. From my read of the history, it seems clear that had Chomsky not made the case for Generative Grammar nobody would have made it for many years to come (if ever, if current resistance is any indication).

There is one more idea that is coming into its own that I would add to the list, and that brings us to the second question (i.e. anything like this on the horizon now?): Gallistel’s conjecture that human cognitive computation is intra-neuronal and and a species of chemical computation rather than inter-neuronal and “connectionist.” This idea has been roundly resisted (and dismissed) by most of the cog-neuro community. The idea that brain computations are not “like” classical computing at all (no registers, variables, write-to and read-from memory etc.) is a virtual dogma in the neurosciences (and has been for well over 30 years). Neo-connectionism is the name of the cog-neuro game and Gallistel’s critiques have been largely ignored and his more positive proposals barely attended to.  Until recently.

I have noted several recentish studies that have argued that there is (at least) some intra-neuronal calculations that cells do (type “Gallistel-King conjecture” into the find box on the top left corner for posts on the topic). Another one has just appeared in Science(here). The authors  are Tagkopoulos, Liu, and Tavazoie (TLT). They show how the e-coli are capable of “forming internal representations that allow prediction of environmental change.” They do this using “intracellular networks” of “biochemical reactions.” Using these networks, these single cell microbes “form internal representations of their dynamic environments that enable predictive behavior.” Further, consistent with the Gallistel-King conjecture (GKC), it appears that these biochemical representations consist of “genome wide transcriptional responses” based on the DNA-RNA-Protein system characteristic of modern cellular bio-chemistry. 

I am no expert in these matters, but it sure looks like what TLT is finding comports quite nicely with the most straightforward version of the GKC in which cognitive computation is based in the same kinds of processes and networks used to convey hereditary information. First, both take place withinsingle cells. Second the information processing has a pretty classical look and embodies a computational architecture (as discussed in detail in The Gallistel & King book) exploiting DNA/RNA/Proteins in the way GKC initially proposed. Not bad for armchair theorizing. Not bad at all.

As I mentioned, there is more and more stuff coming out that provides empirical support for this big idea. And as I have also mentioned elsewhere, this is roughly what we should expect. The GKC is the conservativehypothesis concerning cognitive computation, despite its also being iconoclastic. It claims that cognition supervenes on an information processing network that we know that cells have and that is used for another purpose (passing traits onto future generations). This system is computationally very rich (it embodies a classical (Turing/von Neumann) computational architecture) as Gallsitel and King show. GKC makes the intellectually conservative proposal that an in placeinformation processing network (aan extant system that passes genetic information across generational time) is also used (or repurposed) for other kinds of info processing tasks (i.e. cognitive information processing). This is standard Darwinian thinking. 

In contrast connectionism is quite radical as it proposes a novel computational apparatus to do the heavy cognitive lifting, bypassing a perfectly respectable extant in place and up and running system. Of course, this might be what happened, but it is still a very radical proposal and should only be accepted if there is very significant evidence in its favor. And as Gallistel has argued, there is really no good evidence to support it and lots of problems with it. I will not rehearse these here (but see here), except to say that it is quite amazing how a bad idea gains staying power if it leverages another really bad idea. The marriage of connectionism and associationism is one such stable couple as Gallistel has shown and the fact the neither is convincing on its own seems not to have convinced the neuro-cognoscenti to dumb the pair.

It is fun to speculate just how game changing a world that accepts GKC would be. Cogneuro could really start stealing liberally from our biological friends. Learning would be to cognition what development is to biology (the building of forms based on genetic information plus environmental inputs). All that inter-neuronal chatter might be re-analyzed as sharing computational results rather than executing actual computations. One could imagine a kind of neuronal wisdom of crowds kind of system where individual neurons compute and then “vote” with the popular favorite output carrying the day. But all of this is realfancifulspeculation, completely unmoored from any knowledge (my specialty!). The important point is that it’s looking more and more like GKC is onto something and if it turns out to be even roughly correct, the consequences for what we do in the cog-neuro sciences will be profound. Why do I think this? Because it’s what happened in biology. Indeed, it’s  the big moral from 8-day. Let me explain.

As I said, 8-day is a terrific read and spurs endless fun speculation. It also carries a moral for linguists (and psychologists) with a cognitive bent. To wit: The intellectual challenge facing people in the mid 50s as regards finding the structure of DNA is quite analogous to the central problems in cog-neuro today. The problem then was to find a way of physically grounding the gene. The problem was usefully bounded by the fact that it had to be a structure that comported with the insights of Mendelian genetics (in particular the fact that reproduction leads to half of the genetic traits of the parents being passed onto the offspring). The intellectual challenge was to find a physical structure that would make clear how this was possible. The Watson-Crick structure for DNA did this in a beautiful way. It showed how Mendel’s genetics could be incarnated. Thus, Mendel’s insights formed a boundary condition on the structure of whatever it was that served to transmit hereditary information. The helical structure of DNA did this almost perfectly and Watson and Crick noted as much in their original paper. Here’s the money quote (8-day:154):

It has not escaped our notice that the specific pairing we have postulated immediately suggests a possible copying mechanism for the genetic material. 

We find ourselves in a similar situation today. We know a great deal about parts of cognition. We know a lot about some of the computational properties of cognition. We know that this requires representations with complex properties that demand something very like a classical computational architecture. We know that something like innate cognitive knowledge exists that allows for the kinds of cognitive computations biological systems perform. The goal of neuroscience should be to figure out how this is incarnated in biological material: e.g. What’s an address? How do you read-from and write-to the system, what’s a variable? What’s a pointer to an address? How do you store a number in memory? How is “innate” knowledge genetically coded. These are all things we understand how to execute in silicon. The cognitive theories we have tell us that our embodied computational system musthave these kinds of structures and operations as well. The neuro question is how this is embodied in biological material (as opposed to silicon)? The GKC builds on the fact that though we know how this could be done using intra-neuronal chemistry and have no idea how this could be done using inter-neuronal connections. The obvious conclusion is that cognitive computation is physically grounded in intra-neuronal chemistry. Amazingly, we are starting to get some details about how this might done. And nobody would have thunk it when the GKC was first mooted. 


[1]That said, the various “versions” had different virtues. So Einstein’s theory of special relativity was substantially different from Lorentz’s and the ways it was different mattered. So too the various versions of QED. Feynman’s formulation spread quickly to the community because of its intuitive appeal. Schwinger’s (I am told) was no less adequate but it was far more technically challenging and harder to conceptualize. These are not small differences, but the general point stands: the basic analyses were very similar and had Einstein not come up with his theory or Feynman with his someone would have come up with a working version that the community would have embraced.

Wednesday, May 9, 2018

Cross-modal facilitation of phonological acquisition

Bill Idsardi

A few weeks ago Viola Kozak successfully defended her doctoral dissertation at Gallaudet University. (Congratulations Viola!) It was a great experience to be on the committee, and some of Viola's findings are relevant to some of the discussions that we have been having on the blog about the nature of phonological representations. Here's a quote from the defense announcement:
The purpose of Ms. Kozak’s study was to analyze the English and American Sign Language phonological processing skills of two groups of bimodal bilingual children: hearing children of Deaf adults (Kodas) and Deaf children with cochlear implants from Deaf signing families (DDCI). Additionally, this study compared the performance of these bimodal bilingual children to that recorded in previous studies of deaf children from hearing families with cochlear implants (DHCI), who have, as a group, been found to perform less accurately on English phonological tests than their hearing counterparts. The study investigated whether or not this was the case with Bimodal Bilinguals. Overall, the two groups of bimodal bilingual children scored comparatively on all tests, and the findings suggest that early exposure to ASL from birth may serve to bolster a cochlear implant user’s spoken language acquisition following implantation.
There's a bit to unpack here, so let's take the hearing children first. There were two groups of these, hearing children of Deaf adults (Kodas, HD, n=17) and hearing children of hearing adults (HH, n=4). Phonologically-oriented tests for the participants included phonemic awareness, phonemic discrimination and pseudoword repetition. These two groups scored comparably on these tests, showing that they are both learning their phonologies "on schedule". 

Now let's consider the children with cochlear implants. The group in this study were Deaf children of Deaf adults (DDCI, n=3). Previous studies have looked at Deaf children of hearing adults who had received cochlear implants (DHCI) and found that their phonological development was relatively delayed. But in Viola's study, the DDCI group performed on par with the age-matched set of Kodas (who were not different from the HH group). Here is one plot of the non-word repetition test (probably the hardest task).



The blue dots are vertically scattered in the middle among the red squares, with the lower two blue dots performing similarly to hearing children of hearing adults (HH) who were either slightly younger or slightly older. Now, this is admittedly a small number of participants, but there is not that large a population of Deaf children of Deaf adults who have received cochlear implants to draw on. Previous studies (for example) have shown poorer performance for CI users on phonological tasks, but these were CI users who were Deaf children of hearing adults (DHCI). So it's possible that one source of the delayed development for DHCI children is the lack of early language input, as they are receiving neither spoken language input nor sign language input (because their parents are not signers). If this is on the right track, then the good performance of the DDCI group studied by Viola (the blue dots) might then be due to the fact that they did receive ASL language input prior to receiving their cochlear implants (and they also continued to use ASL after implantation). And then this difference would imply that prior experience with ASL phonology aids in the subsequent acquisition of spoken (English) phonology, across different modalities. And this facilitation effect would in turn support that idea that some of the representational and computational apparatus is common to both ASL and spoken phonology. All of this doesn't make phonology wholly substance-free, but it does argue for the existence of abstract, non-substantive (amodal) components of phonology, something that the SFP program strongly endorses, and that I strongly agree with.