Faculty of Language: Rationalism, Empiricism and Nativism -2

Tuesday, September 9, 2014

Rationalism, Empiricism and Nativism -2

In an earlier post (here), I reviewed Fodor’s and Chomsky’s argument concluding that anyone that believes in induction must be a nativist. Why? Because all extant inductive theories of belief fixation (BF) are selection theories and all selection theories presuppose a given hypothesis space that characterizes all the possible fixable beliefs. Thus, anything that “learns” (fixes beliefs) must have a representation of what is learned (a given hypothesis space) which is used to evaluate the input/experience in fixing whatever beliefs are fixed. Absent this, it is impossible to define an inductive procedure.[1] Thus, trivially (or almost tautologically (see note 1)), whatever one’s theory of induction, be it Rationalist or Empiricist, everyone is a nativist. The question is not whether nativism but what’s native. And here is where Rationalists and Empiricists actually differ.

Before going on, let me remind you that both Fodor and Chomsky (and all the participants at Royaumont it seems to me) took this to be a trivial, nay, almost a tautological consequence of what induction is. However, this does not mean that it is not worth remembering and repeating. It is still the case that intelligent people confuse Rationalism with Nativism and assume that Empiricists have no nativist commitments. This suggests that Rationalists contrast with Empiricists in making fancy assumptions about minds and hence bear the burden of proof in any argument about mental structures. However, once it is recognized that all psychological theory is necessarily nativist, the burden shifting manoeuver looses much of its punch. The question becomes not whether the mind is pre-stocked with all sorts of stuff, but what kind of stuff it is stuffed with and how this stuff is organized. Amy Perfors (here) says this exactly right (135)[2]:

…because all models implicitly define a hypothesis space, it does not make sense to compare models according to whether they build hypothesis spaces in. More interesting questions are: What is the size of the latent hypothesis space defined by the model? How strong or inflexible is the prior?...

So given that everyone is a nativist, how to decide between Rationalist (R) vs Empiricist (E) approaches to the mind. First of all, note that given that everyone is a trivial nativist the debate between Rs and Es necessarily revolves around how beliefs are fixed and what this implies for the mind’s native structure. Interestingly, probing this question ends up focusing on what kind of experience is required to fix a given belief.

Es have traditionally taken the position that beliefs are fixed by positive exposures to extensions of the relevant concepts. So, for example, one fixes the belief that ‘red’ means RED by exposure to red, and that ‘dog’ means DOG by exposure to dogs. Thus, there is no belief fixation without exposure to tokens in the relevant extensions of a concept. It is in this sense that Es see the environment as shaping mental structure. Minds track environmental input and are structured by this input. The main contribution that minds make to the structure of their contents is by being receptive to the information that the environment makes available. On an E view, the trick is to figure out how to extract information in the signal. As should be obvious, this sort of view champions the idea that minds are very good statistical machines able to find valuable informational needles in potentially very large input haystacks. Rs have no problem with this assumption, but they argue that it is insufficient to account for our attested cognitive capacities.

More particularly, Rs argue that there is more to the fixation of belief than environmental input. Or, another way of making this same point, is that the beliefs that get fixed via exposure to input data far outrun the information available from that input. Thus, thought the environment can trigger the emergence of beliefs they do not shape them for we have ideas/concepts that are not themselves tokened in the input. If this is correct, then Rs reason that hypothesis spaces are highly structured and what you come to “know” is strongly affected by this given structure. Note that the disagreement between Rs and Es hinges on what it is possible to glean from available input.

So how to approach this disagreement in a semi-rational manner? This is where the Logical Problem of Acquisition (LPA) comes in. What is the LPA? It’s an attempt to specify the nature of the input data that an Acquisition Device (AD) has access to and to then compare this to the properties of the attained competence. Chomsky discusses the general form of this approach in chapter 1 of Reflections on Language (here).

In the study of language, the famous diagram in (1) concisely describes the relevant issues:

(1) PLD_L -> FL -> G_L

PLD_L is the name we give to the linguistic data from L that a child (actually) uses in building its grammar. FL is, well you know, and G_L is the resultant grammar that a native speaker attains. One can easily generalize this schema to other domains of inquiry by subbing other relevant domains for “L.” A generalized version of the schema is (2) (‘X’ being a variable ranging over cognitive domains of interest) and a version of it as applied to vision is (3). So, if one’s interest is in visual object recognition (as for example in Marr’s program), we can consider the schema in (2) as outlining the logic to be explored (PVD = Primary visual data, FV = Faculty of Vision, GV = grammar (i.e. rules) of vision).[3]

(2) PXD -> FX -> GX

(3) PVD -> FV -> GV

This schematic rendition of the LPA focuses the R vs E debate on the information available in PXD. An Eish conception commits hostages to the view that PXD is quite rich and that it provides a lot of information concerning GX. To the degree that information about GX can be garnered from PXD to that degree we need not populate FX with principles to bridge the gap. Rish conceptions rest on the view that PXD is a rather poor source of information relevant to GX. As a result, Rs assume that FX is generally quite rich.

Note that both Rs and Es assume that FX has a native structure. This, recall is common to both views. The question at issue is how much belief fixation (or more exactly the fixation of a particular belief) owes to the nature of the data and how much to the structure of the hypothesis space. As a first approximation one can say that Rs believe that given hypothesis spaces are pretty highly structured so that the data required to “search” that space can be quite sparse. Conversely, the richer the set of available alternatives the more one needs to rely on the data to fix a given belief. Thus for Rs all the explanatory action lies in specifying the narrow range of available alternatives, while for Es most of the explanatory action lies in specifying the (most often nowadays, statistical) procedures that determine how one moves across a rather expansive set of possibilities.

The schemas above suggest ways of investigating this disagreement. Let’s consider some.

E invites the view that, ceteris paribus, variations in PXD should lead to variations in GX as the latter closely tracks properties of the former (it is in this sense that Es think of PXD as shaping a person’s mental states). Thus, if some kinds of inputs are systematically absent in an individuals’ PXD, we should expect that that individual’s cognitive development and attained competence should differ from that of a individual with more “normal” inputs. Hume (our first systematic associationist psychologist) gives a useful version of this view:[4]

…wherever by any accident the faculties which give rise to any impressions are obstructed in their operations, as when one is born blind or deaf, not only the impressions are lost, but also their corresponding ideas; so that there never appear in the mind the least traces of either of them.

There’s been lots of research over the last 50 years exploring Hume’s contention in the domain of language acquisition. Lila Gleitman and Barbara Landau (G&L) provides a good brief overview of some of the child language research investigating these matters.[5] It notes that the evidence does not support this prediction (at least in the domain of language). Rather it seems that “humans reconstruct linguistic form …[despite] the blatantly inadequate information offered in their usable environment (91).” In other words, it seems that the course of language acquisition can proceed smoothly (in fact no differently than what happens in the “normal” case) even when the input to the system is perceptually very limited and degraded. G&L interpret this Rishly to mean that language acquisition is relatively independent of the nature of the quality of the input, which makes sense if it is guided by a rich system of innate knowledge.

G&L illustrate the logic using two kinds of examples: blind people can and do learn the meanings of words like ‘see’ and ‘look’ without being able to see or look, and people can acquire full native competence (and can make very subtle “perceptual” distinctions in their vocabulary) despite being blind and deaf. Indeed, it seems that even extreme degradation of the sensory channels leaves the process of language acquisition unaffected.

It is worth noting just how degraded the input can be when compared to the “normal” case. G&L reporting Carol Chomsky’s original research on learning via the Tadoma method (92):[6]

To perceive speech at all, the deaf-blind must place their fingers strategically at the mouth and throat of the speaker, picking up the dynamic movements of the mouth and jaw, the timing and intensity of the vocal-cord vibration, and the release of air…From this information, differing radically in kind and quality from the continuously varying speech wave, the blind-deaf recover the same ornate system of structured facts as do hearing learners…

In short, there is plenty of evidence that language acquisition can (and does) take place in the face of extremely degraded input, at least when compared with the PLD available in the standard case.[7]

The Poverty of Stimulus (PoS) argument also reflects the logic of the schemas in (1-3). As the schema suggests, a PoS has two major struts: a description of the available PLD and a description of the grammatical operations of interest (i.e. the relevant rules). The next step compares what information can be gleaned about the operation from the data, the slack is then used to probe the structure of FL. The standard PoS question is then: what must we assume about FL so that given the witnessed PLD, the LAD can derive the relevant rules? As the schema indicates, the inference is from instances of rules (used outputs of a grammatical system) to the rules that generate the observed sentences. Put another way, whatever else is going on, the LPA requires that FL at least contain some ways of generalizing beyond the PLD. This is not controversial. What is controversial is how fancy these methods for generalizing beyond the data have to be. For Es, the generalizing procedures are quite anodyne. For Rs it is often quite rich.

Well-designed PoS arguments focus on grammatical phenomena for which there is no likely relevant information available in the PLD. If Es are right (see Hume above), all relevant grammatical operations and principles should find (robust?) expression in the PLD. If Rs are right, we should find lots of cases where speakers develop grammatical competence even in the absence of relevant PLD (e.g. all agree that “John expects Mary to hug himself” is out and that “John expects himself to hug Mary is good” where ‘John’ is the antecedent of ‘himself’).

It goes without saying that given this logic debate between Es and Rs will revolve around how to specify the PLD in relevant cases (see here for a sophisticated discussion). So for example, all accept the idea that PLD consists of good examples of the relevant operation (e.g. all take: “John hugged himself” to be a typical data point bearing on principle A (A)). What of negative data, data that some example is unacceptable with the indicated interpretation (e.g. that “John expects Mary to hug himself” is out)? There is every reason to think that overt correction of LAD “mistakes” barely occurs. So, in this sense the PLD does not contain negative data. However, perhaps for the LAD absence of evidence is evidence of absence. In other words, perhaps for the LAD failing to witness an example like “John expects Mary to hug himself” leads to the conclusion that the dependency between ‘John’ and ‘himself’ in these configurations is illicit. This is entirely possible. So too with other *-cases.[8]

Note, that this reasoning requires a fancier FL than one that simply assumes that all decisions are made on the basis of positive data. So the logic of LPA is respected here: we compensate for the absence of certain information in the PLD (i.e. direct negative evidence) by allowing FL to evaluate expectations of what should be seen in the PLD were a given construction good.[9] The question an R would ask an E is whether the capacity to compute such expectations doesn’t itself require a pretty hefty native capacity. After all, many things are absent from the data, but only some of these absences tell us anything (e.g. I would bet that for most cases in the PLD the anaphor is within 5 words of the antecedent, nonetheless “John confidently for a man of his age and temperament believes himself to be ready to run the marathon” seems fine).

One assumption I commonly make in considering PoS arguments is that PLD effectively consists of simple acceptable sentences (e.g. “John likes himself”). This is the so-called Degree 0 hypothesis (D0H).[10] If the PLD is so restricted, then FL must be very rich indeed for many robust linguistic phenomena are simply unattested (and recall, induction is impossible in the absence of any data to drive it) in simple clauses; e.g. island effects, ECP effects, many binding effects, minimality effects a.o. The D0H may be too strong, but there are two (maybe one as they are related) reasons for thinking that it is on the right track.

The first is Penthouse Principle (PP) Effects. Ross noted long ago that there are many operations restricted to main clauses but virtually none that apply exclusively to embedded clauses. Subject Aux Inversion and Tag Question formation are two examples from English. If we assume that something like D0H is right(ish) we expect all idiosyncratic processes to be restricted to main clauses where substantial evidence for them will be forthcoming. Embedded clauses, on the other hand should be very regular. At the very least we expect no operations to apply exclusively to embedded domains, the converse of the PP as given D0H there can be no evidence to fix them.

The second reason relates to this. It’s a diachronic argument David Lightfoot gave based on the history of English (here). It is based on a very nice observation: main clause properties can affect embedded clause properties but not vice versa. Lightfoot illustrates this by considering the shift from OV to VO in English. He notes that in the period in which the change occurred, embedded clauses always displayed OV order. Despite this, English changed from OV to VO. Lightfoot reasons as follows: were embedded clause information robustly available there would have been very good evidence that, despite appearances to the contrary in unembedded clauses, that English was OV not VO (i.e. the attested change to VO (which ended up migrating to embedded clauses) would never have occurred. Thus, the fact that English changed in this way is nice (and influences in the other direction are unattested) follows if something like D0H holds (viz. an LAD don’t use embedded clause information child in the acquisition of its grammar). Lisa Pearl subsequently elaborated a sophisticated quantitative version of this argument here and here. The upshot: D0H holds. Of course if it does then the strong version of PoS arguments for many linguistic phenomena readily spring to mind. No data, no induction. No induction, highly structured natively given hypothesis spaces guiding the AD.

OK, this post has gotten out of control and is far too long. Let me end by reiterating the take-home message. Rs and Es differ not on whether nativism but on what is native. And, exploring the latter effectively revolves around considerations of how much information the data contains (and the child can use) in fixing its beliefs. This is where the action is. Research like what G&L review is interesting in that it shows that achieved competence seems quite insensitive to large variations in the relevant usable data. Classical PoS arguments are interesting in that they provide cases where it is arguable that there is no data at all in the input relevant to fixing a given belief. If this is so, then the mechanisms of belief fixation must lean very heavily on the highly structured (and hence restricted nature) of the hypothesis space that ADs natively bring to the belief fixation process. In R/E debates everyone believes that input matters and everyone believes that minds have native structure. The argument is about how much each factor contributes to the process. And this, is something that can only be adjudicated empirically. As things stand now, IMO, the fertility of the Rish position in the domain of language (most of cognition actually) has been repeatedly demonstrated. Score one (indeed many) for Descartes and Kant.

[1] In effect, induction serves to locate a member/members from a given set of alternatives. No pre-specified alternatives, no induction. Thus Fodor’s point: for learning (i.e. belief fixation) to be possible requires a given set of concepts that mediate the process.

Fodor emphasizes that this view, though trivial, is not purely tautological. There does exist a tautological claim that some have confused with Fodor’s. This misreading interprets Fodor as saying that any acquired concept must be acquirable (i.e. a principle of modal logic along the lines of: If I do have the concept that I could have had the concept). Alex Clark, for example, so reads Fodor (here): “There is a tautological claims which is that I have an innate intellectual endowment that allows me to acquire the concept SMARTPHONE in some way, on the basis of reading, using them, talking to people etc. Obviously any concept I have, I must have the innate ability to have it…” 

Fodor notes this possible interpretation of his views at Royaumont (p. 151-2), but argues that this is not what he is claiming. He says the following: “The banal thesis is just that you have the innate potential of learning any concept you can in fact learn; which reduces, in turn, to the non-insight that whatever is learnable is learnable. …What I intended to argue is something very much stronger; the intended argument depends on what learning is like, that is the view that everybody has always accepted, that it is based on hypothesis formation and confirmation. According to that view, it must be the case that the concepts that figure in the hypothesis you come to accept are not only potentially accessible to you, but are actually exploited to mediate the learning…The point about confirming a hypothesis like "X is miv off it is red and square" is that it is required that not only red and square be potentially available to the organism, but that these notions be effectively used to mediate between the organism's experiences and its consequent beliefs about the extension of miv…”  In other words, if inductive logics require given hypothesis spaces to get off the ground and if we attribute an inductive logic to a learner then we must also be attributing to them the given hypothesis space AND we must be assuming that it is in virtue of exploiting the properties of that space in fixing a belief. So far as I can tell, this is what every inductivist is in fact committed to.

[2] Despite the terminological misstep of identifying Rationalism with Nativism on p 127.

[3] In Marr’s program, the grammar includes the rules and derivations that get us from the grey scale sketch to the 2.5D sketch.

[4] This is quoted in Gleitman and Landau, see note 4. The quote is from Hume’s Treatise p 49.

[5] See ‘Every child an isolate: nature’s experiments in language learning,” Chapter 6 of this. See here for a free copy.

[6] Carol Chomsky’s original papers on this topic are appendixed in book. They are well worth reading. On the basis of the reported speech, the Tadoma learners seem indistinguishable from “normal” native speakers.

[7] G&L also note the excess of data problem towards the end of their paper. This is something that Gleitman has explored in more recent work (discussed here and in links cited there). Lila once noted that a picture is worth a thousand words, and that is precisely the problem. In the early period of word learning the child is flooded with logical possibilities when word learning is studied in naturalistic settings. Here induction becomes a serious challenge not because there is no information but because there is too much and narrowing it down to the relevant stuff is very hard. Lila and colleagues have argued that in such cases what the child does bears relatively little resemblance to the careful statistical sampling that one might expect if acquisition were via “learning.” This suggests that there must be a certain sweet spot where data is available but not too available for learning (induction) to be a viable form of acquisition. Where this is not possible other acquisition procedures appear to be at play, e.g. guess and guess again! Note, that this amounts to saying that resource constraints are key factors in making “learning” an option. In many cases, learning (i.e. reviewing the alternatives systematically) is simply too costly, and other less seemingly rational procedures kick in. Interestingly, form an R perspective, it is precisely when the field of options is narrowed (when syntax kicks in) that something akin to classical learning appears to become viable.

[8] For reasons I have never quite understood, many (see here) have assumed that GGers are hostile to the idea that LADs can use “negative” data productively. This is simply false. See Howard Lasnik (here) for a good review. As Lasnik notes, the possibility that negative data could be relevant goes back at least to Chomsky’s LGB (if not earlier). What is relevant, is not whether negative data might be useful but what kinds of minds can productively use it. The absence of a barking is useful when one is listening for dogs. Thus, the more constrained the space of options under consideration the easier it is to use absence of evidence as evidence of absence. If you have no idea what you are looking for, not finding it is of little informational value.

[9] For example, Chater and Vitanyi (C&V) (here) order the available hypotheses according to “simplicity” measured in MDL terms, not unlike what Chomsky proposed in Aspects. Not surprisingly, given such an ordering indirect negative evidence can be usefully exploited (something that would not surprise a GGer). What C&V do not consider are the possibility of cases where there is virtually no relevant positive or negative data in the PLD. This is what is taken to be the strongest kind of PoS argument and is the central case discussed in at least one of the references C&V cite (see here).

[10] Most who think that this is more or less on the right track actually take “simple” to mean un-embedded binding domains (e.g. Lightfoot). This is sometimes called Degree 0⁺. Thus, ‘Bill’ is in the PLD in (i) but not in (ii):

(i) John believes Bill to be intelligent

(ii) John believes (that) Bill is intelligent

31 comments:

Maxim BaruSeptember 9, 2014 at 9:43 PM
great exposition.

somewhat orthogonal but:

I'd like to ask though what you think of Fodor's recent sermons to the effect that this computational story just about only holds for modular domains of cognition, and doesn't scale up. that is, that the computational view of acquisition and thinking which entail all of the details you just described only holds in parts of the mind that work like turing machines (language, perceptual systems): things in which the content is innate, atomistic, compositional, systematic, semantically evaluable in virtue of their syntax etc etc).

for instance heres a quote from a book review: (http://www.lrb.co.uk/v20/n02/jerry-fodor/the-trouble-with-psychological-darwinism) unfortunately there's no way for me to pull a good quote from this review without turning this comment into a monster. but have a look at the whole thing, it's about 6 or 7 paragraphs in.

"I think it’s likely, for example, that a lot of rational belief formation turns on what philosophers call ‘inferences to the best explanation’. You’ve got what perception presents to you as currently the fact and you’ve got what memory presents to you as the beliefs that you’ve formed till now, and your cognitive problem is to find and adopt whatever new beliefs are best confirmed on balance. ‘Best confirmed on balance’ means something like: the strongest and simplest relevant beliefs that are consistent with as many of one’s prior epistemic commitments as possible. But, as far as anyone knows, relevance, strength, simplicity, centrality and the like are properties, not of single sentences, but of whole belief systems; and there’s no reason at all to suppose that such global properties of belief systems are syntactic."

this is obviously not strictly relevant to the linguist but I supposed it would be relevant to you.
ReplyDelete
Replies
Maxim BaruSeptember 10, 2014 at 10:36 AM
I've got the same intuition.

So then let me ask you, in light of this, what do you make of the minimalist inverted Y model of the architecture of the language faculty.

how does information flow in and out in your view? Is the CI interface feeding syntax from some central processor? when something is transferred over to the CI for interpretation does that interpretation become available to the central processor?

Presumably, a mental representation of a thought needs to be available to the syntactic component so that it knows what structure to build?

ReplyDelete
Replies
Alex ClarkSeptember 10, 2014 at 11:33 PM
Norbert, you don't mention linguistic nativism at all -- in your view then the POS is not an argument for linguistically specific hypothesis spaces, but just for (highly) structured ones?

That seems a big change from earlier versions of the argument.
ReplyDelete
Replies
ewanSeptember 14, 2014 at 6:19 AM
The quote you cited does then go on to say that starting by ascribing things to Factor 1 is a research strategy and that good research will then go on to try and reduce Factor 1 to Factor 2. But that doesn't mean I disagree about what the first part says. I agree that that sounds like a good example of something that's logically wrong. I agree that the thrust of their (qualified) statement is "POS proves there is something domain specific and innate, except that it doesn't really of course." So I take it there's a bit of bravado in this.

"One hopes for subsequent revision and reduction of the initial character- ization, so that 50 years later, the posited UG seems better grounded. If successful, this approach suggests a perspective familiar from biology and (in our view) important in cognitive science: focus on the internal system and domain-specific constraints; data analysis of external events has its place within those constraints. At least since the 1950s, a prime goal for theoretical linguists has been to uncover POS issues and then accommodate the discovered facts while reducing the domain-specific component (1)."

A second's reflection on this is enough to reveal that this obviously contradicts "The goal is to identify phenomena that reveal Factor (1) [and not Factor (2)] etc" - if one can take what's been learned through a POS argument about Factor 1 and then swap most of it out for Factor 2 then surely it was not all about Factor 1.

Reflections on Language (1975) had it, I think, the right way around. However I can't find my copy at the moment. As I recall the first chapter first challenges the assumption that learning is necessarily domain general and trivial, presents a simple aux inversion sentence an example of an obvious non-trivial induction problem, and then says "it seems really unlikely to me that anything like structure-dependence could be domain general and innate". In some rhetorically overblown way no doubt but, if I remember right, still leaving it fairly clear that this is an option.
ReplyDelete
Replies
possNovember 6, 2016 at 11:47 PM
Hey Admin your all post are well written and informative.

Pos Solutions
Pointofsale
Retail Pos Software
Pharmacy Software
Pos Systems for Retail
ReplyDelete
Replies

Add comment

Faculty of Language

Comments

Tuesday, September 9, 2014

Rationalism, Empiricism and Nativism -2

31 comments:

Contributors