Friday, April 19, 2013

One FInal Time into the Breach (I hope): More on the SMT

The discussion of the SMT posts has gotten more abstract than I hoped. The aim of the first post discussing the results by Pietroski, Lidz, Halberda and Hunter was to bring the SMT down to earth a little and concretize its interpretation in the context of particular linguistic investigations.  PLHH investigate the following: there are many ways to represent the meaning of most, all of which are truth functionally equivalent. Given this, are the representations empirically equivalent or are there grounds for arguing choosing one representation over the others. PLHH propose to get a handle on this by investigating how these representations are used by the ANS+visual system in evaluating dot scenes wrt statements like most of the dots are blue. They discover that the ANS+visual system always uses one of three possible representations to evaluate these scenes even when use of the others would be both doable and very effective in that context. When one further queries the core computational predilections of the ANS+visual system it turns out that the predicates that it computes easily coincide with those that the “correct” representation makes available. The conclusion is that the one of the three representations is actually superior to the others qua linguistic representation of the meaning of most, i.e. it is the linguistic meaning of most.  This all fits rather well with the SMT. Why? Because the SMT postulates that one way of empirically evaluating candidate representations is with regard to their fit with the interfaces (ANS+visual) that use it. In other words, the SMT bids us look to how grammars fit with interfaces and, as PLHH show, if one understands ‘fit’ to mean ‘be transparent with’ then one meaning trumps the others when we consider how the candidates interact with the ANS+visual system.

It is important to note that things need not have turned out this way empirically. It could have been the case that despite core capacities of the ANS+visual system the evaluation procedure the interface used when evaluating most sentences was highly context dependent, i.e. in some cases it used the one-to-one strategy, in others the ‘|dots ∩ blue| - |dots ∩ not-blue|’ strategy and sometimes the ‘|dots ∩ blue| - [|dots| - |dots ∩ blue|]’ strategy.  But, and this is important, this did not happen. In all cases the interface exclusively used the third option, the one that fit very snugly with the basic operations of the ANS+visual system. In other words, the representation used is the one that the SMT (interpreted as the Interface Transparency Thesis) implicates. Score one for the SMT. 

Note that the argument puts together various strands: it relies on specific knowledge on how the ANS+visual system functions. It relies on specific proposals for the meaning of most and given these it investigates what happens when we put them together. The kicker is that if we assume that the relation between the linguistic representation and what the ANS+visual system uses to evaluate dot scenes is “transparent” then we are able to predict[1] which of the three candidate representations will in fact be used in a linguistic+ANS+visual task (i.e. the task of evaluating a dot scene for a given most sentence[2]).[3]

The upshot: we are able to use information from how the interface behaves to determine a property of a linguistic representation.  Read that again slowly: PLHH argue that understanding how these tasks are accomplished provides evidence for what the linguistic meanings are (viz. what the correct representations of the meanings are). In other words, experiments like this bear on the nature of linguistic representations and a crucial assumption in tying the whole beautiful package together is the SMT interpreted along the lines of the ITT. 

As I mentioned in the first post on the SMT and Minimalism (here), this is not the only exemplar of the SMT/ITT in action. Consider one more, this time concentrating on work by Colin Phillips (here). As previously noted (here), there are methods for tracking the online activities of parsers. So, for example, the Filled Gap Effect (FGE) tracks the time course of mapping a string of words into structured representations.  Question: what rules do parsers use in doing this. The SMT/ITT answer is that parsers use the “competence” grammars that linguists with their methods investigate. Colin tests this by considering a very complex instance: gaps within complex subjects. Let’s review the argument.

First some background.  Crain and Fodor (1985) and Stowe (1986) discovered that the online process of relating a “filler” to its “gap” (e.g. in trying to assign a Wh a theta role by linking it to its theta assigning predicate) is very eager.  Parsers try to shove wayward Whs into positions even if filled by another DP.  This eagerness shows up behaviorally as slowdowns in reading times when the parser discovers a DP already homesteading in the thematic position it wants to shove the un-theta marked DP into. Thus in (1a) (in contrast to (1b), there is a clear and measurable slowdown in reading times at Bill because it is a place that the who could have received a theta role.

(1)  a. Who did you tell Bill about
b. Who did you tell about Bill

Thus, given the parser’s eagerness, the FGE becomes a probe for detecting linguistic structure built online. A natural question is where do FGEs appear? In other words, do they “respect” conditions that “competence” grammars code?  BTW, all I mean by ‘competence grammars’ are those things that linguists have proposed using their typical methods (one’s that some Platonists seem to consider the only valid windows into grammatical structure!)?  The answer appears to be they do. Colin reviews the literature and I refer you to his discussion.[4]  How do FGEs show that parsers respect grammatical structure? Well, they seem not to apply within islands! In other words, parsers do not attempt to related Whs to gaps within islands. Why? Well given the SMT/ITT it is because Whs could not have moved from positions wihin islands and so they are not potential theta marking sites for the Whs that the parser is eagerly trying to theta mark. In other words, given the SMT/ITT we expect parser eagerness (viz. the FGE) to be sensitive to the structure of grammatical representations, and it seems that it is.

Observe again, that this is not a logical necessity. There is no a priori reason why the grammars that parsers use should have the properties that linguists have postulated, unless one adopts the SMT/ITT that is. But let’s go on discussing Colin’s paper for it gets a whole lot more subtle than this. It’s not just gross properties of grammars that parsers are sensitive to, as we shall presently see.

Colin consider gaps within two kinds of complex subjects. Both prevent direct extraction of a Wh (2a/3a), however, sentences like (2b) license parasitic gaps while those like (3b) do not:

(2)  a. *What1 did the attempt to repair t1 ultimately damage the car
b. What1 did the attempt to repair t1 ultimately damage t1
            (3) a. *What1 did the reporter that criticized t1 eventually praise the war
                 b. *What did the reporter that criticized t1 eventually praise t1

So the grammar allows gaps related to extracted Whs in (2b) but not (3b), but only if this is a parasitic gap.  This is a very subtle set of grammatical facts.  What is amazing (in my view nothing short of unbelievable) is that the parser respects these parasitic gap licensing conditions.  Thus, what Colin shows is that we find FGEs at the italicized expressions in (4a) but not (4b):

(3)  a. What1 did the attempt to repair the car ultimately …
b.   What1 did the reporter that criticized the war eventually …

This is a case where the parser is really tightly cleaving to distinctions that the grammar makes. It seems that the parser codes for the possibility of a parasitic gap while processing the sentence in real time.  Again, this argues for a very transparent relation between the “competence” grammar and the parsing grammar, just as the SMT/ITT would require.

I urge the interested to read Colin’s article in full. What I want to stress here is that this is another concrete illustration of the SMT.  If grammatical representations are optimal realizations of interface conditions then the parser should respect the distinctions that grammatical representations make. Colin presents evidence that it does, and does so very subtly. If linguistic representations are used by interfaces, then we expect to find this kind of correlation. Again, it is not clear to me why this should be true given certain widely bruited Platonic conceptions. Unless it is precisely these representations that are used by the parser, why should the parser respect its dicta?  There is no problem understanding how this could be true given a standard mentalist conception of grammars. And given the SMT/ITT we expect it to be true. That we find evidence in its favor strengthens this package of assumptions.

There are other possible illustrations of the SMT/ITT.  We should develop a sense of delight at finding these kind of data. As Colin’s stuff shows, the data is very complex and, in my view, quite surprising, just like PLHH’s stuff. In addition, they can act as concrete illustrations of how to understand the SMT in terms of Interface Transparency.  An added bonus is that they stand as a challenge to certain kinds of Platonist conceptions, I believe.  Bluntly: either these representations are cognitively available or we cannot explain why the ANS+visual system and the parser act as if they were. If Platonic representations are cognitively (and neurally, see note 3) available, then they are not different from what mentalists have taken to be the objects of study all along. If from a Platonist perspective they are not cognitively (and neurally) available then Platonists and mentalists are studying different things and, if so, they are engaged in parallel rather than competing investigations. In either case, mentalists need take heed of Platonist results exactly to the degree that they can be reinterpreted mentalistically. Fortunately, many (all?) of their results can be so interpreted.  However, where this is not possible, they would be of absolutely no interest to the project of describing linguistic competence. Just metaphysical curiosities for the ontologically besotted.

[1] Recall, as discussed here, ‘predict’ does not mean ‘explain.’
[2] Remember, absent the sentence and in specialized circumstances the visual system has no problem using strategies that call on powers underlying the other two non-exploited strategies. It’s only when the visual system is combined with the ANS and with the linguistic most sentence probe that we get the observed results.
[3] Actually, I overstate things here: we are able to predict some of the properties of the right representation, e.g. that it doesn’t exploit negatively specified predicates or disjunctions of predicates.
[4] Actually, there are several kinds of studies reviewed, only some of which involve FGEs. Colin also notes EEG studies that show P600 effects when one has a theta-undischarged Wh and one crosses into an island. I won’t make a big deal out of this, but there is not exactly a dearth of neuro evidence available for tracking grammatical distinctions.  They are all over the place. What we don’t have are good accounts of how brains implement grammars. We have tons of evidence that brain responses track grammatical distinctions, i.e. that brains respond to grammatical structures. This is not very surprising if you are not a dualist. After all we have endless amounts of behavioral evidence (viz. acceptability judgments, FGEs, eye movement studies, etc.) and on the assumption that human behavior supervenes on brain properties it would be surprising if brains did not distinguish what human subjects distinguish behaviorally. I mention this only to state the obvious: some kinds of Platonism should find these kinds of correlations challenging. Why should brains track grammatical structure if these live in Platonic heavens rather than brains?  Just asking.


  1. I assume there will be a discussion of this soon:

  2. I'm not sure if you already had this taken up with you but I can't find it elsewhere in the comments. This seems to be not a good case of SMT in fact. After discussion with Jeff, it sounds like the reverse: the visual system *can* demonstrably compute all the alternatives, and it therefore must be FL imposing the constraint. This contradictory interpretation is evident in your post even. You first write:

    "... They discover that the ANS+visual system always uses one of three possible representations to evaluate these scenes even when use of the others would be both doable and very effective in that context. ...."
    [i.e. by the ANS+visual system]

    but then write

    "... things need not have turned out this way empirically. It could have been the case that despite core capacities of the ANS+visual system the evaluation procedure the interface used when evaluating most sentences was [something different] ..."
    [implying that this evaluation procedure is somehow privileged by ANS+V]

    - which is it?

  3. The SMT says that optimal ling reps will be perfect matches for the interfaces. If we assume that the predicates and relations optimal in the former are also optimal in the latter we have an instance of the SMT. Ok, what's the optimal representation of the meaning of 'most'? Well there are 3 options PLHH consider. THe first is knocked out if it is optimal either in the interface OR the ling system to use cardinalities. PLHH argue that the ANS+visual system can use either. However, linguistically, one can argue, that cardinalities rule. Why? Because for other determiners (e.g. 'exactly 3' 'four more Xs than Ys' seem to require cardinal measures on the assumption that determiners are generalized quantifiers (as Frege suggested). So given that we need cardinalities anyhow and that they suffice to code the meanings of all known determiners, then assuming that all determiners are use cardinalities is the "optimal" assumption.

    This leaves the selective versus subtractive representations. Here, PLHH argue that the subtractive representation fits with the ANS+visual systems most generally, i.e. they argue that the selective system of predicate representations only works in a subset of cases while the subtractive method is fully general. Plus they note that the ANS+visual always tracks the "full" set of dots, and must always track the blue dots and that this is always enough for the general case, i.e. regardless of how many non-blue dots there are. Thus, in the general case, the subtractive representation is optimal given restrictions on the ANS+visual system. Conclusion: the actual representation is the optimal one when considered BOTH from the perspective of the linguistic system and the perspective of the interface.

    That's how I see the full argument going. The papers I cited dealt mainly with step 2 (subtractive versus selective representations). The interfaces per se don't choose between the One-to-one rep vs the cardinality reps. But, I think we can argue that here the ling system, wanting to avoid redundancy, will prefer a system that suffices for all linguistically possible determiners and it seems that using cardinalities does indeed suffice. So, Ockham urges cardinalities.

    The SMT is about "fit" between FL and interfaces. The fit goes in both directions. You cannot have predicates and operations in either that is not fine in each. I understand PLHH as presenting an argument in which a perfect (or superior) fit holds with a representation in terms of cardinalities subtractively related.

    Hope this makes things clear.

  4. okay, I can see that, it looks like the crucial reasoning is

    "the subtractive representation fits with the ANS+visual systems most generally, i.e. they argue that the selective system of predicate representations only works in a subset of cases while the subtractive method is fully general"

    - from LHPH (PLHH is the older one):

    "[suppose you put different colors of nonblue dots and you are asked if 'most of the dots are blue.] Because the nonblue dots are a heterogenous set, they cannot be attended directly. Moreover, building up the nonblue dots by constructing a disjunctive combination of all nonblue sets is also not a straightforward visual computation. Listeners simply would not be able to
    directly attend the heterogeneous set of nonblue dots."

    -- yet nevertheless you COULD do something like this, but it would make a particular behavioral prediction. we don't see that, thus it looks like you DON'T do this. yes, okay, I see the extra layer, the SMT layer, which is an explanation of WHY, to wit [what LHPH just said]. that part of the reasoning is optional, and they sound less committed to it than you, but I would rather be committed.

    however, now I wonder about the nature of the explanation: I'm imagining some sort of optimization procedure, over all possible types of data, over all possible interfaces: what would be the best fitting meaning, given the interfaces and the problems they will have to solve. this seems like a nasty optimization problem, and, whether it's solved in the phylogeny or the ontogeny, it seems likely that there are going to be some harder cases than this that we need heuristics for. type 1: different interfaces I'm going to have to talk to express different preferences about the optimal meaning; type 2: different stimuli at the same interface express different preferences about the optimal meaning; and, of course, type 3=type 1 + type 2. can we come up with cases like this?

  5. Yes, the last problem you note is a very nasty one, which is why I take the SMT to be more methodological than metaphysical (I tried to say something about this in another post). However, right now, before we wonder about how or whether we have multiple optimization problems, it would be nice to have a couple of SMT examples on the table. This was why I flagged the PLHH stuff and Colin's stuff and Berwick and Weinberg's stuff. These are examples of how the reasoning might be rooted in some real results. At this moment in time, they don't seem to pull in opposite directions. Last point: I think that you read me as saying the the SMT causes the transparency. I doubt it. Rather the SMT suggests we look for these as a way of probing the structure of mental representations. If such exist, they will tell us something interesting. There is a whole other question of WHY such nice mappings exist. My hunch is (and it is even vaguer than a hunch really) is that there has not been enough time for the ling reps to be fitted to the interfaces should they not fit well. Thus, the only ones we see are the ones that fit very well for the others have no contact with FL at all. If this is coherent, then only the well fitting will be visible at all, as the ones that don't fit simply won't have any visible interface effect.

    Hope this helps.