"An increasingly popular theory holds that the mind should be viewed as a

*near-optimal*or*rational*engine of probabilistic inference, in domains as diverse as word learning, pragmatics, naive physics and predictions of the future. We argue that this view, often identified with Bayesian models of inference, is markedly less promising thatn widely believed, and is undermined by post hoc practices that merit wholesale reevaluation. We also show that the common equation between*probabilistic*and*rational*or*optimal*is not justified."
The paper reviews most of the widely cited banner Bayesian papers produced by Griffiths, Tenebaum, Chater, etc. and, IMO, shows that they are seriously defective. The problems include:

- Models do not generalize to "slightly different configurations" (3).
- The "probabilistic-cognition literature as a whole may disproportionately report successes...which would lead to a distorted perception of the applicability of the approach" (3).
- Priors and decision procedures appear to be chosen on an
*ad hoc*basis and these "seemingly innocuous design choices can yield models with arbitrarily different predictions" (5).

M&D identify a methodological source for these problems: "...the tendency of advocates of probabilistic theories of cognition (like researchers using many computational frameworks) to follow a breadth-first search strategy in which the formalism is extended to an ever broader range of domains...rather than a depth-first strategy, in which some challenging domain is explored in great detail with respect to a wide range of tasks" (3-4).

This breadth vs depth notion resonates in the ling domain in particular. Research in Generative Grammar over the last 60 years has uncovered a bunch of pretty good generalizations about how grammars function (e.g. Binding Theory, Bounding Theory, ECP etc.). I have often asked those that do not like Generative accounts to show me how to derive

*these*generalizations using their favored assumptions. There aren't any. But a good alternative theory should aim to explain the properties of these rich detailed domains, at least if the intent is to convince people like me (most likely not a top priority).
At any rate, back to M&D. This is the third of a series of papers that have come out very critical of new Bayesian turn (e.g. see here). They all make similar points, citing different data/arguments each time: Bayesianism per se is to loose to explain much and the results that have been touted have been massively oversold. Or to put this in M&D's words: "...probabilistic models have not yielded a robust account of cognition. They have not converged on a uniform architecture that is applied across tasks; rather there is a family of different models, each depending on highly idiosyncratic assumptions tailored to an individual task...the approach is well on its way to becoming a Procrustean bed into which all problems are fit..." (8). I don't know about you, but to me this does not sound like a rave review.

M&D like the other critics before them, not the undeniable: Bayes is "a useful tool," nothing to sneeze at, but not quite the magic lever that will break open the hard problems of human cognition. It strikes me that now may the time to short your Bayes stock.

This comment has been removed by the author.

ReplyDeleteThe article doesn't incline me to dump my position, which is in any event not the one attacked. My line of thought goes as follows:

ReplyDelete1. Bayesian reasoning is optimal if you have the correct numbers to plug in for your circumstances as you know them (as medical statistics can provide for the basic 'do I have cancer?' example). Evolution can be counted on to provide these numbers for organisms that learn how to recognize predators, food sources, etc., so we can expect that smart animals will contain implementations of Bayesian learners (perhaps multiple ones for different tasks).

2. The faculty of language appears to have emerged/been thrown together hastily from a pile of stuff that was lying around, and clearly involves learning, which by 1. can be assumed to be a computable-by-nervous-system approximation to Bayesian, but it remains to ascertain what the priors are, and exactly how the likelihoods are calculated. I'll add that as far as I can see, you don't really need Bayes' theorem to do this, if the evidence e (the PLD) is seen stuff, the hypothesis h (the grammar) is unseen stuff, and suppose that the grammar selection task is to maximize P(h)P(e|h)=P(e,h), no P(e) or P(h|e) required, since we don't care how probable it is that the grammar that looks the best is actually the correct one.

So maybe my position isn't really Bayesian at all, but 'pre-Bayesian', but that would apply to most of the linguistically relevant stuff I'm aware of, such as Lisa Pearl's and Amy Perfors' work. But the specific (pre-?) Bayesian contribution is to clarify how indirect negative evidence can in principle be used, if it turns out to be needed (which I think is almost certainly the case, but that's a different discussion).

One more thing to add to the above: So don't have to, and almost certainly shouldn't, assume that the priors for grammar acquisition are 'optimal' in any general sense, but rather use Bayesian modelling to help ascertain what they are. Which I think Pearl is especially clear about, but Perfors is also.

Delete"don't have to, and almost certainly shouldn't, assume that the priors for grammar acquisition are 'optimal' in any general sense, but rather use Bayesian modelling to help ascertain what they are."

DeleteI'm not sure one can do that. (Not that I don't do it all the time, but that doesn't make it possible.) The M&D paper, as I read it, is not committed to the implication that "Bayesian cognition" (BC) necessarily implies "priors which are optimal in a general sense" (OP). The issues they raise are about cases where, in fact, the prior assumptions the model needs to make about how the world works don't bear a very close resemblance to how the world actually works at all. It sounds like that's the kind of "optimal" prior you're alluding to. And you're saying that BC could still be true, even if OP is not.

That's right. BC means only "optimal inference" (OI), and maybe "optimal decisions" (OD), not necessarily OP. In some places they recognize that. But in those cases, they point out, as has been pointed out before, that behavior that fails to conform to [+OI,+OP] won't be able to tell you whether you're dealing with (putting aside decisions) [+OI,-OP], [-OI,+OP], or [-OI,-OP]. And once you're in that mess, you can (almost) always pick a prior which is -OP to save the crucial BC hypothesis of +OI. You can use Bayesian modeling to determine what the shape of the -OP prior is once you know that +OI is true. But not if you don't. The Pearl and Goldwater stuff claims that, actually, it's not.

What needs to happen, in my view, is that we need independent evidence about the priors, optimal or not. For example, very different tasks in very different domains might tap the same biases. Strong BC should say that if we assume OI, (I think OD is hopeless), we will come to the same conclusion about the nature of the prior, regardless of the task.

I think I accept most of this, especially the last paragraph, but which Pearl and Goldwater stuff are you referring to. "How Ideal are We?" (2009)? I think I'm willing to buy OI as an initial, provisional, hypothesis from which departures need to be justified, on the basis that evolution will tend to tune learning mechanisms to the problems they're trying to solve.

Delete(The standard justification, it seems to me.)