Faculty of Language: What are theta roles for?

Wednesday, September 30, 2015

What are theta roles for?

So here’s my question: What’s the point of theta theory? What does a theta role do? Here’s my impression: we want theta theory to do two different kinds of things and it is not clear to me that any theory can (or should) do both. What are these two things? They are an integral part of the semantic interpretation of a sentence and they are the means by which arguments are linked to syntactic positions in “D-structure.” I should point out that noting these dual desiderata is not original to me, but arises from what I recall were earlier important discussions of these matters by Dowty, Grimshaw, and others. Nonetheless, I feel that these issues have become more obscure over time and I would like to engage in a rambling re-think. This is all in the way of excusing the shambolic nature of what follows. Hard as it is for me to present clear arguments in general, in this case I am not even going to try. I just want to sorta kinda survey the options and try to clear up my own confusion. Needless to say, I am relying on the kindness of others to clear up the mess. Here goes.

The literature seems to have two different (though possible related, we shall see) desiderata for theta roles:

(1) Theta roles are required for semantic interpretation

(2) Theta roles are required to get the LAD from primary linguistic data to a G.

Let’s discuss each of these a little bit. The first view of theta roles treats them as essential semantic notions. Without theta roles, arguments would not have a semantic interpretation and given that Gs map meanings and (“with,” if you are a thoroughly modern minimalist (TMM)) sounds then we need some conception of meaning which is the target of the mapping and theta roles are taken to be one component of a well-formed meaning.

The second view treats theta roles as levers for getting a language acquisition device (aka: a child or LAD) from primary linguistic data (PLD) to a G, most particularly, from PLD to a “D”-structure. The ‘D’ here is in scare quotes here for as any TMM knows we have dispensed with D-structure in the GB sense, yet, so far as I know, every theory of GG has some analogue thereof, including current minimalist accounts. By ‘D-structure’ I just mean the G structure in which arguments are grammatically linked up to their predicates (or vice versa).[1] A G establishes thematic links before establishing any further dependencies that expressions grammatically enter into (e.g. agreement, case, binding, and especially, movement). Fixing this first relation, the one where arguments join predicates, is very important because it is very hard to study all the other dependencies, especially movement, if you have no idea where expressions begin their grammatical lives.

Is there a necessary relation between these two desiderata? Perhaps, perhaps not (though I suspect not). Here’s what I mean. It might be that the conception of theta role required for semantic interpretation is identical to the one that used to get LADs from PLD to Gs. However, there is no obvious reason why this need be the case. In particular, the conception of theta role required for semantic interpretation seems to be at a different grain than the one useful from priming the G pump. Let me explain.

One conception of theta role is simply as a place-holder for the notion “argument.” For example, all we mean when we say that some DP has the “agent” theta role is that it is the external argument of some predicate. The designation “agent” does not mean much, save indicating which of the ordered arguments of a predicate some DP is related to.[2]

The problem with this conception is that it is not clear how it helps with (2). In particular, it is quite unlikely that LADs know the meanings of the predicates they are being exposed to and so it is not clear how they could use this thin sense of theta role to acquire their G. Rather, what we would like is some good coarse rule of thumb that the LAD can use to vault into the G given some PLD. This is where notions like ‘agent’ and ‘patient’ gain their value. Being an agent or patient (a doer or done-to) is plausibly an observational feature of an event participant. In other words, the substantive interpretation of notions like agent and patient plausibly have what Chomsky called “epistemological priority” (EP). They are observable non-linguistic predicates that can be used to map PLD to (non-observable) grammatical dependencies. An example of such a useful mapping rule would be “Agents are always external arguments, patients always internal arguments.” If every D-structure corresponded to a set of (observable) theta roles with the right linking rules, then we could solve the problem of how an LAD gets from PLD to abstract Gish structures.

Now the problem is that it turns out to be hard come up with such substantive thematic notions that are also plausibly semantically general. Another way of saying this is that though there are plausibly some clear cases of “agenthood,” it is not clear that the same notion extends usefully to all (or even most) verbal subjects. Thus, though kickers may be prototypical agents, lovers may not be. At any rate, one of the well-known problems is that such substantive theta roles have a problem extending to all predicates.

One important and influential solution to this is Dowty’s work (here). It defines super categories of theta roles, collapsing them into two “proto”-flavors; Proto-Agent (P-A) and Proto-Patient (P-P). Proto roles are defined over the full semantics of a predicate indexed to a particular argument position. Thus, P-As are “verbal entailments about the argument in question,” (i.e. those DPs that have more of the “agent” properties than any other DP in that argument structure).[3] On this conception, an argument is P-A if it has a preponderance of the following properties in a given proposition: it is volitional, sentient, a causer of events or changes of states in another participant, a mover, exists independently of event named by verb ((27): 572). Indices of P-P are undergoing a change of state, being an incremental theme, being casually affected by another participant, being stationary relative to movement of another participant, not existing independently of the named event ((28: 572). These are among the contributing factors that Dowty suggests for classifying arguments into one of the proto categories (he is quite clear that these may not exhaust the relevant entailments). Note that on this conception, proto roles are defined in terms of the more articulated semantics of the sentence. In other words, given the meaning of a sentence we can compute a coarser grained classification of arguments into super categories that “average” over the differences. On this conception, proto-roles “loose” information that the actual meaning of the sentence contains.

Not surprisingly, for Dowty proto-roles do not determine semantic interpretations for they presuppose them (i.e. proto-roles are defined in terms of the entailments of the argument in question in the specific proposition). Thus, on this view, proto-roles are not important for (1) above. Their special function (if they are important at all, which is something that Dowty often questions) is to provide an account of how arguments map to syntactic positions given that we know the verbal implications of that argument (i.e. what the proposition means).

IMO, the most interesting version of proto-role theory is Baker’s UTAH version (see here).[4] UTAH directly addresses the problem of how to get from pre-linguistic information into the syntax. The idea is that proto-roles mediate the mapping from PLD to “D-structure,” (e.g. P-As map to underlying subjects and P-Ps to underlying objects). Thus, proto-roles are understood to enjoy epistemological priority and are thus able to mediate a mapping to the linguistic system. What is less clear is that Baker’s understanding of proto-roles is really the same as Dowty’s. Why?

Well first, it seems unlikely, at least to me, that LADs compute proto-roles for a given predicate to see how they map onto the syntax. This presupposes that LADs have a rather rich understanding of the meaning of each predicate prior to having any linguistic analysis of the sentence. Some features of the “scene” may be evident (e.g. on hearing “Fido is biting the ball” it is evident that Fido is an “agent” and the ball a ‘patient’) but it seem to me unlikely that this is a consequence of a computation over the meaning of “bite” indexed to the subject and object positions. Rather, here the two notions are simple primitives applying more or less (im)perfectly to the scene at hand. To get from PLD to G, this kind of sloppy information may suffice (at least for a sufficient number of verbs) but it is unlikely to be based on a prior full understanding of the predicates involved. Rather the opposite. Of course, once the G is engaged, then there is more than theta theory available to guide the LAD. So, for the linking problem, all that UTAH must do is get the LAD into the G, then the G can offer other kinds of linguistic information useful for acquiring the G of interest.

Second, Baker also assumes that the theta roles that solve the linking problem are also inputs to the semantic interpretation of the sentence. Note that this is very different from Dowty. For Dowty, proto-roles are too coarse to provide a semantic interpretation. Baker’s suggestion that theta roles are critical to meaning (rather than notions derived from the meaning) assumes a different conception of linguistic meaning than Dowty’s conception. It is unclear to me whether this conception has been fully articulated.

There is a second influential view of theta roles, one that aims to tie it more tightly to a natural semantics. This eschews proto-roles and develops a more articulated inventory of thematic functions. So, here we get not just two or three roles but a myriad of these. Agents, causers, experiencers, instruments, goals, sources, beneficiaries, targets of emotion, etc. This richer conception allows theta structure to explicate argument structure. Theta roles don’t just reflect meaning. They determine it. Here, theta roles are cut thinly enough so that they can support intuitive differences in the meanings of different predicates. Not all agents/causers/experiencers are the same. We need hyphenated versions of these to get the full range of mappings that all the different predicates in a language manifest.

There are two main problems with this conception, I believe. First, as Dowty argues quite persuasively, we really don’t have an even approximately decent theory of what these richer roles are or how to specify them. In particular, there are many many verbs where it is quite unclear what the theta roles of the relevant arguments is let alone how they differ. The most obvious cases involve symmetrical predicates like ‘face’ (e.g. “Carnegie Hall faces the Carnegie Deli”) or ‘resemble’ (“Bill resembles Sam”). In such cases it is quite difficult to see what thematic difference might distinguish one argument from the other. And this problem generalizes. Why? Because there are many different ways of being an agent and it is not at all clear that a hugger is an agent in the exact same way that a lover is. But if these differences are semantically relevant, then it appears that we will need about as many theta roles as we have predicates. This is effectively Dowty’s point, but in the other direction. You can’t get from agents directly to huggers as the concept is intended to abstract away from what makes huggers different from lovers. But if you want to get all the way to the actual semantic role that subjects of these particular predicates play, then you will need a lot of hyphenated theta roles.

Second, it is not clear whether this conception will get you any purchase on (2). Again, as Dowty notes, for this end we want a coarser notion, one that will allow us to map arguments to syntactic structure in some general way. Cutting roles too finely will not yield a simple mapping from roles to structure.

It is worth considering for a minute how these two conceptions interact with the theta criterion. As Grimshaw, among others, noted a long time ago, the theta criterion can make do with a very thin conception of theta role. All it requires is that whatever a theta role is a DP must get one and no more than one of them. It does not matter how we distinguish roles, only that we have some way of tying roles to syntactic positions. The prohibition amounts to the claim that an argument must saturate some position and cannot saturate more than one. So far as the theta criterion goes, we don’t really need a general conception of theta role, only of something like “argument position.” The theta criterion restricts arguments to one and only one of these.

A substantive theory of theta roles, one where the kinds of theta roles we have matter, then only really arises with the linking problem. Here we need theta roles that enjoy EP because grammatical notions are not observables, and so to prime FL, to get us to Gs, we need some notions that can bridge the G non-G divide (i.e. some observables that are (at least weakly) correlated to Gish concepts).

Let me be a little clearer. Subject-hood and object-hood are not observable except via an FL lens. Agent-hood and patient-hood likely are. My (ex) dog Sampson could parse many scenes into agents and patients (doers and done-tos), at least some of the time. If this is so (and I am certain that it is) then these sorts of notions have EP status (they are not parasitic on FL for their viability), and these notions can be used to prime FL via something like UTAH (i.e. agents are subjects, patients are objects). UTAH uses the EP thematic notions to access FL given some PLD. But, if this is what one needs thematic notions for, then it is not at all clear that every argument in every sentence need have a theta role. All that is required is that enough PLD can be parsed in this way to get the G system off the ground. Once the LAD has accessed FL and started developing a G then these Gish notions can take over/supplement the analysis of the PLD. In other words, theta roles as EPs need not be very general (i.e. cover every conceivable predicate and argument), they just need to be general enough to cover enough PLD predicates to prime FL and get it going. Once FL is engaged then its resources are available for further linguistic analysis. And for this purpose, these notions can be (actually should be) quite coarse as their aim is not to provide an interpretation for the sentence but to just crack open the FL module and make it usable by the LAD, which, when on-line, is then able to provide (more) grammatical ways of analyzing the incoming PLD (e.g. this agrees with that so this is a subject, this is adjacent to the verb so this is the object, etc.).

One might go a step further here, I think. To solve the linking problem you want coarse roles that are not determined by calculating the verbal inferences of an argument. Why? Because this is just too fancy a procedure. You want very coarse indicators, those that Sampson could (and did) use. The problem with proto-roles as understood by Dowty is that they don’t seem to be EPish. They are not so much observables as inferables. What I mean is that to get proto-roles the LAD would need to compute inferences off of pretty sophisticated semantic representations. And these need not be very accessible. Better to have limited coarse-grained properties that fit a small number of available predicates than to have a sophisticated system that generalizes across all predicates. You just don’t need the latter if what you want to do is solve the linking problem.

I’ve rambled on long enough and repeated myself way too much (as if repetition and clarity go hand in hand!). Here is what appears to be the main conclusion: we seem to have been asking theta roles to do two things that don’t obviously pull in the same direction. We want them to provide an interpretation for the sentence and to solve the linking problem. However, the kind of roles we want for the first appear to be different from the kinds of roles we need for the second. IMO, the linking problem is the important one for GG. But if this is right, then having a theory of roles that applies to every DP in every sentence is unnecessary (or at least not obviously required). We need a few gross observational roles that apply to enough PLD predicates to get a G up and running. Once engaged, an LAD gets immediate access to a whole slew of linguistic features that the LAD can effectively use to continue acquiring its G. One this conception, we just don’t need a general theory of theta roles (i.e.one that assigns each argument an interpretive role). Which seems like a good thing given that one does not appear to be currently available or likely to be forthcoming.

[1] Everyone (including advocates of the movement theory of control, e.g. me) assumes that at least the following is accurate: every (contentful (i.e. non pleonastic)) DP enters the derivation through a thematic door. Thus the first relation that any such DP grammatically enters into is a thematic relation. This is also true of every version of minimalism that I am aware of.

[2] Even for neo-Davidsonians like Scheine and Pietroski where theta roles serve an important type-lifting role (they are relational predicates that tie a DP to an event variable), all that is generally required is a distinction between internal vs external argument. What flavor these are (whether they are agents or experiencers or causes or…) does not really matter much. The same is even truer for standard conceptions where arguments are effectively related to their predicates by saturating a variable position of the predicate via lambda conversion.

[3] Arguments actually, for it not defined syntactically but over the propositional structure. We can say that a DP has the proto-role in virtue of representing the relevant argument. I will leave such niceties aside here.

[4] This is an online version of the paper that appeared in Haegeman’s edited volume Elements of Grammar. It is a great paper. One of those that I wish that I had written.

15 comments:

kgrOctober 1, 2015 at 8:59 AM
"In other words, the substantive interpretation of notions like agent and patient plausibly have what Chomsky called “epistemological priority” (EP). They are observable non-linguistic predicates that can be used to map PLD to (non-observable) grammatical dependencies."

Could you say why you believe this? I don't think it is plausible at all that coarse-grained thematic role concepts are prior in any respect. If anything is plausibly "prior", it is non-linguistic correlates of the finer-grained properties Dowty looked at like causation, volition, animacy, change, movement, etc.
ReplyDelete
Replies
UnknownOctober 1, 2015 at 9:32 AM
I have the same question as Kyle. I don't see why "proto-roles as understood by Dowty...don’t seem to be EPish." If anything, they seem much more EPish. Is AGENT really more of a natural class than the sorts of properties often associated with AGENTs---e.g. VOLITIONAL, SENTIENT, etc.?

(I'm actually unclear here whether you mean to have more specific roles here or not---e.g. AGENT-OF-KICKING---presumably for Fodor&Leporian reasons. In that case, do we really want UG---or at least the learner's initial linking algorithm---to come prebuilt with all the linking roles you'd need to make that work---e.g. AGENT-OF-KICKING -> SUBJECT or a rule that respects the hierarchy AGENT-OF-KICKING > PATIENT-OF-KICKING?)

So I think your response is, give me whatever list of EP roles/properties you want, Dowty-style linking rules are just "too fancy a procedure" for a learner to employ. I could imagine this response coming from two sources: (i) Dowty's linking rules are fundamentally relational (on arguments), while you might be able to have a simpler set of nonrelational linking rules like AGENT -> SUBJECT, PATIENT -> OBJECT, at least to get the learner started; or (ii) Dowty's linking rules are globally relational (you have to take into account all of the relevant entailments), while the simple rules are relational in a much simpler way (they make reference to a thematic hierarchy like AGENT > PATIENT).

That characterization seems accurate (though correct me if you wouldn't give such a characterization), but I doubt it tells us what kinds of learners are more or less plausible. The stupid learner in (i) is going to make a lot of stupid mistakes on really simple verbs---e.g. "NP break" v. "NP break NP"---that could well show up downstream. The non-Dowty relational learner in (ii) will do better, but at the cost of relationality.

But why should I believe that comparing a few more plausibly EPish properties a la Dowty is going to be harder than comparing single roles (especially as the number of EPish roles expand)? The big jump seems to be granting relationality in the linking rules. And once I've gone the relational route, it's straightforward to state the two alternatives in the same learning model by slightly altering the representational constraints (as is true of all problems that contrast a categorial system with the analogous featural system). And at that point, it's an empirical question.
ReplyDelete
Replies
Dennis O.October 23, 2015 at 7:29 AM
It seems to me that, at least on a broad conceptual level, the tension between Dowty's and Baker's views is much less severe, perhaps non-existent, if theta-roles and the Theta Criterion are construed as belonging to the external interpretive ("C-I") system(s) -- which seems to me to be the only meaningful interpretation of these notions within the current standard model (and the only sense in which they can be said to have epistemological priority). UTAH, or whatever variant of it we want to adopt, is then a principle regulating how those systems read the structures generated by G (more specifically, the "D-structure", now called "vP phase"); this seems to be the view adopted by Baker towards the end of the paper you cited. Then G need not know anything about theta properties, and ideally just merges away blindly.
ReplyDelete
Replies
UnknownOctober 24, 2015 at 4:55 PM
I think it quite plausible that a set of limited linking rules can guide the child to certain rudimentary syntax-concept mappings. And if this requires (for one thing) that prelinguistic infants parse the world in terms of causal events, and perceive the participants in those events as doers and done-tos, it seems we have some evidence that they do.

For instance, Leslie and Keeble 1987 showed that 2 month olds strongly distinguished spatio-temporal reversals of Michotte-type stimuli (one ball's movement is perceived to have caused the movement of another ball). By merely flipping the direction of action, these reversals also reversed who caused whom to move. Infants didn't so strongly distinguish spatially-similar reversals of stimuli that failed to support a perceptual causal relation (i.e. one ball moves, there's a substantial temporal delay, and then the other ball moves). In the latter case, the lack of increased attention to the reversals can be attributed to the fact that, since perception supported no causal relation between the objects' movements, the balls played no particular roles, and there was consequently no (interesting!) perception that the roles were reversed.

Other research (here) shows that 12 month olds inferred the existence and location of a causal agent that they had no perceptual evidence for.

Still other work (here and here) suggests that infants as young as 6 months old evaluate characters positively/negatively depending on the roles they play in helping and hindering events.

(h/t Brent Strickland's paper on how language reflects core cognition)
ReplyDelete
Replies
UnknownNovember 28, 2018 at 6:54 AM
Need help.
I have an research topic.
Comparison of theme between Sindhi and English language!
ReplyDelete
Replies

Add comment

Faculty of Language

Comments

Wednesday, September 30, 2015

What are theta roles for?

15 comments:

Contributors