1 Developmental Science 9:3 (2006), pp TARGET ARTICLE WITH COMMENTARIES AND RESPONSE Blackwell Publishing Asia The role of sensorimotor impairments in...
1 Developmental Science 5:2 (2002), pp ARTICLE WITH PEER COMMENTARIES AND RESPONSE Blackwell Publishers Ltd Learning to perceive object unity: a conne...
1 Developmental Science 3:3 2000), pp 247±286 ARTICLE WITH PEER COMMENTARIES AND RESPONSE Seeing the big picture: map use and the development o...
1 MANSFIELD AND BLACKSTONE'S COMMENTARIES JULIAN S. WATERMAN* I THE eighteenth century, a period when prose came to a higher excellence than it had ye...
1 Reports and Commentaries MOSCOW AS A DIGITAL PATTERN: ALEXEY BELIAYEV-GUINTOVT S IMPERIAL LOOPS Vlad Strukov A t the end of March 2012, Triumph gall...
1 Draft Articles on Diplomatic Protection with commentaries 2006 Text adopted by the International Law Commission at its fifty-eighth session, in 2006...
1 Draft Articles on Consular Relations, with commentaries 1961 Text adopted by the International Law Commission at its thirteenth session, in 1961, an...
1 Rwanda Humanitarian Situation Report UNICEF Rwanda/2015/Pflanz DATE OF SITREP 20 May 2015 DATE OF SITREP 20 MAY 2015 Highlights Refugees fleeing ten...
1 Learning with Target Prior Zuoguan Wang Dept. of ECSE, Rensselaer Polytechnic Inst. Troy, NY 28 Gerwin Schalk Wadsworth Center, NYS Dept. of Health ...
1 Arcle Hyperspecral Targe Deecon va Adapve Jon Sparse Represenaon and Mul-Tas Learnng wh Localy Informaon Yuxang Zhang, Ke Wu, Bo Du, *, Langpe Zhang...
Developmental Science 9:2 (2006), pp 125–157
TARGET ARTICLE WITH COMMENTARIES AND RESPONSE Blackwell Publishing Ltd
Gaze following: why (not) learn it? The emergence of gaze following
Jochen Triesch,1,2 Christof Teuscher,3 Gedeon O. Deák1 and Eric Carlson1 1. Department of Cognitive Science, University of California, San Diego, USA 2. Frankfurt Institute for Advanced Studies, Johann Wolfgang Goethe University, Germany 3. Los Alamos National Laboratory, Los Alamos, USA
For commentaries on this article see Csibra (2006), Moore (2006) and Richardson and Thomas (2006).
Abstract We propose a computational model of the emergence of gaze following skills in infant–caregiver interactions. The model is based on the idea that infants learn that monitoring their caregiver’s direction of gaze allows them to predict the locations of interesting objects or events in their environment (Moore & Corkum, 1994). Elaborating on this theory, we demonstrate that a specific Basic Set of structures and mechanisms is sufficient for gaze following to emerge. This Basic Set includes the infant’s perceptual skills and preferences, habituation and reward-driven learning, and a structured social environment featuring a caregiver who tends to look at things the infant will find interesting. We review evidence that all elements of the Basic Set are established well before the relevant gaze following skills emerge. We evaluate the model in a series of simulations and show that it can account for typical development. We also demonstrate that plausible alterations of model parameters, motivated by findings on two different developmental disorders – autism and Williams syndrome – produce delays or deficits in the emergence of gaze following. The model makes a number of testable predictions. In addition, it opens a new perspective for theorizing about cross-species differences in gaze following.
Introduction The capacity for shared attention is a cornerstone of social intelligence. It plays a crucial role in the communication between infant and caregiver (Brazelton, Koslowski & Main, 1974; Kaye, 1982; Adamson & Bakeman, 1991; Adamson, 1995; Moore & Dunham, 1995). By 9–12 months most infants can follow adults’ gaze and pointing gestures, and monitor a caregiver’s affect and use it to modulate their own response to an ambiguous stimulus. These behaviors emerge and coalesce on a predictable schedule (e.g. Butterworth & Itakura, 2000; Deàk, Flom & Pick, 2000), although specific milestones show considerable individual differences in age of attainment (Mundy & Gomes, 1998; Markus, Mundy, Morales, Delgado & Yale, 2000). Shared attention skills allow the young of our species to learn what is important in the environment, based on the patterns of attention in older, more expert individuals. In conjunction with a shared language, these skills allow children to communicate what
they perceive and think about, and to construct mental representations of what others perceive and think about. Consequently, shared attention is crucial for language and communication (Bruner, 1983; Baldwin, 1993; Tomasello, 1999). The term shared attention is typically used to denote a set of different skills comprising gaze following, pointing and requesting behaviors. While some authors use the terms joint and shared attention interchangeably to refer to the matching of one’s focus of attention with that of another person, other authors make a subtle distinction between the two. ‘Shared’ attention is sometimes reserved for the more complex form of communication, wherein two individuals attend to the same object, and each have knowledge of the other’s attention to this object (Tomasello, 1995; Emery, 2000). In this paper, we will be concerned with joint attention more broadly, which we view as an important precursor to the emergence of true shared attention. Our particular focus is on gaze following, which may be defined as looking where
A rather difficult question is what gaze following skills imply about how infants at various ages conceptualize their caregivers’ looking behavior. Although early accounts interpreted gaze following skills as indicating considerable social understanding or even a theory of mind, it has been argued that young infants may learn to follow gaze without such an understanding (Moore & Corkum, 1994; Corkum & Moore, 1995). More recently, Woodward (2003) demonstrated that infants need not have an understanding of the relation between a person who looks and the object of his or her gaze. In addition, early gaze following skills may not even require a representational strategy involving the identification of the caregiver as an intentional, perceiving individual (Leekam, Hunnisett & Moore, 1998). Certainly, such representations will emerge over time in older infants, but they might not be necessary to explain the emergence of gaze following behaviors. Gaze following in other species Humans are not the only species that exhibit gaze following. Gaze following has been demonstrated in a number of other species, including some (but not all) non-human primates (e.g. Itakura, 1996, 2004; Emery, Lorincz, Perrett, Oram & Baker, 1997; Tomasello, Call & Hare, 1997). Chimpanzees even seem to exhibit the more advanced level of gaze following that requires ignoring a distractor object along the scan path – Butterworth’s geometric stage of gaze following (see above) (Tomasello, Hare & Agnetta, 1999). In addition, Hare, Call, Agnetta and Tomasello (2000) demonstrated that chimpanzees know what conspecifics can and cannot see. There has also been some work with non-primates. Domestic dogs, for example, are capable of following the gaze of humans at about the level of 6- to 9-month-old human infants (but are not capable of shared attention) (Hare & Tomasello, 1999; Agnetta, Hare & Tomasello, 2000). In contrast, wolves don’t seem to follow the gaze of humans (Hare, Brown, Williamson & Tomasello, 2002). Why some species are able to follow gaze while other species are not is currently unclear. Behavioral research has been cataloging cross-species differences but little is known about the underlying reasons for cross-species differences. The role of learning Early attempts to explain gaze following postulated the existence of innate modules. Examples of strongly nativist theories have been articulated by Leslie and Baron-Cohen (Leslie, 1987; Baron-Cohen, 1995). Such approaches have marginalized the role of learning in the development
The emergence of gaze following
of cognitive skills. One line of critique against modular accounts is that they tend to have little predictive power, because it is typically not made explicit how the modules work internally and exactly what information is passed between them (see Deák & Triesch, in press, for detailed analysis). In principle, however, this criticism can be overcome, and recent computational and robotic modeling work has started to address this question (Scassellati, 2002). An alternative view explains the emergence of gaze following by postulating that infants gradually discover that monitoring their caregiver’s direction of gaze allows them to predict where interesting visual events will be. This idea was first articulated by Moore and Corkum (1994; Corkum & Moore, 1995). Note that while this view highlights the role of learning processes, it does not preclude an evolved propensity to follow gaze in certain situations, which depends only minimally or not at all on early social experiences. Such mechanisms may be important in jump-starting the learning process. There is substantial evidence consistent with a learning account. In particular, Corkum and Moore (1998) (C&M) demonstrated that 8-month-old infants can be trained to follow their caregiver’s gaze in a contingent reinforcement paradigm, where an interesting visual stimulus was shown if the infant followed the adult’s gaze to the stimulus location. C&M concluded that ‘learning could be involved in the acquisition of gaze following’ (p. 37). A second experiment by C&M, however, seems somewhat inconsistent with a pure learning account. Specifically, they found it more difficult to train infants to look to the location opposite of where the adult turned. This prompts C&M to claim that ‘simple learning is not sufficient as the mechanism through which joint attention cues acquire their signal value’ (p. 28). In our view, however, C&M’s second experiment is quite difficult to interpret and the results appear still consistent with a learning account.1 The importance of learning is also supported by some evidence, albeit preliminary, that gaze following skills emerge gradually through social experience. Deák et al. (2000) found that 12- and 18-month-old infants’ gaze 1
There are at least two questions about the proper interpretation of Experiment 2 in Corkum and Moore (1998). First, it is unclear to what extent the participants could already follow gaze, because the exclusion measure was not very powerful. Corkum and Moore’s interpretation rests on the assumption that the tested infants were incapable of any gaze following. Second, motion cues may have facilitated gaze shifts in the direction of the caregiver’s head turn, but Corkum and Moore’s interpretation rests on the assumption that turns in the opposite direction are equally likely a priori. This does not consider that motion cueing facilitates gaze shifts in the same direction, which is supported by current evidence (e.g. Farroni et al., 2000).
following diminished less across trials if targets were novel and distinctive, than if targets were repetitive and identical. This suggests that even in a single interaction with as few as 12 trials, infants adjust their expectations about the validity of adults’ social cues for predicting visual reward. Also, Deák et al. (Deák, Wakabayashi, Sepeta & Triesch, 2004) reported preliminary observational data showing that gaze and gesture following skills emerge somewhat gradually between 5 and 10 months of age, which is consistent with an ongoing learning process. In sum, then, there is intriguing evidence to suggest that learning models might explain how gaze following and other joint attention skills emerge in the first 18 months. However, existing models are too vague to specify the kinds of data that would help us sharpen a powerful, predictive account of how these skills emerge. The need for computational models Our ultimate goal is to explain how and why gaze following (in its different forms) emerges at a level that reveals the underlying mechanisms of change in the brain and their relation to changes in overt social behavior. A theory of the emergence of gaze following should account for the experimental findings obtained in behavioral experiments, be consistent with known neuroscience data, and make specific predictions that can be used to falsify it. It should offer plausible explanations for differences in populations with developmental disorders and in other species. All else being equal, it should be as simple and parsimonious as possible. In this paper we propose an account of the emergence of gaze following and evaluate its plausibility through computational modeling. Like many others, we believe that computational models can be a great aid in theorizing about developmental phenomena. The benefits of such an approach have been adequately discussed in several places (e.g. Elman, Bates, Johnson, Karmiloff-Smith, Parisi & Plunkett, 1996; O’Reilly & Munakata, 2002). For instance, computational models can be very helpful in bridging the explanatory gap between biological mechanisms and observed behaviors. Importantly, computational approaches can be useful in analyzing the causal structure of developmental processes, that is, which changes may be necessary or sufficient for developmental events like the emergence of a new cognitive skill. These questions cannot easily be studied experimentally because (1) changes to individual neural processes are not readily observable or manipulable, and (2) there are typically many processes changing at the same time, making it very difficult to answer questions about cause and effect relations. Computational modeling may be particularly helpful in studying such relations because
Jochen Triesch et al.
one can easily monitor all changes in the model, and systematically prohibit or promote certain changes in order to study how this alters the developmental trajectory. The specific approach described in the following is comparable to other modeling work in the area of cognitive development. To some extent our approach is inspired by connectionist models (Elman et al., 1996) and dynamical systems approaches to development (Thelen & Smith, 1994). We share with connectionist modelers the desire to explain behavior in terms of underlying neural structures. In contrast to classical connectionist models of development, however, our approach emphasizes aspects of the embodied nature of cognitive development (Clark, 1997; Wilson, 2002). In particular, we consider the role of the learner’s situated real-time interaction with its environment. A good understanding and careful modeling of this interaction is a central goal of our approach (see Schlesinger & Parisi, 2001, for another example of this approach). These issues have also been addressed to some extent within the dynamic systems approach (Thelen, Schöner, Scheier & Smith, 2000), but our approach emphasizes the role of biologically plausible reward-driven learning processes. It is surprising to us that reward-driven learning mechanisms such as Temporal Difference learning (see below) are rarely being used in computational models of infant development. For example, connectionist style models typically utilize supervised learning (often using the backpropagation learning mechanism) which is not applicable to many developmental learning contexts. Similarly, in dynamical systems approaches, goal-directed learning is frequently not addressed either. Instead, the transition from one (younger and less capable) developmental state to the next (older and more capable) state is often modeled by changing a control parameter of the dynamical system in order to account for different performance levels. What is not addressed is what forces may drive these changing control parameters in developing infants. We feel that computational models that aim to carefully capture the affect-driven learning during situated, real-time interactions with the environment hold much promise for advancing our understanding of early cognitive development. The account that follows is an attempt to evaluate the promise of such models in the context of gaze following.
this idea, we propose that gaze following (and other attention-sharing skills) emerge through the interplay of a Basic Set of structures and mechanisms. This set includes perceptual skills and preferences, reward-driven learning, habituation and a structured social environment (Fasel, Deák, Triesch & Movellan, 2002). In the following, we will briefly discuss each component of this Basic Set, and review evidence that each of these is functioning in normally developing infants before the time that the first solid gaze following skills emerge. This is crucial for establishing the viability of this set as a causal precursor for the emergence of gaze following skills. We will then describe how these components may interact to allow for the learning of gaze following. Perceptual skills and preferences Several perceptual skills and preferences that are in place by 3 months of age or earlier might be important for shared attention skills to develop. Even the youngest infants prefer human stimuli, especially their caregivers’ faces and voices (Brazelton et al., 1974; DeCasper & Fifer, 1980; Pascalis, de Schonen, Morton, Deruelle & Fabre-Grenet, 1995). One interpretation is that social stimuli have a higher salience than competing inanimate stimuli (Bates, 1979). Infants also generally enjoy social interaction. Around 2–3 months, infants begin responding in a more consistent and focused way to caregivers. At the same time most infants produce their first social smiles, and parents report greater engagement and ‘presence’ during interactions (Cole & Cole, 1996). Infants as young as 3 months prefer looking at the eyes of an approaching person, rather than the mouth (Haith, Bergman & Moore, 1979). Attention-shifting skills (critical for following gaze or pointing cues) begin to mature around 3–4 months (e.g. Butcher, Kalverboer & Geuze, 2000; Farroni, Johnson, Brockbank & Simion, 2000; Johnson, Posner & Rothbart, 1994), but other, more complex perceptual skills will continue to undergo significant changes. A skill that is highly relevant to the development of gaze following and other attention-sharing skills is face processing, or more specifically, head pose and eye direction perception (i.e. discriminating the rotational angles of the face, and estimating the line of gaze). One study found that 1month-olds prefer a photograph of their caregiver’s face in frontal to profile poses, suggesting that even young infants can discriminate extreme differences in caregivers’ head poses (Sai & Bushnell, 1998). But this finding has not been extended, so we do not know how well infants of different ages can discriminate different head poses. It appears that 8–10-month-olds use head pose, not eye direction, to estimate adults’ gaze direction
The emergence of gaze following
(Moore et al., 1997). Robust use of the eyes seems to emerge later, with significant improvement between 12 and 14 months (Caron et al., 2002). Thus, by this age, face processing skills must be sufficiently well developed to allow for robust gaze following even in somewhat ambiguous circumstances. However, for gaze following to be successful, the ability to accurately encode the caregiver’s head pose needs to be mapped to the proper motor behaviors, which requires additional learning processes. Reward-driven learning Reward-driven learning, we claim, is important for learning attention-sharing. Reward-driven or reinforcement learning occurs in 2- and 3-month-olds (Kaye, 1982) and may even be present at birth (Floccia, 1997).2 Two-month-olds can, for example, learn within minutes to predict the locations of the next interesting event in a simple repeated sequence (Haith, Hazan & Goodman, 1988). We propose that the principal learning mechanisms used for acquiring attention-sharing behaviors are neurally plausible processes of Reinforcement Learning called Temporal Difference or TD learning (Sutton, 1988; Sutton & Barto, 1998). These processes are not merely Skinnerean, nor are they anti-mentalistic, but they have the goal of formalizing the relation between an agent’s affect-laden experienced outcomes (positive or negative) and the agent’s means of adapting behavior to increase positive outcomes and decrease negative ones. TD learning in particular has been tied to specific neuromodulatory systems (Schultz, Dayan & Montague, 1997), and recent models are neurally plausible (Montague, Hyman & Cohen, 2004). In particular, the firing of dopaminergic neurons in parts of the basal ganglia has been associated with the temporal difference signal from which TD learning methods derive their name. Although TD learning has previously played almost no role in developmental models, it holds promise for understanding the development of behaviors in all contexts that involve affectively valued outcomes. Reward-driven learning, however, may not be the only learning mechanism that is important for the emergence of gaze following. Habituation Habituation also plays an important role in our theory as a fundamental learning process. Habituation processes 2
Sometimes the term contingency learning is used in the developmental literature. We use reinforcement learning because it is more common in neuroscience, cognitive science and machine learning, and because it makes explicit an assumption that is implicit in the idea of contingency learning – specifically, that the learner is motivated or affectively driven to predict, and experience, certain outcomes.
have complex dynamics that are in themselves challenging to understand and to model (Sirois & Mareschal, 2002). In most previous modeling attempts, habituation was related directly to the behavioral responses of the organism, e.g. the strength or probability of a motor response to a certain stimulus. Our view is somewhat different in that we relate habituation processes to changes in the internal evaluation or reward of a stimulus. Together, habituation and reward-driven learning (see above) will produce certain behavioral sequences and modify them adaptively. For example, when an infant looks at a caregiver’s face, or at a toy held by the caregiver, habituation will systematically occur, which we interpret as a systematically declining reward value over time for looking at this object. Dishabituation, conversely, amounts to a recovery of this reward. Because TD learning predicts future rewards, habituation will facilitate attention shifts away from the current target so that a new, more rewarding target can be fixated. Dishabituation leads to a relative recovery of the reward value of an object when a different stimulus is attended. These processes, in conjunction with reward-driven learning of behavioral policies, will produce cycles of attention-shifting between interesting social objects in the visual environment, such as the caregiver, and various other objects with properties that infants find interesting. The utility of these cycles for learning to follow gaze will depend on predictable behavior patterns provided by the caregiver. Structured social environment We posit that the most relevant situations for learning shared attention skills include interactions such as faceto-face play, feeding, diaper changing and bathing, which make up a high proportion of infants’ waking time. What is important about such interactions, we hypothesize, is their predictable event-contingency structure. This structure is learnable, by means of reinforcement learning and habituation, and infants can learn to maximize their positive engagement in such interactions. Studies on the statistical structure of infant–parent interactions generally show that each participant synchronizes his or her actions with the other, and selects actions based partly on the other’s recent actions, emotions and messages (Watson & Ramey, 1985). We hypothesize that infants soon start to predict where interesting objects and events will be, based on their caregivers’ gaze patterns. The caregiver’s gaze is predictive of interesting sights because caregivers will tend to look at other people or at objects they are manipulating (Land, Mennie & Rusted, 1999), and infants are interested in such stimuli.
its own visual field. Ultimately, the test of this theory will be whether it can be extended to explain many of the interesting subtleties such as the ordered sequence of the development of gaze following skills, or the value of different caregiver cues (eyes, face, body posture) for joint attention, or the later development of theoryof-mind-like representations. We are optimistic that our framework provides a good starting point for this endeavor, and that we will eventually be able to account for a large range of empirical phenomena, including ‘higher’ shared attention skills. We will return to this point in the discussion.
Computational model We now present a simple computational model to test whether the mechanisms of the Basic Set can lead to the emergence of gaze following and to explore how alterations of model parameters can simulate some developmental disorders that are characterized by delays in the emergence of gaze following.3 The goal of this inquiry is to determine under what conditions the Basic Set is sufficient for the emergence of gaze following. We do not suggest, however, that all of the Basic Set elements are strictly necessary – some might be replaceable by alternative mechanisms. Also, we do not claim that this set is sufficient for a comprehensive account of all human attention-sharing behaviors. It merely attempts to explain the basic gaze following behaviors that progressively emerge during the first year in typically developing infants, and, hopefully, disruptions of this progression that occur in certain developmental disorders. Future work will establish whether the model can also explain, for example, point-following behaviors. The model was implemented in Matlab. The source code is available at http://mesa.ucsd.edu Environment and caregiver model The simulation comprises a model of the infant (referred to simply as ‘infant’, merely for expositional fluency), a model of the caregiver (the ‘caregiver’) and a model of the environment in which they interact. An overview of the model is given in Figure 1. As a simplification in the model, we assume that infant and caregiver are facing each other and remain in the same position. The space surrounding infant and caregiver is discretized into N distinct regions. The caregiver can look at any of these regions or at the infant. The infant can look at any of 3
An initial account of the model was given in Carlson and Triesch (2003).
The emergence of gaze following
Figure 1 Overview of the model showing infant, caregiver and interesting object. Corresponding model parameters are given in brackets. Note that while we draw the spatial locations as arranged in a hexagonal fashion, the model does not assume or use any specific topological relations between these locations.
giver’s head pose and, as a consequence, the estimated head pose will not be very predictive of rewarding sights. Thus, a not-so-predictive caregiver whose head pose can be estimated accurately and a highly predictive caregiver whose head pose we can only infer correctly some fraction of the time will produce the same net effect, and we can model both situations with the same parameter pvalid. Note that this environment and caregiver model is extremely simple. In particular, the caregiver is not responding to the infant in any way. This is obviously a gross simplification of the complex, reciprocal dynamics of infant–caregiver interactions (e.g. Kaye, 1982), but as we will demonstrate below, even this kind of social environment can be sufficient for gaze following to emerge. More complex, interactive caregiver models have also recently been investigated, and these show that the caregiver’s behavior plays an important role (Teuscher & Triesch, 2004). In particular, the caregiver’s behavior has to be properly matched to the parameters of the infant model for optimal learning speed, although gaze following will emerge under a wide range of caregiver behaviors. Infant model Our infant model is essentially that of a pleasure-driven agent. There are many ways of formalizing this idea but a particularly appropriate formal framework is reinforcement learning (Sutton & Barto, 1998). Besides being the basis for modern theories of learning under rewards and punishments, reinforcement learning is also an important subfield of machine learning with some impressive application successes (Sutton & Barto, 1998). In particular, our model uses temporal difference learning (TD learning) algorithms, which have been proposed as models for certain basal ganglia functions (Schultz et al., 1997). A detailed description of the equations of the model is given in the Appendix. We conceive the infant as a reinforcement learning system that learns to make two kinds of decisions. First, at any given time it decides whether to shift gaze or keep fixating the same location. Second, it decides where to look next, once the decision to shift the direction of gaze has been made. The information available to the infant includes the identity of its current object of fixation, its associated reward value, and the length of time the infant has been fixating this object. If and only if the fixated object is the caregiver, the infant will know the caregiver’s current head pose. Looking, reward and habituation The infant model receives rewards for looking at interesting things. The amount of reward received depends
If the when-agent makes the decision to shift gaze, the where-agent determines the target of the gaze shift. The state space of this agent has only a single dimension: the caregiver’s head pose. Importantly, unless the infant is looking at the caregiver, the caregiver’s head pose will be unknown to the infant. Concretely, this agent distinguishes N + 2 different states: N for the N different head poses observed when the caregiver looks at the N regions of space, plus one for the caregiver’s head pose when the caregiver is facing the infant, plus one state to represent that the head pose of the caregiver is unknown to the infant. The where-agent learns to map these states onto N + 1 different actions: one action for looking at each of the N regions of space and one action for looking at the caregiver. Note that we assume a one-to-one correspondence between a caregiver head pose and the region of space the caregiver looks at. In reality, this mapping is ambiguous and the ambiguity can produce characteristic errors in gaze following (Butterworth & Jarrett, 1991). Modeling this ambiguity and how the infant learns to resolve it is the subject of a separate paper (Lau & Triesch, 2004). Learning in both agents occurs through the SARSA algorithm (see Appendix), which was chosen because of its simplicity. Both agents balance exploration vs. exploitation by selecting actions with a softmax action selection mechanism (see Appendix). It should be noted that separating the infant model into two separate learning agents is not strictly necessary. We would expect similar results for a simpler model that uses a single reinforcement learning agent to model the infant, whose state space was the product space of the state spaces of the when and where agents, and whose possible actions are to shift gaze to any of the N + 1 locations. However, the learning time would be expected to increase because of the higher dimensionality of the resulting state space.
Experiments Normal emergence of gaze following In this section we describe a first analysis of the model and the effects of some model parameters on its learning behavior. For easy reference, all parameters, their default values, and their allowed ranges are listed in Table 1. In the following, default parameter values are used unless otherwise indicated. The effect of changing several parameters is discussed below. Generally speaking, the model is robust to changes in the parameters over wide ranges. The parameters Tmin, pshift and pvalid were set ad hoc but could eventually be set in accordance with data from an observational study of naturalistic
The emergence of gaze following
Table 1 Overview of model parameters, their allowed ranges and default values Symbol
number of spatial regions duration of one simulation step learning rate habituation rate discount factor for future rewards temperature (randomness of action selection) reward for looking at frontal view of caregiver reward for looking at profile view of caregiver reward for looking at target reward for looking at other region minimum target stationary time (steps) probability of target shift per time step predictiveness of caregiver gaze
infant–caregiver interactions that is currently under way (Deák et al., 2004). To quantify the emergence of gaze following in the model and its dependence on model parameters we use the following approach. At specific points during the learning process we temporarily ‘freeze’ the model and evaluate its behavior for 1000 time steps (which corresponds to slightly more than 4 minutes of simulated interaction), after which the learning process resumes. The model behavior at these stages of the learning process is analyzed by observing the infant model interacting with the environment and computing two statistics. The caregiver index CGI is defined as the frequency of the infant’s gaze shifts towards the caregiver: CGI =
# gaze shifts to caregiver . # gaze shifts
The gaze following index GFI is the frequency of gaze shifts that lead from the location of the caregiver to where the caregiver is looking: GFI =
Figure 2 Emergence of gaze following in simple environment with just one interesting target present at any time. The solid curve plots the caregiver index (CGI), the solid curve with circles plots the gaze following index (GFI) and the dotted curve plots average reward per time step, as functions of the number of learning iterations. Error bars indicate standard deviations across 15 simulations.
# gaze shifts from caregiver to correct location . # gaze shifts (2)
pose. It learns to correctly map the caregiver’s head pose to gaze shifts to the locations that the caregiver looks at. The increasing average reward the model obtains per time step during this phase confirms that gaze following is in fact beneficial for the model under these parameters. Note that for a model without habituating rewards it would be optimal to continually stare at the caregiver. A microscopic view of the behavior of the infant model is shown in Figure 3 (top). It shows the fixation behavior of the infant during various stages of the learning process. Fixations on the caregiver are indicated by white pixels, target fixations by black pixels, and fixations on other regions of space by grey pixels. The quick
Jochen Triesch et al.
Figure 4 Gaze following in the presence of multiple targets for various values of pvaild. The gaze following performance averaged over 100 000 steps (y-axis) is plotted as a function of the number of targets that are present simultaneously (xaxis). Error bars indicate standard error across 15 simulations. Gaze following is diminished if significant ambiguities due to multiple targets exist. Also, a reduced predictiveness of the caregiver pvaild has a negative impact on gaze following performance. The dashed horizontal line marks the ‘chance level’ of gaze following expected for an infant who first looks to the caregiver and then shifts gaze randomly to any of the N locations.
Figure 3 Microscopic analysis of model behavior for normally developing (top), autism-like (center) and Williamslike (bottom) model. Each row of pixels shows the target of the infant’s gaze as a function of time (for 50 steps). The gaze target is color coded, with white corresponding to the caregiver, black corresponding to the target, and grey corresponding to other regions of space. In particular, an instance of gaze following is represented by a black pixel lying to the right of a white pixel. Different rows show the behavior at different times during the learning process (every 4000 steps).
development of a preference for looking at the caregiver is visible as the increase in the amount of white pixels (caregiver fixations) during the first few rows. The subsequent increase in target fixations (black pixels) is the effect of the emergence of gaze following. Gaze following episodes are shown by black pixels to the right of white pixels.4 The increase in the number of such episodes during learning directly reflects the increasing GFI (compare Figure 2). Figure 4 shows that gaze following will still be learned in more complex environments, where multiple interesting events occur simultaneously. In this case, the learning is somewhat slower because the infant may temporarily learn incorrect associations between a particular caregiver head pose and a gaze shift to a location not looked at by the caregiver but that nevertheless contains an interesting event. 4
Note that there can be instances of black pixels to the right of white pixels that do not correspond to gaze following. This occurs when the infant looks away from the caregiver to a location not looked at by the caregiver that happens by chance to hold the interesting object. These instances are comparatively rare, however. More precisely, the probability of the infant finding the target this way is only (1 − pvalid)/(N − 1), where N is the number of locations in the environment.
Figure 5 Top: Effect of learning rate on emergence of gaze following. A higher learning rate α leads to accelerated initial learning as measured by the gaze following index (GFI). However, a high learning rate can lead to problems in the long run. The infant may never acquire a high level of gaze following. Error bars indicate standard errors across 15 runs. Bottom: Effect of habituation rate on learning of gaze following. Faster habituation leads to accelerated learning as measured by the gaze following index (GFI). Even without any habituation gaze following is still learned – albeit very slowly. Error bars indicate standard error across 15 simulations.
exploratory actions. The infant will spend most time looking at the caregiver, which is the optimal thing to do. Due to the random softmax action selection mechanism, however, which sometimes explores the consequences of seemingly suboptimal actions, the infant will look away from the caregiver, which creates an opportunity to discover the benefit of following gaze. We conclude that although habituation is not strictly necessary if there are
developmental disorders (Baron-Cohen, 1995), our account prompts us to look for potential differences in the components of the Basic Set that may lead to different developmental trajectories. The goal here is not to provide a comprehensive model of these developmental disorders, but to show how specific aspects of these disorders may contribute to deficits in gaze following. Changes in the reward structure In the last section we have already seen how differences in learning rate or habituation rate can slow down or even prevent the emergence of gaze following. For autism spectrum disorders and Williams syndrome, however, a particularly interesting candidate is the reward structure of the model, because in both kinds of disorders the affective value of faces may be altered. An intriguing attribute of autism is disinterest in faces. In general, the interest in or appeal of social stimuli is diminished in autism (Adrien, Lenoir, Martineau, Perrot, Hameury, Larmande & Sauvage, 1993; Chawarska, Klin & Volkmar, 2003; Maestro, Muratori, Cavallaro, Pei, Stern, Golse & Palacio-Espasa, 2002; Tantam, Holmes & Cordess, 1993; Klin, Jones, Schultz & Volkmar, 2003; Dawson, Meltzoff, Osterling, Rinaldi & Brown, 1998). For some (but not all) individuals with autism, direct eye contact even seems to be aversive, a phenomenon known as gaze avoidance (Hutt & Ounsted, 1966; Richer & Coss, 1976; Langdell, 1978). It has been proposed many times that a disruption in face processing may be an underlying cause for social deficits in autism (e.g. Trepagnier, 1996; Howard, Cowell, Boucher, Broks, Mayes, Farrant & Roberts, 2000; Klin et al., 2003). Why faces are in some ways less salient or rewarding to individuals with autism is not clear. It may be that faces are too unpredictable for autistics, an idea consistent with the hypothesis that autistics prefer highly predictable stimuli (Gergely & Watson, 1999); it may also be that anatomical differences in the amygdala (which participates in processing facial affect displays) play a role (e.g. Howard et al., 2000; Baumann & Kemper, 2005). Regardless of the cause, this symptom, and its long-term effect on social learning, bears more precise (ideally quantitative) specification. In contrast to the disinterest in faces in autism, children with Williams syndrome show a high preference for looking at faces over looking at other objects (Bertrand et al., 1993; Bellugi, Lichtenberger, Jones, Lai & St George, 2000; Mervis et al., 2003). In addition, altered as well as delayed emergence of face processing skills has been reported (Karmiloff-Smith, Thomas, Annaz, Humphreys, Ewing, Brace, Van Duuren, Pike, Grice & Campbell, 2004).
The emergence of gaze following
Figure 6 Learning performance as a function of caregiver and target reward. For the caregiver reward we use Rfrontal = Rprofile ≡ Rcaregiver. The z-axis corresponds to the GFI after 105 time steps of learning, averaged over 10 repetitions of the experiment.
chance without utilizing the caregiver’s gaze). As a consequence, the learning process is slowed down or even prevented, and the GFI stays close to zero. The microscopic behavior of such a model is shown in Figure 3 (middle). Thus, a reduced reward for looking at the caregiver’s face or aversiveness of the caregiver is sufficient to explain delays or complete failure in the emergence of gaze following. It is interesting to note that an analysis of the model shows that even for negative caregiver rewards, the model will nevertheless slowly learn how to follow gaze, even if it does not exhibit the behavior on a regular basis. By analyzing the infant’s action selection probabilities we found that the probability for following the caregiver’s gaze once the infant is looking at the caregiver slowly but clearly rises above those for other actions. However, the model rarely executes a complete gaze following sequence because it is unrewarding to do so, due to first having to look at the aversive caregiver. This behavior of the model might explain a puzzling finding by Leekam, Baron-Cohen, Perret, Milders and Brown (1997) that autistic children can follow gaze if explicitly told to do so, though they may rarely do it spontaneously. This finding is very problematic for previous accounts of the emergence of gaze following. We know of no theory that offers a satisfactory explanation for it. Subsequent studies by Leekam and colleagues (Leekam et al., 1998; Leekam, López & Moore, 2000) suggest that autistic children can be trained to follow gaze through contingent presentation of rewarding visual stimuli (Whalen & Schreibman, 2003), but that a lack of motivation to engage with the experimenter may impede learning. These findings are also consistent with our account. The association from caregiver head pose to regions in space is learned (although slowly) due to the constant low level of random exploration, but gaze following is simply not rewarding enough to be produced on a regular basis. If, however, an additional incentive for following gaze is present (e.g. being asked to look where another person is looking, or being trained via operant conditioning), the behavior can be elicited. Also, it is in line with the finding that gaze following in response to static pictures may be ‘easier’, if we make the additional assumption that static pictures of faces are not as aversive as dynamic displays (Klin et al., 2003). It should be noted that an infant who looks less at faces due to a diminished reward for faces can be expected to develop deficits in face processing skills such as fine discrimination of head poses or estimation of the direction of gaze. This will likely corroborate delays in the emergence of gaze following. The model could capture this by making the parameter pvalid a function of the total amount of time the infant has been looking at the caregiver.
Figure 7 Learning performance for infant models with attention shifting deficits of varying degree. Top: for normal, positive caregiver reward. Bottom: for zero caregiver reward. Note the different scales on the axes. Error bars indicate standard error across 15 simulations.
or poor attention-shifting, or both, can explain gaze following deficits in autism within the proposed model. Regarding Williams syndrome, a noteworthy recent report on the perception of faces in adults with Williams syndrome finds less accuracy in determining the direction of gaze, and significantly longer response latencies during face perception (Mobbs, Garrett, Menon, Rose, Bellugi & Reiss, 2004). Given our results above, we can conclude that both of these symptoms, if present in infants, would corroborate problems in the emergence of gaze following. Less accuracy in determining the direction of gaze will lower the predictiveness of the caregiver (smaller pvalid), while longer response latencies can be thought of as increasing Tlat. In a similar vein, recently observed inaccuracies of saccade targeting and a higher
model shows that if the reward values associated with the objects/events that caregivers tend to look at are not higher than those for random locations, gaze following will not emerge. By the same token, infants whose caregivers produce few predictive gaze cues (e.g. due to visual deficits) should also learn gaze following more slowly. Infants who find faces too attractive should have deficits in gaze following. Using a caregiver reward much higher than the target reward leads to deficits in gaze following in the model. Infants who find faces uninteresting or aversive should have deficits in gaze following. Using small positive or negative rewards for looking at the caregiver leads to gradual deficits in the emergence of gaze following. This problem may be corroborated by a poor development of face processing skills caused by aversiveness (or even neutrality) of faces. Infants with deficits in attention-shifting should exhibit delays in learning gaze following. The model shows that slow attention-shifting (Tlat > 0) leads to a sluggish emergence of gaze following behavior. Amount of caregiver contact should influence emergence of gaze following. An infant who experiences few face-to-face interactions with caregivers may be slower to acquire gaze following because of a shortage of relevant learning experiences. Differences in caregiver behavior can aid or hinder the emergence of gaze following. Varying the model parameters related to the caregiver behavior (pshift, Tmin) while keeping the parameters of the infant identical, leads to differences in learning speed. It is likely that ‘optimal’ caregiver behavior depends on particular infant parameters. Thus, the optimal caregiver behavior will generally be different for each infant – especially in the case of abnormally developing infants. More work is needed to understand these issues and their potential ramifications for therapeutic interventions (Teuscher & Triesch, 2004). Lesioning certain neural pathways should impair gaze following behavior. We assume that information about the caregiver’s direction of gaze is extracted from face processing areas including (but not necessarily limited to) the Fusiform Face Area (Kanwisher, McDermott & Chun, 1997). Control of gaze shifts is assumed to be mediated through areas such as the Frontal Eye Fields (Tehovnik, Sommer, Chou, Slocum & Schiller, 2000). Our temporal difference learning model assumes that pathways between these sites (direct or indirect) are modified during learning and lesioning these pathways may impair gaze following.
emphasis on learning, especially for the emergence of more advanced gaze following skills. It has been noted that infants will follow not only the line of regard of humans, but also that of non-human objects with face-like features, or objects that behave contingently to them (Johnson, Slaughter & Carey, 1998). This suggests that infants’ capacity for joint attention is a generalizable skill that is not tightly tied to specific situations with specific caregivers. Rather, it is a robust skill that extends flexibly to various social interactions. Our model readily accounts for these findings, if the additional assumption is made that such non-human objects may be able to activate some of the same head pose and gaze direction sensitive neurons in the infant’s face processing areas that are utilized for following the gaze of humans. Related work A few related models have recently been proposed in the literature. The idea of using temporal difference learning to model the acquisition of gaze following was first mentioned by Matsuda and Omori (2001). They model a learning situation as used by Corkum and Moore (1998), where an experimenter monitors the infant’s behavior and gives visual rewards to the infant when it follows the caregiver’s gaze. Their paper lacks details, however, and does not explicitly model how the caregiver’s direction of gaze becomes associated with certain gaze shifts. We consider explaining this process to be the central problem of learning gaze following. A recent model by Nagai, Hosoda, Morita and Asada (2003) has been implemented in a robot. Their model, which was developed concurrently with ours, shares a number of aspects of our model (Fasel et al., 2002; Carlson & Triesch, 2003). In Nagai et al.’s model the infant also learns to associate head poses of the caregiver with appropriate gaze shifts based on the success or failure of finding a visually appealing stimulus. To this end, a neural network is trained to map the robot’s current gaze direction and an image of the caregiver’s face onto the desired gaze shift. Their model, however, does not utilize temporal difference learning, but rather an ad hoc learning mechanism. Also, no attempts are made to explain failures of the emergence of gaze following in either developmental disorders or in other species. On the positive side, the authors do not make the simplifying assumption that caregiver head poses have a one-to-one correspondence with regions in space, which we have used here. Nagai et al. also attempt to explain the progressive development of gaze following skills as described by Butterworth and Jarrett (1991). However, a closer look at their model reveals that the most sophisticated
appears ill-advised to use deficits in gaze following to define a disorder. This is still the case in autism, where deficits in social interaction skills such as gaze following are used to define the syndrome. Our hope is that computational modeling efforts like ours will help in understanding complex developmental disorders by helping to better differentiate symptoms and narrow down their primary causes. This, in turn, will suggest promising avenues for treatment and early diagnosis. Cross-species differences A good account of the emergence of gaze following should also explain differences in the emergence of gaze following behavior, or the complete absence of it, in other species. Since a simple Basic Set of structures and mechanisms is sufficient for gaze following to emerge, any species with the Basic Set should be able to acquire gaze following to some degree. Deficits or differences in the Basic Set may limit the emergence of gaze following, as seen in our discussion of developmental disorders. Across vertebrate species some Basic Set elements such as habituation and reward-driven learning are essentially ubiquitous, suggesting that these are likely not the missing factors. This inference demands some caution, however, because the presence of, say, rewarddriven learning does not mean that just any contingencies can be learned. Nevertheless, we feel that differences in other Basic Set elements are more relevant. Regarding perceptual skills and preferences, the basic questions are how infants of other species might prefer to look at conspecifics, and how well they might distinguish different head or eye orientations. The first question can be studied with controlled preferential looking paradigms to evaluate visual preferences for looking at conspecifics (or humans) (e.g. Bard, Platzman, Lester & Suomi, 1992). Our model predicts that a (not too big) preference for looking at conspecifics’ faces is beneficial (although not strictly necessary) for gaze following to emerge. In terms of the ability to distinguish different head or eye poses of conspecifics, there is evidence that, for example, many primate species can do so to some extent (Itakura, 2004). Interestingly, eye direction may be particularly easy to discern for humans because of the white sclera (Kobayashi & Kohshima, 1997; Emery, 2000). We assume that gaze direction (orientation of the eyes) is more informative than just head pose, but it is also harder to perceptually discriminate, because the eyes are small. A first attempt to relate such differences to our model is as follows. If an animal with a weaker perceptual system can only inaccurately estimate a conspecific’s head position, then this cue will be less predictive of
pointing gestures performed by the caregiver. However, there are certain differences to consider. First, while the caregiver frequently shifts gaze, pointing gestures during naturalistic exchanges are rare by comparison (Deák et al., 2004). Second, pointing gestures are likely to be more salient for infants because of the large amount of movement involved. Third, infants may be better at discriminating pointing direction than head direction because the extended arm provides a better directional cue (Deák et al., 2000). Fourth, pointing gestures are likely to be more predictive of interesting events, because caregivers will tend to engage in this ‘effort’ only when a particularly relevant environmental stimulus is present. All but the first of these four points suggest that it might be easier for infants to learn point following. In fact, human infants by 9 months follow gaze much more reliably when it is accompanied by a point (Flom, Deák, Phill & Pick, 2003), and a quasi-naturalistic observational study shows that infants from 5 to 10 months are far more likely to follow a parent’s point than a parent’s gaze shift (Deák et al., 2004). Future work Of course, our model and the ones discussed above must be seen as only first steps towards a full computational account of the emergence of gaze following. In many respects, these models are still overly simplistic. Examples of simplifications in our model are the restriction to a small set of discrete spatial regions, the absence of peripheral vision and the stereotypic, non-interactive behavior of the caregiver model, just to name a few. Recent work has started to address some of these issues (Lau & Triesch, 2004; Teuscher & Triesch, 2004; Jasso & Triesch, 2004). Another limitation is that the model currently does not address how higher attention sharing skills may emerge. Future work needs to demonstrate that models such as the present one can be scaled up to explain the emergence of more advanced attention sharing skills. Despite these shortcomings and limitations, we think our model is a useful step in theorizing about the emergence of gaze following and shared attention in general. In some respects, the simplicity of the model is a strength, since it brings the computational essence of the underlying learning mechanisms into focus.
Appendix Model equations We will follow the notation in Sutton and Barto, 1998. Time progresses in discrete steps (t = 0, 1, 2, . . . ). At any
The emergence of gaze following
time t the when- and where-agents of the model are each in a particular state st. In the following we will only consider a single agent (when or where). Upon observing the current state st, the agent decides to take an action at and potentially receives a reward rt as a consequence. The probabilistic mapping from states to actions is the agent’s policy (denoted π), which is adapted during learning. The goal of the agent is to learn a policy that maximizes the future discounted reward Rt defined as: ∞
Rt = ∑γ k rt+k+1,
where rt+k+1 is the reward received at time t + k + 1, and 0 ≤ γ ≤ 1 is the so-called discount factor. In order to improve its policy, the agent learns a so-called state-action value function Qπ(s,a). These are estimates of the future discounted reward the agent will receive when choosing action a in state s and following the current policy π thereafter. Formally, the unknown state-action values are defined as: Q π (s, a ) = E π [R t | st = s, at = a ],
where Eπ[.] denotes the expected value with respect to the current policy π(t). We will denote the estimate of a state-action value at time t as Qt(s,a). Our agent estimates the Qt(s,a) with a temporal difference learning (TD learning) method, the SARSA algorithm (Sutton & Barto, 1998): On taking an action and receiving a reward, the temporal difference error is computed as δt = rt + γQt (st+1,at+1) − Qt(st,at),
where Qt(st,at) is the state-action value assigned to the state-action pair (st,at) at time t. The temporal difference is used to adjust the state-action value estimate with the learning step: Qt+1(st,at) = Qt(st,at) + αδt,
where α > 0 is a learning rate parameter. The agent balances exploration and exploitation using a softmax or Boltzmann action selection rule. The probability of choosing action a in state s is given by: pt (a | s) =
advantage that the amount of exploration is stabilized in the presence of changes to other parameters. In a neural implementation, the estimated Q-values can be thought of as the strength of synaptic connections between units coding for different environmental states (presynaptically) and possible actions (postsynaptically), such that increasing the estimate of a Q-value corresponds to strengthening a connection from the corresponding state to the corresponding action. In the context of gaze following these connections may be along a pathway from face processing areas such as the Fusiform Face Area to gaze control structures such as the Frontal Eye Fields.
Acknowledgements This work would not have been possible without the support of the UC Davis MIND Institute and the National Alliance for Autism Research. The work described here is part of the MESA project at UC San Diego (Modelling for the Emergence of Shared Attention; http:// mesa.ucsd.edu), a larger effort to understand the emergence of shared attention in normal and abnormal development through closely integrating observational studies and systematic experiments with computational modelling approaches. We thank all members of the MESA project for their continuing collaboration: Ian Fasel, Hector Jasso, Boris Lau, Javier Movellan, Leigh Sepeta and Yuri You. We also thank Shoji Itakura, Christine Johnson and Laura Schreibman for comments on earlier drafts.
References Adamson, L.B. (1995). Communication development during infancy. Boulder, CO: Westview. Adamson, L.B., & Bakeman, R. (1991). The development of shared attention during infancy. Annals of Child Development, 8, 1–41. Adrien, J.L., Lenoir, P., Martineau, J., Perrot, A., Hameury, L., Larmande, C., & Sauvage, D. (1993). Blind ratings of early symptoms of autism based upon family home movies. Journal of the American Academy of Child and Adolescent Psychiatry, 32, 617–626. Agnetta, B., Hare, B., & Tomasello, M. (2000). Cues to food locations that domestic dogs (canis familiaris) of different ages do and do not use. Animal Cognition, 3, 107–112. Argyle, M., & Cook, M. (1976). Gaze and mutual gaze. Cambridge: Cambridge University Press. Baldwin, D. (1993). Infants’ ability to consult the speaker for clues to word reference. Journal of Child Language, 20, 395–419. Bard, K.A., Platzman, K.A., Lester, B.M., & Suomi, S.J. (1992). Orientation to social and nonsocial stimuli in neonatal
Clark, A. (1997). Being there: Putting brain, body, and world together again. Cambridge, MA: MIT Press. Cole, M., & Cole, S. (1996). The development of children (3rd edn.). New York: Freeman. Corkum, V., & Moore, C. (1995). Development of joint visual attention in infants. In C. Moore & P.J. Dunham (Eds.), Joint attention: Its origins and role in development (pp. 61– 83). Hillsdale, NJ: Erlbaum. Corkum, V., & Moore, C. (1998). The origins of joint visual attention in infants. Developmental Psychology, 34 (1), 28– 38. Coss, R.G. (1978). Perceptual determinants of gaze aversion by the lesser mouse lemur (microcerbus murinus): the role of two facing eyes. Behaviour, 64, 248–267. Csibra, G. (2006). Blind infants in random environments: further predictions. Developmental Science, 9 (2), 148–149. Dawson, G., Meltzoff, A.N., Osterling, J., Rinaldi, J., & Brown, E. (1998). Children with autism fail to orient to naturally occurring social stimuli. Journal of Autism and Developmental Disorders, 28, 479–485. Dawson, G., Toth, K., Abbott, R., Osterling, J., Munson, J., & Estes, A. (2004). Early social attention impairments in autism: social orienting, joint attention, and attention to distress. Developmental Psychology, 40 (2), 271–283. Deák, G.O., Flom, R., & Pick, A.D. (2000). Perceptual and motivational factors affecting joint visual attention in 12and 18-month-olds. Developmental Psychology, 36, 511–523. Deák, G.O., & Triesch, J. (in press). The emergence of attentionsharing skills in human infants. In K. Fujita & S. Itakura (Eds.), Diversity of cognition. University of Kyoto Press. Deák, G.O., Wakabayashi, Y., Sepeta, L., & Triesch, J. (2004). Development of attention-sharing from 5 to 10 months of age in naturalistic interactions. Paper presented at the International Conference on Infancy Studies, Chicago, IL. DeCasper, A.J., & Fifer, W.P. (1980). Of human bonding: newborns prefer their mothers’ voices. Science, 208, 1174– 1176. D’Entremont, B., Hains, S., & Muir, D. (1997). A demonstration of gaze following in 3- to 6-month-olds. Infant Behavior and Development, 20 (4), 569–572. Elman, J.L., Bates, E.A., Johnson, M.H., Karmiloff-Smith, A., Parisi, D., & Plunkett, K. (1996). Rethinking innateness. Cambridge, MA: A Bradford Book/The MIT Press. Emery, N.J. (2000). The eyes have it: the neuroethology, function and evolution of social gaze. Neuroscience and Biobehavioral Reviews, 24, 581–604. Emery, N.J., Lorincz, E.N., Perrett, D.I., Oram, M.W., & Baker, C.I. (1997). Gaze following and joint attention in rhesus monkeys (macaca mulatta). Journal of Comparative Psychology, 111 (3), 286–293. Farroni, T., Johnson, M.H., Brockbank, M., & Simion, F. (2000). Infants’ use of gaze direction to cue attention: the importance of perceived motion. Visual Cognition, 7, 705– 718. Farroni, T., Massaccesi, S., Pividori, D., & Johnson, M.H. (2004). Gaze following in newborns. Infancy, 5 (1), 39–60. Fasel, I., Deák, G.O., Triesch, J., & Movellan, J. (2002). Combining embodied models and empirical research for
Conference on Development and Learning, San Diego, CA (pp. 229–236). The Salk Institute for Biological Studies. Jasso, H., Triesch, J., & Teuscher, C. (2005). A reinforcement learning model explains the stage-wise development of gaze following. Proceedings of the 12th Joint Symposium on Neural Computation (JSNC 2005), Los Angeles, CA, 14 May. Johnson, M.H., Posner, M.I., & Rothbart, M.K. (1994). Facilitation of saccades toward a covertly attended location in early infancy. Psychological Science, 5, 90–93. Johnson, S., Slaughter, V., & Carey, S. (1998). Whose gaze will infants follow? The elicitation of gaze following in 12-montholds. Developmental Science, 1 (2), 223–238. Kanwisher, N., McDermott, J., & Chun, M.M. (1997). The fusiform face area: a module in human extrastriate cortex specialized for face perception. The Journal of Neuroscience, 17 (11), 4302–4311. Karmiloff-Smith, A., Thomas, M., Annaz, D., Humphreys, K., Ewing, S., Brace, N., Van Duuren, M., Pike, G., Grice, S., & Campbell, R. (2004). Exploring the Williams syndrome faceprocessing debate: the importance of building developmental trajectories. Journal of Child Psychology and Psychiatry, 45 (7), 1258–1274. Kaye, K. (1982). The mental and social life of babies. Chicago, IL: University of Chicago Press. Klin, A., Jones, W., Schultz, R., & Volkmar, F. (2003). The enactive mind, or from actions to cognition: lessons from autism. Philosophical Transactions of the Royal Society London, B, 358, 345–360. Kobayashi, H., & Kohshima, S. (1997). Morphological uniqueness of human eyes and its adaptive meaning. Nature, 387, 767–768. Laing, E., Butterworth, G., Ansari, D., Gsödl, M., Longhi, E., Panagiotaki, G., Paterson, S., & Karmiloff-Smith, A. (2002). Atypical development of language and social communication in toddlers with Williams syndrome. Developmental Science, 5 (2), 233–246. Land, M.F., Mennie, N., & Rusted., J. (1999). Eye movements and the roles of vision in activities of daily living: making a cup of tea. Perception, 28, 1311–1328. Landry, R., & Bryson, S. (2004). Impaired disengagement of attention in young children with autism. Journal of Child Psychology and Psychiatry, 45 (6), 1115–1122. Langdell, T. (1978). Recognition of faces: an approach to the study of autism. Journal of Child Psychology and Psychiatry, 19 (3), 255–268. Lau, B., & Triesch, J. (2004). Learning gaze following in space: a computational model. In J. Triesch & T. Jebara (Eds.), Proceedings of the ICDL’04 – Third International Conference on Development and Learning, San Diego, CA (pp. 57–64). The Salk Institute for Biological Studies. Leekam, S., Baron-Cohen, S., Perret, D., Milders, M., & Brown, S. (1997). Eye-direction detection: a dissociation between geometric and joint attention skills in autism. British Journal of Developmental Psychology, 15, 77–95. Leekam, S., Hunnisett, E., & Moore, C. (1998). Targets and cues: gaze-following in children with autism. Journal of Child Psychology and Psychiatry, 39 (7), 951–962.
Osterling, J., & Dawson, G. (1994). Early recognition of children with autism: a study of first birthday home video tapes. Journal of Autism and Developmental Disorders, 24, 247–257. Pascalis, O., de Schonen, S., Morton, J., Deruelle, C., & Fabre-Grenent, M. (1995). Mother’s face recognition by neonates: a replication and an extension. Infant Behavior and Development, 18, 79–85. Richardson, F.M., & Thomas, M.S.C. (2006). The benefits of computational modelling for the study of developmental disorders: extending the Triesch et al. model to ADHD. Developmental Science, 9 (2), 151–155. Richer, J.M., & Coss, R.G. (1976). Gaze aversion in autistic and normal children. Acta Psychiatrica Scandinavica, 53, 193–210. Sai, F., & Bushnell, W.R. (1998). The perception of faces in different poses by 1-month-olds. British Journal of Developmental Psychology, 6, 35–41. Scaife, M., & Bruner, J.S. (1975). The capacity for joint visual attention in the infant. Nature, 253, 265–266. Scassellati, B. (2002). Theory of mind for a humanoid robot. Autonomous Robots, 12, 13–24. Schlesinger, M., & Parisi, D. (2001). The agent-based approach: a new direction for computational models of development. Developmental Review, 21, 121–146. Schultz, W., Dayan, P., & Montague, P.R. (1997). A neural substrate of prediction and reward. Science, 275, 1593–1599. Sirois, S., & Mareschal, D. (2002). Models of habituation in infancy. Trends in Cognitive Sciences, 6 (7), 293–298. Sutton, R.S. (1988). Learning to predict by the methods of temporal differences. Machine Learning, 3, 9–44. Sutton, R.S., & Barto, A.G. (1998). Reinforcement learning: An introduction. Cambridge, MA: A Bradford Book/The MIT Press. Tamis-LeMonda, C., & Bornstein, M. (1989). Habituation and maternal encouragement of attention in infancy as predictors of infant language, play, and representational competence. Child Development, 60, 738–751. Tantam, D., Holmes, D., & Cordess, C. (1993). Nonverbal expression in autism of Asperger type. Journal of Autism and Developmental Disorders, 23, 111–133. Tehovnik, E.J., Sommer, M.A., Chou, I.-H., Slocum, W.M., & Schiller, P.H. (2000). Eye fields in the frontal lobes of primates. Brain Research Reviews, 32, 413–448. Teuscher, C., & Triesch, J. (2004). To care or not to care: analyzing the caregiver in a computational gaze following framework. In J. Triesch & T. Jebara (Eds.), Proceedings of the ICDL’04 – Third International Conference on Development and Learning, San Diego, CA (pp. 9–16). The Salk Institute for Biological Studies. Thelen, E., Schöner, G., Scheier, C., & Smith, L.B. (2000). The dynamics of embodiment: a field theory of infant perseverative reaching. Behavioral and Brain Sciences, 24 (1), 1–86. Thelen, E., & Smith, L.B. (1994). A dynamics systems approach to the development of cognition and action. Cambridge, MA: A Bradford Book/The MIT Press. Tomasello, M. (1995). Joint attention as social cognition. In C. Moore & P.J. Dunham (Eds.), Joint attention: Its origins and role in development (pp. 103–130). Hillsdale, NJ: Erlbaum.
The emergence of gaze following
Tomasello, M. (1999). The cultural origins of human cognition. Cambridge, MA: Harvard University Press. Tomasello, M., Call, J., & Hare, B. (1997). Five primate species follow the visual gaze of conspecifics. Animal Behavior, 55, 1063 –1069. Tomasello, M., Hare, B., & Agnetta, B. (1999). Chimpanzees, pan troglodytes follow gaze geometrically. Animal Behavior, 58, 769 –777. Trepagnier, C. (1996). A possible origin for the social and communicative deficits of autism. Focus on Autism and Other Developmental Disabilities, 11, 170 –182. van der Geest, J.N., Lagers-van Haselen, G.C., van Hagen, J.M., Govaerts, L.C.P., de Coo, I.F.M., de Zeeuw, C.I., & Frens, M.A. (2004). Saccade dysmetria in Williams-Beuren syndrome. Neuropsychologica, 42, 569 –576.
Wainwright-Sharp, J., & Bryson, S. (1993). Visual orienting deficits in high-functioning people with autism. Journal of Autism and Developmental Disorders, 13 (1), 1–13. Watson, J.S., & Ramey, C.T. (1985). Reactions to responsecontingent stimulation in early infancy. In J. Oates (Ed.), Cognitive development in infancy (pp. 219–227). Hillsdale, NJ: Erlbaum. Whalen, C., & Schreibman, L. (2003). Joint attention training for children with autism using behavior modification procedures. Journal of Child Psychology and Psychiatry, 44 (3), 456–468. Wilson, M. (2002). Six views of embodied cognition. Psychonomic Bulletin and Review, 9 (4), 625–636. Woodward, A.L. (2003). Infants’ developing understanding of the link between looker and object. Developmental Science, 6 (3), 297–311.