1 Developmental Science 9:2 (2006), pp TARGET ARTICLE WITH COMMENTARIES AND RESPONSE Blackwell Publishing Ltd Gaze following: why (not) learn it? The ...
1 Developmental Science 3:3 2000), pp 247±286 ARTICLE WITH PEER COMMENTARIES AND RESPONSE Seeing the big picture: map use and the development o...
1 A Connectionist account of Spanish determiner production Andrew Nix, Neil Davey, David Messer, Pamela Smith University of Hertfordshire Hatfield, UK...
1 BRAIN-STRUCTURED CONNECTIONIST NETWORKS THAT PERCEIVE AND LEARN Vasant Honavar and Leonard Uhr Computer Sciences Department University of Wisconsin-...
1 LEARNING TO SEE ANALOGIES: A CONNECTIONIST EXPLORATION Douglas S. Blank 6XEPLWWHGWRWKHIDFXOW\RIWKH*UDGXDWH6FKRRO LQSDUWLDOIXOILOOPHQWRIWKHUHTXLUHPHQ...
1 Proceedings of Informing Science & IT Education Conference (InSITE) 2009 Introducing Peer Review and Assessment within a Project Based Learning ...
1 Learning Words in Time: Towards a Modular Connectionist Account of the Acquisition of Receptive Morphology Michael Gasser Computer Science and Lingu...
1 Peer-to-Peer Learning with Open-Ended Writable Web Jaakko Kurhila Dept. of Computer Science PO Box 26 FIN University of Helsinki, Finland ABSTRACT!&...
1 Developmental Science 9:3 (2006), pp TARGET ARTICLE WITH COMMENTARIES AND RESPONSE Blackwell Publishing Asia The role of sensorimotor impairments in...
1 A Connectionist Learning Control Architecture for Navigation Jonathan R. Bachrach Department of Computer and Information Science University of Massa...
Developmental Science 5:2 (2002), pp 151–185
ARTICLE WITH PEER COMMENTARIES AND RESPONSE Blackwell Publishers Ltd
Learning to perceive object unity: a connectionist account Denis Mareschal1 and Scott P. Johnson2 1. Centre for Brain and Cognitive Development, Birkbeck College, London, UK 2. Department of Psychology, Cornell University, USA
Abstract To explore questions of how human infants begin to perceive partly occluded objects, we devised two connectionist models of perceptual development. The models were endowed with an existing ability to detect several kinds of visual information that have been found important in infants’ and adults’ perception of object unity (motion, co-motion, common motion, relatability, parallelism, texture and T-junctions). They were then presented with stimuli consisting of either one or two objects and an occluding screen. The models’ task was to determine whether the object or objects were joined when such a percept was ambiguous, after specified amounts of training with events in which a subset of possible visual information was provided. The model that was trained in an enriched environment achieved superior levels of performance and was able to generalize veridical percepts to a wide range of novel stimuli. Implications for perceptual development in humans, current theories of development and origins of knowledge are discussed.
Introduction We inhabit a visual world that is filled with objects. Many of the objects we see are partly occluded by other, nearer surfaces, and it is routine for objects to go in and out of sight. Our impression of this visual array, nevertheless, is not one of fleeting or partial images (consistent with what is projected onto the retina), but rather an environment composed of solid, continuous, permanent entities. The visual system, therefore, is adept at imparting structure to an incompletely specified visual array. How does this way of experiencing the world arise? Does the young infant possess similar percepts to adults, in that he or she is born with impressions of segregated, coherent objects at various distances? Or does the infant’s visual world consist of a series of disjoint, unrelated shapes that do not cohere into a sensible array until some period of development? These questions have long interested philosophers and psychologists. James (1890) described the neonate’s perceptual experience as fundamentally chaotic: ‘The baby, assailed by eyes, ears, nose, skin, and entrails at once, feels it all as one great blooming, buzzing confusion’ (vol. 1, p. 488). James went on to suggest that ‘Infants
must go through a long education of eye and ear before they can perceive the realities which adults perceive. Every perception is an acquired perception’ (vol. 2, p. 78; emphasis in original). This position was echoed by Piaget (1952, 1954), who proposed that at birth, the infant’s visual world consists of a patchwork or ‘tableaux’ of moving colors and shapes, as opposed to segregated, coherent objects. Perceptual organization was thought to emerge only gradually over the first two postnatal years, via direct manual experience with objects and coordination of visual, auditory and tactile information. More recent work on infants’ object perception has called into question these descriptions of young infants’ capabilities and experiences. For example, Kellman and Spelke (1983) investigated the conditions under which 4month-old infants perceive the unity of two surfaces (e.g. two rod parts) that extend from behind a nearer, occluding box (Figure 1a). Kellman and Spelke (1983; Kellman, Spelke & Short, 1987) found that after habituation to a display in which the two surfaces underwent common motion behind a stationary occluder (reported by adults to consist of a single object behind an occluder), the infants looked longer at two disjoint rod parts (a ‘broken’ rod; see Figure 1b) than at a single,
Figure 1 Displays used by Kellman and Spelke (1983) to explore young infants’ perception of object unity. A: A partly occluded rod moves relative to a stationary occluder. B: Broken rod. C: Complete rod. After habituation to A, infants often show a preference for B relative to C, indicating perception of the rod’s unity in A.
A second conclusion drawn from the early work on object unity concerned the possibility that for young infants, some percepts and concepts are qualitatively similar to those of adults: ‘Humans may begin life with the notion that the environment is composed of things that are coherent, that move as units independently of one another, and that tend to persist, maintaining their coherence and boundaries as they move’ (Kellman & Spelke, 1983, p. 521). Spelke (1990, 1994) has since proposed that the earliest kinds of object perception can be characterized as reasoning in accord with fundamental physical principles. One of these is the principle of contact: visible surfaces that undergo a common, rigid motion tend to be connected (Spelke & Van de Walle, 1993). More recent research has explored further both the possibility that young infants utilize only a limited range of available visual information in object perception tasks, and the notion that core principles guide early object perception. In the next two sections of this paper, some of this evidence will be presented. Notably, there is not yet an adequate account of the development of object perception that can encompass the full range of evidence, although progress has been made toward such a theory. We describe in a subsequent section computational models that were designed to investigate whether and how the perception of object unity in an ambiguous stimulus (such as depicted in Figure 1a) might be learned. Before describing the models, we review evidence concerning the roles of various sources of information in young infants’ perception of object unity, and the ontogenetic origins of this skill. What visual cues are important in young infants’ object perception? Johnson and colleagues (Johnson & Aslin, 1996; Johnson & Náñez, 1995) probed in detail the kinds of visual information 4-month-olds use in object unity tasks. The first question was whether depth cues (binocular disparity, motion parallax, and accommodation and convergence, all potentially available in the Kellman & Spelke, 1983 rod-and-box displays) were necessary for perception of object unity in this age group (Johnson & Náñez, 1995). This was investigated with a two-dimensional, computergenerated display consisting of two rod parts, undergoing common motion, above and below an occluding box (Figure 2a). The objects were presented against a textured background consisting of a regular array of dots, in like manner to the Kellman and Spelke displays. After the infants were habituated to the rod-and-box display, they preferred a broken rod relative to a complete rod, replicating the Kellman and Spelke results. This implies that remaining information in the display was sufficient
A connectionist model of perceptual development
Figure 2 Displays employed to investigate the role of texture and edge orientation in young infants’ perception of object unity. A: Rod parts are aligned across the occluder, against a textured background. As the rod moves, it covers and uncovers progressively the texture, providing depth information. B: Rod parts are aligned across the occluder, against a matte black background, with no texture information for depth. C: Rod parts are not aligned, but are relatable (if extended, they would meet behind the occluder). D: Rod parts are neither aligned nor relatable. Four-month-old infants perceive unity only in A, underscoring the importance of edge alignment and texture to veridical object percepts. (Adapted from Johnson & Aslin, 1996.)
around the surface (and not through it), the infants appeared to perceive it as opaque (although adults judged this latter stimulus to contain a translucent object; Johnson & Aslin, 2000). In two-dimensional displays, therefore, background texture may be necessary for segregation of visible surfaces into their constituent depth planes by this age group. Johnson and Aslin (1996) next explored the role of orientation of the rod parts’ edges, asking if misaligned edges may also impact perception of the rod’s unity. This was accomplished in two ways. First, a display was constructed with rod edges that were not aligned, but were relatable – that is, the edges would meet at an angle greater than 90° if extended behind the occluder (Figure 2c; see Kellman & Shipley, 1991 for a formal definition of relatability). Second, a display was devised in which the rod edges were neither aligned nor relatable (Figure 2d). In both conditions, posthabituation test displays (broken and complete rods) matched the visible rod portions in the habituation display. In the former condition, there was no consistent test display preference, and in the latter condition, there was a preference for the complete rod. These two findings imply that the infants attended to rod orientation in perception of its unity: when edges are misaligned (Figure 2c), perception of object unity appears to be indeterminate, and when edges are neither aligned nor relatable, infants seem to perceive disjoint objects (Johnson, Bremner, Slater & Mason, 2000 and Smith, Johnson & Spelke, in press recently obtained similar results). Note that in all three of these conditions, the rod parts underwent common motion, and thus would be predicted to specify unity to 4-month-olds on the Kellman (1996) account of unit formation. How does perception of object unity develop? A second line of research has addressed the conclusion that humans begin postnatal life with certain kinds of object reasoning skills (Spelke, 1990, 1994; Spelke & Van de Walle, 1993). In an investigation of the possibility that perception of object unity is available from birth, Slater, Morison, Somers, Mattock, Brown and Taylor (1990) tested neonates with rod-and-box displays and reported consistently longer looking at a complete rod, relative to a broken rod, the opposite result relative to findings with 4-month-olds (Kellman & Spelke, 1983). This result demonstrates that neonates achieved figure –ground segregation in rod-and-box displays, clearly distinguishing the rod parts from the occluder and background, but they did not appear to perceive the unity of the rod parts. Instead, the neonates responded only to what was directly visible in the display, failing to make
Denis Mareschal and Scott P. Johnson
the ‘perceptual inference’ necessary to posit the existence of the hidden portion of the rod.1 This finding with neonates implies further that veridical perception of object unity, in the sense that performance corresponds to that of adults, emerges some time between birth and 4 months of age. This possibility was explored by habituating 2-month-olds with the rod-andbox display that had been shown previously to 4-montholds (in which the older infants had apparently perceived the rod parts’ unity), followed by the same complete and broken rod test displays (Johnson & Náñez, 1995). The younger infants showed no consistent posthabituation preference, suggesting that they had no clear percept of either unity or disjoint objects. It is possible, however, that the display presented to the 2-month-olds contained insufficient visual information to activate veridical surface segregation. This possibility was probed with displays in which this information was enhanced by showing more of the rod’s surface (Johnson & Aslin, 1995). In this case, 2-month-olds preferred a broken rod display during test, indicating perception of the rod parts’ unity during habituation. A similar logic was adopted in an investigation of neonates’ perception of object unity in enhanced displays containing additional information relative to the displays used previously (by Slater et al., 1990): more visible rod surface, greater depth difference, background texture, and so on (Slater, Johnson, Brown & Badenoch, 1996). Even with this additional information, however, the neonates preferred a complete rod during test, indicating perception of disjoint objects. Progress toward a comprehensive account The pattern of results across experiments leads to several conclusions. First, by 4 months, infants rely on multiple sources of information in object perception tasks: no single visual cue, such as common motion, drives perception of object unity. Second, perception of object unity develops – that is, surface segregation skills undergo change, improving rapidly after birth. There are no published reports of any other object perception task that has been presented to infants from birth through the first several postnatal months (see Johnson, 2000), and at present there is no direct evidence that would suggest that humans are born with object reasoning skills. Despite this recent progress in our understanding of perceptual development, fundamental questions remain regarding the origins of object perception. We can make a start toward answering these questions by outlining 1 Note that both the neonate’s and the adult’s percepts are entirely consistent with the available evidence. What has changed is the bias in how the infants respond to ambiguous events.
some possibilities regarding perception of object unity. First, it might be that unity perception develops more or less as the visual system matures, and the infant is thereby able to take note of available information as improvements occur in acuity, color and luminance discrimination, depth perception, and so on. Second, infants may experience objects in accord with some core principles (such as contact), but may not exhibit evidence of these principles due to limitations in our testing procedures, or an inability to access the full range of available visual information that might trigger veridical percepts (see Jusczyk, Johnson, Spelke & Kennedy, 1999). Third, unity perception might be learned. On this account, visual skills are sufficient at birth (or very soon after birth) to abstract those visual cues specifying surface segregation, but the neonate fails to recognize that partly occluded and fully visible objects seen at different times might be one and the same. That is, visual sensitivity is sufficient to impart clear percepts of all visible surfaces in an array, but what is missing is the ability to link separated edges across a spatial gap. What kind of evidence would allow us to distinguish between these contrasting views? One important tool with which to explore this and related questions is connectionist (computational) modeling, which has been successful in exploring a range of developmental phenomena (Elman, Bates, Johnson, Karmiloff-Smith, Parisi & Plunkett, 1996; Mareschal & Shultz, 1996; Mareschal, 2001). Connectionist models consist of networks of interconnected processing nodes, analogous to neurons, designed to learn through interactions with a specific ‘environment’ created by the modeler. Such models are often produced by arranging the nodes in layers, with connections within and between the layers. One common approach is the incorporation of an input layer that is responsible for initial processing of stimulus information, an output layer that provides a response, and an intermediary hidden layer that enables the internal ‘rerepresentation’ of information in the environment. Representations are embodied in the weights assigned to connections or as patterns of activation across a bank of nodes, and are developed by extracting the statistical regularities present in the environment. Computational models can provide rigorous and tangible accounts of development, because the time course and nature of learning can be captured and made explicit, and in implementing a model, the modeler is forced to make explicit what is meant by ‘representation’, ‘acquired knowledge’, ‘innate knowledge’, and so on. If a model can be shown to acquire a particular behavior, then the constraints built into that model (in terms of prior ‘knowledge’, information processing mechanisms and learning algorithms) constitute possible
A connectionist model of perceptual development
candidate constraints on human learning. Of course, it is always possible that humans operate using a different set of constraints. Models do not provide definitive answers to questions of human information processing. What they provide, instead, is a set of possible solutions. In the present article we report on two models of the development of perception of object unity, with the goal of explaining human performance across development. The models were first trained by exposure to simple events in a simulated, schematic visual environment. In these events, unified and disjoint ‘objects’ moved past and behind an occluding ‘screen’. The models were endowed with the ability to extract object motion, common motion of two objects, accretion and deletion of texture, T-junctions (an intersection where one surface occludes another, so-called because the projected edge of the far surface stops at the edge of the near surface, analogous to the stem and bar of a T, respectively) and edge alignment. The models also possessed a short-term memory, such that when an object became occluded, a rapidly decaying trace of that object’s representation remained. After varying amounts of training, we presented novel test events that incorporated the visual cues to which the model was sensitive. Test events always presented partly occluded (never fully visible) objects. Using the Johnson and Aslin (1996) and Kellman and Spelke (1983) strategy, we systematically included or omitted cues across displays and observed the models’ responses. We found that after sufficient training, the models responded appropriately to object unity under conditions of partial occlusion, demonstrating the potential importance of learning in the development of object perception. The nature of the training environment (i.e. the cues that were made available) was critical in determining performance.
two disjoint objects. After training, the models applied this ‘knowledge’ to novel events in which object unity was not directly visible (i.e. when two rod parts were visible above and below the occluder). The key to learning to perceive unity in these ambiguous stimuli was the presence of a short-term perceptual memory and exposure to objects that became occluded and unoccluded. The first assumption: detection, then utilization Infants are born with a functional visual system, and exhibit marked preferences for some classes of stimuli over others: moving stimuli are preferred to static stimuli, patterned stimuli to unpatterned stimuli, high contrast to low contrast, and horizontal contours to vertical contours, among others (see Slater, 1995 for review). At birth, infants also provide evidence of figure–ground segregation. Recall that neonates preferred a complete to a broken rod after habituation to a rod-and-box display, implying perception of disjoint objects (i.e. two rod parts) in the original display (Slater et al., 1996). When habituated to a complete rod in front of an occluder, in contrast, neonates subsequently preferred a broken rod test display (Slater et al., 1990). These findings suggest that the neonates formed a clear impression of distinct, segregated surfaces in both displays: two rod parts separate from the occluder and background in the former condition, and a single rod separate from the occluder and background in the latter condition. Despite these visual skills, effective utilization of visual information in object segregation tasks lags behind its detection at birth. For example, T-junctions were available as cues for relative depth in the Slater et al. (1990, 1996) occludedrod displays, but the neonates did not appear to have perceived occlusion in these displays. That is, the rod surfaces appear to have been perceived to end at the rod– box intersection, rather than continue behind, suggesting that T-junctions were detected (and contributed to figure–ground segregation) but misclassified as indicating edge termination rather than edge continuation. It is unlikely that the infants simply perceived the rod and occluder surfaces to occupy the same depth plane, because even at birth infants can distinguish objects at different distances (Slater, Mattock & Brown, 1990). The potential role of surface motion in neonates’ perceptual segregation is less clear, due to a complex developmental trajectory for motion sensitivity (see Banton & Bertenthal, 1997). It has been claimed that infants younger than 4 to 6 weeks of age lack cortical mechanisms subserving motion discrimination, a claim based in part on infants’ preferential looking toward one side of a stimulus containing regions moving in opposite directions vs a uniform pattern on the other side
Figure 3 Architecture of the models. See text for details.
sented to the retina represented objects, their orientation and motions, and the background. This information was processed by seven encapsulated perceptual modules, each of which identified the presence of one of the following cues during specific portions of training and test events: (a) motion anywhere on the display; (b) comotion of objects in the upper and lower halves of the display, whether in-phase or out-of-phase; (c) common motion of objects in the upper and lower halves of the display; (d) parallelism of object edges in the upper and lower halves of the display; (e) relatability of object edges in the upper and lower halves of the display; (f) texture deletion and accretion and (g) T-junctions. We chose these particular cues because of the importance of motion (i.e. cues a, b and c), edge orientation (cues d and e), and depth (cues f and g) to young infants’ perception of object unity (Johnson & Aslin, 1996; Kellman & Spelke, 1983). Each perceptual module fed into a layer of hidden units with sigmoid activation functions, which in turn fed into a response (output) layer. The response units determined the model’s decision as to whether the ambiguous stimulus (i.e. the partly occluded rod) contained a single object, two disjoint objects, or neither (a response we termed ‘indeterminate’). Unity was also a ‘primitive’, like the other cues, in that a model could perceive it directly in unambiguous cases (i.e. when the object was visible to one side of the occluder). These types of response to unity are consistent with evidence from human neonates. In the absence of any occlusion, neonates can discriminate between a broken and an unbroken visible rod. Indeed, this is a necessary precondition for interpreting the looking-time behaviors of neonates in experimental studies of the perception of object unity (e.g. Slater et al., 1990). In the absence of direct perception (i.e. when the objects were partly occluded) the perception of unity was mediated by its association with other, directly perceivable, cues.
A connectionist model of perceptual development
Figure 5 Five time steps through training Event 1. The object (unified, in this case) appears both as fully visible and partly occluded at different times during the event.
to incorporate or omit cues known to mediate perception of object unity: motion, alignment, relatability, T-junctions, and accretion and deletion of texture. Figure 6 lists the perceptual cues present in each event. All events began with the object (or objects) moving on to the display from the side. During this initial portion of the event, the object was unobstructed from view. The object moved across the display, passed behind the area occupied by the occluding screen, reappeared on the other side of the screen, and continued off the display (Figure 5). The perceptual modules
Figure 4 Schematic depictions of training and test events. Refer to Figure 6 for the cues available in each event.
The bottom half of the network (see Figure 3) encompasses perceptual abilities that are functional at the earliest time of testing unity. Each module was designed to compute the presence or absence of a single cue in the displays experienced by the model. The modules incorporated general neural computational principles of summation, excitation, inhibition and local computation. However, there was no learning involved. The modules were tailored to the specific nature of the model’s experience and were intended as analogues of the neonate’s visual system, but not to embody its anatomy or physiology.2 However, they did instantiate some of the basic principles believed to underlie the computation of the associated visual cues (see Spillman & Werner, 1990 for a review). For a more complete description of each perceptual module, see the Appendix. What drives learning? Learning was driven by an error feedback signal partly obtained directly from the environment and partly from 2
The fact that these modules are operational at the earliest time of testing does not necessarily imply that they are ‘hardwired’ from birth. For example, Nakissa and Plunkett (1998) described a set of simulations in which networks evolve over many generations to become excellent learners of phonological discriminations. Networks from the final generation are unable to discriminate phonemes prior to any experience, but only a few minutes of real world speech are necessary for categorical perception of phonemes. One can imagine a similar scheme in which a short time of visual experience would fine-tune a set of crude perceptual modules, but at present this remains an empirical question.
with −1 < Ti < +1, 0 < µ < 1, and Ei = 0.0 when the rod is occluded. Ei is the unity feedback signal obtained from the environment (by direct perception) for output i, and µ is a parameter controlling the depth of memory. When Ei = 0.0 (i.e. there is no direct percept of unity), the target (Ti(t)) is derived entirely from the memory component µ.Ti(t − 1), the second term in the right-hand side of equation 1. The weight updates are computed according to an error reduction algorithm (backpropagation) that minimizes the difference between the actual output and the target output activations. The system self-organizes in such a way as to minimize the difference between its unity prediction and what it perceives as true in its environment. There is no external agent providing the network with the desired answer. All target information required for updating weights is obtained directly from the environment (through direct perception) in the same way as the perceptual input is obtained, or from within the system (through memory). In other words, this is an example of unsupervised learning. Similar accounts of self-organization using backpropagation networks can be found elsewhere (e.g. Mareschal, French & Quinn, 2000; Munakata, McClelland, Johnson & Siegler, 1997; see also Baldi, Chauvin & Hornik, 1995, for formal proofs of the equivalence of some linear backpropagation networks with some linear self-organizing systems). The model’s unity response was driven by a combination of activation from the direct and mediated routes. When direct perception was possible, the activation from this route overrode that of the mediated route by saturating an output unit’s response towards +1 or −1. When direct perception was not possible, the unity response was mediated through its associations with other cues that are directly available. In the present paper, we are interested in assessing the mediated route’s performance. The degree to which the model’s mediated response was correct when direct perception was not possible reflects how well it responded to incomplete information. The degree to which the mediated route’s prediction was correct when direct perception was possible reflects how well the network has internalized general information about objects that applies across its entire learning environment. Network performance can be assessed either when direct perception is possible (events 3, 4 and 9 to 26 in Figure 4), or when it is not possible (e.g. on events 1, 2, 5, 6, 7 and 8). In assessing the model’s performance we compared the output of the mediated route with direct perception when available. When direct perception was not possible, the network’s response was compared to the modeler’s knowledge of what condition the ambiguous stimulus was derived from. A mediated response was scored as
A connectionist model of perceptual development
correct if it accurately predicted the origins of the ambiguous event (e.g. a unified object was perceived when the event was caused by a unified object). It was scored as incorrect when it predicted the opposite origins of the ambiguous event (e.g. two objects were perceived when a single object caused the event), and it was scored as indeterminate when the output was either (+1, +1) or (−1, −1). Because output units were linear, for the purposes of scoring the network responses the output values were classified as +1 if they were positive and −1 if they were negative. These responses were then compared with human responses under similar conditions to evaluate how well the model matches human data. A network’s performance was tested by presenting it with events consisting of the ambiguous segment of the trajectory only. In other words, during testing, the networks could not use information available in the unambiguous segment of the trajectory to derive unity. The test results reported below were scored on what would correspond to time steps 7 and 8 of a full 14 time step event. We report on two models that each contained the same architecture and training procedures previously described. Model 1 was trained in a ‘simple’ perceptual environment (a small subset of the events depicted in Figure 4), and Model 2 was trained with an ‘enriched’ environment (a larger subset of the events). To anticipate, we found that both models learned to predict unity in an ambiguous event, but the model that experienced an enriched environment acquired the most general knowledge of the relation between the presence of individual perceptual cues and the percept of unity.
idiosyncratic series of events determined by the random selection procedure. Networks were periodically tested for prediction of unity after specific intervals during training (10, 50, 100, 500, 1,000, 1,500, 2,000, 3,000, 4,000, 5,000, 6,000, 7,000 and 8,000 epochs). Figure 7 shows the performance of the ten networks across all 26 events during testing. Consider first the four events on which Model 1 was trained (events 1 to 4). By 50 epochs, the networks correctly signaled the presence of a single object or two distinct objects on the unambiguous portion of the trajectory of these events (when the object elements are to the left or right of the occluder, not shown in Figure 7). When the networks were tested with the ambiguous portion of the trajectory (when the object elements are above and below the occluder), they quickly reached high levels of performance in all of the familiarization events. Learning was more rapid in events 3 and 4 because there are no ambiguous portions in these events. When tested with ambiguous events, accurate performance was delayed. Nevertheless, by 1,500 epochs all networks perceived events 1 and 2 as arising from a single, partly occluded object. To understand the generality of knowledge encoded in the mediated route, consider next performance on events with which the networks were not trained (events 5 to 26, Figure 7). By 1,500 epochs, the networks performed correctly on the ambiguous portions of 15 of the remaining 22 test stimuli (68.2% of the events), but failed on events 10, 11, 19 to 21, 23 and 24. Note that events 15 to 26 are all displays with two objects in which there is no common motion. Failure on these events is tantamount to perceiving a single unified object even though the two object components are moving in the opposite direction. These networks also failed to perform correctly on 3 of the 26 events (events 9, 10 and 22) when tested with the visible unambiguous segment of the event, even after maximum training (8,000 epochs). In summary, the networks that were trained with a simple perceptual environment learned to perceive object unity quite rapidly in most of the displays with unified objects, and perceived disjoint objects in most of the other displays. However, the networks were able to generalize their knowledge only to events that were relatively similar to those that were experienced during learning. It is not immediately clear why the networks failed to use some of the cue combinations appropriately. Inspection of conditions that led to unsuccessful performance does not lead to straightforward interpretation of all instances of failure, although it is notable that events 19, 20, 21, 23 and 24 each contain some combination of T-junctions, relatability and co-motion (but not common motion). These are all cues that can lead to the percept of unity, especially on the training set
Denis Mareschal and Scott P. Johnson
Figure 7 Model 1 performance. Ten networks, each with random initial weights between nodes (perceptual modules, hidden units and output units), were trained with events 1–4. The networks were able to generalize learning to novel events, but performance was constrained by the limited training experience.
Model 2: Learning in an enriched environment In the second model, we extended the range of learning experiences by using a training set that was more representative of the entire range of events. This training set was
A connectionist model of perceptual development
Figure 8 Model 2 performance. Ten networks, each with random initial weights between nodes (perceptual modules, hidden units and output units), were trained with events 1, 2 and 17–22. The architecture was identical to the networks in Model 1, but performance was superior due to the enriched training environment.
unambiguous portion of the test events on which they were trained. The more rapid adaptation of the mediated route, relative to Model 1, was due to the more frequent exposure to disjoint objects in the training environment. Figure 8 shows the performance of the ten networks during the ambiguous portions of all 26 events across
it was in the previous time step, and 0 if there is no change. There is exactly the same number of frames with texture as those without texture, but an imbalance arises when we pass from a training event with texture to one without texture. Even though this latter event does not have any texture elements, the texture module will respond initially by signaling that there has been texture deletion in the first time frame, because the network has gone from ‘seeing’ texture to ‘not seeing’ texture when passing between events. In other words, the first frame of any event without texture that follows an event with texture will be marked as having texture deletion. This happens in 25% of the events. Each event consists of 14 frames, so the networks experience a texture deletion output (1/14) × 0.25 = 0.02 times more often. That is, approximately 51% of frames involve the texture deletion feature active whereas 49% involve the feature inactive. As noted previously, the enriched training set has many more disjoint events than unified events which implies that there is a slightly higher correlation between the presence of texture deletion and the presence of disjoint objects. For these networks, therefore, the presence of texture deletion is a weak predictor of two objects being present. The correlations that underlie this association are very small, and it is thus a very weak link that only comes into play when the other cues are well balanced. Notably, the networks overcame the tendency to associate texture with disjoint objects eventually in perceiving unity in event 1. Consider next performance on events to which the networks were not exposed during training. By the end of 8,000 epochs, the networks achieved accurate performance on 14 of these 18 events (77.8%, a higher success rate than the 68.2% success rate of Model 1), suggesting that the additional training events led to greater generalization of knowledge relative to Model 1. Inspection of Figure 8 reveals that the enhanced knowledge of Model 2 incorporates a role for both motion and alignment in perception of object unity. In events 5 and 6, for example, in which there is no motion, unity is perceived accurately by 1,500 epochs. This percept is achieved more quickly than in comparable displays with motion (events 1 and 2), suggesting that motion is a cue that biases against unity perception. As in the case of the texture cue described previously, this counterintuitive result (relative to human performance) can be accounted for by appealing to the nature of the training set. Recall that the majority of training events consisted of disjoint objects, and these all contained co-motion as a cue (but not common motion). Motion, therefore, in the form of comotion, became associated with disjoint objects; later in training, common motion (available as a cue in events 1 and 2 in the training set) became associated with unity.
A connectionist model of perceptual development
If perception of unity in events 1 and 2 was not achieved primarily on the basis of motion, what cue or cues led to accurate performance? Note that alignment (the combination of parallelism and relatability) was present in these two events but none of the other training events, leading the networks to associate alignment to unity, rather than to disjoint objects. In the absence of motion, therefore, the networks more quickly perceived unity when alignment was available (events 5 and 6). The networks also seemed to use parallelism and relatability separately as cues for unity, even though this led to inaccurate performance: unity was perceived in events 7 and 8, each with relatability but not parallelism, and events 23 and 24, each with parallelism but not relatability. This response pattern was due to the association of each cue to unity in training events 1 and 2. This tendency to perceive unity from parallelism and relatability was overcome, however, with the additional information for disjoint objects provided by the lack of T-junctions in comparable displays (events 9 and 10, and events 25 and 26, respectively). Lack of T-junctions was associated consistently with separate objects during training in events 17, 18, 21 and 22. In summary, the networks in Model 2 learned to perceive either unified or disjoint objects in a wide range of new events. Performance was superior relative to Model 1, due to the provision of a richer training set. The idiosyncrasies of Model 1 did not characterize performance in Model 2, whose responses were more readily interpreted in light of training experience. The few instances of inaccurate performance in Model 2 were explained by appealing to the nature of learning in connectionist networks, and the limitations of the training environment. These powerful statistical learners extracted regularities that were unique to their training environment and that do not reflect regularities characteristic of the human environment. Increasing the richness and complexity of this environment (thereby bringing it more in line with the infant’s environment) should eradicate these spurious correlations. For these networks, therefore, their responses were not inaccurate (strictly speaking), given the perceptual environment they were provided.
Figure 9 Connection weights (and resting activations in parentheses), after training, in one of the Model 2 networks. See text for details.
contribute to these representations at different points in development. The simulations reported in the previous section (Model 2) were repeated with the same architecture and training environment, with one exception: an initial investigation of the role of the hidden units in the models described previously showed that only two of the three hidden units played any functional role (i.e. the third hidden unit was redundant). To simplify the analyses of weights in the network, the simulations reported in this section were run using two instead of three hidden units. Weight matrices after training A better understanding of how the different cues contribute to the percept of unity can be obtained by looking at the strength of the connection weights in the networks. Figure 9 depicts a schematic view of the connection weights in one network after 8,000 epochs of training with events 1, 2 and 17 to 22. At the top of the figure are the two output units: one of these signals unity, and the other signals disjoint objects (recall that an output across these units of (+1, −1) is scored as a ‘one object’ or unity response, and an output of (−1, +1) is scored as a ‘two objects’ response). At the center of the figure are the two hidden units, and at the bottom are the seven perceptual modules. Each hidden unit and
bias that is always present, the motion cue effectively resets the baseline activation to below zero. So, although there is an initial bias toward producing a ‘one object’ response, that bias is reversed in the presence of motion. As discussed subsequently, the presence of other cues negatively and positively associated with either hidden unit can tip that unit’s response, and the network’s response, in one direction or the other. Next, note that the other cues will each tend to have different effects on hidden unit 2, the stronger of the two hidden units. Recall that this hidden unit is initially biased to produce a ‘one object’ response, and that an input of strong negative activation will produce the opposite response. Hidden unit 2 has a strong negative weight with the co-motion cue (−2.284). Along with the motion cue, therefore, the presence of co-motion tends to produce a ‘two objects’ response. T-junctions are most strongly associated with the presence of one object (1.620), followed by common motion (0.847), relatability (0.254) and parallelism (0.114). Finally, for this network, the texture is a weak predictor of two objects being present. Note that hidden unit 2 has picked out many of the cues that we would normally see as predicting unity, and that both relatability and parallelism provide independent contributions to the ‘one object’ response. Hidden unit 1, the weaker of the two hidden units, has developed a completely different set of associations. Specifically, this unit has developed negative associations with all cues. As a result, as long as something is present on a display, its response will counteract the initial bias towards responding with unity by attempting to cause a ‘two objects’ response. This embodies the fact that the large majority of training events involve two objects. Hidden unit 2 must thus fire strongly to bring the network to a ‘one object’ response. Any weak response from hidden unit 2 (e.g. when cues provide conflicting information) allows hidden unit 1 to activate the output unit 2 by firing negatively. Examination of these connection weights, therefore, illustrates how the network is able to combine evidence in a complex fashion from a series of perceptual modules to make a unity prediction. In particular, no single cue is sufficient to perceive unity, and cues take on different degrees of importance in different contexts. In the next section we explore how these associations build up with learning. Learning cue associations Figure 10 shows the connection weights between the outputs of the perceptual modules and the hidden units across development. Connection weights are represented by squares. The larger the magnitude of the weight, the
A connectionist model of perceptual development
Figure 10 Development of connection weights between the seven perceptual modules and bias unit, and the two hidden units. Right to left: 1. Motion. 2. Co-motion. 3. Common motion. 4. Parallelism. 5. Relatability. 6. Texture. 7. T-junctions. Top to bottom: Activation strengths of each connection after varying numbers of epochs.
At 2,000 epochs, the network begins to rely on different perceptual cues to produce its response and the basic structure of the mature weight matrix (discussed in the previous section) begins to appear. Hidden unit 2 acquires a fairly large positive bias (thereby biasing it to produce a unity response). Motion, common motion, relatability and T-junctions have become positively associated with this unit, and hence these all become cues that signal a single unified object. In contrast, co-motion and texture become strongly associated with the percept of two objects. At this point, hidden unit 1 has the same (though weaker) pattern of associations between perceptual cues and the output response as hidden unit 2. Initially, then, hidden unit 1 is redundant. By 3,000 epochs, the weights have continued to grow in magnitude, but have remained in essentially the same pattern as at 2,000 epochs. A weaker response from the perceptual modules, therefore, will trigger the same responses from the network, which is becoming more sensitive to weaker forms of the same evidence in interpreting an ambiguous stimulus. By 4,000 epochs, hidden unit 1 has emerged with it own role: all the connections between the perceptual modules and this unit have become negative. At this point, hidden unit 1 acts to bias the network against a unity response, thereby ensuring that hidden unit 2 elicits a unity response only when there is strong evidence to do so. From 5,000 epochs onwards, the pattern of associations between the perceptual modules and the hidden units remains stable. The changes with development involve an increase in the magnitude of the weights, leading to an increased sensitivity to the relevant cues. By epoch 6,000, parallelism decreases in association with hidden unit 2 until it reaches a negligible level at 8,000 epochs.3 The texture cue also weakens its association with hidden unit 2. In contrast, all weights linking the perceptual modules with hidden unit 1 continue to increase in magnitude. In summary, throughout development the network progressively relies on an increasing range of cues to determine its response. The connection weights embody the association between perceptual cues and unity in both ambiguous and unambiguous contexts. The challenge for the network is to discover a set of weights that 3
Interestingly, parallelism did not become associated with unity during any time of this network’s development, unlike the majority of the Model 2 networks reported previously. This is the only instance in which the present network and the previous networks differed with respect to which cues predicted unity. The difference may lie in the somewhat stochastic nature of the development of connectionist networks, due to variations in starting conditions, randomized training schedules, and the fact that there is no unique set of cue combinations that can be used to perceive unity correctly.
Denis Mareschal and Scott P. Johnson
is consistent with both contexts, and is consistent with both one object and two object events. Many of the associations that emerge reflect the fact that the majority of events this network experiences are unambiguous (fully visible) and arise from two distinct moving objects. These sorts of events are the first that are learned in the network’s mediated route. Only later in development (4,000 epochs and afterward) does the mediated route begin to learn associations that allow it to function with ambiguous events. An important finding from the analyses in this section and the previous section is that no single cue is necessary or sufficient to the perception of unity. Instead, the importance of a cue depends on the context in which it appears. Individual cues do not maintain constant importance as markers of unity across all events, but acquire more or less importance in accord with other available information.
ity to segment figure from ground could progress to an ability to perceive partial occlusion. From there, more sophisticated percepts emerge, such as the ability to represent objects while fully occluded, but these percepts too are fragile initially (Johnson, Bremner, Slater, Mason, Foster & Cheshire 2002; Johnson, 2000, 2001). Such an account does not rely on the positing of innate knowledge in infants to explain the development of object percepts. Our models are consistent with this account. They implement an initial sensitivity to visual information (that has been shown to support veridical responses to partly occluded objects). They then show how these visual cues can be combined to produce a representation of the ambiguous state of a partly occluded object: a representation that was not present in any form prior to experience with the environment. It might be argued that the network has not actually acquired any new representation of unity because there was a set of ‘unity’ nodes present from the onset (i.e. the output units that we labeled externally as corresponding to unity). It is true that this is a network with a fixed architecture, and therefore a predetermined number of representational nodes (see Mareschal & Shultz, 1996 for a discussion of networks with a variable number of nodes), but we believe that the network has nevertheless acquired a new level of representation. The output is initially not linked to any perceptual input. At this early stage, although we are labeling the output as ‘unity’, it has no semantic content because it is not grounded in any perceptual experience and consequentially cannot be used to refer to any event in the environment. This is followed by a stage in which the output is linked to certain perceptual cues that lead the network to respond to the ambiguous stimulus in Figure 1a as though it were caused by two broken but partly occluded rods. At this stage the nodes have acquired semantic content that is grounded in perceptual input and leads them to classify ambiguous events as ‘not unified’. In other words, the network now has a representation of what is unified and what is not unified, although this representation differs from the normal adult representation. Finally, at the end of training, the unity output nodes have become associated to yet another set of perceptual cues. The response is still perceptually grounded (and therefore can be used to refer to something in the environment). However, the semantic content of the response has changed. Now, the network’s unity response when presented with an ambiguous stimulus is the same as that of adults, and the ambiguous stimulus is perceived as corresponding to a single unbroken but partly occluded rod. Within the constraints imposed by the limited perceptual cues available, the network’s representation of ‘unity’ has acquired the same semantic content as that of adults.
object surfaces based on what was directly perceived, although these segregation skills were not pre-specified. After repeated exposure to events in which objects were seen first fully visible and then underwent partial occlusion, percepts of unity emerged that matched human adult percepts. The use of cues to perceive unity changed with exposure. Both these latter patterns of performance are also observed in infants. Overall, then, the progression from responding to partly occluded objects as disjoint objects to perceiving object unity characterizes both our connectionist models and human infants, implying common developmental mechanisms. This lends plausibility to our account of these mechanisms as arising from an early perceptual competence (but see footnote 2 again) and experience viewing objects as they become progressively occluded and disoccluded. The models employed simplistic perceptual modules and experienced a relatively impoverished environment (as compared to the natural world). Thus, it is unlikely that specific predictions about infants’ reliance on particular cues can be derived from these models. This is because the infants experience a much richer environment than the networks did. Nevertheless, the models embodied the computational principles by which human infants might learn. They successfully show how associative mechanisms can be used to combine perceptual cues in such a way as to derive a unity response similar to that of adults, from a perceptually ambiguous event. It is also notable that early in training, the models made no use of perceptual information in making a response. After 2,000 epochs, however, there was a relatively sudden emergence of cue use whose basic pattern remained largely unchanged throughout development. That is, there was a sudden shift in performance, strongly resembling a stage (cf. Elman et al., 1996). Of course, there are also important differences in how these connectionist models and humans use information to parse scenes containing partly occluded objects. Our models were not intended to tell us about which cues infants use to perceive unity (indeed our selection of input cues was driven by previous experimental studies with infants). They were designed to test the prediction that perceptual sensitivity and association lead to a response bias towards ambiguous stimuli that has been interpreted as evidence of object knowledge. Texture, for example, was not used by the models as a depth cue, as is the case for human adults and infants (Gibson, 1979; Johnson & Aslin, 1996, 2000). What do we mean by knowledge? Computational models force the user to be explicit about what is meant by the term ‘knowledge’. In the present
The models in this paper demonstrate how the perception of unity could be mediated by available information in the absence of direct evidence. We do not wish to claim that there is anything special about unity in this case: it is relatively straightforward to generalize this account to other perceptual cues. Any one of these cues could be mediated by indirect associations with other cues. A more complex network could be devised, in which, if one cue could not be computed, its association with other computable cues could be used to derive a value for that cue. However, whether the resulting network would be computationally tractable is an open question.
Appendix: Details of the model architecture The networks were designed with an input layer (the perceptual modules), an output layer (the response units) and a layer of hidden units between the perceptual modules and the response units. They were exposed to a simple perceptual environment. There were no direct connections from the perceptual modules to the output; all cue relations, therefore, had to be encoded across the hidden units. Networks began with random initial weights (mean = 0.0, range = 0.01). During training, the weights between the perceptual modules, hidden units and the output units were updated at every epoch (i.e. image presentation) using the backpropagation algorithm4 (Chauvin & Rumelhart, 1995) with a learning rate of 0.5, momentum of 0.03. The memory parameter (µ) was set to 0.4. A network’s performance was tested by presenting it with events consisting of the ambiguous segment of the trajectory only. In other words, during testing, the networks could not use information available in the unambiguous segment of the trajectory to derive unity. Output representations The unity response was coded across two linear output units with activations ranging across the interval (−1, +1). An output activation pair of (+1, −1) signified that the surfaces were unified, and (−1, +1) signified that the surfaces were not unified. A response of (+1, +1) or (−1, −1) was interpreted as indeterminate.
We have no necessary commitment to backpropagation. Any connectionist algorithm that implements gradient descent search in multi-layered networks could be used equally well, such as the leabra algorithm (O’Reilly, 1996, 1998).
A connectionist model of perceptual development
Input representations The input consisted of a 196-bit vector mapping all the units on a 14 × 14 grid. In the center of the grid was a 4 × 4-unit occluder. All units corresponding to the position of the occluder and visible object parts were given a value of 1. When background texture was present, all other units on the display were given a value of 0 or 0.2. Units with values of 0.2 corresponded to positions on which there was a texture element (i.e. the dots seen in Figure 4). Each event consisted of a sequence of 14 snapshots in which the object moved progressively across the display.
Figure 11 Architecture of three perceptual modules. A. Motion module. B. Co-motion module. C. Texture module. See text for details.
Positions that were newly occupied with the current input were provided with positive activation and positions that were occupied at the previous time step, but were no longer occupied, were provided with negative activation. The direction of motion was determined by observing the relative position of positive and negative activation. A Direction Buffer computed a weighted sum of the negative and positive activation (in which locations along the horizontal retinal axis were increasingly weighted from left to right) for each of the top and bottom halves of the display. A positive sum indicated that
Denis Mareschal and Scott P. Johnson
the object was moving to the right, a negative sum indicated that the object was moving to the left, and 0 indicated that there was no motion. Comparing these two values for the top and bottom halves of the display allowed the module to compute whether there is the same kind of motion in the top and bottom halves of the display. If there was common motion the module output 1, or 0 otherwise.
passed into a memory buffer (Past.Texture.Sum) for use at the next time step. The values of Past.Texture.Sum and Texture.Sum were passed to an output unit that computed the difference between the two. The output unit responded with a 1 if the difference was not 0 (i.e. there had been texture accretion or deletion), or a 0 if there was no difference (i.e. no background texture was present).
The parallelism module
The T-junction module
This module computed an approximation to the tangent of the angle that the object’s axis of principle length made with the horizontal, for both the upper and lower halves of the display, and compared these two values. A Cartesian coordinate system was set up by weighting the columns and the rows of the display according to the position of the row and column with respect to an origin at the center of the display. The X- and Y-components of object segments in the upper and lower halves of the display were computed within this coordinate system. The ratio of the Y-component over the X-component was used as an approximation to the tangent of the angle. If the tangents in the top and bottom halves were equal, the module output 1, or 0 otherwise.
This module focused on the area immediately above and below the edge of the occluding screen and computed whether there was a gap along these edges. If there was a gap, the absence of T-junctions was computed and the module output a 0, or a 1 if there was no gap. The cues detected by the perceptual modules were primitives, and other cues were computed as combinations of these primitives. For example, collinearity was indicated by positive responses from both the parallelism and relatability modules. Parallelism and relatability are more primitive cues than collinearity because the latter cannot be computed without the former, whereas the converse is not true (i.e. both parallelism and relatability can be computed independently of collinearity). Also, co-motion, common motion, parallelism and relatability can only be computed when there is more than one possible object present (i.e. when there are two objects in either an ambiguous or unambiguous situation).
Acknowledgements This research was supported by NSF grant BCS-9910779, ESRC grant R000238340, ESRC grant R000239112 and European Commission HP-RTN grant CT-2000-00065. Portions of this research were presented at the 1998 meeting of the International Society for Infant Studies, Atlanta, GA and the 1999 meeting of the Cognitive Science Society, Vancouver, BC.
References Baldi, P., Chauvin, Y., & Hornik, K. (1995). Backpropagation and unsupervised learning in linear networks. In Y. Chauvin & D.E. Rumelhart (Eds.), Backpropagation: Theory, architectures, and applications (pp. 389 – 432). Hillsdale, NJ: Erlbaum. Banton, T., & Bertenthal, B.I. (1997). Multiple developmental pathways for motion processing. Optometry and Vision Science, 74, 751–760. Bornstein, M.H. (1985). Habituation of attention as a measure of visual information processing in human infants: summary, systematization, and synthesis. In G. Gottlieb &
Kellman, P.J., Spelke, E.S., & Short, K.R. (1987). Infant perception of object unity from translatory motion in depth and vertical translation. Child Development, 57, 72–86. LaPlante, D.P., Orr, R.R., Neville, K., Vorkapich, L., & Sasso, D. (1996). Discrimination of stimulus rotation by newborns. Infant Behavior and Development, 19, 271–279. Laplante, D.P., Orr, R.R., Vorkapich, L., & Neville, K.E. (2000). Multiple dimension processing by newborns. International Journal of Behavioral Development, 24, 231–240. Mareschal, D. (2001). Connectionist methods in infancy research. In J. Fagen & H. Hayne (Eds.), Progress in infancy research (Vol. 2, pp. 71–119). Mahwah, NJ: Erlbaum. Mareschal, D., French, R.M., & Quinn, P.C. (2000). A connectionist account of asymmetric category learning in early infancy. Developmental Psychology, 36, 635–645. Mareschal, D., & Shultz, T.R. (1996). Generative connectionist networks and constructivist cognitive development. Cognitive Development, 11, 571–605. Marr, D. (1982). Vision. San Francisco: Freeman. Mitchell, T.M. (1997). Machine learning. New York: McGrawHill. Morgan, C.L. (1903). Introduction to comparative psychology. New York: Scribner. Munakata, Y., McClelland, J.L., Johnson, M.H., & Siegler, R.S. (1997). Rethinking infant knowledge: towards an adaptive process account of successes and failures in object permanence tasks. Psychological Review, 104, 686–713. Nakissa, R.C., & Plunkett, K. (1998). Evolution of a rapidly learned representation for speech. Language and Cognitive Processes, 13, 105–127. Náñez, J.S. (1988). Perception of impending collision in 3- to 6-week-old infants. Infant Behavior and Development, 11, 447–463. Needham, A., Baillargeon, R., & Kaufman, L. (1997). Object segregation in infancy. In C. Rovee-Collier & L. Lipsitt (Eds.), Advances in infancy research (Vol. 11, pp. 1–44). Norwood, NJ: Ablex. O’Reilly, R.C. (1996). Biologically plausible error-driven learning using local activation differences: the generalized recirculation algorithm. Neural Computation, 8, 895–938. O’Reilly, R.C. (1998). Six principles for biologically-based computational models of cortical cognition. Trends in Cognitive Sciences, 2, 455–462. Piaget, J. (1952). The origins of intelligence in children. New York: International Universities Press. Piaget, J. (1954). The construction of reality in the child. New York: Basic Books. Slater, A. (1995). Visual perception and memory at birth. In C. Rovee-Collier & L.P. Lipsitt (Eds.), Advances in infancy research (Vol. 9, pp. 107–162). Norwood, NJ: Ablex. Slater, A., Bremner, G., Johnson, S.P., Sherwood, P., Hayes, R., & Brown, E. (2000). Newborn infants’ preference for attractive faces: the role of internal and external facial features. Infancy, 1, 265–274. Slater, A., Brown, E., & Badenoch, M. (1997). Intermodal perception at birth: newborn infants’ memory for arbitrary auditory-visual pairings. Early Development and Parenting, 6, 99–104.
Denis Mareschal and Scott P. Johnson
Slater, A., Johnson, S.P., Brown, E., & Badenoch, M. (1996). Newborn infants’ perception of partly occluded objects. Infant Behavior and Development, 19, 145–148. Slater, A., Mattock, A., & Brown, E. (1990). Size constancy at birth: newborn infants’ responses to retinal and real size. Journal of Experimental Child Psychology, 49, 314–322. Slater, A., Morison, V., Somers, M., Mattock, A., Brown, E., & Taylor, D. (1990). Newborn and older infants’ perception of partly occluded objects. Infant Behavior and Development, 13, 33 – 49. Slater, A., Von der Schulenburg, C., Brown, E., Badenoch, M., Butterworth, G., Parsons, S., & Samuels, C. (1998). Newborn infants prefer attractive faces. Infant Behavior and Development, 21, 345 –354. Smith, W.C., Johnson, S.P., & Spelke, E.S. (in press). Motion and edge sensitivity in perception of object unity. Cognitive Psychology. Spelke, E.S. (1990). Principles of object perception. Cognitive Science, 14, 29 –56. Spelke, E.S. (1994). Initial knowledge: six suggestions. Cognition, 50, 431– 445. Spelke, E.S., & Van de Walle, G. (1993). Perceiving and reasoning about objects: insights from infants. In N. Eilan,