Abstract For infants, the first problem in learning a word is to map the word to its referent; a second problem is to remember that mapping when the word and/or referent are again encountered. Recent infant studies suggest that spatial location plays a key role in how infants solve both problems. Here we provide a new theoretical model and new empirical evidence on how the body – and its momentary posture – may be central to these processes. The present study uses a name-object mapping task in which names are either encountered in the absence of their target (experiments 1–3, 6 & 7), or when their target is present but in a location previously associated with a foil (experiments 4, 5, 8 & 9). A humanoid robot model (experiments 1–5) is used to instantiate and test the hypothesis that body-centric spatial location, and thus the bodies’ momentary posture, is used to centrally bind the multimodal features of heard names and visual objects. The robot model is shown to replicate existing infant data and then to generate novel predictions, which are tested in new infant studies (experiments 6–9). Despite spatial location being task-irrelevant in this second set of experiments, infants use body-centric spatial contingency over temporal contingency to map the name to object. Both infants and the robot remember the name-object mapping even in new spatial locations. However, the robot model shows how this memory can emerge –not from separating bodily information from the word-object mapping as proposed in previous models of the role of space in word-object mapping – but through the body’s momentary disposition in space.

Citation: Morse AF, Benitez VL, Belpaeme T, Cangelosi A, Smith LB (2015) Posture Affects How Robots and Infants Map Words to Objects. PLoS ONE 10(3): e0116012. https://doi.org/10.1371/journal.pone.0116012 Academic Editor: Vincent M. Reid, Lancaster University, UNITED KINGDOM Received: February 14, 2014; Accepted: December 3, 2014; Published: March 18, 2015 Copyright: © 2015 Morse et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited Funding: This work has been supported by the EU FP7 projects ITALK (no. 214668) and Poeticon++ (no. 288382), NIH grant R21HD068475 and NSF GRFP. Competing interests: The authors have declared that no competing interests exist.

Introduction Robotics makes clear the evolutionary feat that is biological intelligence [1–4]. Smooth and effective action in a constantly changing physical world requires the continuous coupling of sensors and effectors to those changing physical realities [2,5–8]. However, an intelligent system that does more than react also needs stable cognitive products such as categories and decisions that are at least partially decoupled from the here-and-now on which sensing and acting are so dependent [9]. Building artificial devices that can perform both sensorimotor and cognitive tasks—for example, walk up hills and learn language—has not yet been achieved [10,11]. One relevant debate in the study of both artificial and biological intelligence is how cognitive and sensory-motor components should relate to each other. One possibility, sometimes known as the cognitivist solution [12], is that cognition receives information from the sensors and passes information to effectors but is fundamentally distinct and separate from sensorimotor processes in the symbolic character of its cognitive computations [13–15]. The alternative possibility, sometimes known as the embodiment solution [1,7,8] is that there are no distinct computational principles for cognition versus perception and action and that, instead, cognitive processes emerge out of and are dynamically coupled to sensorimotor systems [1,6,16–22]. In this view, cognitive products, although partially decoupled from the here-and-now, are realized in and through the same sensory-motor systems involved in action and perception [21]. Although this second approach seems to promise a smooth integration of cognitive decisions with perception and action, there are many open questions about just how such an “embodied” solution would actually work to solve any specific cognitive task [23–25]. Here we provide an example of an embodied solution to the task of mapping heard names to seen things. Our novel solution integrates cognitive and sensorimotor processes by using bodily posture as the frame of reference for both planning actions and mapping names to objects. Within this cognitive architecture, the internal representations of objects and names are dynamically bound to the body’s momentary disposition in space; nonetheless, performance has a “cognitive” signature in that the robot can recognize a learned name for an object in new postural positions. We also show that this solution may be relevant to biological forms of intelligence by testing novel predictions from the robot model with 16-month-old human toddlers. The problem we consider is how a naïve learner, an infant, might map a name to an object and then later show memory for that mapping. This mapping problem has generated considerable theoretical interest [26,27] as everyday learning contexts contain multiple objects and unknown names. There are several cognitive solutions in which objects and names are represented as mental entities independent of their experienced spatial relation to the body [26–28]. However, in building a robot model, the body cannot be ignored; for a physically-realized learner to learn anything at all from real time experiences with words and objects, the body must, at the right moment, turn its sensors to the referred to object. Because sensors are in the body, the body—and its spatial disposition—is essential to the initial mapping of the name to a thing. But is the body relevant to more than this initial mapping? A growing body of research with infant learners [29–31] suggests that the spatial consistency of the direction of attention to the name, to the object, and across repeated encounters with the name and object increases the likelihood of mapping the name to the object and remembering that mapping. One account of these findings, the Dynamic Neural Field model (DNF) [31] binds features by virtue of their shared spatial position (without specification of the spatial frame as body-centric or allocentric) and then projects the result of this binding to a second field that represents words and objects but has no spatial component. Thus, the DNF learns words and objects through spatial correspondence but the specific learned mapping between words and objects remains independent from spatial information. The model as a whole system is still able to demonstrate spatial biases because object features, primed by words, project back into the object-space map which contains memories of the locations where those object features have been previously encountered. We propose an alternative account that was motivated by the task of building a physical device that can map words to objects and then show through physical behavior that it has remembered that mapping. The robotic model maps words—as does the DNF– through space, but in the robotic model those spatial representations are body-centric and always tied to the momentary posture of the learner. The robotic model generalizes these learned mappings to new spatial locations despite the through-body, and thus indirect, mapping between words and objects. In this model; words, objects, and their remembered associations are always through posture and hence are an explicit example of learned sensorimotor knowledge. Finally, this robotic posture model makes novel predictions not made by the DNF that are tested and confirmed in young children.

Infant Results Studies of word-object mapping by 1–1 1/2 year old infants, including variants of the Original Baldwin task and the Interference task [29,31,38] show that the spatial consistency of experienced names and words are critical for learning. However, there are several possible explanations of why spatial consistency matters for early learning (see, [29]), and the role of posture, the body’s momentary disposition in space as it orients to events in the world, has never been directly tested. Accordingly, Experiments 6 to 9 were replications of the Robot experiments with toddlers: Experiment 6—Original Baldwin task with no posture shift (Robot Experiment 1), Experiment 7—Original Baldwin task with posture shift at Step 5 (Robot Experiment 3), Experiment 8—Interference task with no posture shift (Robot Experiment 4), and Experiment 9—Interference task with posture shift at Step 5 (Robot Experiment 5). The timeline of events for the infant experiments are shown in Fig. 4. PPT PowerPoint slide

PowerPoint slide PNG larger image

larger image TIFF original image Download: Fig 4. Timeline of experiment 6 (above) and experiment 8 (below). Steps 1–4 expose the infant to the target and foil objects in consistent left and right locations. In step 5 the infant is told ‘this is a modi’ while the objects are out of sight (hidden in buckets) in experiment 6, or while the foil object is in the target object location and being attended in experiment 8. Steps 6 & 7 repeat the original exposure of the target and foil, and in step 8 the infant is shown both objects in a new location and asked ‘where is the modi’. Experiments 7 and 9 follow the same timeline with the addition that step 5 occurs in a different posture from all other steps. (The individual shown in this figure has given written informed consent (as outlined in PLOS consent form) to publish this image). https://doi.org/10.1371/journal.pone.0116012.g004 The results (Fig. 5) replicated the qualitative findings with robots: Postural consistency enabled infants to link a name and object experienced at distinct times but created interference when different meaningful events (different objects) are associated with overlapping postural stance. However for Experiment 8 (interference task), in which the target is named in the foil location associated with a different posture, infant results show a significantly below chance level of selection of the target, while in the robot experiments this is not significantly below chance. This suggests that infant memory for the object previously associated with a postural direction of attention is stronger than was assumed in the robot model. The key point, however, is this: Although posture is not typically considered a relevant factor in human cognition (but see [39–42]), humans, like the robot, are faced with the problem of how to integrate cognition with bodily actions that always depend on the momentary posture and thus one expedient solution—for toddlers as well as the robot model—may be the dynamic binding of cognition to body posture. The infants, like the robot, were tested for their word-target mappings in new locations and postural stances. Thus, the infant results, like the robot model, indicate that sensorimotor representations tied to postural orientation (and thus smoothly linked to bodily action) can yield one signature character of cognition: the seeming independence of knowledge from the body. PPT PowerPoint slide

PowerPoint slide PNG larger image

larger image TIFF original image Download: Fig 5. Comparison between the Child and Robot data showing the means of the proportion of correct choices (and standard error of the means) for all experiments, and using the low-learning rate robot data. Dotted line denotes chance, p < 0.05. Specific values for the child data and the robot data is as follows: For the Original Baldwin task, when objects and names were separately linked to the same posture, the robot correctly mapped the name to the target (Exp1), M = 0.71 (SD = 0.41), at above chance levels, t(19) = 2.2, p < 0.05, d = 0.51. Infants also correctly mapped the name to the target (Exp6), M = 0.71 (SD = 0.20), at above chance levels, t(15) = 4.16, p < 0.001, d = 1.04. In Experiment 2, where the locations of objects was switched, the robot failed to map the name to the target, M = 0.46, p = 0.64, but did so reliably less often than in the standard Baldwin condition t(38) = 0.03, p < 0.05, d = 0.58. In the Baldwin task with posture change, when Step 5, the naming event, was experienced in a new posture, the robot (Exp3) and the infants (Exp7) failed to map the name to the object, both preforming at chance Robot; M = 0.42, p = 0.85, Child; M = 0.41, p = 0.16, and did so reliably less often than in the standard Baldwin Task where there was no posture shift; Robot; t(38) = 2.49, p < 0.05, d = 0.78, Infant; t(30) = 3.73, p < 0.001, d = 1.32. In the Interference task, the toddlers showed the same interference effect as the robot, and as the toddlers in Samuelson et al., 2011; when the target object was explicitly named at a location and posture associated with the distractor object, both the robot (Exp4) and children (Exp8) selected the target referent at below chance levels however only the child data was significantly below chance, M = 0.36 (SD = 0.4), t(19) = -1.5, p = 0.07, d = 0.34, Robot data p = 0.07. For the Interference task with a posture change, when the Phase 1 experiences were distinguished from the Phase 2 naming events by a poster shift, although performance was not above chance, both children (Exp9) and the robot (Exp5) the interference effect present in Experiment 4 & 8 was reduced p = 0.09 & p = 0.13 respectively. However for both child data and robot data the named target in the posture shift condition was reliably selected more often than when there was no posture shift Child; t(30) = -2.59, p < 0.05, d = 0.91, Robot; t(38) = -1.87, p < 0.05, d = 0.24. https://doi.org/10.1371/journal.pone.0116012.g005 Directly comparing the robot and child data (see Fig. 5) we can see a very close match between the overall performance for the children and the low-learning-rate robot in each condition (within 2%) and yet there are differences: on the interference task with no posture shift, the child data (experiment 8) was significantly below chance while the robot data (experiment 4) was not. In every experimental condition we can see that the standard error is slightly greater for the robot vs child data, showing that the robot data was slightly more variable than the child data. We attribute both differences to inherent noise in the camera image driving the visual system leading to variation in the object boundaries and hence variation in the color information driving the color SOM. The key result is the similar qualitative pattern for the robot and the children, a pattern that implicates a role for momentary posture in young children’s mapping of a name to an object.

Conclusion The link between posture and mapping in the robot model and in the infants challenges theories that disassociate learning and memories for words and objects from the body and its disposition in space. The robot model is also an advance over the DNF model on the role of spatial consistency [31] because that model makes no prediction about posture as it does not specify the relevant spatial frame. The robot model—in the service of a physical body that must acquire and demonstrate information in space—proposes label-to-posture and object-to-posture associations, and thereby not only specifies the relevant spatial frame, body posture, but makes it central to learning. The infant results support this position; young learners, who also have bodies in space, exhibit learning patterns like the robot that are tied to the body’s position in space. This link between mapping labels to objects via posture in infants has potentially far reaching implications. Many atypical patterns of motor development are co-morbid with cognitive developmental disorders [43–45]. The link between abnormal movement patterns and poor attentional control in children, poor language learning, and at risk cognitive development (see, [45,46]) are well-known but not well understood. The present model provides hypotheses toward a mechanistic understanding of the developmental dependencies between sensory-motor processes and early cognitive development in that the coordination of the body in space plays a role in binding multimodal events and in retrieving those memories. One open question given the present results concerns development. Is the tie to posture for infant learning a fact about infants or a general fact about how the brain and how its functional networks extend into the body [47]. One possibility is that the findings are about early infant learning and perhaps related to the immature hippocampal system [48]. Another possibility is that the link between memories and the body found here is generally true, even in adults, albeit perhaps not so easily shown. First, the spatial frames of reference for eye, head, trunk, and hand [49–53] interact and bias each other suggesting a role for overall posture in the bodily frame of reference. Second, localized spatial attention plays a fundamental role in binding sensory information in adults [e.g. 54,55] and the brain networks that underlie localized attention overlap with the networks underlying the spatial frames for bodily action [56–61]. In sum, future work is needed to determine whether a role for posture in mapping words to objects is specific to and perhaps diagnostic of an immature learning system or whether it reflects the more general principle of how cortical networks extend to the sensory-motor surface [47]. The robotic model used the bodily frame of reference that is necessary for planning actions to also integrate information across space, time, and modalities, an expedient engineering solution that may also have been discovered through biological evolution. This solution made novel and confirmed predictions about the role of posture in infant learning. There are other ways to explain why posture shifts have the experimental effects they do [see 62]. The posture solution, however, derives from evidence and theoretical principles concerning the centrality of the body’s posture to all sensory-motor functions. Momentary posture determines the nature and structure of both input and planned actions, and in so doing provides a continuous context through which sensory-inputs gain their meaning.

Method Robot method 100 randomly initialized robot models (further details in S1 File) participated with equal numbers assigned to each of the 5 robot experiments. The two objects were a red sensory ball, and a yellow plastic cup. Each object was between 7–9cm in length and width and height. Each served as the target referent for half the robot models in each experiment. In Experiments 1 (Baldwin task) and 2 (switch task), the robot could see the white surface of the table during the naming event. In Experiments 1 (Baldwin task) and 4 (interference task), half the robot models were in a seated position through out the experiment and half in an upright standing position. In the Posture Change procedures of Experiments 3 (posture change) and 5 (interference task with posture change), half the robot models sat during Phase 1 (Object presentation), then stood at Phase 2 (Naming); the other robot models, stood at Object presentation then sat at Naming. In each trial of all experiments each individual network was tested 4 times. A video camera recorded the interaction for the later coding. A video of the interaction with the robot for experiments 4 and 5 is included in the supplementary information for this paper (S1 Video). Infant method 64 toddlers, half male, within +/- 3 weeks of 16 months participated with equal numbers assigned to each of the 4 Experiments (6–9) (6 additional children were recruited but did not contribute data due to shyness, poor performance on control objects at test as described below, or experimental error). Parents of all child participants provided written informed consent prior to the experiment. All experimental protocols and the consent materials were approved by the Indiana University Institutional Review Board. Children in all experiments were from monolingual middle-class homes in a Midwestern town. Participant names were obtained from public birth announcements. The two objects were a transparent cube with moving colored beads inside and a bright blue plastic toy garlic press that opened and closed. These were selected through extensive pilot testing such that infants demonstrated no significant preference for one toy over the other. For half the infants in each experiment the garlic press was the target and for the other half the cube was. Each object was between 5–7cm in length, 4–7 cm in height, and 4 cm in width. Each served as the target referent for half the children in each experiment. In Experiments 6 (Baldwin task) and 7 (posture change), two identical grey plastic buckets, 15 cm high with a diameter of 12 cm, were also used to hide the two objects during the naming event. For testing of children’s knowledge of the name-referent mapping, a small transparent container (10 X 10 cm, 5 cm high) was also used. In addition, small toys, typical instances of a banana, spoon, duck, and cat, were used as warm up and control test items. In the no-posture shift procedures of Experiments 6 (Baldwin task) and 8 (interference task), half the children sat on their parent’s lap through out the experiment and half stood on the parent’s lap; in both cases parents’ supported the trunk with the their hands. In the Posture Change procedures of Experiments 7 (posture change) and 9 (interference task with posture change), half the children sat during Phase 1 (Object presentation), then stood at Phase 2 (Naming), then sat for Phase 3 (Object re-presentation); the other children, stood at Object presentation, sat at Naming, then stood for the Object re-presentation phase. All children sat during the Test. In all conditions, the mother was instructed to hold her child—with both hands—at the waist, and other than to help her child stand or sit as instructed, not to participate in the experiment and explicitly not to touch, mention or motion any of the toys, nor to help the child in anyway. A video camera recorded the interaction for the later coding and to affirm parental compliance. Experiments 6 (Baldwin task) and 8 (interference task) followed the original procedure in the Baldwin task used with toddlers [35] and the results in the Constant Posture procedure of Experiment 6 constitute a replication of those previous findings. The experiment began with warm-up trials (with the child in the posture assigned for the initial trials). The warm-up trials were designed to familiarize the child with the testing procedure used at the end of the session, and to familiarize the child with interacting with the experimenter. The experimenter presented the child with two familiar objects (e.g., banana and kitty) and told the child the names (“See this banana? Look, here is a kitty”). The experimenter then put the objects in the test container and told the child to get one object, e.g., “Get the banana.” Correct choices were cheered and incorrect choices were corrected. This was repeated until the child correctly indicated the requested object on 3 consecutive trials. Phase 1 of the Experiment with the novel objects began immediately. The target object was presented first, either 25 cm to the right or 25 cm to the left of midline (counterbalanced across children). The experimenter held the object up to the right or left of midline, saying “Look at this. See this” for 5s while performing an action with it (shaking the cube or opening and closing the toy garlic press), and then placing it on the table at the side it was presented on, following an imaginary line approximately 25 cm off midline so that the child looked and reached to the object on that side. After the child examined the object for approximately 5s, the experimenter took it back, again moving it along the same imaginary line on which it had been presented. The experimenter then presented the distractor object 25 cm off midline on the opposite side and repeated the procedure 3 more times, making a total of 4 presentations for each object. The target and distractor objects were always presented on opposite sides, but at a consistent side for each object (e.g., for one child, the cube was presented always on the left, and the garlic press always on the right). For Phase 2, Naming, the experimenter placed the target and distracter objects in separate buckets, out of view of the child. The buckets, with the toys inside, were placed on the table, one 25 cm to the right of midline, the other 25 cm to the left such that the bucket containing the target was on the same side as the presentation of the target during the familiarization phase. The bucket containing the distracter object was on the opposite side, again remaining consistent with where the side the distractor object was presented on during the Naming phase. Next the experimenter tapped the bucket with the target object, and said “modi” three times, while looking straight into the child’s eyes. Following the naming event, for the Object re-presentation phase, the Experimenter gave the child first the distractor, with no naming, following the procedure in Phase 1, and then the target. This imposes a delay of about 10–15 sec between the naming event and the testing. The test trials were structured identically to the warm-up trials but without feedback. Both objects were placed in the same transparent container at midline, with no consistent spatial arrangement, and the child was asked to “Get the modi.” During this period, the experimenter maintained her gaze directly at the child’s eyes. Four test trials with the target and distractor were alternated with 4 control trials in which children were asked to select between pairs of familiar toys previously seen in the warm-up. These were included to maintain interest, to break up the requests for the target, and to insure children understood the task. Children who failed to select the correct objects on 3 of 4 control trials were not included in the final sample. The task took approximately 15 minutes. During the procedure, one video camera was focused on the child and parent. Performances were scored offline from the video. The procedure in Experiments 8 (interference task) and 9 (interference task with posture change), was identical except for Phase 2, the Naming event. Here the target object was placed in full view on the table 25 cm to the right or left of midline and on the opposite side from its location during Phase 1. The experimenter looked at the object and said “Modi. A Modi. Modi.” After the naming moment, the objects were then re-presented to the child at the same locations as in Phase 1.

Acknowledgments This work has been supported by the EU FP7 projects ITALK (no. 214668) and Poeticon++ (no. 288382), NIH grant R21HD068475 and NSF GRFP.

Author Contributions Conceived and designed the experiments: AFM VLB TB AC LBS. Performed the experiments: AFM VLB. Analyzed the data: AFM VLB LBS. Contributed reagents/materials/analysis tools: AC LBS. Wrote the paper: AFM VLB LBS.