Abstract Echolocating bats rely on active sound emission (echolocation) for mapping novel environments and navigating through them. Many theoretical frameworks have been suggested to explain how they do so, but few attempts have been made to build an actual robot that mimics their abilities. Here, we present the ‘Robat’—a fully autonomous bat-like terrestrial robot that relies on echolocation to move through a novel environment while mapping it solely based on sound. Using the echoes reflected from the environment, the Robat delineates the borders of objects it encounters, and classifies them using an artificial neural-network, thus creating a rich map of its environment. Unlike most previous attempts to apply sonar in robotics, we focus on a biological bat-like approach, which relies on a single emitter and two ears, and we apply a biological plausible signal processing approach to extract information about objects’ position and identity.

Author summary Many animals are able of mapping a new environment even while moving through it for the first time. Bats can do this by emitting sound and extracting information from the echoes reflected from objects in their surroundings. In this study, we mimicked this ability by developing a robot that emits sound like a bat and analyzes the returning echoes to generate a map of space. Our Robat had an ultrasonic speaker mimicking the bat’s mouth and two ultrasonic microphones mimicking its ears. It moved autonomously through novel out-doors environments and mapped them using sound only. It was able to negotiate obstacles and move around them, to avoid dead-ends and even to recognize if the object in front of it is a plant or not. We show the great potential of using sound for future robotic applications.

Citation: Eliakim I, Cohen Z, Kosa G, Yovel Y (2018) A fully autonomous terrestrial bat-like acoustic robot. PLoS Comput Biol 14(9): e1006406. https://doi.org/10.1371/journal.pcbi.1006406 Editor: Joseph Ayers, Northeastern University, UNITED STATES Received: March 27, 2018; Accepted: July 29, 2018; Published: September 6, 2018 Copyright: © 2018 Eliakim et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Data Availability: All relevant data are within the paper and its Supporting Information files. Funding: This project was funded by ONRG grant no. N62909-13-1-N066 and by the Israeli ministry of science technology and space, grant no. 3-11874. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Competing interests: The authors have declared that no competing interests exist.

Introduction The growing use of autonomous robots emphasizes the need for new sensory approaches to facilitate tasks such as obstacle avoidance, object recognition and path planning. One of the most challenging tasks, faced by many robots, is the problem of generating a map of an unknown environment, while simultaneously navigating through this environment for the first time [1]. This problem, is routinely solved by echolocating bats that perceive their surroundings acoustically (other animals also solve this task on a daily basis using a range of sensory modalities) [2]. By emitting sound signals and analyzing the returning echoes, bats can orient through a new environment and probably also map it [3] [4]. Inspired by this ability, we present the ‘Robat’—a fully autonomous terrestrial robot that solely relies on bat-like SONAR to orient through a novel environment and map it. Using a biologically plausible system with two receivers (ears) and a single emitter(mouth) which produced frequency modulated (FM) chirps at a typical bat rate, the Robat managed to move through a large out-doors novel environment and map it in real-time. There have been many attempts to use airborne sonar for mapping the environment and moving through it using non-biological approaches; for example by using an array of multiple narrow-band speakers [5, 6] [7] and or multiple microphones [8]. These studies proved, that by using multiple emitters, or by carefully scanning the environment with a sonar beam, as if it were a laser, one can map the environment acoustically, but these approaches are very far from the biological solution [9]. A bat emits relatively few sonar emissions towards an object, and it must rely on two receivers only (its ears) in order to extract spatial information from its very wide bio-sonar beam which can reach 60 degrees (6 dB double side drop in amplitude [10] [11] [12]). Unlike the narrow-band signals typically used in robotic applications, the bat’s wide-band signals provide ample spatial information allowing it to localize multiple reflectors within a single beam. This is the approach we aimed to test and mimic in this study. Numerous studies have shown that echoes generated by emitting bat-like sonar signals contain spatial information that can be exploited for localization and identification of objects [13] [14] [15] [16] [17] [18]. Several previous attempts have been made to model and mimic bats’ spatial abilities of localization and mapping [19] [20]. One of the most comprehensive attempts to use a biological approach to map the environment was ‘BatSLAM’ [21], which relied on mammalian brain-like computation for simultaneous localization and mapping of a novel environment using biomimetic sonar. Using a biological representation of the data (the cochleogram) the BatSLAM algorithm generated topological maps in which the nodes represent unique places in the environment and the edges represent the robot’s displacements between them. The approach of recognizing a location based on its unique acoustic signature was further broadened by Vanderelst et al. [6] who classified a wide range of natural scenes based on their acoustic statistics, once again, without extraction of their spatial characteristics. Vanderelst et al. limited the information extracted from the echoes to the acoustic resolution available to a bat, and they were still successful in achieving useful scene recognition. Our work differs from these former studies in two important respects: (1) Our Robat moved through the environment autonomously while the previous robots were driven by the user. (2) We mapped the 2D structure of the environment, while they mapped the position of the robot in the environment. Namely, in our approach the outline of the objects that were encountered by the Robat were delineated so that paths (free of obstacles) were revealed for future use. In these previous studies, objects in the environment were mapped as locations with a unique acoustic representation so that when encountered again, the agent could localize itself on the acoustic-map, but no spatial information about objects’ size or orientation was extracted. When moving autonomously, such information is essential for movement planning. In addition to mapping, our Robat had to autonomously move through the environment while avoiding obstacles. Some previous attempts were made to model orientation and obstacle avoidance using a biological echolocation-based approach. For example, Vanderelst et al. [9], suggested a simple sensorimotor approach for obstacle avoidance based on turning away from the louder of the two echoes received by the ears. They showed that a simulated agent can move through a novel environment without any mapping of the positions or borders of the objects within it. This approach might be beneficial when an animal wants to move fast through the environment without an intention of returning to specific locations within it, but if the animal needs to find its way back to some point in this environment (e.g., to its roost), or to plan its movement to a specific location, some mapping must be performed. For example, the robust low-level sensorimotor heuristic presented in [9] could be combined with higher level mapping algorithms (e.g., [22]). To our best knowledge, our Robat is the first fully autonomous bat-like biologically plausible robot that moves through a novel environment while mapping it solely based on echo information—delineating the borders of objects and the free paths between them and recognizing their type.

Discussion In this study, we managed to build an autonomous robot that moves through a novel environment and maps it acoustically using bat-like Bio-sonar. We achieved high mapping accuracy, despite our simple approach, proving the great potential of using active wide-band sound emissions to map the environment. We created a (2D) topographic map which would allow us to plan future movements through the environment (and not a topological map). The statistical approach presented in [9] is therefore complementary to ours, allowing classifying specific locations based on their echoes. For example, when navigating back to a specific location using the map created by the Robat, their approach could be used to validate the arrival at the desired location and also to help adjust the map to improve its accuracy. The Robat was much slower than a real bat, stopping for ca. 30 seconds every 0.5m to acquire echoes. This slowness was however, merely a result of the mechanical limitations of our system and mainly the gimbal that was slow. Using a speaker with a wider beam (that eliminates the need to turn at each location) would allow the Robat to acquire echoes on the move, while moving as fast as a bat. Importantly, despite our stopping for echo recording, the acoustic information we acquired did not differ from that received by a bat, except for the fact that a bat’s echoes would also be slightly Doppler-shifted (but this would probably not affect any of our results). In some respects, our processing was not fully bat-like. We used a sampling rate of 250kHz, which is higher than the theoretical time precision of the auditory system [34]. Bats and other small mammals have been shown to estimate azimuth with an accuracy of <10degrees (the exact accuracy depends on the azimuth, (e.g. [35, 36])). This accuracy accounts for an inter-aural time difference of <10μs which is in accordance with our sampling rate (sampling at 250kHz is equivalent to an error of ~5μs when estimating time differences between two ears). Therefore, even if our computation was different from that of a bat (which does not cross-correlate two highly sampled time signals) the overall accuracy allowed by our approach was not better than that of a bat. Moreover, due to the inflation and interpolation method that we used in order to delineate the borders of the objects, the effective accuracy of our mapping was much lower than that allowed by this high sampling rate, and probably much lower than that available to bats [31, 32]. Therefore, we hypothesize that using an auditory preprocessing model like that used in Batslam for example [21] would probably not change our results dramatically. Another advantage that we had over real bats was the relatively large distance between the two ears which were spaced 7cm apart—ca. two times more than in a large bat. This probably allowed more accurate azimuth estimations, but once again, we hypothesize that because of the use of inflation, this did not improve our performance dramatically. Importantly, we managed to extract information about multiple objects within a single sonar beam. On average, in each echo that contained reflections (some echoes did not) we detected 4.1 objects positioned in a range of azimuths between -50—50 degrees. Another important difference between the Robat and an actual bat is the lack of an external ear in the Robat. The angle-dependent frequency response of the external ear allows bats (and other animals) to gain information about the location of a sound source in three dimensions. Because we relied on temporal information for object localization, we used a first approximation of an ear. Adding a structure mimicking the external ear could have further improved our localization performance and it would be essential in order to expand our mapping to 3D. In order to better mimic the bat’s beam, we used three beams (directed 60 degrees apart), but this made our task easier than a bat’s because we could analyze the echoes returning from each direction separately. We therefore also tested an approach in which we sum the three echoes collected (with different headings) at each acquisition point, thus mimicking a wider beam. Even with this degraded data, we were able to map the environment with a decent accuracy of 1.14 ± 0.70 [m] (mean + STD, S6b Fig), an accuracy that would allow future planning of trajectories while avoiding obstacles on the way. In some respects our approach was probably much more simplistic than a bat. For example, the obstacle avoidance algorithm was very simple and a better approach would probably use control-theory to steer the Robat around obstacles [37]. In terms of mission priority, we used serial processing where the Robat first processes new incoming sensory information; it then performs the urgent low-level task of obstacle avoidance and path planning, and only every several acquisitions, it performs the high-level process of map integration. There is much evidence that the mammalian brain also performs sensory tasks sequentially (e.g., [38]) but it would be interesting to test some procedures for parallel processing in the future. In addition to mapping the positions of objects in the environment, a complete map should also include information about the objects such as their type or identity. To show that such information is available in the echoes, we developed a classifier that can categorize objects based on their echo. We hypothesize that the medium classification performance that we achieved (68%) was a result of our choice of categories. We trained the classifier to distinguish between plant and non-plant objects but these are not always two well distinct groups. For example, the echo of an artificial object such as a fence will have vegetation-like acoustic features and indeed most of the classifier’s mistakes were recognition of non-plants as plants. Bats might thus divide their world of objects differently, perhaps to diffusive vs. glint-reflecting objects. Altogether, we show how a rather simple signal processing approach allows to autonomously move and map a new environment based on acoustic information. Our work thus proves the great potential of using acoustic echoes to map and navigate, a potential that is translated into action by echolocating bats on a daily basis.

Materials and methods Acquisition The Robat was based on the ‘Komodo’ robotic platform (Robotican, Israel). The Bio-sonar sensor was mounted on a DJI Ronin gimbal which allowed turning the sensing unit relatively to the base of the robot in a stable manner. The sensing unit included an ultrasonic speaker acting as the bat’s mouth (VIFA XT25SC90-04) and 2 ultrasonic microphones acting as the bat’s ears spaced 7cm apart (Avisoft-Bioacoustics CM16/CMPA40-5V Condenser). The speaker and the microphones were connected to A/D and D/A converters which were based on the USB-1608GX-2AO NI DAQ board, sampling at 250KS/s at each ear. The emitted signal was a 10ms FM chirp sweeping between 100-20kHz. It was amplified using a Sony Amplifier (XM-GS4). An uEye RGB camera, was used for image collection for validation purposes only. Three 2.4GHz/5.8GHz antennae were mounted at the rear of the Robat for wireless communication between the Robat and a stationary station. This allowed viewing the map created by the Robat in real time, but importantly, all calculations and decisions were performed on the Robat itself. Mapping While moving, the Robat stopped every 0.5m (based on its odometry measurements) and the sonar system (emitter and receivers) was rotated to three different headings [0,60,-60 degrees] relative to the direction of movement, a sound signal (see above) was emitted, and echoes were recorded. Each recording was 0.035 sec long, equivalent to a range of ca. 6 meters (farther objects were thus ignored at each emission). The signal-to-echo delay time and the time of arrival differences of the echoes to the two ears (i.e., the Interaural Time Difference) were used together in order to map the environment. To this end, the received signals were cross-correlated with the theoretical emitted signal. The cross-correlated signal was normalized relative to the maximum value of the recording, and a peak detection function was used to find peaks of interest (python peakutil with a minimal peak distance of 0.002 sec, and a min amplitude of 0.3.). To match peaks arriving at the right and left ears, for each peak detected in one ear, an equivalent peak was searched for in the other ear within a window of +/- 0.001 sec. If a peak was found, the Pearson correlation was used to determine if the two echoes were reflections of the same object. For this purpose, a segment of 0.01 seconds around each peak was cut and the correlation between the two time signals (one from each ear) was computed. Only correlations higher than 0.9 were accepted. This threshold was conservative thus potentially resulting in missing of objects, but it reduced the localization of artifact non-existent objects. Because the Robat emitted very 0.5 m—there was much overlap between echoes of consecutive emissions. We were therefore likely to detect an object several times, so a conservative approach was chosen. In addition to its position, each object on the map was defined by three parameters: “C |T |P”, where C is the Pearson correlation coefficient between the left and right ears for the specific point, T is the object’s type based on its acoustic classification—either artificial or a plant, and P is the classification probability (see more below about the classification process). Results in the in-doors controlled environment showed that using two ears, the mean error in distance estimation was 1.3 ± 2.1 [cm] (mean + STD, S3 Fig) and the mean azimuth estimation error was 1.2 ± 0.7 [degrees] (mean+STD, S3 Fig). Importantly, these are the results for a single reflector, so accuracy in the real environment where many reflections are received at each point will be lower. Every 5 Robat-steps, newly localized objects were integrated into the map that was created so far. This was done using an Iterative-Object-Inflation algorithm, which inflated points into squares and connected them. To this end, the entire area around the Robat was divided into a grid with 2000x2000 pixels (5x5cm2 each). Each detected object was placed in the corresponding pixel on the map and was inflated to an area of 20x20 pixels around its center (i.e., 1x1 m2, S4 Fig). This procedure creates a binary map with 1’s depicting objects and 0’s depicting a free path. Pixels along the trajectory that the Robat previously moved through always received the value 0 depicting an open path (even if they were within the 20x20 window of a detected object). Movement and obstacle avoidance We chose a very simple obstacle avoidance approach also known as the ‘bug algorithm’ [39]. During the exploration process, the Robat moved forward in steps of 0.5m between consecutive acquisition points. When detecting an obstacle less than 1.2m in front of it, the Robat turned 90 degrees towards the right, and performed a 1m step towards the right (after checking that there is no obstacle ahead). After performing a 1m step to the right, the Robat turned 90 degrees to the left and acquired an echo. If no obstacle was detected (meaning that the obstacle has been passed) the Robat continued straight (i.e., in its previous direction before turning right). If the way was still blocked (i.e., the obstacle was not passed), the Robat turned again to the right and kept moving towards the right (90 degrees relative to its original direction). Summing echoes from all three headings In order to better mimic the bat, that has a beam much wider than Robat’s beam, we examine an approach of summing the echoes returning from the three different headings (mentioned above) into one superposition echo, and then running the same (detection, localization and mapping) algorithms as described above. Evaluation of the mapping accuracy In order to examine the acoustic map generated by the Robat, inspired by [40], we collected aerial images using a drone (DJI Phantom 4, DJI), to construct a complete ground truth map of the area. This procedure was only performed for the large palm greenhouse (40x5 m2). The contour of the objects on both sides of the trail in the greenhouse was extracted and compared to the contour of the inflated map that was acoustically reconstructed by the Robat (both contours were marked manually). Each of the two contours was fit by a 55-coefficient order polynomial function which was then sampled at 500 points to get a high resolution description of the contour. The two contours (real and Robat-estimated) were compared by calculating the root-mean-square distance between them (the average over these 500 points, S5 Fig). Classification Acoustic based object classification was performed using a neural-network that was trained on a binary task—classifying whether and object was a plant or not. Only objects that were located closer than 3[m] from the sensing unit were classified. 0.035 s long echoes were used from both the right and left ear. These recordings were passed through three band pass filters, without the transmitted echo, (20-40kHz, 40-60kHz and 60-100kHz). Each echo was represented by 6 signals—3 filters x two ears. Next, a set of 21 acoustic features (Table 2) were extracted from each band-passed recording following T. Giannakopoulos [41]. Each echo was divided into seven windows equally spaced with an overlap of 40ms and the 21 features were extracted for each window generating a total of 147 dimensions per signal (21 features x 23 windows). The classifier was thus fed with 6 signals (483 dimensions each) and the decision of the majority of the six classifiers was used. PPT PowerPoint slide

PowerPoint slide PNG larger image

larger image TIFF original image Download: Table 2. Classification features. https://doi.org/10.1371/journal.pcbi.1006406.t002 The data was fed into a neural network with the following architecture: Input layer—483 elements First layer—105 elements with an RELU activation function Dropout—0.5 Second Layer—50 elements with an RELU activation function Third Layer—6 elements with an RELU activation function Output Layer—1 element with a sigmoid activation function We used Python’s TensorFlow to construct and train a three-layer neural-network (using the Keras directory). The training sets included 788 plant examples and 628 non-plant examples collected on several sites on campus. We used the camera that was on the Robat to label the echoes. Finally, to assess the statistical significance of our classification, we ran 100 permutations in which we assigned the training data randomly into the two classes (plants and non-plants), trained a classifier for each permutation and tested it on the same test-data. We also tested several additional classification methods before choosing the neural-network. We tested a KNN (K nearest neighbors) classifier with five different distance measurements: Mahalanobis, Euclidean, Correlation, Minkowski and Canberra. We also tested two additional approaches for dimensionality reduction (before using the KNN) including PCA and LDA. In addition, we also tested a linear SVM classifier. For all classifiers, we used the same input features (see above). The results were similar for most classifiers, but the neural network performed slightly better than the other (S8 Fig).

Acknowledgments Thanks to Manna Food Security fellowship supported I.Eliakim in this research, and for Kineret Manevich, Yuval Sapir from botanical garden, Tel Aviv University for helping with the outdoor experiments. Thanks to Aya Goldstein for aerial footage of the greenhouse.