In this section, we present the simulation experiments for the two case studies. Sections 4.1 and 4.2, respectively, describe the simulation platform and setups. Sections 4.3 and 4.4, respectively, analyze the inferred models and classifiers. Section 4.5 compares Turing Learning with a metric-based identification method and mathematically analyzes this latter method. Section 4.6 presents further results of testing the generality of Turing Learning through exploring different scenarios, which include: (i) simultaneously inferring the control of the agent and an aspect of its morphology; (ii) using artificial neural networks as a model representation, thereby removing the assumption of a known agent control system structure; (iii) separating the replicas and the agents, thereby allowing for a potentially simpler experimental setup; and (iv) inferring arbitrary reactive behaviors.

Simulation platform

We use the open-source Enki library (Magnenat et al. 2011), which models the kinematics and dynamics of rigid objects, and handles collisions. Enki has a built-in 2D model of the e-puck. The robot is represented as a disk of diameter \(7.0\hbox { cm}\) and mass \(150\hbox { g}\). The inter-wheel distance is \(5.1\hbox { cm}\). The speed of each wheel can be set independently. Enki induces noise on each wheel speed by multiplying the set value by a number in the range (0.95, 1.05) chosen randomly with uniform distribution. The maximum speed of the e-puck is \(12.8{\hbox { cm/s}}\), forwards or backwards. The line-of-sight sensor is simulated by casting a ray from the e-puck’s front and checking the first item with which it intersects (if any). The range of this sensor is unlimited in simulation.

In the object clustering case study, we model objects as disks of diameter \(10\hbox { cm}\) with mass \(35\hbox { g}\) and a coefficient of static friction with the ground of 0.58, which makes it movable by a single e-puck.

The robot’s control cycle is updated every \(0.1\hbox { s}\), and the physics is updated every \(0.01\hbox { s}\).

Simulation setups

In all simulations, we use an unbounded environment. For the aggregation case study, we use groups of 11 individuals—10 agents and 1 replica that executes a model. The initial positions of individuals are generated randomly in a square region of sides \(331.66\hbox { cm}\), following a uniform distribution (average area per individual = \(10000\hbox { cm}^2\)). For the object clustering case study, we use groups of 5 individuals—4 agents and 1 replica that executes a model—and 10 cylindrical objects. The initial positions of individuals and objects are generated randomly in a square region of sides \(100\hbox { cm}\), following a uniform distribution (average area per object = \(1000\hbox { cm}^2\)). In both case studies, individual starting orientations are chosen randomly in \([-\pi ,\pi )\) with uniform distribution.

We performed 30 runs of Turing Learning for each case study. Each run lasted 1000 generations. The model and classifier populations each consisted of 100 solutions (\(\mu = 50\), \(\lambda = 50\)). In each trial, classifiers observed individuals for \(10\hbox { s}\) at \(0.1\hbox { s}\) intervals (100 data points). In both setups, the total number of samples for the agents in each generation was equal to \(n_t \times n_a\), where \(n_t\) is the number of trials performed (one per model) and \(n_a\) is the number of agents in each trial.

Analysis of inferred models

In order to objectively measure the quality of the models obtained through Turing Learning, we define two metrics. Given a candidate model (candidate controller) \({\mathbf {x}}\) and the agent (original controller) \({\mathbf {p}}\), where \({\mathbf {x}}\in {\mathbb {R}}^{2n}\) and \({\mathbf {p}}\in [-1,1]^{2n}\), we define the absolute error (AE) in a particular parameter \(i\in \{1,2,\dots ,2n\}\) as:

$$\begin{aligned} {\mathrm {AE}}_i = \left| x_i-p_i\right| . \end{aligned}$$ (9)

We define the mean absolute error (MAE) over all parameters as:

$$\begin{aligned} {\mathrm {MAE}} = \frac{1}{2n}\sum _{i=1}^{2n} {\mathrm {AE}}_i. \end{aligned}$$ (10)

Fig. 4 Model parameters Turing Learning inferred from swarms of simulated agents performing (a) aggregation and (b) object clustering. Each box corresponds to the models with the highest subjective fitness in the 1000th generation of 30 runs. The dashed black lines correspond to the values of the parameters that the system is expected to learn (i.e., those of the agent) (Color figure online) Full size image

Figure 4 shows a box plotFootnote 4 of the parameters of the inferred models with the highest subjective fitness value in the final generation. It can be seen that Turing Learning identifies the parameters for both behaviors with good accuracy (dashed black lines represent the ground truth, that is, the parameters of the observed swarming agents). In the case of aggregation, the means (standard deviations) of the AEs in the parameters are (from left to right in Fig. 4a): 0.01 (0.01), 0.01 (0.01), 0.07 (0.07), and 0.06 (0.04). In the case of object clustering, these values are as follows: 0.03 (0.03), 0.04 (0.03), 0.02 (0.02), 0.03 (0.03), 0.08 (0.13), and 0.08 (0.09).

Fig. 5 Evolutionary dynamics of model parameters for the (a) aggregation and (b) object clustering case studies. Curves represent median parameter values of the models with the highest subjective fitness across 30 runs of Turing Learning. Dashed black lines indicate the ground truth (Color figure online) Full size image

We also investigate the evolutionary dynamics. Figure 5 shows how the model parameters converge over generations. In the aggregation case study (see Fig. 5a), the parameters corresponding to \(I=0\) are learned first. After around 50 generations, both \(v_{\ell 0}\) and \(v_{r0}\) closely approximate their true values (\(-0.7\) and \(-1.0\)). For \(I=1\), it takes about 200 generations for both \(v_{\ell 1}\) and \(v_{r1}\) to converge. A likely reason for this effect is that an agent spends a larger proportion of its time seeing nothing \((I=0)\) than seeing other agents \((I=1)\)—simulations revealed these percentages to be 91.2 and \(8.8\,\%\) respectively (mean values over 100 trials).

In the object clustering case study (see Fig. 5b), the parameters corresponding to \(I=0\) and \(I=1\) are learned faster than the parameters corresponding to \(I=2\). After about 200 generations, \(v_{\ell 0}\), \(v_{r0}\), \(v_{\ell 1}\), and \(v_{r1}\) start to converge; however, it takes about 400 generations for \(v_{\ell 2}\) and \(v_{r2}\) to approximate their true values. Note that an agent spends the highest proportion of its time seeing nothing \((I=0)\), followed by seeing objects \((I=1)\) and seeing other agents \((I=2)\)—simulations revealed these proportions to be 53.2, 34.2, and \(12.6\,\%\), respectively (mean values over 100 trials).

Fig. 6 a Dispersion of 50 simulated agents (red box) or replicas (blue boxes), executing one of the 30 inferred models in the aggregation case study. b Dispersion of 50 objects when using a swarm of 25 simulated agents (red box) or replicas (blue boxes), executing one of the 30 inferred models in the object clustering case study. In both (a) and (b), boxes show the distributions obtained after \(400\hbox { s}\) over 30 trials. The models are from the 1000th generation. The dashed black lines indicate the minimum dispersion that 50 individuals/objects can possibly achieve (Graham and Sloane 1990). See Sect. 4.3 for details (Color figure online) Full size image

Although the inferred models approximate the agents well in terms of parameters, it is not uncommon in swarm systems that small changes in individual behavior lead to vastly different emergent behaviors, especially when using large numbers of agents (Levi and Kernbach 2010). For this reason, we evaluate the quality of the emergent behaviors that the models give rise to. In the case of aggregation, we measure dispersion of the swarm after some elapsed time as defined in Gauci et al. (2014c).Footnote 5 For each of the 30 models with the highest subjective fitness in the final generation, we performed 30 trials with 50 replicas executing the model. For comparison, we also performed 30 trials using the agent [see Eq. (6)]. The set of initial configurations was the same for the replicas and the agents. Figure 6a shows the dispersion of agents and replicas after \(400\hbox { s}\). All models led to aggregation. We performed a statistical testFootnote 6 on the final dispersion of the individuals between the agents and replicas for each model. There is no statistically significant difference in 30 out of 30 cases (tests with Bonferroni correction).

In the case of object clustering, we use the dispersion of the objects after \(400\hbox { s}\) as a measure of the emergent behavior. We performed 30 trials with 25 individuals and 50 objects for the agent and each model. The results are shown in Fig. 6b. In the final dispersion of objects by the agent or any of the models (replicas), there is no statistically significant difference in 26 out of 30 cases (tests with Bonferroni correction).

Analysis of generated classifiers

The primary outcome of the Turing Learning method (and of any system identification method) is the model, which has been discussed in the previous section. However, the generated classifiers can also be considered as a useful by-product. For instance they could be used to detect abnormal agents in a swarm. We now analyze the performance of the classifiers. For the remainder of this paper, we consider only the aggregation case study.

To assess the performance of the classifiers, we measure the percentage of correct judgments over agents and a wide range of models. The models are uniformly distributed over the four-dimensional parameter space of the agents [see Eqs. (5) and (6)]; each element takes a value in \([-1,1]\). To keep the analysis of classifiers within a reasonable computation time, we discretize this space using 11 settings per parameter; thus, each parameter is chosen from the set \(\{-1.0, -0.8, \ldots , 0.8, 1.0\}\). This discretized space is a grid consisting of \(|{\mathcal {X}}|=11^4=14641\) points (i.e., models). The model is executed by a replica mixed into a group of 10 agents (as in the Turing Learning runs). The motion data are fed to each classifier, which makes 10 judgments per individual. If the classifier consistently judges the individual as a model (i.e., not an agent) in 10 out of 10 trials, it outputs a “model” decision. Otherwise, it outputs “agent”. This conservative approach is used to minimize the risk of false-positive detection of abnormal behavior. The classifier’s performance (i.e., decision accuracy) is computed by combining the percentage of correct judgments about models \((50\,\%\hbox { weight})\) with the percentage of correct judgments about agents \((50\,\%\hbox { weight})\), analogous to the solution quality definition in Eq. (2).

Fig. 7 Average decision accuracy of the best classifiers over 1000 generations (nonlinear scale) in 30 runs of Turing Learning. The error bars show standard deviations. See text for details Full size image

We performed 10 trials using a set of initial configurations common to all classifiers. Figure 7 shows the average decision accuracy of the classifier with the highest subjective fitness during the evolution (best classifier (subjective)) in 30 runs of Turing Learning. The accuracy of the classifier increases in the first 5 generations, then drops, and fluctuates within range 62–\(80\,\%\). For a comparison, we also plot the highest decision accuracy that a single classifier achieved during the post-evaluation for each generation. This classifier is referred to best classifier (objective). Interestingly, the accuracy of the best classifier (objective) increases almost monotonically, reaching a level above \(95\,\%\). To select the best classifier (objective), all the classifiers were post-evaluated using the aforementioned 14641 models.

At first sight, it seems counterintuitive that the best classifier (subjective) has a low decision accuracy. This phenomenon, however, can be explained when considering the model population. We have shown in the previous section (see Fig. 5a) that the models converge rapidly at the beginning of the coevolutions. As a result, when classifiers are evaluated in later generations, the trials are likely to include models very similar to each other. Classifiers that become overspecialized to this small set of models (the ones dominating the later generations) have a higher chance of being selected during the evolutionary process. These classifiers may however have a low performance when evaluated across the entire model space.

Note that our analysis does not exclude the potential existence of models for which the performance of the classifiers degenerates substantially. As reported by Nguyen et al. (2015), well-trained classifiers, which in their case are represented by deep neural networks, can be easily fooled. For instance, the classifiers may label a random-looking image as a guitar with high confidence. However, in this degenerate case, the image was obtained through evolutionary learning, while the classifiers remained static. By contrast, in Turing Learning, the classifiers are coevolving with the models, and hence have the opportunity to adapt to such a situation.

A metric-based system identification method: mathematical analysis and comparison with Turing Learning

In order to compare Turing Learning against a metric-based method, we employ the commonly used least-square approach. The objective is to minimize the differences between the observed outputs of the agents and of the models, respectively. Two outputs are considered—an individual’s linear and angular speed. Both outputs are considered over the whole duration of a trial. Formally,

$$\begin{aligned} {e_{m} = \sum \limits _{i=1}^{n_a} \sum \limits _{t=1}^{T} \left\{ \left( s_m^{(t)} - s_i^{(t)}\right) ^2 + \left( \omega _m^{(t)} - \omega _i^{(t)}\right) ^2 \right\} ,} \end{aligned}$$ (11)

where \(s_m^{(t)}\) and \(s_i^{(t)}\) are the linear speed of the model and of agent i, respectively, at time step t; \(\omega _m^{(t)}\) and \(\omega _i^{(t)}\) are the angular speed of the model and of the agent i, respectively, at time step t; \(n_a\) is the number of agents in the group; T is the total number of time steps in a trial.

Mathematical analysis

We begin our analysis by first analyzing an abstract version of the problem.

Theorem 1

Consider two binary random variables X and Y. Variable X takes value \(x_1\) with probability p, and value \(x_2\), otherwise. Variable Y takes value \(y_1\) with the same probability p, and value \(y_2\), otherwise. Variables X and Y are assumed to be independent of each other. Assuming \(y_1\) and \(y_2\) are given, then the metric \(D={\mathbb {E}}\{(X-Y)^2\}\) has a global minimum at \(X^*\) with \(x_1^*=x_2^*={\mathbb {E}}\{Y\}\). If \(p\in (0,1)\), the solution is unique.

Proof

The probability of (i) both \(x_1\) and \(y_1\) being observed is \(p^2\); (ii) both \(x_1\) and \(y_2\) being observed is \(p (1 - p)\); (iii) both \(x_2\) and \(y_1\) being observed is \((1 - p) p\); (iv) both \(x_2\) and \(y_2\) being observed is \((1-p)^2\). The expected error value, D, is then given as

$$\begin{aligned} D = p^2\left( x_1 - y_1\right) ^2 + p (1 - p)\left( x_1 - y_2\right) ^2 + (1-p) p \left( x_2 - y_1\right) ^2 + (1-p)^2\left( x_2 - y_2\right) ^2. \end{aligned}$$ (12)

To find the minimum expected error value, we set the partial derivatives w.r.t. \(x_1\) and \(x_2\) to 0. For \(x_1\), we have:

$$\begin{aligned} \frac{\partial D}{\partial x_1} = 2p^{2}\left( x_1 - y_1\right) + 2 p (1 - p) \left( x_1 - y_2\right) = 0, \end{aligned}$$ (13)

from which we obtain \(x^*_1 = p y_1 + (1-p) y_2={\mathbb {E}}\{Y\}\). Similarly, setting \(\frac{\partial D}{\partial x_2} = 0\), we obtain \(x^*_2 = p y_1 + (1-p) y_2={\mathbb {E}}\{Y\}\). Note that at these values of \(x_1\) and \(x_2\), the second-order partial derivatives are both positive [assuming \(p\in (0,1)\)]. Therefore, the (global) minimum of D is at this stationary point. \(\square \)

Corollary 1

If \(p\in (0,1)\) and \(y_1

e y_2\), then \(X^*

e Y\).

Proof

As \(p\in (0,1)\), the only global minimum exists at \(X^*\). As \(x^*_1 = x^*_2\) and \(y_1

e y_2\), it follows that \(X^*

e Y\). \(\square \)

Corollary 2

Consider two discrete random variables X and Y with values \(x_1\), \(x_2, \ldots , x_n\), and \(y_1\), \(y_2, \ldots , y_n\), respectively, \(n>1\). Variable X takes value \(x_i\) with probability \(p_i\) and variable Y takes value \(y_i\) with the same probability \(p_i\), \(i=1,2,\dots ,n\), where \(\sum \limits _{i=1}^{n} p_i = 1\) and \(\exists i,j:y_i

e y_j\). Variables X and Y are assumed to be independent of each other. Metric D has a global minimum at \(X^*

e Y\) with \(x_1^*= x_2^* = \ldots = x_n^*={\mathbb {E}}\{Y\}\). If all \(p_i\in (0,1)\), then \(X^*\) is unique.

Proof

This proof, which is omitted here, can be obtained by examining the first and second derivatives of a generalized version of Eq. (12). Rather than four \((2^2)\) cases, there are \(n^2\) cases to be considered. \(\square \)

Corollary 3

Consider a sequence of pairs of binary random variables (\(X_t\), \(Y_t\)), \(t=1,\ldots ,T\). Variable \(X_t\) takes value \(x_1\) with probability \(p_t\), and value \(x_2\), otherwise. Variable \(Y_t\) takes value \(y_1\) with the same probability \(p_t\), and value \(y_2

e y_1\), otherwise. For all t, variables \(X_t\) and \(Y_t\) are assumed to be independent of each other. If all \(p_t\in (0,1)\), then the metric \(D={\mathbb {E}}\left\{ \sum _{t=1}^T(X_t-Y_t)^2\right\} \) has one global minimum at \(X^*

e Y\).

Proof

The case \(T=1\) has already been considered (Theorem 1 and Corollary 1). For the case \(T=2\), we extend Eq. (13) to take into account \(p_1\) and \(p_2\), and obtain

$$\begin{aligned} x_1\left( p_1^2+p_1-p_1^2+p_2^2+p_2-p_2^2\right) = y_1\left( p_1^2+p_2^2\right) +y_2\left( p_1-p_1^2+p_2-p_2^2\right) . \end{aligned}$$ (14)

This can be rewritten as:

$$\begin{aligned} x_1=\frac{p_1^2+p_2^2}{p_1+p_2}y_1+\frac{p_1(1-p_1)+p_2(1-p_2)}{p_1+p_2}y_2. \end{aligned}$$ (15)

As \(y_2

e y_1\), \(x_1\) can only be equal to \(y_1\) if \(p_1^2+p_2^2 = p_1 + p_2\), which is equivalent to \(p_1(1-p_1)+p_2(1-p_2) = 0\). This is however not possible for any \(p_1,p_2\in (0,1)\). Therefore, \(X^*

e Y\).Footnote 7

For the general case, \(T\ge 1\), the following equation can be obtained (proof omitted).

$$\begin{aligned} x_1=\frac{\sum _{t=1}^{T}p_t^2}{\sum _{t=1}^{T}p_t}y_1+\frac{\sum _{t=1}^{T}p_t(1-p_t)}{\sum _{t=1}^{T}p_t}y_2. \end{aligned}$$ (16)

The same argument applies—\(x^*_1\) cannot be equal to \(y_1\). Therefore, \(X^*

e Y\).\(\square \)

Implications for our scenario: The metric-based approach considered in this paper is unable to infer the correct behavior of the agent. In particular, the model that is globally optimal w.r.t. the expected value for the error function defined by Eq. (11) is different from the agent. This observation follows from Corollary 1 for the aggregation case study (two sensor states), and from Corollary 2 for the object clustering case study (three sensor states). It exploits the fact that the error function is of the same structure as the metric in Corollary 3—a sum of square error terms. The summation over time is not of concern—as was shown in Corollary 3, the distributions of sensor reading values (inputs) of the agent and of the model do not need to be stationary. However, we need to assume that for any control cycle, the actual inputs of agents and models are not correlated with each other. Note that the sum in Eq. (11) comprises two square error terms: one for the linear speed of the agent, and the other for the angular speed. As our simulated agents employ a differential drive with unconstrained motor speeds, the linear and angular speeds are decoupled. In other words, the linear and angular speeds can be chosen independently of each other, and optimized separately. This means that Eq. (11) can be thought of as two separate error functions: one pertaining to the linear speeds, and the other to the angular speeds.

Fig. 8 Model parameters a metric-based evolutionary method inferred from swarms of simulated agents performing (a) aggregation and (b) object clustering. Each box corresponds to the models with the highest fitness in the 1000th generation of 30 runs. The dashed black lines correspond to the values of the parameters that the system is expected to learn (i.e., those of the agent) (Color figure online) Full size image

Comparison with Turing Learning

To verify whether the theoretical result (and its assumptions) holds in practice, we used an evolutionary algorithm with a single population of models. The algorithm was identical to the model optimization sub-algorithm in Turing Learning except for the fitness calculation, where the metric of Eq. (11) was employed. We performed 30 evolutionary runs for each case study. Each evolutionary run lasted 1000 generations. The simulation setup and number of fitness evaluations for the models were kept the same as in Turing Learning.

Figure 8a shows the parameter distribution of the evolved models with highest fitness in the last generation over 30 runs. The distributions of the evolved parameters corresponding to \(I=0\) and \(I=1\) are similar. This phenomenon can be explained as follows. In the identification problem that we consider, the method has no knowledge of the input, that is, whether the agent perceives another agent \((I=1)\) or not \((I=0)\). This is consistent with Turing Learning as the classifiers that are used to optimize the models also do not have any knowledge of the inputs. The metric-based algorithm seems to construct controllers that do not respond differently to either input, but work as good as it gets on average, that is, for the particular distribution of inputs, 0 and 1. For the left wheel speed, both parameters are approximately \(-0.54\). This is almost identical to the weighted mean \((-0.7*0.912 + 1.0*0.088=-0.5504)\), which takes into account that parameter \(v_{\ell 0}=-0.7\) is observed around 91.2 % of the time, whereas parameter \(v_{\ell 1}=1\) is observed around 8.8 % of the time (see also Sect. 4.3). The parameters related to \(I=1\) evolved well as the agent’s parameters are identical regardless of the input \((v_{r0}=v_{r1}=-1.0)\). For both \(I=0\) and \(I=1\), the evolved parameters show good agreement with Theorem 1. As the model and the agents are only observed for \(10\hbox { s}\) in the simulation trials, the probabilities of seeing a 0 or a 1 are nearly constant throughout the trial. Hence, this scenario approximates very well the conditions of Theorem 1, and the effects of non-stationary probabilities on the optimal point (Corollary 3) are minimal. Similar results were found when inferring the object clustering behavior (see Fig. 8b).

By comparing Figs. 4 and 8, one can see that Turing Learning outperforms the metric-based evolutionary algorithm in terms of model accuracy in both case studies. As argued before, due to the unpredictable interactions in swarms, the traditional metric-based method is not suited for inferring the behaviors.

Generality of Turing Learning

In the following, we present four orthogonal studies testing the generality of Turing Learning. The experimental setup in each section is identical to that described previously (see Sect. 4.2), except for the modifications discussed within each section.

Simultaneously inferring control and morphology

In the previous sections, we assumed that we fully knew the agents’ morphology, and only their behavior (controller) was to be identified. We now present a variation where one aspect of the morphology is also unknown. The replica, in addition to the four controller parameters, takes a parameter \(\theta \in \left[ 0,2\pi \right] \hbox {rad}\), which determines the horizontal field of view of its sensor, as shown in Fig. 9 (the sensor is still binary). Note that the agents’ line-of-sight sensors of the previous sections can be considered as sensors with a field of view of \(0\hbox { rad}\).

Fig. 9 A diagram showing the angle of view of the agent’s sensor investigated in Sect. 4.6.1 Full size image

The models now have five parameters. As before, we let Turing Learning run in an unbounded search space (i.e., now, \({\mathbb {R}}^5\)). However, as \(\theta \) is necessarily bounded, before a model is executed on a replica, the parameter corresponding to \(\theta \) is mapped to the range \((0, 2\pi )\) using an appropriately scaled logistic sigmoid function. The controller parameters are directly passed to the replica. In this setup, the classifiers observe the individuals for \(100\hbox { s}\) in each trial (preliminary results indicated that this setup required a longer observation time).

Fig. 10 Turing Learning simultaneously inferring control and morphological parameters (field of view). The agents’ field of view is (a) \(0\hbox { rad}\) and (b) \(\pi /3\hbox { rad}\). Boxes show distributions for the models with the highest subjective fitness in the 1000th generation over 30 runs. Dashed black lines indicate the ground truth Full size image

Figure 10a shows the parameters of the subjectively best models in the last (1000th) generations of 30 runs. The means (standard deviations) of the AEs in each model parameter are as follows: 0.02 (0.01), 0.02 (0.02), 0.05 (0.07), 0.06 (0.06), and 0.01 (0.01). All parameters including \(\theta \) are still learned with high accuracy.

The case where the true value of \(\theta \) is \(0\hbox { rad}\) is an edge case, because given an arbitrarily small \(\epsilon >0\), the logistic sigmoid function maps an unbounded domain of values onto \((0,\epsilon )\). This makes it simpler for Turing Learning to infer this parameter. For this reason, we also consider another scenario where the agents’ angle of view is \(\pi /3\hbox { rad}\) rather than \(0\hbox { rad}\). The controller parameters for achieving aggregation in this case are different from those in Eq. (6). They were found by rerunning a grid search with the modified sensor. Figure 10b shows the results from 30 runs with this setup. The means (standard deviations) of the AEs in each parameter are as follows: 0.04 (0.04), 0.03 (0.03), 0.05 (0.06), 0.05 (0.05), and 0.20 (0.19). The controller parameters are still learned with good accuracy. The accuracy in the angle of view is noticeably lower, but still reasonable.

Inferring behavior without assuming a known control system structure

In the previous sections, we assumed the agent’s control system structure to be known and only inferred its parameters. To further investigate the generality of Turing Learning, we now represent the model in a more general form, namely a (recurrent) Elman neural network (Elman 1990). The network inputs and outputs are identical to those used for our reactive models. In other words, the Elman network has one input (I) and two outputs representing the left and right wheel speed of the robot. A bias is connected to the input and hidden layers of the network, respectively. We consider three network structures with one, three, and five hidden neurons, which correspond, respectively, to 7, 23, and 47 weights to be optimized. Except for a different number of parameters to be optimized, the experimental setup is equivalent in all aspects to that of Sect. 4.2.

Fig. 11 Turing Learning can infer an agent’s behavior without assuming its control system structure to be known. These plots show the steady-state outputs (in the 20th time step) of the inferred neural networks with the highest subjective fitness in the 1000th generation of 30 simulation runs. Two outliers in (c) are not shown. a One hidden neuron, b Three hidden neurons, c Five hidden neurons Full size image

Fig. 12 Dynamic outputs of the inferred neural network with median performance. The network’s input in each case was \(I=0\) (time steps 1–10), \(I=1\) (time steps 11–20), and \(I = 0\) (time steps 21–30). See text for details. a One hidden neuron, b Three hidden neurons, c Five hidden neurons Full size image

We first analyze the steady-state behavior of the inferred network models. To obtain their steady-state outputs, we fed them with a constant input (\(I=0\) or \(I=1\) depending on the parameters) for 20 time steps. Figure 11 shows the outputs in the final time step of the inferred models with the highest subjective fitness in the last generation in 30 runs for the three cases. In all cases, the parameters of the swarming agent can be inferred correctly with reasonable accuracy. More hidden neurons lead to worse results, probably due to the larger search space.

We now analyze the dynamic behavior of the inferred network models. Figure 12 shows the dynamic output of 1 of the 30 neural networks. The chosen neural network is the one exhibiting the median performance according to metric \(\sum

olimits _{i=1}^{4} \sum

olimits _{t=1}^{20} (o_{it} - p_{i})^2\), where \(p_i\) denotes the \(i\hbox {th}\) parameter in Eq. (6), and \(o_{it}\) denotes the output of the neural network in the \(t\hbox {th}\) time step corresponding to the \(i\hbox {th}\) parameter in Eq. (6). The inferred networks react to the inputs rapidly and maintain a steady-state output (with little disturbance). The results show that Turing Learning can infer the behavior without assuming the agent’s control system structure to be known.

Separating the replicas and the agents

In our two case studies, the replica was mixed into a group of agents. In the context of animal behavior studies, a robot replica may be introduced into a group of animals and recognized as a conspecific (Halloy et al. 2007; Faria et al. 2010). However, if behaving abnormally, the replica may disrupt the behavior of the swarm (Bjerknes and Winfield 2013). For the same reason, the insertion of a replica that exhibits different behavior or is not recognized as conspecific may disrupt the behavior of the swarm and hence the models obtained may be biased. In this case, an alternative method would be to isolate the influence of the replica(s). We performed an additional simulation study where agents and replicas were never mixed. Instead, each trial focused on either a group of agents, or of replicas. All replicas in a trial executed the same model. The group size was identical in both cases. The tracking data of the agents and the replicas from each sample were then fed into the classifiers for making judgments.

Fig. 13 Model parameters inferred by a variant of Turing Learning that observes swarms of aggregating agents and swarms of replicas in isolation, thereby avoiding potential bias. Each box corresponds to the models with the highest subjective fitness in the 1000th generation of 30 simulation runs Full size image

The distribution of the inferred model parameters is shown in Fig. 13. The results show that Turing Learning can still identify the model parameters well. There is no significant difference between either approach in the case studies considered in this paper. The method of separating replicas and agents is recommended if potential biases are suspected.

Inferring other reactive behaviors

The aggregation controller that agents used in our case study was originally synthesized by searching over the parameter space defined in Eq. (5) with \(n=2\), using a metric to assess the swarm’s global performance (Gauci et al. 2014c). Each of these points produces a global behavior. Some of these behaviors are particularly interesting, such as the circle formation behavior reported in Gauci et al. (2014a).

We now investigate whether Turing Learning can infer arbitrary controllers in this space, irrespective of the global behaviors they lead to. We generated 1000 controllers randomly in the parameter space defined in Eq. (5), with uniform distribution. For each controller, we performed one run, and selected the subjectively best model in the last (1000th) generation.

Fig. 14 Turing Learning inferring the models for 1000 randomly generated agent behaviors. For each behavior, one run of Turing Learning was performed and the model with the highest subjective fitness after 1000 generations was considered. a Histogram of the models’ MAE [(defined in Eq. (10); 43 points that have an MAE larger than 0.1 are not shown]; and b AEs [defined in Eq. (9)] for each model parameter Full size image

Figure 14a shows a histogram of the MAE of the inferred models. The distribution has a single mode close to zero and decays rapidly for increasing values. Over \(89\,\%\) of the 1000 cases have an error below 0.05. This suggests that the accuracy of Turing Learning is not highly sensitive to the particular behavior under investigation (i.e., most behaviors are learned equally well). Figure 14b shows the AEs of each model parameter. The means (standard deviations) of the AEs in each parameter are as follows: 0.01 (0.05), 0.02 (0.07), 0.07 (0.6), and 0.05 (0.2). We performed a statistical test on the AEs between the model parameters corresponding to \(I=0\) (\(v_{\ell 0}\) and \(v_{r0}\)) and \(I=1\) (\(v_{\ell 1}\) and \(v_{r1}\)). The AEs of the inferred \(v_{\ell 0}\) and \(v_{r0}\) are significantly lower than those of \(v_{\ell 1}\) and \(v_{r1}\). This is likely due to the reason reported in Sect. 4.3, that is, an agent is likely to spend more time seeing nothing \((I=0)\) than seeing other agents \((I=1)\) in each trial.