The cost of an empirical bit in biophysics has fallen dramatically, and high-precision data are now abundant. However, biological systems are notoriously complex, multiscale, and inhomogeneous, so that we often lack intuition for transforming such measurements into theoretical frameworks. Modern machine learning can be used as an aid. Here we apply our Sir Isaac platform for automatic inference of a model of the escape response behavior in a roundworm directly from time series data. The automatically constructed model is more accurate than that curated manually, is biophysically interpretable, and makes nontrivial predictions about the system.

The roundworm Caenorhabditis elegans exhibits robust escape behavior in response to rapidly rising temperature. The behavior lasts for a few seconds, shows history dependence, involves both sensory and motor systems, and is too complicated to model mechanistically using currently available knowledge. Instead we model the process phenomenologically, and we use the Sir Isaac dynamical inference platform to infer the model in a fully automated fashion directly from experimental data. The inferred model requires incorporation of an unobserved dynamical variable and is biologically interpretable. The model makes accurate predictions about the dynamics of the worm behavior, and it can be used to characterize the functional logic of the dynamical system underlying the escape response. This work illustrates the power of modern artificial intelligence to aid in discovery of accurate and interpretable models of complex natural systems.

The quantitative biology revolution of recent decades has resulted in an unprecedented ability to measure dynamics of complex biological systems in response to perturbations with the accuracy previously reserved for inanimate, physical systems. For example, the entire escape behavior of a roundworm Caenorhabditis elegans in response to a noxious temperature stimulus can be measured for many seconds in hundreds of worms (1, 2). At the same time, theoretical understanding of such living dynamical systems has lagged behind, largely because, in the absence of symmetries, averaging, and small parameters to guide our intuition, building mathematical models of such complex biological processes has remained a very delicate art. Recent years have shown the emergence of automated modeling approaches, which use modern machine-learning methods to automatically infer the dynamical laws underlying a studied experimental system and predict its future dynamics (3⇓⇓⇓⇓⇓⇓⇓⇓⇓⇓–14). However, arguably, these methods have not yet been applied to any real experimental data with dynamics of a priori unknown structure to produce interpretable dynamical representations of the system. Thus, their ability to build not just statistical but physical models of data (15) which are interpretable by humans, answer interesting scientific questions, and guide future discovery remains unclear.

Here we apply the Sir Isaac platform for automated inference of dynamical equations underlying time series data to infer a model of the C. elegans escape response, averaged over a population of worms. We show that Sir Isaac is able not only to fit the observed data, but also to make predictions about the worm dynamics that extend beyond the data used for training. The inferred optimal model has an easily interpretable form, with the identified interactions and the inferred latent dynamical variable connecting naturally to known mechanisms of C. elegans sensorimotor control. And by analyzing the dynamical structure of the model—number of dynamic variables, number of attractors (distinct behaviors), etc.—we can generalize these results across many biophysical systems.

Results

Automated Dynamical Inference. Sir Isaac (7, 8) is one of the new generation of machine-learning algorithms able to infer a dynamical model of time series data, with the model expressed in terms of a system of differential equations. Compared with other approaches, Sir Isaac is able to infer dynamics (at least for synthetic test systems) that are (i) relatively low dimensional, (ii) have unobserved (hidden or latent) variables, (iii) have arbitrary nonlinearities, (iv) rely only on noisy measurements of the system’s state variables and not of the rate of change of these variables, and (v) are expressed in terms of an interpretable system of coupled differential equations. Briefly, the algorithm sets up a complete and nested hierarchy of nonlinear dynamical models. Nestedness means that each next model in the hierarchy is more complex (in the sense of having a larger explanatory power) (16⇓–18) than the previous one and includes it as a special case. Completeness means that any sufficiently general dynamics can be approximated arbitrarily well by some model within the hierarchy. That is, the only restriction on the dynamics is that they are continuous and do not have infinite rates of change. Two such hierarchies have been developed, one based on S systems (19) and the other on sigmoidal networks (20). Both progressively add hidden dynamical variables to the model and then couple them to the previously introduced variables using nonlinear interactions of specific forms. Sir Isaac then uses a semianalytical formulation of Bayesian model selection (8, 18, 21, 22) to choose the model in the hierarchy that best balances the quality of fit vs. overfitting and is, therefore, expected to produce the best generalization. The sigmoidal network hierarchy is especially well suited to modeling biological systems, where rates of change of variables usually saturate over some scale, and it is the sole focus of our study.

Experimental Model System. Nociception evokes a rapid escape behavior designed to protect the animal from potential harm (23, 24). C. elegans, a small nematode with a simple nervous system, is a classic model organism used in the study of nociception. A variety of studies have used C. elegans to elucidate genes and neurons mediating nociception to a variety of aversive stimuli including high osmolarity and mechanical, chemical, and thermal stimuli (25⇓⇓–28). However, a complete dynamic understanding of the escape response at the neuronal, as well as the molecular, level is not fully known. Recent studies have quantified the behavioral escape response of the worm when thermally stimulated with laser heating (1, 2), and these data are the focus of our study. The response is dynamic: When the stimulus is applied to the animal’s head, it quickly withdraws, briefly accelerating backward, and eventually returns to forward motion, usually in a different direction. Various features of this response change with the level of laser heating, such as the length of time moving in the reverse direction and the maximum speed attained.

Fits and Predictions. We use the worm center-of-mass speed, v, as the variable whose dynamics need to be explained in response to the laser heating pulse. We define v > 0 as the worm crawling forward and v < 0 as the worm retreating backward. The input to the model is the underlying temperature, h ( t ) , which can be approximated as h ( t ) = I h 0 ( t ) , where I is the experimentally controlled laser current, and h 0 is the temperature template, described in Materials and Methods. Based on trajectories of 201 worms in response to laser currents ranging between 9.6 mA and 177.4 mA, we let Sir Isaac determine the most likely dynamical system explaining these data within the sigmoidal networks model class (8) (see Materials and Methods for a detailed description of the modeling and inference). The inferred model has a latent (unobserved) dynamical variable, hereafter referred to as x 2 , in addition to the speed. v and x 2 are coupled by nonlinear interactions. However, some of these nonlinear interactions may be insignificant and may be present simply because the nested hierarchy introduces them before some other interaction terms that are necessary to explain the data. Thus, we reduce the model by setting parameters that are small to zero one by one and in various combinations, refitting such reduced models, and using Bayesian model selection to choose between the reduced model and the original Sir Isaac inferred model. The resulting model is d v d t = − v τ 1 + V 1 h ( t ) + W 11 1 + e θ 1 + v / v s + W 12 1 + e x 2 , [1] d x 2 d t = − x 2 τ 2 + V 2 h ( t ) , x 2 ( t = 0 ) = 0 . [2]The form of each term is set a priori by the choice of the sigmoidal model class, and we choose v s as a relevant velocity scale. Everything else about the model is inferred from the data using the Bayesian model selection procedure, including the number of terms, the number of hidden dynamical variables, and values of all other constants [see Materials and Methods for inferred values of τ 1 , θ 1 , V 1 , V 2 , W 11 , W 12 , and v ( t = 0 ) ; the model uses default values for τ 2 = 1.0 s and x 2 ( t = 0 ) = 0 ]. Interestingly, the inferred model reveals that the latent dynamical variable x 2 is a linear low-pass filtered (integrated) version of the heat signal. The fits produced by this model are compared with data in Fig. 1A, showing an excellent agreement (see Materials and Methods for quantification of the quality of fits). Surprisingly, the quality of the fit for this automatically generated model, with very little human input, is better than that of a state-of-the-art manually constructed probabilistic, nondynamical model (2): Only about 10% of explainable variance in the data remains unexplained by the model for times between 100 ms and 2 s after the stimulus, compared with about 20% for the manual model (Materials and Methods). Fig. 1. The escape response behavior is fitted and predicted well by the inferred model. (A) Colored lines and shaded bands represent the empirical mean and the SD of the mean, respectively, of the escape response velocity for five groups of worms stimulated with laser currents in different ranges (38, 42, 39, 41, and 41 subjects in each group). Dashed lines and bands of the corresponding color show means and SDs of the mean (Materials and Methods) of fits to these empirical data by the chosen model. Only the velocity in the range of time [ − 1,2.25 ] s relative to the time of the onset of the laser stimulus was used for fitting. (B and C) While most worms are still moving backward during this range of time, the inferred model predicts, without any additional free parameters, the time at which a worm’s speed again becomes positive both as a function of (B) applied laser current and (C) peak observed worm reverse speed. These predictions (red curves) agree with the binned averages (orange circles) of individual worms’ behavior (blue circles). Bins correspond to eight quantiles each with 25 or 26 trials, and error bars are the SEM. However, the quality of the fit may not be surprising in itself since the Sir Isaac model hierarchy can fit any dynamics using sufficient data. A utility of a mathematical model is in its ability to make predictions about data that were not used in fitting. Thus, we use the inferred model to predict when the worm will return to forward motion. This usually happens at times well after t = 2.25 s, and only data for smaller times were used for fitting. Fig. 1 B and C compares these predictions with experiments, showing very good predictions. Such ability to extrapolate beyond the training range is usually an indication that the model captures the underlying physics and is not purely statistical (15), giving us confidence in using the model for inferences about the worm.