The trainable free parameter of the wave equation is the distribution of the wave speed, c(x, y, z). In practical terms, this corresponds to the physical configuration and layout of materials within the domain. Thus, when modeled numerically in discrete time ( Fig. 1E ), the wave equation defines an operation that maps into that of an RNN ( Fig. 1B ). Similar to the RNN, the full time dynamics of the wave equation may be represented as a directed graph, but in contrast, the nearest-neighbor coupling enforced by the Laplacian operator leads to information propagating through the hidden state with a finite velocity ( Fig. 1F ).

Similar to the standard RNN, the connections between the hidden state and the input and output of the wave equation are also defined by linear operators, given by P (i) and P (o) . These matrices define the injection and measuring points within the spatial domain. Unlike the standard RNN, where the input and output matrices are dense, the input and output matrices of the wave equation are sparse because they are nonzero only at the location of injection and measurement points. Moreover, these matrices are unchanged by the training process. Additional information on the construction of these matrices is given in section S3.

For sufficiently large field strengths, the dependence of A on h t−1 can be achieved through an intensity-dependent wave speed of the form c = c lin + u t 2 · c nl , where c nl is exhibited in regions of material with a nonlinear response. In practice, this form of nonlinearity is encountered in a wide variety of wave physics, including shallow water waves ( 18 ), nonlinear optical materials via the Kerr effect ( 19 ), and acoustically in bubbly fluids and soft materials ( 20 ). Additional discussion and analysis of the nonlinear material responses are provided in section S2. Similar to the σ (y) ( · ) activation function in the standard RNN, a nonlinear relationship between the hidden state, h t , and the output, y t , of the wave equation is typical in wave physics when the output corresponds to a wave intensity measurement, as we assume here for Eq. 6 .

Here, the subscript, t, indicates the value of the scalar field at a fixed time step. The wave system’s hidden state is defined as the concatenation of the field distributions at the current and immediately preceding time steps, h t ≡ [ u t , u t−1 ] T , where u t and u t−1 are vectors given by the flattened fields, u t and u t−1 , represented on a discretized grid over the spatial domain. Then, the update of the wave equation may be written as h t = A ( h t − 1 ) ∙ h t − 1 + P ( i ) ∙ x t (5) y t = ( P ( o ) ∙ h t ) 2 (6)where the sparse matrix, A , describes the update of the wave fields u t and u t−1 without a source ( Fig. 1E ). The derivation of Eqs. 5 and 6 is given in section S1.

We now discuss the connection between the dynamics in the RNN as described by Eqs. 1 and 2 and the dynamics of a wave-based physical system. As an illustration, the dynamics of a scalar wave field distribution u(x, y, z) are governed by the second-order partial differential equation ( Fig. 1D ) ∂ 2 u ∂ t 2 − c 2 ∙ ∇ 2 u = f (3)where ∇ 2 = ∂ 2 ∂ x 2 + ∂ 2 ∂ y 2 + ∂ 2 ∂ z 2 is the Laplacian operator, c = c(x, y, z) is the spatial distribution of the wave speed, and f = f(x, y, z, t) is a source term. A finite difference discretization of Eq. 3 , with a temporal step size of Δt, results in the recurrence relation u t + 1 − 2 u t + u t − 1 Δ t 2 − c 2 ∙ ∇ 2 u t = f t (4)

While many variations of RNNs exist, a common implementation ( 17 ) is described by the following update equations h t = σ ( h ) ( W ( h ) ∙ h t − 1 + W ( x ) ∙ x t ) (1) y t = σ ( y ) ( W ( y ) ∙ h t ) (2)which are diagrammed in Fig. 1B . The dense matrices defined by W (h) , W (x) , and W (y) are optimized during training, while σ (h) ( · ) and σ (y) ( · ) are nonlinear activation functions. The operation prescribed by Eqs. 1 and 2 , when applied to each element of an input sequence, can be described by the directed graph shown in Fig. 1C .

( A ) Diagram of an RNN cell operating on a discrete input sequence and producing a discrete output sequence. ( B ) Internal components of the RNN cell, consisting of trainable dense matrices W (h) , W (x) , and W (y) . Activation functions for the hidden state and output are represented by σ (h) and σ (y) , respectively. ( C ) Diagram of the directed graph of the RNN cell. ( D ) Diagram of a recurrent representation of a continuous physical system operating on a continuous input sequence and producing a continuous output sequence. ( E ) Internal components of the recurrence relation for the wave equation when discretized using finite differences. ( F ) Diagram of the directed graph of discrete time steps of the continuous physical system and illustration of how a wave disturbance propagates within the domain.

In this section, we introduce the operation of an RNN and its connection to the dynamics of waves. An RNN converts a sequence of inputs into a sequence of outputs by applying the same basic operation to each member of the input sequence in a step-by-step process ( Fig. 1A ). Memory of previous time steps is encoded into the RNN’s hidden state, which is updated at each step. The hidden state allows the RNN to retain memory of past information and to learn temporal structure and long-range dependencies in data ( 15 , 16 ). At a given time step, t, the RNN operates on the current input vector in the sequence, x t , and the hidden state vector from the previous step, h t−1 , to produce an output vector, y t , as well as an updated hidden state, h t .

Training a physical system to classify vowels

We now demonstrate how the dynamics of the wave equation can be trained to classify vowels through the construction of an inhomogeneous material distribution. For this task, we use a dataset consisting of 930 raw audio recordings of 10 vowel classes from 45 different male speakers and 48 different female speakers (21). For our learning task, we select a subset of 279 recordings corresponding to three vowel classes, represented by the vowel sounds ae, ei, and iy, as contained in the words had, hayed, and heed, respectively (Fig. 2A).

Fig. 2 Schematic of the vowel recognition setup and the training procedure. (A) Raw audio waveforms of spoken vowel samples from three classes. (B) Layout of the vowel recognition system. Vowel samples are independently injected at the source, located at the left of the domain, and propagate through the center region, indicated in green, where a material distribution is optimized during training. The dark gray region represents an absorbing boundary layer. (C) For classification, the time-integrated power at each probe is measured and normalized to be interpreted as a probability distribution over the vowel classes. (D) Using automatic differentiation, the gradient of the loss function with respect to the density of material in the green region is computed. The material density is updated iteratively, using gradient-based stochastic optimization techniques until convergence.

The physical layout of the vowel recognition system consists of a two-dimensional domain in the x-y plane, infinitely extended along the z direction (Fig. 2B). The audio waveform of each vowel, represented by x(i), is injected by a source at a single grid cell on the left side of the domain, emitting waveforms that propagate through a central region with a trainable distribution of the wave speed, indicated by the light green region in Fig. 2B. Three probe points are defined on the right-hand side of this region, each assigned to one of the three vowel classes. To determine the system’s output, y(i), we measured the time-integrated power at each probe (Fig. 2C). After the simulation evolves for the full duration of the vowel recording, this integral gives a non-negative vector of length 3, which is then normalized by its sum and interpreted as the system’s predicted probability distribution over the vowel classes. An absorbing boundary region, represented by the dark gray region in Fig. 2B, is included in the simulation to prevent energy from building up inside the computational domain. The derivation of the discretized wave equation with a damping term is given in section S1.

For the purposes of our numerical demonstration, we consider binarized systems consisting of two materials: a background material with a normalized wave speed c 0 = 1.0 and a second material with c 1 = 0.5. We assume that the second material has a nonlinear parameter, c nl = − 30, while the background material has a linear response. In practice, the wave speeds would be modified to correspond to different materials being used. For example, in an acoustic setting, the material distribution could consist of air, where the sound speed is 331 m/s, and porous silicone rubber, where the sound speed is 150 m/s (22). The initial distribution of the wave speed consists of a uniform region of material with a speed that is midway between those of the two materials (Fig. 2D). This choice of starting structure allows for the optimizer to shift the density of each pixel toward either one of the two materials to produce a binarized structure consisting of only those two materials. To train the system, we perform back-propagation through the model of the wave equation to compute the gradient of the cross-entropy loss function of the measured outputs with respect to the density of material in each pixel of the trainable region. This approach is mathematically equivalent to the adjoint method (23), which is widely used for inverse design (24–25). Then, we use this gradient information to update the material density using the Adam optimization algorithm (26), repeating until convergence on a final structure (Fig. 2D).

The confusion matrices over the training and testing sets for the starting structure are shown in Fig. 3 (A and B), averaged over five cross-validated training runs. Here, the confusion matrix defines the percentage of correctly predicted vowels along its diagonal entries and the percentage of incorrectly predicted vowels for each class in its off-diagonal entries. The starting structure cannot perform the recognition task. Figure 3 (C and D) shows the final confusion matrices after optimization for the testing and training sets, averaged over five cross-validated training runs. The trained confusion matrices are diagonally dominant, indicating that the structure can indeed perform vowel recognition. More information on the training procedure and numerical modeling is provided in Materials and Methods.

Fig. 3 Vowel recognition training results. Confusion matrix over the training and testing datasets for the initial structure (A and B) and final structure (C and D), indicating the percentage of correctly (diagonal) and incorrectly (off-diagonal) predicted vowels. Cross-validated training results showing the mean (solid line) and SD (shaded region) of the (E) cross-entropy loss and (F) prediction accuracy over 30 training epochs and five folds of the dataset, which consists of a total of 279 total vowel samples of male and female speakers. (G to I) The time-integrated intensity distribution for a randomly selected input (G) ae vowel, (H) ei vowel, and (I) iy vowel.