a–e, We validated the structure of change inferred with nearest-neighbour statistics (Fig. 3) with an approach based on linear regression in the high-dimensional spectrogram space (see Supplementary Methods). Unlike for the case of nearest neighbour-based statics, here each rendition must first be assigned to a cluster (that is, a syllable; compare with Fig. 2a) and each cluster is analysed separately. a, Illustration of the linearization scheme. First, we infer the (local) DiSC on days k and k + 1 (grey arrow) as the vector of linear-regression coefficients relating production day to variability of renditions from days k − 1 and k + 2. Second, we infer the direction of within-day change (green arrow) as the linear-regression coefficients relating the period within a day to variability of renditions from days k and k + 1, orthogonalized to the DiSC. Third, we infer the direction of across-day change (orange arrow) as the linear-regression coefficients relating production day to variability of renditions from days k and k + 1, orthogonalized to the DiSC and within-day change. All three sets of coefficients, and the corresponding directions in spectrogram space, typically vary across days, syllables and birds. The progression of song along the DiSC and along the (orthogonalized) directions of within-day and across-day change are obtained by projecting renditions on day k and k + 1 onto the corresponding directions. b, Example rendition of syllable b as in Fig. 1 (top, encapsulated by red lines) and inferred coefficients (directions in spectrogram space; bottom) for day k = 57. Bright and dark shades of grey mark spectrogram bins for which power increases or decreases, respectively, over the corresponding timescales in a. c, Dependency of cross-validated regression quality (fraction of variance explained; y axis) on the regularization constant (λ) for the estimation of the DiSC. One regularization constant was chosen for each syllable and the direction based on maximizing the leave-one-out cross-validation error on the training set. d, Progression of syllable b along the directions of change shown in b, during days 57 and 58. Renditions from each day are binned into ten consecutive periods on the basis of production time within the day (analogous to the ten periods in Fig. 3a, b; curves and error bars represent means and 95% bootstrapped confidence intervals). For simplicity of visualization, the time elapsed (x axis) during the night between days k and k + 1 is not shown to scale. The position along the DiSC for the morning of day k + 1 is close to that for the evening of day k, indicating overall strong consolidation (left). The position along the direction of within-day change is reset overnight, implying that the underlying changes are not consolidated (middle). The position along the direction of across-day change jumps overnight, consistent with offline learning (right). We note that strong consolidation, weak consolidation and offline learning have all been reported previously, albeit in different behaviours and species2,4,14,15,23,24. The charts in d show that these different patterns of change can occur in the very same syllable along distinct spectral features (see also Fig. 1h and Extended Data Fig. 8). By considering features with different projections onto these directions, a wide range of consolidation patterns can be uncovered (see also Fig. 1h). e, As for d, but averaged across all four-day windows during days 60–69 and over all syllables and birds (same five birds as in Figs. 2, 3). The resulting averages include contributions from the entire behavioural repertoire, including regressions, typical renditions and anticipations. The two right-most panels show concurrent progression along the DiSC and the direction of within-day or across-day change, combining data from the first and second, or first and third, panels in e. These representations are analogous, and in qualitative agreement, with the behavioural trajectories in Fig. 3h–k (typical). f, Analogous to e, but computed on vocalizations represented by 32 acoustic features instead of spectrograms. Directions as in e can be retrieved, but progression along the DiSC appears noisier, suggesting that the 32 acoustic features do not fully capture in particular the slow spectral changes occurring over development (see also Extended Data Fig. 9). g, h, Contribution of individual acoustic features to the directions of slow, within-day and across-day change. As in f, the directions are computed in the space of 32 acoustic features. g, Distribution of coefficients in the retrieved orthonormalized directions. Thick and thin black bars represent means and 95% confidence intervals; crosses show outliers; thin vertical lines represent medians. h, Means (solid lines) and medians (dotted lines) of the signed (left) or unsigned (right) distributions in g. Most coefficients are small and variable, indicating that the alignment between any of the 32 acoustic features and the inferred directions of change is weak and highly variable over time, syllables and birds.