A cohort of digital embryos

Images of five live embryos developing from the 32-cell stage at 4–6 hours post-fertilisation (hpf) until the hatching blastula were acquired with two-photon microscopy and processed by our automated reconstruction workflow5,6 (Fig. 1a and Supplementary Table 1). Nuclear and membrane staining were obtained by RNA injection at the one-cell stage (Fig. 2a). This produced spatiotemporal sets of cell centres, segmented membrane shapes, and the complete cell lineage tree (Fig. 2b,c,e) via automated identification of cell filiation across consecutive time steps. Image acquisition lasted 3–8 consecutive hours with a constant time resolution of 2–5 min (Fig. 2d). Our visualisation interface Mov-IT5,6 helped validate and correct cell tracking, and manually label cells at the 32-cell stage according to their classification into four cell types with known distinct fates: mesomeres (Mes), macromeres (Mac), large micromeres (LMic) and small micromeres (SMic) (Supplementary Fig. 1b)11,19. Labels were propagated along the cell lineage (Supplementary Video 1). This data was suitable to investigate the parameters characterising cell behaviour, including cell displacements, cell divisions, cell volume, cell shape and cell contact changes.

Figure 1 Methodological workflow. The technical content of each box is described in detail in the supplementary material. Bottom to top: increasing levels of abstraction, from raw data to theory and modelling. The concept of “augmented phenomenology” (second tier) represents the superposition of raw data and its reconstruction. Features extracted from the augmented phenomenology are combined into an organised dataset conveying maximum biological meaning and leading to the formulation of theoretical hypotheses. (a to d) Upward arrowheads indicate derivation from data, including reconstruction of digital specimens and statistical analysis leading to probabilistic models. (e and f, h and j) Downward arrowheads indicate prediction testing, whether analytically (e) or by simulation (f), (h) and (j). Horizontal arrow: (g) Aggregation step leading to the design of a “normal” prototype from measurable individual cell features across the five specimens. (i) The prototype is used as an input into the biomechanical model. (k) Bidirectional arrow indicating the quantitative comparison between model simulations and digital reconstruction. (l) Feedback loop tuning the parameter values of the biomechanical model as a function of realism. Full size image

Figure 2 Reconstruction of digital sea urchins from 3D+ time in vivo and in toto imaging. (a to c) Raw and reconstructed data from one specimen (embryo 3) at different stages of development. Scale bar 20 μm. (a) Volume rendering of raw images (Amira software) from two-photon laser scanning microscopy. Total cell number indicated bottom left. (b,c) Reconstructed digital embryo, displayed with Mov-IT software. (b) Detected nuclear centres with colour code for cell types propagated along the lineage: mesomeres (Mes) in blue, macromeres (Mac) in red, large micromeres (LMic) in pink, and small micromeres (SMic) in purple. (c) Surface rendering of segmented cell membranes, colour code as in (b). (d) Temporal observation window for each specimen of the cohort, time scale in hours post-fertilisation (hpf) after temporal rescaling (details in Supplementary Fig. 2). Arrowheads for embryo 3 indicate the temporal window displayed in (e). (e) Flat representation of the reconstructed cell lineage of embryo 3 with the same cell type colour code as in (b), time scale in hpf. Full size image

Temporal and spatial rescaling of embryo-level dynamics

The quantitative inter-individual comparison of morphogenetic features was first measured at the global embryo level at each time step via the total number of cells N(t), total cellular volume W(t) and total cellular surface area Z(t). Because embryos appeared to grow at different speeds, a spatiotemporal rescaling of the corresponding curves was necessary to establish a baseline for inter-individual comparison and the assessment of variability (Fig. 3a–c), i.e. to provide comparable charts while preserving the overall shape of the functions. This rescaling consisted of an affine transformation in time and a linear transformation in space, tuned by parameters specific to each embryo (Supplementary Fig. 2c,f). After this preliminary step, the temporal evolution of the above three global measures was found to be highly reproducible between specimens of the cohort (Fig. 3a–c and Supplementary Fig. 2). Egg size and temperature fluctuations are potential factors explaining growth rate variation at the level of the embryo. Using this rescaling, we provided a correspondence between hpf and developmental stages (Supplementary Table 2).

Figure 3 Multi-level statistics and probabilistic model for the cohort and its prototype. (a to c) Temporally rescaled macroscopic quantities (whole-embryo level) in each specimen, top groups of curves in each chart and each cell population (lower four groups of curves, top to bottom: Mes, Mac, LMic, SMic). Same embryo colour code as in Fig. 2d (details in Supplementary Fig. 2). (d to f) Statistics of individual cell features in each cell group clustered by common type and generation. Mean and standard deviation bars represent normal and log-normal approximations of the cell feature distributions. In black: statistics aggregated into a prototype (see Supplementary Figs 3 and 4). (g to i) Embryo-level features in one measured specimen (embryo 3) and in simulation based on the probabilistic model (see Supplementary Fig. 11 for details). In black: mean of 300 simulated cell lineages; in grey: their standard deviation; in colour: measured values (top curve: whole embryo, lower four curves: separate populations using the same cell type colours as Fig. 2b). Shaded areas: incomplete cycles, for which an accurate estimation was not possible. (j to l) Same as (a to c) with the prototype curves added in black and grey. Full size image

Coarse-grained statistical analysis of cell features along the cell lineage

We further investigated the possibility to relate the global similarity of embryos (macroscopic level of description) to individual cell features (microscopic level of description). To this aim, we examined how six features of individual cells were distributed along the cell lineage: cell cycle lengths x i (t), mitosis times m i (t) (Supplementary Fig. 1a), average cell volumes (t) and cell surface areas (t), average daughter/mother volume ratios a i (t) = and daughter/mother surface area ratios b i (t) = . Because symmetries in the embryos, namely rotations around the animal-vegetal axis, and variations in single cell positions from one embryo to the other prevented the identification of individual cells across specimens, a “coarse-graining” transition to a mesoscopic and more generic level of description was required (Fig. 1c). To this aim, cells were clustered into groups g according to their type k i = 1, …, 4 (Mes, Mac, LMic, SMic) and their generation rank n i = 6, …, 10. Assuming that cells were indistinguishable within the same group, individual cell features were “exchangeable” sequences of random variables, hence obeyed a theorem stating that such sequences are mixtures of independent and identically distributed (i.i.d.) variables20. The above six cell features thus produced six distributions of individual cell features at the level of each group: X g , M g , , , A g and B g , which we set out to measure.

Due to limits on the observation windows, however (Fig. 2d), some cell cycles were incompletely or not recorded, hence some groups g were missing cells. In total, there were 252 exploitable distributions, corresponding to 4 cell types across 5 cell generations or less (depending on the dataset) and 4–6 cell features (depending on the cell cycle and the availability of A g and B g ) in each one of the 5 embryos (Fig. 3d–f and Supplementary Figs 3 and 4). Graphical assessment of these histograms, supported by a statistical test (Supplementary Figs 5 and 6), led to their classification into normal distributions for cell cycle lengths and mitosis times, and log-normal distributions for average volumes and surfaces. This allowed us to calculate the mean μ and standard deviation σ of each random variable in each cell group g and provide an idealised parametric representation of the whole cohort’s dataset in the form of 252 (μ, σ) pairs.

From these statistical measures, we concluded that random variables were largely independent from each other along one cell lineage: we detected no significant correlations among single features (temporal or spatial) of mother, daughter and sister cells (Supplementary Fig. 9 and Supplementary Table 3). Moreover, while the average cell cycle lengthened through generations with increasing fluctuations, the cell cycle length of a daughter did not depend on the cell cycle length of its mother. At the spatial level, we observed a variation in the range of ±20% around 0.5 for the average daughter/mother volume ratio, which mitigated the common assumption of average cell volume conservation across mitosis through consecutive generations at cleavage stages. However, although variability in volume and surface area at the individual cell level was not counterbalanced across consecutive generations, global uniformity could still be observed at the macroscopic level, as shown in Fig. 3 for the evolution of the total cellular volume.

A multi-level probabilistic model relating individual cell features to embryonic dynamics

Based on this data analysis, we were able to formally relate individual cell features to embryo-wide features via a multi-level probabilistic model of the cell lineage (Fig. 1d). At its core, it consisted of positing the following three recursive relationships: M g = M g−1 + X g , = A g and = B g where g − 1 = (n − 1, k) denotes the mother group of g (Supplementary Table 3–5). To evaluate this model, we calculated the differences between the empirical distributions (M g , , ) measured directly on the embryos’ final groups and their counterparts predicted from applying the iterative sum or product of (X g , A g , B g ) on the first group. Among 48 eligible final groups, 75% of them showed a close proximity between measured and predicted parameter sets and 21% were in good agreement, confirming the accuracy of the model. In sum, we propose here a generic methodology to identify the probabilistic laws of the prototypic cell behaviours underlying blastula formation.

Next, using this model, realistic virtual cell lineages were generated and their statistics compared to the real data. Starting from the 32-cell stage, each simulated cell divided into two and its cell cycle length x i was drawn from the idealised normal distributions (μ Xg , σ Xg ). Given the relationships between cell features (Supplementary Table 5), similar to the relationships between random variables, the mitosis time of cell i was deduced by summing the cell cycle lengths of its ancestors j: m i = ∑ x j . For each embryo, 300 realisations of the cell lineage were produced by varying the random generator’s seed. Comparison between these simulated specimens and empirical values on the three macroscopic quantities N, W and Z showed that embryo-level dynamics for each cell type was accurately reproduced using the empirical parameters measured in each cell group (Fig. 3g–i and Supplementary Fig. 11).

The number of cells over time N(t) displayed an alternation of plateaus, when no cells divided, and rapid increase during periods of high mitotic activity (Fig. 3a). During these periods, the slope of N(t) reflected the spread of mitoses over time, which our model showed to be caused by continuously desynchronising cell cycles along the lineage, and whose variance of division times within a cell group was equal to the sum of variances of cell cycle lengths in ancestor groups (Supplementary Note 2.3.1). In other words, we showed that the variability of division times came from accumulated variations in cell cycle lengths along the cell lineage. This result contrasts with previous statements about putative successive periods of synchrony, metachrony and asynchrony21. Moreover, there seemed to be no need for a “mitotic gradient” to explain the spatial distribution of mitoses21, as our study suggested that it could simply arise from the variability of cell cycle lengths.

The total cellular volume W(t) was globally conserved while undergoing alternating phases of contraction and expansion, which were also interpreted as emerging from the collective behaviour of individual cells characterised by desynchronising cycles but otherwise similar individual dynamics throughout these cycles (Fig. 3b and Supplementary Fig. 7). The total cellular surface area Z(t) followed a comparable evolution, but increased globally as cells became more cylindrical (Fig. 3c and Supplementary Fig. 8).

Designing a data-driven prototype

In sum, each specimen of the cohort could be represented by a reduced number of parameters, the (μ, σ) pairs, sufficient to reproduce the embryo-level dynamics. To obtain a unified view of development during cleavage stages, a prototypical representation of the cohort was defined as the “centroid” of the five specimens in parameter space (Fig. 1g)22. This methodology was also used to define prototypical statistics for each cell group (black bars in Fig. 3d–f and Supplementary Figs 3 and 4), where intra-individual variability was represented by standard deviations computed for each cell feature. Similar to individual specimens, cell lineages were also simulated based on these statistics and the resulting prototypical embryo-level dynamics provided a representation of the “normal” development of sea urchin blastula (Figs 1h and 3j–l).

Spatial embedding through biomechanical modelling

In a last step, to understand the relations between individual cell features and the actual shape of the embryo, the cell lineage prototype was embedded in space via a biomechanical model (Figs 1i and 4). Here each cell was represented by a cylindrical particle oriented along the apicobasal axis of the epithelium (Fig. 4a and Supplementary Fig. 12). As cells are extremely small and sticky, inertia was negligible with respect to viscosity and cell displacements could be represented by an overdamped equation of motion (Supplementary Note 3.1.1). The epithelialisation of the forming blastula led to a decomposition of the forces exerted between neighbouring cells into tangential components and normal components, respectively: “attraction-repulsion” forces maintaining the integrity of the cell volume by stiffness and adhesion (Fig. 4b and Supplementary Fig. 13); and “planarity conservation” forces maintaining the monolayered structure of the epithelium and modelling the blastocoel turgor pressure (Supplementary Fig. 14). In the first case, the adhesion coefficient ω adh could take two values, homotypic (ω adh,o ) or heterotypic (ω adh,e ), depending on the match between the two cell types in contact. In addition, cell division was oriented inside the tangential domain at an angle drawn randomly from a uniform distribution.

Figure 4 Spatial embedding in a biomechanical model of the prototypical multi-level probabilistic model. (a) Principle of cell axis computation. (b) Intensity of the attraction-repulsion force as a function of the interparticle distance r, with and . (c) Phenotypic phase diagram: the similarity of the simulation with digital embryos is computed for various pairs of homotypic and heterotypic adhesion coefficients (ω adh,o , ω adh,e ). The green parametric region corresponds to the domain of best fit (Supplementary Video 2 shows simulations of representative embryos for each region). (d) Upper panel: example of multilayered aspherical “deviant” embryo. Lower panel: most realistic embryo. For both, section along the sagittal plane. (e to g) Landscapes of the three objective functions: cell type population border similarity D s , epithelial layer planarity P s , and embryo shape sphericity S s (details concerning the calculation of these three metrics are described in the supplementary material). The fitness colour map goes from blue (best values) to red (worst values). The white dotted lines delimit the domains where the differences between observed and simulated embryonic developments are minimal. These lines are reproduced in the general phase diagram of (c). Full size image

Fitting the free parameters to the experimental data