a, FACS gating strategy before scRNA-seq. Live cells were selected on the basis of DAPI staining. Four sequential gates (P1–P4) were used; cells from gate P4 were used for scRNA-seq. SSC, side scatter; FSC, forward scatter; H, height; W, width; A, area. b, Box plot showing the median number of transcripts (left) and genes (right) detected per cell for SORT-seq experiments on E14-IB10 (E14-S, n = 5,951 cells from 26 biologically independent samples) and LfngT2AVenus gastruloids (Lfng-S, n = 4,592 cells from 74 biologically independent samples), and for 10x Genomics experiments on LfngT2AVenus gastruloids (Lfng-10x, n = 14,659 cells from 74 biologically independent samples). SORT-seq and 10x Genomics analyses were performed in parallel on the same 74 biologically independent LfngT2AVenus gastruloids; all cells extracted from these gastruloids were pooled and split into two tubes, of which one was used for SORT-seq and the other for 10x Genomics. The box extends from the lower to the upper quartile; whiskers are 1.5× the interquartile range; flier points are those past the end of the whiskers. c, UMAP plot for each experiment separately (n = 5,883, 4,589 and 14,636 cells for E14-S, Lfng-S and Lfng-10x, respectively; Methods). The E14-S cells (n = 5,883) were extracted from n = 26 biologically independent samples; the Lfng-S and Lfng-10x cells (n = 4,589 and 14,636, respectively) were extracted from n = 74 biologically independent samples that were pooled and then split into one tube for SORT-seq and one tube for 10x Genomics. The colour of each cell is the same as the colour of that particular cell in Fig. 1a. d, UMAP plot obtained by analysing all the cells from the different experiments together (n = 25,202 cells from 100 biologically independent samples), in which cells are coloured according to their batch (Methods, Supplementary Table 1). The black line indicates the symmetry line in clusters 1–8 used to generate the linearized UMAP plot in Extended Data Fig. 2d (Methods). e, Fraction of E14-IB10 (n = 26 biologically independent samples) and LfngT2AVenus (n = 74 biologically independent samples) cells in each scRNA-seq cluster from Fig. 1a. Blue, green and black numbers, number of E14-IB10, LfngT2AVenus and total cells in each cluster, respectively (Supplementary Tables 1, 4). f, Fraction of cells for each cell type in each plate in SORT-seq experiments (Lfng-S, n = 19 plates containing cells from n = 74 biologically independent gastruloids; E14-S, n = 30 plates containing cells from n = 26 biologically independent gastruloids), and in each experimental batch in 10x Genomics experiments (Lfng-10x, n = 2 independent batches containing cells extracted from n = 44 and 30 biologically independent gastruloids, respectively, with 2 technical replicates each). In the box plots, centre line is median; box limits are the 1st and 3rd quartiles; and whiskers denote the range. g, Fraction of cells detected in the E8.5 mouse embryo scRNA-seq dataset4 with which we compared our gastruloid scRNA-seq data. Exact numbers in each cluster are indicated. h, Dot plot showing the number of overlapping genes between significantly upregulated genes (n = 79, 87, 84, 22, 84, 66, 82, 78, 100, 97, 100, 96 and 90 genes for clusters 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12 and 13 (respectively) in the gastruloid dataset, determined using the two-side t-test, followed by selection of genes with fold change above 1.01 and P value below 0.01; n = 7, 20, 21, 35, 200, 39, 23, 200, 200, 95, 54, 21, 58, 57, 200, 81, 135, 28, 200 and 200 genes for the embryonic-cell types reported in the x axis, determined in ref. 4 and selecting genes with P value below 0.01) for each gastruloid cluster (n = 25,202 cells extracted from 100 biologically independent samples) and each E8.5 mouse embryonic-cell type4. Dot colour indicates the probability of finding such a number of overlapping genes between the two sets by random chance (P value determined by binomial testing, one-sided, no adjustments for multiple corrections were made). Dot size represents the number of overlapping genes. i, Dot plot showing overlapping genes between significantly upregulated genes for each gastruloid scRNA-seq cluster (n = 79, 87, 84, 22, 84, 66, 82, 78, 100, 97, 100, 96 and 90 genes for clusters 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12 and 13, respectively (Supplementary Table 2); scRNA-seq dataset obtained for 25,202 cells that were extracted from n = 100 biologically independent gastruloids), and upregulated genes for each E7.0–E8.5 mouse embryonic-cell type4. Dot colour indicates the probability of finding such a number of overlapping genes between the two sets by random chance (P value determined by binomial testing, one-sided, no adjustments for multiple corrections were made), and dot size represents the number of overlapping genes. Blue, embryonic stage. 10x, 10x Genomics; Ant, anterior; EnD, endoderm; haemato, haemato-endothelial; prog, progenitors; S, SORT-seq33. Source Data