a, Percentage of cells with at least one target site captured. Cells from embryo 2 were run on two 10x lanes. b, Scatter plot showing the relationship between the mean number of UMIs (a proxy for expression level) sequenced per target site and the percentage of cells in which the target site is detected, which we refer to as ‘target-site capture’. In general, as the mean number of UMIs increases, the percentage of cells also increases. Using a full-length, intron-containing EF1α promoter in mouse embryos leads to a higher number of UMIs, which generally results in better target-site capture. c, Percentage of cells for which a given intBC is detected across all seven embryos profiled in this study. d, Target-site capture and expression level across tissues for embryo 5, which uses a truncated EF1α promoter to direct transcription of the target site. Each row corresponds to a different intBC, indicated in the top left of the histogram. Left, the percentage of cells in each tissue for which the target site is captured. Right, violin plots representing the distribution of UMIs for the target site in each tissue. Dashed line refers to a ten-UMI threshold. The target site may be expressed at different levels in a tissue-specific manner, which leads to higher likelihoods of capture in certain tissues. Biased capture of target sites that carry the intBCs AGGACAAA and ATTGCTTG may also be explained by mosaic integration after the first cell cycle as their capture is preferential to extra-embryonic lineages that are restricted early in development. White dot indicates the median UMI count for cells from a given tissue, edges indicate the interquartile range, and whiskers denote the full range of the data. e, Target-site capture and expression level across tissues for embryo 7, which drives target-site expression from an intron-containing EF1α promoter. Each row corresponds to a different intBC, indicated in the top left of the histogram. Left, the percentage of cells in each tissue for which the target site is captured. Right, violin plots representing the distribution of UMIs for the target site in each tissue as in d. Dashed line is a visual threshold for ten UMIs. Although tissue-specific expression may explain some discrepancy in target-site capture, high expression (as estimated from the number of UMIs) can still correspond to low capture rates, as observed for the intBC TGGCGGGG. One possibility is that particular indels may destabilize the transcript and lead to either poor expression or capture. f, Scatter plots that show the relationship between estimated relative indel frequency and the median number of cells that carry the indel. Because the indel frequency within a mouse is dependent on the timing of the mutation, we cannot calculate the underlying indel frequency distribution using the fraction of cells within embryos that carry a given indel. Instead, we estimate this frequency by the presence or absence of an indel using all of the target-site integrations across mice, which reduces biases from cellular expansion but assumes that any given indel occurs only once in the history of each intBC. Because the number of integrations is small, we might expect our estimates to be poor. Here we see that the number of cells marked with an indel increases with indel frequency, which suggests that our frequency estimates are underestimated for particularly frequent indels. This is probably due to the fact that we cannot distinguish between identical indels in the same target site that may have resulted from multiple repair events (convergent indels). The most frequent insertions are of a single base and tend to be highly biased towards a single nucleotide (for example, 92:1:I is uniformly an ‘A’ in 5 out of 7 embryos, and never below 88%).