We also discuss future directions in inferring sequences of gene trees and scalable ARGs and their use in studying selection.

In this review, we discuss the biological importance of studying selection and advances in selection simulators. Furthermore, we review traditional summary statistics and methods that aggregate multiple statistics, including approximate Bayesian computation (ABC) and supervised machine learning methods.

Methodological advances in inferring genome-wide genealogies provide an alternative and complementary way to infer natural selection by making use of the full data set rather than traditional summary statistics.

Gene trees and ARGs represent powerful and rich data structures for the detection of signatures of natural selection from DNA sequences.

Methods to detect signals of natural selection from genomic data have traditionally emphasized the use of simple summary statistics. Here, we review a new generation of methods that consider combinations of conventional summary statistics and/or richer features derived from inferred gene trees and ancestral recombination graphs (ARGs). We also review recent advances in methods for population genetic simulation and ARG reconstruction. Finally, we describe opportunities for future work on a variety of related topics, including the genetics of speciation, estimation of selection coefficients, and inference of selection on polygenic traits. Together, these emerging methods offer promising new directions in the study of natural selection.

Purchase access to all full-text HTML articles for 6 or 36 hr at a low cost. Click here to explore this opportunity.

To read this article in full you will need to make a payment

Inferring the landscape of recombination using recurrent neural networks.

Apparent latent structure within the UK Biobank sample has implications for epidemiological analysis.

Polygenic adaptation on height is overestimated due to uncorrected stratification in genome-wide association studies.

Reduced signal for polygenic adaptation of height in UK Biobank.

Height-reducing variants and selection for short stature in Sardinia.

Population genetic differentiation of height and body mass index across Europe.

Evidence of widespread selection on standing variation in Europe at height-associated SNPs.

Detection of human adaptation during the past 2000 years.

Reconstructing the history of polygenic scores using coalescent trees.

An approximate full-likelihood method for inferring selection and allele frequency trajectories from DNA sequence data.

The coalescent process in models with selection.

Ancestral inference in population genetics models with selection (with discussion).

The genealogy of samples in models with selection.

Integration within the Felsenstein equation for improved Markov chain Monte Carlo methods in population genetics.

Maximum-likelihood estimation of migration rates and effective population numbers in two populations using a coalescent approach.

Pairwise comparisons of mitochondrial DNA sequences in stable and exponentially growing populations.

Phylogenetic analysis at deep timescales: unreliable gene trees, bypassed hidden support, and the coalescence/concatalescence conundrum.

Implementing and testing the multispecies coalescent model: a valuable paradigm for phylogenomics.

A method for genome-wide genealogy estimation for thousands of samples.

Accurate computation of likelihoods in the coalescent with recombination via parsimony.

RENT+: an improved method for inferring local genealogical trees from haplotypes with recombination.

New methods for inference of local tree topologies with recombinant SNP sequences in populations.

Ancestral inference from samples of DNA sequences with recombination.

No evidence for recent selection at FOXP2 among diverse human populations.

Genome-wide scans of selection highlight the impact of biotic and abiotic constraints in natural populations of the model grass Brachypodium distachyon.

Strong selective sweeps before 45,000BP displaced archaic admixture across the human X chromosome.

Mapping gene flow between ancient hominins through demography-aware inference of the ancestral recombination graph.

Selection plays the hand it was dealt: evidence that human adaptation commonly targets standing genetic variation.

Discovery of ongoing selective sweeps within Anopheles mosquito populations using deep learning.

Distinguishing positive selection from neutral evolution: boosting the performance of summary statistics.

S/HIC: robust identification of soft and hard sweeps using machine learning.

Distinguishing between selective sweeps from standing variation and from a de novo mutation.

A survey of methods and tools to detect recent and strong positive selection.

OmegaPlus: a scalable tool for rapid detection of selective sweeps in whole-genome datasets.

SweeD: likelihood-based detection of selective sweeps in thousands of genomes.

A map of recent positive selection in the human genome.

On the number of segregating sites in genetical models without recombination.

Statistical method for testing the neutral mutation hypothesis by DNA polymorphism.

Forces shaping the fastest evolving regions in the human genome.

Adaptive protein evolution at the Adh locus in Drosophila.

Simple methods for estimating the numbers of synonymous and nonsynonymous nucleotide substitutions.

A new method for estimating synonymous and nonsynonymous rates of nucleotide substitution considering the relative likelihood of nucleotide and codon changes.

Preponderance of synonymous changes as evidence for the neutral theory of molecular evolution.

Tree-sequence recording in SLiM opens new horizons for forward-time simulation of whole genomes.

On the road to breeding 4.0: unraveling the good, the bad, and the boring of crop quantitative genomics.

An integrative approach to predicting the functional effects of non-coding and coding sequence variation.

Estimation of allele-specific fitness effects across human protein-coding sequences and implications for disease.

An evolutionary framework for measuring epigenomic information and estimating cell-type-specific fitness consequences.

Selection and environmental adaptation along a path to speciation in the Tibetan frog Nanorana parkeri.

How and Why Species Multiply: The Radiation of Darwin’s Finches.

Reanalysis suggests that genomic islands of speciation are due to reduced diversity, not reduced gene flow.

The effect of deleterious mutations on neutral molecular variation.

Linked selection and recombination rate variation drive the evolution of the genomic landscape of differentiation across the speciation continuum of Ficedula flycatchers.

Universal patterns of selection in cancer and somatic tissues.

More effective drugs lead to harder selective sweeps in the evolution of drug resistance in HIV-1.

Tracking footprints of artificial selection in the dog genome.

Genome-wide detection and characterization of positive selection in human populations.

Altitude adaptation in Tibetans caused by introgression of Denisovan-like DNA.

Adaptive introgression of anticoagulant rodent poison resistance by hybridization between Old World mice.

Evolution of novel mimicry rings facilitated by adaptive introgression in tropical butterflies.

Genomic signatures of extensive inbreeding in Isle Royale wolves, a population on the threshold of extinction.

The accumulation of deleterious mutations as a consequence of domestication and improvement in sunflowers and other Compositae crops.

Elevated proportions of deleterious genetic variation in domestic animals and plants.

The accumulation of deleterious mutations in rice genomes: a hypothesis on the cost of domestication.

CADD: predicting the deleteriousness of variants throughout the human genome.

Fast, scalable prediction of deleterious noncoding variants from functional and population genomic data.

Molecular analysis of the β-globin gene cluster in the Niokholo Mandenka population reveals a recent origin of the βS Senegal mutation.

Evolutionary history and adaptation from high-coverage whole-genome sequences of diverse African hunter-gatherers.

Patterns of ancestry, signatures of natural selection, and genetic association with stature in Western African pygmies.

Whole-genome sequence analyses of Western Central African pygmy hunter-gatherers reveal a complex demographic history and identify candidate genes under positive natural selection.

Convergent adaptation of human lactase persistence in Africa and Europe.

Genetic signatures of strong recent positive selection at the lactase gene.

The UK10K project identifies rare variants in health and disease.

Glossary

a data structure that specifies the genealogical relationships among a sample of chromosomes while accounting for recombination events in the history of the sample.

a selective process that favors genetic diversity and therefore tends to maintain genetic variation at a locus for longer than expected by genetic drift alone.

a model that assigns samples to discrete categories.

or polygenic trait, a trait that does not follow Mendelian inheritance patterns and thus is likely affected by a large number of loci.

(also known as a pseudo-likelihood function), an inference function generated by combining a collection of individual component likelihood functions, often by assuming independence where it is not strictly warranted.

an artificial neural network with more than two layers used to process data with complex mathematical functions.

genome-wide distribution of selection coefficients for a set of variants.

a measure of the effect a particular variant has on the value of a phenotype.

a relative measure of divergence that compares total-population variation relative to within-subpopulation variation.

summary of a GWAS describing the marginal association of each individual allele with a trait of interest, typically including a P value, effect size estimate, and standard-error.

group of alleles on a single contiguous DNA sequence that are inherited from a single parent.

increase in frequency of a newly arising beneficial mutation, together with its haplotype background. In the case of a ‘complete’ hard sweep scenario, the beneficial mutation reaches fixation.

the process by which alleles in linkage disequilibrium to a beneficial allele in a site under positive selection increase their allele frequencies.

a statistically nonrandom association of alleles at two or more loci.

increase in frequency of a beneficial mutation together with its haplotype background, without the beneficial mutation reaching fixation.

a numeric score that measures the expected influence of a collection of assayed genotypes on a trait.

selection on a complex trait that is determined by the alleles at multiple loci across the genome. As a result, polygenic selection simultaneously alters allele frequencies at many genomic loci.

a class of artificial neural networks used to evaluate temporal and sequence data.

the time at which half of the samples in a population of interest reach a common ancestor, as a fraction of the time to the most recent common ancestor of all the samples in the population.

an approximation of the coalescent that assumes that the distribution of the genealogies at position i depends only on the genealogy at position i – 1 and not on the previous genealogies.

distribution of allele frequencies within a population.

increase in frequency of a standing genetic variant, together with the associated haplotype backgrounds, when that variant becomes beneficial, for example, due to a change in the environment. In the case of a ‘complete’ soft sweep, the beneficial mutation reaches fixation.

a technique that learns a model from labeled training samples and then uses the learned model to assign a discrete category or a continuous value to an unlabeled sample (test sample).

summary statistic that compares the average number of pairwise differences with the number of segregating sites.

most recent time at which a given set of lineages trace to a common ancestor.