Active smoking is an established critical factor for epigenetic modifications in blood DNA [4]. To our knowledge, this is the first systematic literature review on this topic. We identified 17 eligible articles, which explored the association of active smoking exposure with epigenetic changes in blood DNA. Overall, 1460 smoking-related CpG sites were identified in 14 EWASs, 62 of which were discovered by multiple (≥3) studies. The most frequently reported sites were cg05575921 (AHRR), cg03636183 (F2RL3), cg19859270 (GPR15), and other loci within the intergenic regions 2q37.1 and 6p21.33. Prominent findings for these smoking-related genes were further analyzed in GSMSs to disclose dose-response relationships of smoking intensities and time since cessation with methylation levels. Taken together, these studies suggest the possibility of using methylation markers for a refined quantification of smoking exposure and to better predict the risks of smoking-related diseases.

Smoking-induced methylation could occur in many regions of the human genome. In the annotated gene regions, approximately half of the smoking-related CpG sites are located in the body of specific genes (e.g., AHRR, F2RL3, etc.). Notably, their effect size was commonly higher than that of sites located at other parts, including the 1st exon, untranslated region (UTR), and transcription start site (TSS) (Additional file 1: Table S1, Fig. 2). Additionally, about one quarter of smoking-related loci are located in the intergenic regions, such as 2q37.1 and 6p21.3 (Additional file 1: Table S1).

For the sites located in the gene bodies, epidemiological studies have meanwhile observed biologically plausible associations with smoking-induced chronic diseases or cancers. For instance, the first discovered smoking-associated site cg03636183 is located in the body of gene F2RL3 (the coagulation factor II receptor-like 3 gene) [5], and was consistently confirmed in multiple EWASs and even replicated across racial groups [15, 17–19, 22]. The function of F2RL3 is coding the thrombin protease-activated receptor-4 (PAR-4), which is a protein expressed in various tissues over the body, including blood leukocytes and lung tissue, and plays a key role in platelet activation and cell signaling. This could partly explain why the methylation pattern of F2RL3 was found to be related to risks for cardiovascular diseases (CVD) and lung cancer, as well as to total mortality [26, 27]. Nevertheless, the role of F2RL3 as a potential mediator or just an indicator of smoking-related risk is still not well understood.

The strongest and most consistent associations have meanwhile been reported for CpG sites located in the body of AHRR, a well-known tumor suppressor. Smoking could trigger the generation of polycyclic aromatic hydrocarbons (PAHs) that affects the aryl hydrocarbon receptor (AHR), leading to alterations in the expression (and methylation status) of AHRR [3]. Thus, this gene could mediate detoxification of PAHs and might be involved in the metabolism of endogenous toxins from cigarette smoking [28]. A recent study by Zhang et al. disclosed clear dose-response relationships of AHRR methylation with both current and lifetime smoking exposure, as well as with smoking-related mortality outcomes [29]. Methylation at cg05575921in AHRR and a locus in 6p21.33 were additionally suggested to be promising candidates for enhancing cardiovascular risk prediction [29].

Irrespective of full understanding of the pathophysiological mechanisms, strong and consistent associations with smoking of a variety of CpG sites suggest their potential use as main correlates for smoking exposure. Two GSMSs have demonstrated the potential of several sites within F2RL3 as promising biomarkers for both current and past smoking exposure [8, 23]. A recent study has identified that cg05575921 within AHRR was both sensitive and specific for current smoking in adults with an area under the curve (AUC) of 0.99, and efforts evaluating methylation of cg05575921 as a biomarker to guide smoking cessation are ongoing [30]. Further studies on loci within more smoking-related genes are in need to explore precise dose-response relationships to describe smoking exposure globally and understand their molecular mechanisms comprehensively.

In addition to the critical sites in gene bodies, there are several smoking-related sites within other genome regions, such as cg19859270, which are located at the 1stExon of gene GPR15 (G-protein-coupled receptor 15), and several loci in the intergenic region 2q37.1. These sites might make additional contributions to smoking exposure evaluation through a smoking-related methylation signature. Along with the significant loci in AHRR and F2RL3, they could facilitate the construction of a quantitative approach with better specificity to differentiate never smokers from former smokers (validated AUC = 0.83, positive predictive value = 0.85) [7]. In 2012, GPR15 was first reported in the study of Wan et al., along with its relationships with current and long-term smoking [12]. Afterwards, the study of Tsaprouni et al. presented that this gene was the only one showing a clear trend of increased gene expression in smokers compared to non-smokers with a prominent negative correlation between gene expression and methylation [20]. The author thus presumed that the reduction of methylation levels of locus cg19859270 within GPR15 in smokers would lead to increased transcription. This differential expression of GPR15 in smokers compared with never smokers was further confirmed by recent studies of Bauer et al. and Kõks et al. [31, 32]. In addition, as an HIV co-receptor, this gene was recently reported to interact with the ethnicity-dependent differential prevalence of HIV, especially HIV2 in African Americans [33]. Moreover, the six significant sites in locus 2q37.1 are directly located adjacent to a cluster of alkaline phosphatase genes [34].

In the implementation and interpretation of EWASs and GSMSs based on blood samples, a potential limitation deserving particular attention is that whole blood DNA represents a mixture of DNA from various types of leucocytes that show partly different methylation patterns. Hence, smoking-related differential methylation may, in theory, partly reflect smoking-related shifts in leucocyte distribution. The majority of EWASs adjusted their analysis for leucocyte distribution with the algorithm of Houseman et al. [25]. However, although smoking is known to increase the overall numbers of leucocytes [35], the impact on leucocyte distribution still remains unclear [15–17, 20, 21]. Recently, GPR15 methylation has been shown to be linked with chronic inflammation via regulating T cell migration [32], which raised the possibility that for some loci, like GPR15, differential methylation might reflect a shift in blood cell mixture. On the other hand, a recent study compared the smoking-related methylation profiles in both buccal and whole blood samples and found that effect sizes in blood samples were similar to that in buccal samples [36]. This suggests that cell type distribution has no major impact for the majority of smoking-related differential methylation. Nevertheless, even if smoking-related methylation patterns were partly due to confounding by leucocyte distribution, they might still be useful as biomarkers for smoking exposure.

Our review was limited to smoking-associated DNA methylation changes among adults. Similar to the findings in adults, EWASs investigating the role of maternal smoking in newborns also identified differentially methylated CpG sites in several smoking-related genes, such as AHRR, MYO1G, and GFI1, but with less pronounced effect sizes. Interestingly, several loci, such as cg23067299 (AHRR) and cg05549655 (CYP1A1), were, so far, only discovered in studies assessing the impact of maternal smoking in newborns, whereas none of these studies reported differential methylation of cg03636183 (F2RL3) and cg19859270 (GPR15), two critical loci associated with adult smoking. These discrepancies are likely to be explained by the differences of exposure pathways and population susceptibilities [3, 37], but more and larger EWASs in the respective age groups are needed to further clarify similarities and differences.

The number of known smoking-related CpG sites continues to increase. Given that smoking is an established risk factor for many chronic diseases, these loci could have important applications as objective biomarkers of both current and lifetime smoking exposure and for quantifying risks of smoking-related diseases. Recent GSMSs have already demonstrated strong dose-response relationships between methylation signatures and current and lifetime smoking exposure, as well as time since cessation of smoking [7, 8]. Furthermore, strong associations have been demonstrated between methylation signatures and a variety of major disease endpoints, including coronary artery diseases, lung cancer, and asthma [29, 38–41]. With further refinement of methylation signatures and further evaluation of the predictive value for these and additional disease outcomes, smoking-related methylation signatures might become a valuable tool for enhanced risk stratification and risk-adopted screening and treatment decisions in clinical practice. As a promising example, Teschendorff et al. constructed a smoking index score and subsequently showed that it was able to discriminate normal tissue from cancer tissue rather well [36], thereby demonstrating that smoking-related methylation indices could be useful risk indicators of smoking-induced health disorders.

This review has specific strengths and limitations. Strengths include the comprehensive search in two main databases, as well as strict adherence to standards of study selection, classification, and reporting. However, despite this comprehensive search strategy, we cannot exclude the possibility of having missed relevant studies, especially studies reported in languages other than English, or without full-length reports. Second, in most studies, smoking exposure was exclusively ascertained by self-reporting which is known to be less than perfect and most likely led to underestimation of true effects. Finally, our review was restricted to methylation patterns in blood DNA associated with active smoking. The focus on this sample matrix was a conscious decision due to its special relevance and ubiquitous availability in large epidemiological as well as routine point-of-care settings. Future research should address specific smoking-associated methylation signatures in various types of tissues (e.g., tumor samples, buccal cells). Also, apart from active smoking, methylation signatures reflecting passive smoking would be of major interest.