This umbrella review aimed to summarise and critically evaluate the evidence from SRs and NRs of the effects of MLT on health and to identify the biological mechanisms of action involved. In total, 195 reviews were included (96% of the reviews were published after 2000). Of the reviews, 99 included evidence from in vitro or animal experiments, which highlights the still experimental phase of some MLT research and the translational potential for human trials.

There was a considerable clinical and methodological heterogeneity in terms of populations evaluated (from neonates to elderly), doses, excipients, quality or purity of MLT preparations, comparators, outcome measures, study designs, lengths of follow-ups, settings, etc. Despite that, the present review does lend support to the notion that endogenous and exogenous MLT is associated with improved health outcomes. However, caution is advised for the use or supplementation of MLT in some autoimmune conditions, such as rheumatoid arthritis, asthma or organ transplantation as MLT has been reported to stimulate the function of the immune system via the production of interleukins (IL-1, IL-2, IL-6 and IL-12), interferon γ (IFN-γ), T h cells, cytotoxic T cells, and B- and T-cell precursors [44].

Overall, though it seems that the connection between MLT and health is well founded, there is less evidence connecting MLT with specific diseases in a systematic way. The physiological role of MLT, as uncovered by various experimental studies, does, quite robustly, point to a direct relation between MLT and critical elements of health. However, the connection with specific conditions needs to be researched comprehensively. Thus, we suggest the need for high-quality primary data and we underline the importance of targeted studies on specific conditions, such Alzheimer’s or cardiovascular diseases.

Mechanisms of action

Some of the effects of MLT are via anti-oxidative (e.g. [45,46,47,48,49]), anti-inflammatory (e.g. [50,51,52]), anti-apoptotic (e.g. [53, 54]), anti-nociceptive (e.g. [33, 55]), anti-hypertensive (e.g. [56,57,58]), cytoprotective, neuroprotective, cardioprotective or nephroprotective effects (e.g. [59,60,61,62,63,64]), and by enhancing mitochondrial function and protecting nuclear and mitochondrial DNA or regulating homeostasis (e.g. [53, 65]; Table 1). Even though some of the mechanisms of action are well established, the relative absence of the exact role of confounding factors such as diet, exercise, sleep and genetics on the role of MLT to health limits the generalisability of the results. We here identify three important factors that can be taken into account by future researchers. Firstly, the climatic conditions – and especially latitude – could bias the physiological response. Secondly, the urban environment of cities and the presence of LED light could disrupt circadian rhythms and suppress the production of MLT. Finally, the overall cultural background could also have a significant impact, as this affects nutrition and clothing.

Safety

AEs of exogenous MLT and MLT analogues were reported in 11 (5.6%) of the included reviews. Two reviews pooled the safety data [40, 66]. In Liu and Wang [40], there were more subjective reports of at least one AE after treatment with ramelteon compared to placebo (RR = 1.11, 1.03 to 1.20, P < 0.01; seven studies). In Huang et al. [66], however, agomelatine revealed a lower rate of discontinuation due to AEs compared with selective serotonin reuptake inhibitors or serotonin–norepinephrine reuptake inhibitors (RR = 0.38, 95% CI = 0.25 to 0.57). AEs were typically mild and included worsening of symptoms (seizures, asthma or headaches), transient headaches and dizziness, abdominal pain, pharyngitis, back pain and asthenia, somnolence, fatigue, nasopharyngitis, upper respiratory infection, nausea, dizziness, diarrhoea, dyspepsia, dysmenorrhoea, diarrhoea, dry mouth, increased alanine aminotransferase, nightmares, morning drowsiness, enuresis, rash and hypothermia (Additional file 5: Table S5). Given the overwhelming benefits of MLT treatment and the existence of very few and mild AEs (also for long-term use), the risk–benefit ratio favours MLT.

Cost-effectiveness

Only two reviews undertook any health economic analysis of MLT. One review stated that the cost of a 30-tablet pack of 2 mg of Circadin was £15.39 [67], whereas Liira et al. [38] ‘did not find evidence on the cost-effectiveness of the drugs in the included trials’. More cost-effectiveness or cost-benefit analyses would be required to confirm the economic benefits of MLT and to inform various stakeholders and policymakers.

Quality (and quantity) of primary data

In 154 (78.9%) of the reviews, the quality of the primary data was not evaluated. In the 41 reviews (21%) that did evaluate it, the quality of the primary data ranged from poor to high (average = moderate), as judged by the authors of the included reviews, primarily using the Cochrane Risk of Bias Tool. The relatively low number of primary studies (median 9) included in the SRs or NRs might be of potential concern, and signals the need for more research into a wide range of conditions and clinical areas including oncology, emergency medicine, neurology, metabolic diseases, cardiovascular medicine, gynaecology, paediatrics, psychiatry, mental health, gastrointestinal diseases and pain management.

Review quality

The methodological quality of the included SRs was frequently poor (Additional file 3: Table S3). Most of the articles that scored poorly on the Oxman checklist (quality rating scale) were NRs, which are often of poorer quality compared to SRs. As these articles do contribute relevant information, we decided to include them in our study. Of the reviews, however, 36 (18.4%) scored 6–9 on the Oxman checklist, meaning they had minimal or no flaws.

Strengths and weaknesses

This umbrella review has important strengths, such as the inclusion and critical appraisal of 195 review articles, identification of gaps and uncertainties in the evidence base, and categorisation of significant health-related effects and associated mechanisms of action. However, this umbrella review of both SRs and NRs has several limitations that ought to be kept in mind when interpreting its results. First and foremost, even though comprehensive searches were employed, there is no guarantee that all relevant SRs of MLT were included. The searches were restricted to the past 21 years, thereby omitting some potentially older and potentially important reviews, as well as reviews published in languages other than English.

Secondly, one of the limitations of our overview is that many SRs often analysed the same primary studies. This overlap between SRs is important when interpreting results of this overview (Additional file 4: Table S4, Fig. 2). For instance, due to the double counting of the patient data resulting from the overlapping studies, the total number of patients included in our analyses is inestimable. Also, in the subset of 31 MAs, 238 RCTs were included. These RCTs were frequently used in more than one MA (range = 1–4, mean = 1.4, SD = 0.66), meaning that there were overlapping studies and double counting of the data (Fig. 2). To further illustrate this, three [31, 37, 68] of five MAs [31, 37, 68,69,70] evaluating MLT for cancers relied on the same data from the same four primary trials (Lissoni 1996, 1997, 1999, 2003). However, the amount of overlap was calculated (corrected covered area) and found to be 1.2%, which is 'slight' according to Pieper's formula.

Thirdly, although, four SRs were methodologically sound (Oxman checklist score ≥ 6), they were based on poor-quality primary data, which (logically) might seem contradictory.

Fourthly, we did not evaluate whether there was evidence for small-study effects using funnel plot asymmetry [23] (publication bias) because of insufficient data.

Fifthly, reviewing SRs might abandon the nuances that may be embedded in the original data, such as conflicts of interest, sources of funding, validity, generalisability etc.

Sixthly, various animal, human and in vitro models; different modes of administration; and exogenous and endogenous MLT were frequently analysed together, thereby giving limited understanding of how the results vary depending on the health outcomes evaluated.

Lastly, there is no commonly accepted cut-off point differentiating NRs vs. SRs using the Oxman scoring system. For example, a review that arbitrarily scored 2–3 on the scale (indicating the presence of major flaws) may be arbitrarily assigned as an NR as well as an SR (the definition being arbitrary too). In another example, reviews that could be arbitrarily judged as narrative with extensive flaws (a score of 1 or below), e.g. De Jonghe et al. [71], may include information about the number of primary studies and total sample size, i.e. 9/330. On the other hand, reviews that had no flaws (a score of 6–9) may not have that information, e.g. Liira et al. [38]. Taken together, these limitations reduce the conclusiveness of our findings, making them prone to criticism.