As with all meta-analyses, the study findings may be affected by publication, search or selection bias affecting the studies ultimately included in the analysis. Where possible, steps were taken to minimise the effects of bias on the analysis, but the degree to which these steps were successful is difficult to quantify.

The primary aim of the present systematic review and meta-analysis was to understand whether capnography added to patient monitoring only (consisting of pulse oximetry and visual inspection of ventilation) reduces the incidence (or odds) of adverse events during PSA based on randomised controlled trials (RCTs) of patients undergoing a variety of ambulatory surgical procedures. The analysis was based on the hypothesis that earlier and more sensitive detection of ventilatory changes with capnography may allow for more timely intervention and prevention of potential adverse events, such as cardiac dysrhythmias. Throughout the analyses, we sought to provide the highest level of synthesised evidence with respect to the clinical utility of capnography monitoring during PSA. To mitigate potential pitfalls due to non-standard endpoints, particular emphasis was placed on maintaining a consistent definition of adverse events across all studies included.

In current clinical practice, patient monitoring during PSA often relies on visual assessment of ventilation and use of pulse oximetry, which reflects hypoxaemia. 10–14 To date, a mandate to include capnography in patient monitoring, as a means of early detection of alveolar hypoventilation, has remained a topic of debate. 15 In particular, there has been a perceived gap between various study outcomes and evidence of improved patient safety. No studies have provided ‘hard proof’ that addition of capnography to patient monitoring may reduce severe morbidity and mortality during PSA (in part because of ethical considerations to ensure patient rescue). Previous efforts to use meta-analysis to determine the utility of capnography to identify clinically significant respiratory depression have been faulted for large heterogeneity and non-standard endpoints. 16 17

The administration of procedural sedation and analgesia (PSA) involves achieving a drug-induced depression in level of consciousness and pain to ensure the comfort and cooperation of patients undergoing non-surgical and minor surgical procedures. Significant adverse events associated with PSA are relatively rare but not inconsequential, and can include severe oxygen desaturation, bradycardia, hypotension and cardiac arrest. 1 2 Consensus dictates that levels of sedation are directly related to patient risk during PSA, as is the potential for unintended progression from moderate to deep sedation. 3 Generally speaking, most cardiopulmonary events associated with PSA stem from poor or absent ventilation cascading into hypoxia, tissue injury and cardiac decompensation (see online supplementary figure 1 ). In turn, maintaining patient safety involves the identification of respiratory compromise to prompt the use of clinical intervention before further complications occur. 4–9

No patients, service users or laypeople were involved in the design or conduct of this study. Outcome measures were all related to patient safety during PSA, but were not developed based on an explicit elicitation of patient priorities, experience and preferences.

Sensitivity analyses were specified a priori and the tested conditions were: (1) inclusion of high-quality studies only; (2) inclusion of only moderate sedation; (3) inclusion of only studies with low risk of bias; (4) inclusion of only studies based in the USA; (5) inclusion of only studies based in Europe; (6) exclusion of paediatric data; (7) exclusion of gender-specific studies; (8) exclusion of data in patients <30 years of age. No formal statistical comparisons were made between sensitivity analyses, and intervention effects were not calculated for the excluded studies, thereby mitigating the introduction of type 1 error into the analysis.

The main outcome reported for each endpoint was the pooled mean risk ratio (RR), except when the incidence of rare endpoints was less than 1%. In these instances, the Peto method was used as a fixed-effects model designed specifically for analysis of rare endpoints. The Peto method only reports an OR and, to allow comparison between all endpoints analysed, the pooled mean OR was therefore also presented for all analyses. In all cases, the 95% CI is reported to allow assessment of significance.

Data extraction, initial data consolidation and summary statistics were performed in Microsoft Excel. Data for each endpoint were subsequently entered into Review Manager V.5.3.4 for results synthesis. 21 Heterogeneity of data was evaluated using χ 2 and I 2 statistics presented by Review Manager V.5.3.4, with I 2 further categorised by the tentative Higgins et al heterogeneity categories of low, moderate and high. 22 The meta-analysis performed calculated the mean intervention effect across all eligible studies using (after analysis of heterogeneity) a random effects model as described by DerSimonian and Laird. 23 An estimate of between-study variation was provided by the Mantel-Haenszel methodology. 24

Risk of bias in results was evaluated independently from the quality assessment through the declaration of funding sources and conflicts of interest. If the study was funded by industry then the study scored 2 , any conflicts of interest declared relating to industry funding outside of the current research publication scored 1. A study with low potential for bias, therefore, would have a score of 0. A high potential for bias was defined as a score of 3, while a score of 1–2 was considered to indicate moderate potential for bias. The absence of industry funding was not taken to signify an absence of bias, but the presence of industry funding or conflicts of interest was assumed to be an indicator of bias. 20

Assessment of article quality was conducted on a study (as opposed to outcome) level using a modified Jadad score, 19 with additional criteria added to make the adaptation specific to monitoring. The Jadad score assesses studies based on their design (randomised and blinded) and their reporting (all patients accounted for), with a maximal score of 5 (high quality) and a low score of 0 (low quality). Additional data included here were endpoint definitions, patient population, hospital location at which patients underwent sedation and the staff responsible for monitoring. In line with the Jadad score, items related to trial design could score up to twice as highly as items relating to trial reporting. The reporting of the inclusion/exclusion criteria and endpoint definitions scored one point each, and reporting the location of sedation, and the monitoring staff scored half-point each, making the maximal score 8 (high quality). For the purposes of analysing study quality, studies with scores of 0–5.5 were considered to be low quality, while studies scoring 6.0–8.0 were designated as high-quality studies.

Predefined endpoints of interest were desaturation/hypoxaemia (the primary endpoint, with severe desaturation defined as SpO 2 ≤85%), apnoea, aspiration, bradycardia, hypotension, premature procedure termination, respiratory failure, use of assisted/bag-mask ventilation and death during PSA. The protocol was left open for the analysis of other patient safety endpoints that were reported by ≥3 studies. Cardiac arrest and death were considered to be representative of severe morbidity and mortality. Notably, the present analysis examined individual endpoints as opposed to composite endpoints (eg, desaturation, apnoea or respiratory depression) and included analyses of more specific endpoints, such as oxygen desaturation <90% and <85%.

Literature searches were conducted in PubMed, the Cochrane Library and EMBASE. Search terms were a combination of MeSH (Medical Subject Heading) terms and free-text searches within the articles title and abstract. Searches aimed to identify all literature reporting on randomised, controlled trials in patients receiving sedation during ambulatory surgery and in which visual assessment of ventilation and pulse oximetry monitoring (control) was compared with control plus capnography. ‘Grey’ or unpublished literature (including congress abstracts) was included in the search strategy and, as the review protocol was not registered in advance, the full search strategy (see online supplementary table 1 ) and additional details are provided in the supplementary data. Only articles or abstracts published on or after 1 January 1995 were included and all searches were performed on 15 January 2017. A previous systematic review in this area did not identify any study prior to 1995, 16 and studies published prior to 1995 were considered unlikely to reflect modern clinical practice. No language exclusion was applied and inclusion was not dependent on the capnography monitor in use. After duplicate removal, title and abstract screening (see online supplementary table 2 ) was performed independently by RS and RFP using Sourcerer (Covalence Research Ltd, London, UK). 18 Full-text versions of all non-excluded articles were retrieved by MM and reviewed independently by RS and RFP. Data were then extracted independently by RS and RFP into data extraction forms in Microsoft Excel (Microsoft Corporation, Redmond, Washington). Any discrepancies in the extracted data were resolved by reference to the original study, reaching consensus between RS and RFP. All extracted endpoint data were reviewed by JRL and MMRFS for clinical utility to ensure that all synthesised data relate to clinically equivalent endpoints. Extracted data included the number of patients with events and the population at risk, in addition to items required to assess article quality and bias. Reference lists of included studies were not searched.

A series of sensitivity analyses were conducted in which the studies included in the estimation of the RR and OR were varied. The results of these analyses are presented in table 2 and show that results were generally robust to the studies included for data synthesis. There were limited data available to assess the impact of capnography monitoring during moderate sedation.

The need for assisted ventilation is reduced with capnography monitoring. The ORs for the assisted ventilation endpoint are presented for all studies (A), high-quality studies (B), studies with low risk of bias (C) and studies with the endpoint specified as bag-mask ventilation (D).

Only one study reported ‘respiratory failure,’ which was treated with assisted bag-mask ventilation. 28 In contrast, the number of studies (n=6) reporting assisted and/or bag-mask ventilation was sufficient to perform a meta-analysis of this endpoint as a surrogate for respiratory failure. 5 28 29 31 32 34 Due to the low number of events, a Peto fixed-effects OR model was used to assess this endpoint. Analysis found no evidence of heterogeneity (I 2 =0%, low) and demonstrated a significant reduction in assisted ventilation with capnography monitoring (OR 0.47, 95% CI 0.23 to 0.95). In every case, the need to provide assisted ventilation was lower in the capnography arm compared with the control arm ( figure 2 ). Three studies were of high quality and had a low risk of bias, meta-analysis of these studies gave an OR of 0.56 (95% CI 0.27 to 1.20). Three studies specified assisted ventilation as bag-mask ventilation, and for this subset of studies the OR was 0.56 (95% CI 0.26 to 1.25).

There was one clear outlier in the apnoea analysis, with data from Klare et al reporting an RR of 11.71 (95% CI 5.30 to 25.90). 34 Apnoea in this study was undefined for the standard of care arm, but in the capnography arm the apnoea criterion was the absence of exhaled CO 2 for ≥15 s. Different criteria between trial arms may explain the large difference in detected apnoea, and capnography would be expected to detect apnoea earlier than standard of care monitoring. Excluding this study from the analysis resulted in an RR of 0.85 (95% CI 0.65 to 1.12; OR 0.73, 95% CI 0.43 to 1.24).

Apnoea was less widely reported or reported in combination with disordered respiration. Comparable endpoints were reported in five studies, of which three were high quality. 5 6 25 33 34 There was substantial heterogeneity in the apnoea outcomes (I 2 =92%, high) and the analysis yielded a non-significant RR of 1.17 (95% CI 0.72 to 1.89). In an analysis including exclusively high-quality studies, the RR favoured capnography but remained non-significant at 0.89 (95% CI 0.64 to 1.23; see online supplementary figure 8 ).

Six studies, three of high quality, reported bradycardia outcomes. 25 28–30 33 34 The definition of bradycardia (heart rate <50 bpm) was consistent among five of the six trials and there was no evidence of heterogeneity between the studies (I 2 =0%, low). In four studies, the incidence of bradycardia was higher in the capnography arm compared with the control arm and overall, capnography monitoring was associated with a non-significant increase in bradycardia (RR 1.15, 95% CI 0.89 to 1.48; OR 1.16, 95% CI 0.88 to 1.54) and outcomes were not affected by the inclusion of only high-quality studies or only studies with low risk of bias (see online supplementary figure 7 ).

Synthesising estimates from high-quality studies supported the analysis of all studies, the RR of 0.57 (95% CI 0.36 to 0.92) and OR of 0.53 (95% CI 0.31 to 0.89) reducing by 0.02 and the CIs widening (see online supplementary figure 6 ). There was moderate heterogeneity between studies (I 3 =64%, moderate). Focusing on the six studies reporting an endpoint of SpO2 </≤85%, there was moderate heterogeneity and the RR was estimated at 0.56 (95% CI 0.41 to 0.78). Overall, a 40% reduction in the incidence of severe desaturation events would be expected with the use of capnography monitoring relative to standard of care.

Seven studies, of which four were classified as high quality, reported severe desaturation. 5 25 27–30 34 All but one of the studies defined severe desaturation as SpO 2 </≤85%. The analysis for this endpoint was aligned with the significant reduction in the odds of mild desaturation with the inclusion of capnography, with an RR of 0.59 (95% CI 0.43 to 0.81) and OR of 0.55 (95% CI 0.38 to 0.78). As with mild desaturation, there was evidence of heterogeneity (I 2 =47%, moderate).

All studies ( table 1 ) reported mild desaturation, with the definition varying from an oxygen saturation (SpO 2 ) of <95% to <90% for ≥15 s. 5 6 25–35 There was evidence of heterogeneity (I 2 =50%, moderate) in the primary analysis. Results indicated that capnography significantly reduced the incidence of mild desaturation (RR 0.77, 95% CI 0.67 to 0.89; OR 0.67, 95% CI 0.55 to 0.82; figure 1 ). The odds of a mild desaturation event were reduced by over 30% when capnography monitoring is used, compared with no use of capnography. If only high-quality studies (n=7, eight populations) were included (see online supplementary figure 5 ), there was evidence of heterogeneity (I 1 2=61%, moderate) but the outcome did not differ: RR 0.75 (95% CI 0.62 to 0.92; OR 0.63, 95% CI 0.48 to 0.83). Using exclusively studies with equivalent definitions of mild desaturation (<90%, n=8, nine populations), evidence of heterogeneity (I 2 =57%, moderate) was still present; the RR estimated from these studies was 0.76 (95% CI 0.65 to 0.89; OR 0.64, 95% CI 0.51 to 0.80).

Literature searches of PubMed, the Cochrane Library and EMBASE returned 385, 87 and 804 articles, respectively. After removal of 270 duplicates (62 Cochrane, 208 EMBASE), 1006 articles remained for abstract screening. Although reasons for exclusion varied (see online supplementary table 2 ), the two independent reviewers agreed upon a total of 24 articles to be retained for full-text review (Cohen’s kappa, 1.0). Eleven articles were excluded on full-text review (see online supplementary figure 2 ) because they: reported duplicate data (n=5), did not report patient safety data (n=3), did not include sedation (n=2) or compared two different capnography monitors (n=1). The 13 articles included for analysis are presented in table 1 and included data on 14 patient groups (one study, published by Mehta et al, provided separate data on colonoscopy and esophagogastroduodenoscopy). 25 All studies reported desaturation endpoints, although the definition did vary by study (see online supplementary table 3 ). Other endpoints were heterogeneously reported, but were in most cases reported by ≥3 studies making meta-analysis feasible as per the predefined protocol. Results reported are from random-effects models unless otherwise stated. Results for the use of supplemental oxygen and hypotension are provided in the supplementary information only (see online supplementary figures 3 and 4 , respectively).

Discussion

The findings of this meta-analysis of recent RCTs comparing visual assessment of ventilation and pulse oximetry monitoring with and without capnography during PSA showed that the odds of oxygen desaturation and assisted ventilation events were significantly reduced with the use of capnography. Other endpoints that could be affected by capnography monitoring were also considered but no significant differences were detected. Of potential clinical importance was the consistency of data across multiple high-quality clinical trials reporting a reduced incidence of assisted ventilation with capnography monitoring. No endpoints assessed in the meta-analysis indicated significant patient safety concerns with capnography.

Physician concerns for patient safety often focus on mortality and severe morbidity. Using the need for assisted ventilation as a proxy, there was evidence that severe morbidity may differ between control and capnography arms in the present meta-analysis. Although we note that no single trial showed a significant difference in this outcome, the information now exists to perform a power calculation to determine the number of patients that would be required to be enrolled in a prospective clinical trial to demonstrate a significant reduction in patient harm. The incidence of mortality and severe morbidity events during nurse-administered PSA has been reported to be 1 event per 303 procedures (0.33%).36 Taking this value along with the assumption that capnography could prevent 50% of events (in line with the estimate from our analysis), and employing trial-size estimation methodology reported by Zhong showed that 27 726 patients would be required to demonstrate statistical superiority.37 Switching to an assumption that capnography would prevent 10% of events, the required enrolment would be >900 000 patients. As such, we submit the feasibility of performing superiority trials is low, and leaves meta-analyses, such as the present study, as the only viable alternative for determining the impact of capnography on such critical patient endpoints.

Our analysis is timely given the ongoing debate as to whether the addition of capnography to patient monitoring during PSA adds value.17 Without doubt, potential technical and financial burdens have further limited adoption of capnography monitoring in various clinical settings.15 17 Nevertheless, it is important to recognise that patient safety benefits may offset a number of these concerns if the outcomes are applicable to current medical practice.38 In this regard, the 13 trials identified in the present analysis were all recent, with the first published study identified in 2006. The data used in the present meta-analysis therefore represent modern medical practice, and provide consistent evidence of improvements in patient safety with the use of capnography monitoring.

Our findings further substantiate a previously published meta-analysis (Waugh et al), which found that capnography monitoring was more likely to detect adverse events, but was faulted for large endpoint heterogeneity.16 In the present meta-analysis, we focused on identifying high-quality studies, and on maintaining consistent definitions across all included studies. The results show that the addition of capnography to patient monitoring during PSA results in increased patient safety, with significant reductions in mild and severe levels of oxygen desaturation, as well as the need for assisted ventilation.

A recent meta-analysis by Conway et al reported a significant benefit with capnography during colonoscopy only with respect to hypoxaemia.39 However, the Conway et al meta-analysis identified and screened only a fraction of the literature included in the present analysis (388 papers in Conway et al, compared with 1006 papers in the current study) and retrieved fewer RCTs (6 vs 13). In addition, Conway et al excluded two trials in which an independent observer monitored capnography output for all patients, and signalled to the attending physician when respiratory compromise was identified with capnography either immediately (intervention) or after a specified delay (control).5 6 The rationale for this study design was to prevent unnecessary patient harm while avoiding investigator bias. Based on our understanding, the two trials excluded in the Conway et al analysis were the only studies in the literature that could be considered fully blinded. Among the other studies, the attending physician would have been aware of study arm assignment.27 29 32

As with other major assessment tools such as Delphi, Consort and the Cochrane risk of bias tool, blinding is an integral part of the Jadad score used in the present analysis.19 40 The trials excluded from the Conway et al analysis are both considered to be ‘high quality’ in the present analysis, driven in part by the inclusion of blinding in the scoring methodology. Other included trials, though potentially more representative of current clinical practice, are open to operator bias, the consequences of which were demonstrated in 2012 by Veerus et al.41

The Jadad score is a widely used score of clinical study quality.42 In the present analysis, the scale was modified to make it more applicable to monitoring studies by including parameters such as monitoring staff and procedure location. One potential limitation of the present quality appraisal approach was the lack of validation of the modifications to the Jadad score; however, as might have been anticipated, the modified score does significantly correlate with the raw Jadad score (adjusted R2=0.93, p<0.01). Furthermore, analysis of mild desaturation data using a mixed model that took the Jadad score or the modified Jadad score as a covariate, found no significant difference between models and the heterogeneity accounted for (approximately 50% for both models).

Another ongoing debate in PSA concerns the clinical importance of seemingly minor endpoints, such as mild desaturation (oxygenation <90% for 15 s). Although such endpoints have traditionally been considered transient and perhaps clinically insignificant during PSA, several recent studies of common intraoperative events have suggested that mild desaturation may have more impact on postsurgical outcomes than has previously been recognised.43 For example, Dunham et al looked retrospectively and determined that surgical patients who experienced perioperative hypoxaemia/desaturation had a significant increase in their length of hospital stay (+2.0 days, p<0.0001).44 In turn, the impact of transient desaturation during PSA in terms of patient outcomes and quality of life may yet be of importance but remains to be determined.

Over all of the randomised trials included in the analysis, there was one report of patient mortality, which occurred in the standard of care arm of the trial presented by Klare et al.34 Only the largest trials reported any requirement for assisted/bag-mask ventilation, which is used as an intervention during, and thereby is a proxy measure for, potentially life-threatening events. Although it is widely accepted that much larger studies would be useful to assess whether or not capnography monitoring impacts patient major morbidity and mortality, there has been no determination to date of the trial size that would be required. Power calculations furthered by our meta-analysis suggest such a large RCT is likely to be impractical.

For healthcare providers, the most significant finding may be the consistency of data surrounding assisted ventilation and severe oxygen desaturation with capnography. Two closed claim reviews both found that inadequate oxygenation/ventilation was the most frequent event leading to a claim related to PSA outside the operating room.45 46 The potential cost burden is demonstrated by the median cost of a claim settled being US$330 000 (in 2007 US$).45 The authors reported that better monitoring would have reduced the number of claims.45 A similar message was returned following the fourth National Audit Project in the UK, which analysed major complications of airway management in the National Health Service and determined that capnography monitoring could have led to earlier identification of airway obstruction, potentially preventing 74% of death or neurological injury cases.47 48 Studies included in the present meta-analysis reported that disordered ventilation as detected by capnography preceded desaturation events by 30–60 s.

The meta-analysis did find an increase in bradycardia with capnography monitoring that was non-significant. However, in each of the trials reporting higher incidence the patients in the capnography arm had larger doses and increased use of multiple agents for inducing PSA. Such confounding is plausible, may not be unusual and was discussed as possible factor in the trial outcomes by Campbell et al.49 All other findings of the current analysis were in line with expectations around the potential benefits of capnography; as further substantiated by the results of our meta-analysis, earlier identification of respiratory compromise appears to result in more timely intervention and prevention of its escalation into patient harm.

As with all data synthesis projects, the present study is only as accurate and reliable as the data underlying it. In the literature, there are examples of newly published clinical trials that do not align with the results of published meta-analyses, and meta-analysis results changing on the publication of new data.50 51 The systematic nature of study identification and inclusion criteria in the present analysis was designed to identify all available literature and provide the most robust estimates of intervention effect. However, the included studies came from a variety of hospital settings, in which the rate of patient safety events might vary. This is apparent in the clinical trial results presented by Mehta et al, where colonoscopy and esophagogastroduodenoscopy were assessed independently due to differences in outcomes.25 Analyses for particular settings were undertaken, but were then limited by reduced data availability. In total, this analysis represented 5460 patients (control 2755 and capnography 2705) over 13 studies. Between trials, the number of patients enrolled varied between 132 and 986. Notably, of the six studies that identified rare outcomes (eg, use of assisted ventilation), five enrolled >500 patients.