According to the BMJ Rapid Recommendations process, 15 a guideline panel provided critical oversight to the review and identified populations, subgroups, and outcomes of interest. The panel included six content experts (five orthopedic or trauma surgeons and one physiotherapist), six methodologists (four of whom are also front line clinicians), and four patients with personal experience of fractures (one of whom had used LIPUS). All patients received personal training and support to optimize contributions throughout the guideline development process. The members of the patient panel led the interpretation of the results based on what they expected the typical values and preferences of patients to be, as well as the variation between patients.

We searched Medline, PubMed, Embase, CINAHL, and the Cochrane Central Register of Controlled Trials up to 16 November 2016, using a combination of keywords and MeSH terms for fracture, orthopedic surgical procedures, and ultrasound. Additional searches included trials registries clinicaltrials.gov and isrctn.com. An experienced research librarian designed the search strategies (appendix 1). Two independent reviewers scanned the references from eligible studies, related systematic reviews, and all studies citing eligible randomized controlled trials on Google Scholar.

Two reviewers, independently and in duplicate, screened the titles and abstracts of identified articles and acquired the full text of any article that either reviewer judged to be potentially eligible. They independently applied the eligibility criteria to the full texts and, when consensus could not be reached, resolved disagreements through discussion or adjudication by a third reviewer.

We included randomized controlled trials that compared LIPUS with a sham device or no device in patients with any type of fracture regardless of location (long bone or other bone), type (fresh fracture, delayed union, non-union, or stress fracture), or clinical management (operative or non-operative). We included any type of osteotomy, including distraction osteogenesis. We excluded trials published only as protocols or abstracts if we were unable to get the final results from investigators.

Two reviewers used standardized forms to independently abstract data; they resolved disagreements by discussion or involved a third reviewer when required. Extracted data included characteristics of patients and fractures, clinical management, risk of bias, intervention details, statements about compliance with treatment, and outcomes.

Two reviewers independently assessed risk of bias using a modified Cochrane risk of bias instrument that includes response options of “definitely or probably yes” (assigned a low risk of bias) or “definitely or probably no” (assigned a high risk of bias), an approach we have previously validated. 18 On the study level, we assessed generation of randomization sequence, concealment of allocation, blinding of patients, caregivers, and outcome reporting (by comparing each publication with its corresponding published protocol, when available). For each outcome within studies, we assessed blinding of outcome assessors, loss to follow-up, and additional limitations. We considered ≥20% loss to follow-up to represent a high risk of bias, unless the investigators performed appropriate sensitivity analyses showing the robustness of the results. We categorized a trial as being at low risk of bias for a particular outcome if we identified no limitation for any risk of bias item. As a post hoc sensitivity analysis, we alternatively considered a more conservative threshold of ≥10% loss to follow-up because the categorization of three trials depended on this criterion (19%, 16 28%, 19 and 31% 20 loss to follow-up for the outcome radiographic healing).

Patients identified functional recovery (time to return to work and time to full weight bearing), pain reduction, and number of operations for subsequent fracture or related to osteotomy (re-operation for operatively managed fracture and osteotomy) as the most important outcomes for patients considering LIPUS for bone healing. Because many clinicians currently base their management on time to radiographic healing, a surrogate outcome important only insofar as it influences patient experience, the panel requested its inclusion in our review. We extracted all outcomes that fell into these categories as well as adverse effects related to the ultrasound device.

Synthesis of results

We pooled treatment effects of LIPUS on similar outcomes across eligible trials, regardless of clinical subgroups, focusing on complete case analysis. We calculated pooled estimates and associated 95% confidence intervals using random effects models for meta-analysis with three or more studies and fixed effects models for meta-analysis with two studies. We examined heterogeneity associated with all pooled analyses using both the χ2 test and I2 statistic. SAS version 9.4, R version 3.1, and Review Manager 5.3 provided software for the statistical analysis.

For time to event outcomes, we pooled hazard ratios. For studies that did not apply methods of survival analysis, we considered time to event reported as a continuous variable (for example, days to return to work) at the longest follow-up time. We used the relative effect measure ratio of means (mean LIPUS/mean control) to account for the baseline difference in fracture healing depending on type of bone and (such as scaphoid, clavicle, tibia, etc) and fracture or procedure (such as stress fracture or distraction osteogenesis). We pooled the natural logarithm of the ratio of means and presented the results as percentage difference (relative change). For studies that reported the proportion of patients who achieved the event at a specific time point, we calculated risk ratios.

When studies used different instruments to measure the same construct on a continuous scale, we converted all instruments to the most commonly used instrument among studies and then pooled results using the weighted mean difference.21

For the outcomes number of subsequent operations and adverse events related to the device, we calculated both risk ratios, which are preferable in case of varying baseline risks, and risk differences, which allow inclusion of studies with zero events in both groups.

In consultation with the expert and patient guideline panel, we prespecified three subgroup hypotheses to explain heterogeneity of effects between studies: LIPUS will show larger effects in studies at high risk of bias; effects will differ based on clinical subgroups; and LIPUS will show larger in studies effects with greater patient compliance. In consultation with the six clinical experts on the parallel guideline panel, we classified eligible randomized controlled trials according to the following five clinical subgroups: operatively managed fresh fractures, non-operatively managed fresh fractures, stress fractures, non-union, and osteotomy (including distraction osteogenesis). Because compliance was inconsistently reported, two reviewers independently categorized trials using response options of “definitely or probably high compliance” or “definitely or probably moderate compliance,” using as a guide a definition of high compliance as at least 80% of patients who applied LIPUS for at least 80% of the total time prescribed. We conducted univariable tests of interaction to establish if the effect size from the subgroups differed significantly from each other and, to test independence of subgroup effects, performed multivariable meta-regression in which we included risk of bias (high versus low), compliance with LIPUS treatment (high versus moderate), and clinical subgroups (as above) as independent variables in a single model.

Only one outcome, days to radiographic healing, included enough studies to perform all planned subgroup analysis. We had prespecified in our protocol at least three studies per group. We assessed the credibility of significant subgroup effects using the criteria suggested by Sun and colleagues.22 Based on the finding that risk of bias seemed to independently explain the high heterogeneity in the outcome days to radiographic healing, we performed subgroup analysis by risk of bias for all outcomes.

The authors and the guideline panel achieved consensus in categorizing the quality of evidence for all reported outcomes as high, moderate, low, or very low using the GRADE (Grading of Recommendations Assessment, Development and Evaluation) approach. In the GRADE approach, randomized controlled trials begin as high quality but can be rated down because of risk of bias, inconsistency, indirectness, imprecision, or publication bias.23 We considered rating down for inconsistency if the magnitude and direction of effects were dissimilar, the confidence intervals had minimal overlap, the test of heterogeneity was significant, or the I2 was high.24 For outcomes with 10 or more studies, we inspected symmetry of funnel plots and performed Egger’s statistical test for publication bias.25

To calculate absolute effects, we applied the effect estimate from the meta-analysis to the control arm of the TRUST trial, which enrolled patients with tibia fractures and had the largest sample size of any eligible study that was at low risk of bias. The approach to rating certainty of individual outcomes was fully contextualized—that is, in rating quality about any individual outcome, we took into account the findings on the other outcomes.