There has been an increasing incidence of Lyme disease (LD) in Canada and the United States corresponding to the expanding range of the Ixodes tick vector and Lyme disease agent (Borrelia burgdorferi sensu stricto). There are many diagnostic tests for LD available in North America, all of which have some performance issues, and physicians are concerned about the appropriate use and interpretation of these tests. The objective of this systematic review is to summarize the North American evidence on the accuracy of diagnostic tests and test regimes at various stages of LD. Included in the review are 48 studies on diagnostic tests used in North America published since 1995. Thirteen studies examined a two-tier serological test protocol vs. clinical diagnosis, 24 studies examined single assays vs. clinical diagnosis, 9 studies examined single immunoblot vs. clinical diagnosis, 7 studies compared culture or PCR direct detection methods vs. clinical diagnosis, 22 studies compared two or more tests with each other and 8 studies compared a two-tiered serological test protocol to another test. Recent studies examining the sensitivity and specificity of various test protocols noted that the Immunetics® C6 B. burgdorferi ELISA™ and the two tier approach have superior specificity compared to proposed replacements, and the CDC recommended western blot algorithm has equivalent or superior specificity over other proposed test algorithms. There is a dramatic increase in test sensitivity with progression of B. burgdorferi infection from early to late LD. Direct detection methods, culture and PCR of tissue or blood samples were not as sensitive or timely compared to serological testing. It was also noted that there are a large number of both commercial (n = 42) and in-house developed tests used by private laboratories which have not been evaluated in the primary literature.

The objective of this systematic review is to summarize the North American evidence on the accuracy of diagnostic tests and test regimes used to diagnose LD in patients presenting with clinical symptoms in North America at various stages of disease and to address the question of whether there is evidence of superior, equivalent or poor performance by the commercial (approved by the FDA and/or HC) and in house laboratory tests captured in this review. To the best of our knowledge this systematic review is a significant update to Dumler (2001) [ 24 ] and is complementary to a recent systematic review on European Lyme disease diagnostic tests [ 25 ].

The diagnostic tests available for confirmation of human LD have variable sensitivity and specificity depending on the stage of infection, thus it is important to monitor the literature on available tests for LD to promote those tests that perform the most effectively and address concerns about the performance of non-validated tests and test protocols using evidence-informed strategies for decision making [ 17 , 18 ]. Currently in Canada and the United States, a two-tiered serology protocol is the only validated diagnostic approach for LD diagnosis recommended by United States CDC and the Public Health Agency of Canada [ 17 , 18 ]. This two-tiered test is typically an enzyme immunoassay (EIA) to detect IgM or IgG antibodies to B. burgdoferi in serum and if the sample is positive or equivocal on the screening assay, then a western blot is used to detect serum IgM or IgG antibodies to B. burgdorferi. Use of IgM testing is recommended during the first 30 days of infection, after which only IgG tests should be used. Currently, only serology tests have been licensed for use by the FDA and the Health Canada Medical Devices Branch (HC) for LD testing [ 19 , 20 ]. Other direct detection tests such as PCR may be commercially available, but they have not been licensed for use by a governing body. There are a number of commercial EIA kits that are licensed by the FDA and/or HC and use either whole cell preparation of B. burgdorferi and/or purified recombinant or chimeric antigens (see S2 Text ). Other EIAs reported in the literature have been developed within the reporting laboratory and have not been commercialized or under-gone licensing and will be referred to as in-house developed tests [ 21 , 22 ]. The EIA’s have good sensitivity after 30 days of infection, but typically suffer from lower specificity [ 22 ]. In 1995, the Centers for Disease Control and Prevention (CDC) adopted criteria for interpreting the results of the western blot for LD and most commercialized tests follow these guidelines [ 23 ].

Lyme disease incidence has increased since 1975 as the tick vectors have expanded their geographic range across the north eastern and upper mid-western states in the US and more recently into Canada [ 2 , 13 ]. Range and spread of ticks and B. burgdorferi is facilitated by migratory birds and terrestrial hosts [ 14 ]. There is increasing evidence that climate change will result in further northward expansion of the tick vector’s range in Canada, resulting in increased future risk of LD among Canadians [ 15 , 16 ].

Ticks of the genus Ixodes transmit the spirochete when they feed. Ixodes scapularis, the blacklegged tick, is the main vector in northeastern and upper midwestern United States and Canada while I. pacificus is the major vector in western United States and western Canada [ 9 , 10 ]. The primary vectors of LD in Europe and Asia are I. ricinus and I. persulcatus respectively [ 6 , 11 ]. The principal natural hosts of immature stages of the ticks and B. burgdorferi include rodents, other small and medium sized mammals, reptiles and birds, while adult female ticks feed mainly on deer [ 12 ].

Lyme disease (LD) is the most common tick-borne infection in North America [ 1 , 2 ]. It was first publically recognized in the United States in 1975 in the towns of Lyme and Old Lyme Connecticut as a result of an investigation into 51 cases (39 children) with a similar form of arthritis, although the first case was describe five years earlier by a dermatologist in Wisconsin [ 3 , 4 ]. In North America early signs of infection may include erythema migrans (EM, a characteristic skin rash that often has a bulls eye appearance) and fever and non-specific symptoms like headache and lethargy [ 5 , 6 ]. If untreated, the disease can progress to disseminated LD with neurological, cardiac and arthritic manifestations [ 7 ]. Lyme disease in North America is caused by Borrelia burgdorferi sensu stricto (hereafter called B. burgdorferi) and recently Borrelia mayonii was identified and may be responsible for a proportion of cases, however the performance of LD diagnostic tests to identify B. mayonii infection is not available [ 8 ]. In Europe B. afzelii, B. garinii, B. burgdorferi, B. spielmanii, B. bissettii and B. bavariensis cause disease with a wider variety of symptoms than reported in North America; a number of genospecies including B garinii occur in Asia.

Meta-analytic statistical summaries of sensitivity, specificity, likelihood ratios and diagnostic odds ratio have been summarized where possible in the tables. Model diagnostics including goodness of fit, normality, influential and outlying points, publication bias and heterogeneity were examined where possible. Publication bias was not evaluated when heterogeneity was >60% or there were less than 10 lines of data. Meta-regression using the bivariate model was used to examine whether predetermined covariates explain some of the between-study variation given there was sufficient data to fit the model (>10 data lines per covariate).

The dataset was managed in MS excel; each line of data represents a single test accuracy outcome and one study may have several comparisons, thus several lines of data. Each comparison was extracted, grouped and coded according to tests and type of outcome reported. When there were four or more lines of data for a category, meta-analysis was conducted using hierarchical logistic regression and bivariate models in Stata 13 using Metandi and Midas command packages. These models have been designed to account for the correlation between sensitivity and specificity [ 39 ] and they overcome the often violated assumptions of a linear regression model [ 40 , 41 ]. These hierarchical models use 2x2 cell counts to compute log transformations of proportions for the analysis [ 39 ]. Without covariates, the hierarchical summary receiver operating characteristic (HSROC) and bivariate models are equivalent although their assumptions are different: HSROC assumes there is an underlying Receiver-Operating Characteristic (ROC) for each study and the bivariate model directly models the log-odds transformed sensitivity and specificity assuming a bivariate normal distribution between studies [ 42 ].

For this review the stages of LD are as follows: Early / acute LD (stage 1) is defined as those patients presenting with EM and/or associated manifestations that have experienced signs and symptoms of LD for less than 30 days [ 7 ]. Stage 2 illness is early disseminated LD, which includes manifestations of early neurological LD, cardiac LD and multiple EMs [ 36 ]. Stage 3 is late LD, typically with manifestations of Lyme arthritis and late neurological LD [ 36 ]. Those patients tested after antibiotic therapy are described as convalescent with the stage of LD assigned prior to treatment. Post treatment Lyme syndrome is defined as a condition where despite treatment the patient continues to experience illness [ 37 ]. “Chronic LD” is a condition that is not recognised as being caused by B. burgdorferi by most infectious disease experts, occurs in patients exhibiting non-specific illness who do not test positive on Food and Drug Administration (FDA) approved serological tests, so these have been excluded from this review [ 38 ].

Included papers examined the accuracy of diagnostic tests for LD in North America after 1995, and included studies that compared results of one test using a validated test panel, results of clinical diagnosis, or a gold standard test result or investigated inter-test agreement. The recommendations for two-tier testing occurred in 1995, so we limited the review to studies conducted after 1994. Studies that screened an asymptomatic population for LD were excluded from this study. No inclusion or exclusion criteria were implemented on the type of control group; instead it was evaluated as a source of variation between study results (heterogeneity). The control group was usually a mix of one or more categories of healthy volunteers from non-LD endemic or LD endemic regions, or asymptomatic blood donors. In some studies, patients with diseases that have similar signs and symptoms to LD or have humoral responses that overlap with LD and are known to cross-react (e.g. rheumatoid arthritis, systemic lupus erythematosus, syphilis, autoimmune disorders, leptospirosis, periodontitis, relapsing fever, tularemia, Southern Tick-associated Rash Illness (STARI), multiple sclerosis, and Epstein-Barr virus infection) were included as controls to more precisely define test specificity. Studies often used well-defined samples from serum repositories or panels, like those developed by CDC [ 32 ], a research institute [ 33 , 34 ] or a commercial company [ 35 ]. These results were included in this systematic review and the impact of patient-based or panel samples on the outcome was investigated.

The data extraction form captured all pertinent study details and results. The systematic review was managed in DistillerSR (Evidence Partners, Ottawa, ON, Canada) a web-based systematic review management software. Each form was completed by two reviewers working independently and conflicts were resolved by consensus. Data were exported to Microsoft Excel 2010 (Microsoft Corp., USA), prepared for summarization and analysed in STATA v. 13 (StataCorp., USA). The study protocol and PRISMA evaluation can be found in the supplementary material ( S1 Text , S3 Text ).

Studies identified in the scoping review that evaluated diagnostic tests for humans were fully evaluated in this systematic review. The systematic review tools include a confirmation of relevance, location of study, availability of extractable data and a quality assessment form based on the Quality Assessment of Diagnostic Accuracy Studies (QUADAS-2) tool [ 27 – 29 ]. This tool assesses the risk of bias and other methodological quality domains to evaluate the extent to which the results of each study or group of studies could be biased. The QUADAS-2 tool assessed the four quality domains ( Table 1 ) with respect to patient selection, the diagnostic tests used, the reference standard and flow and timing of the study [ 28 ]. An additional section was added to evaluate comparison tests and capture the presence of funding bias [ 30 , 31 ].

The scoping review search strategy was developed and pretested by three individuals with extensive experience in knowledge synthesis, zoonotic diseases and library science. The following search algorithm was implemented in eight bibliographic databases: BIOSIS (via web of knowledge), CAB abstracts, Scopus, PubMed, PsycINFO, APA PsycNet, Sociological Abstracts, and EconLit with no limitation on the search, this was followed by a comprehensive search for grey literature [ 26 ]: (lyme OR borrelia) AND ("host" OR sentinel OR landscaping OR "vector" OR "vectors" OR "monitor" OR "monitoring" OR surveillance OR reservoir OR reservoirs OR prevalence OR educate OR education OR barrier OR barriers OR intervene OR intervention OR incidence OR rate OR prevent OR prevention OR control OR risk OR risks OR attitude OR attitudes OR perception OR perceptions OR diagnostic). The search was conducted September 13 th -14 th , 2013 and no update of the search has been performed as analysis indicated the findings would not change with the addition of new papers, thus the resources required to conduct the update were not prioritized. The protocol for the scoping review is available upon request.

This systematic review was preceded by a scoping review conducted by Greig et al (2016) to identify, classify and characterise what is the current state of scientific knowledge on surveillance methods, prevention and control strategies, diagnostic tests, risk factors, and societal attitudes and perceptions towards LD in humans and B. burgdorferi in tick vectors and vertebrate reservoirs [ 26 ]. Briefly, the scoping review methodology was designed to characterise the primary literature on LD in humans or B. burgdorferi tick vectors or reservoirs, thus studies not on LD or B. burgdorferi were excluded from the scoping review. Additionally, the primary research had to address one of the following topics: surveillance/monitoring, prevalence, incidence, societal attitudes and perceptions in North America and global prevention and control strategies, diagnosis and risk factors. Research on clinical LD and treatment were considered outside the scope of this review. Each relevant paper was classified by purpose, study design, location of the study, B. burgdorferi, host species investigated, vector species investigated, sampling dates, diagnostic tests used, and whether the paper contained extractable data.

Testing for LD in patients exhibiting signs and symptoms of LD for less than 30 days is challenging as the performance of available test protocols is not optimal for making clinical decisions. This is largely due to the time required for the infected individual’s immune system to mount a reaction. This is why researchers have explored the use of a variety of targets including VlsE and C6 expressed after infection, Osp C and Fla B expressed by the feeding tick to detect infection sooner [ 71 , 72 ]. However, cross-reactivity and genetic variability within the targets has limited the diagnostic performance of any single target [ 73 , 74 ]. Thus the results of expected sensitivities and specificities in Table 8 emphasize the importance of physician evaluation and informed judgement when deciding to treat rather than rely entirely on imperfect serological test protocols. Notable findings in the table include the higher specificity associated with the two-tier testing method and the poor and highly variable sensitivity of serological tests in the initial stages of disease when an individual is mounting an immune response to B. burgdorferi.

Table 7 contains studies that looked at various samples and culture sensitivity in early LD as well as the use of various PCRs to identify B. burgdorferi infection. In one study there was agreement between culture of serum vs. plasma, however whole blood classified more samples positive compared to serum resulting in little agreement [ 58 , 68 ]. The confirmation of B. burgdorferi presence in culture using qPCR both increased the sensitivity and shortened the length of culture time before a positive result could be obtained [ 69 ]. A study examining the sensitivity of direct qPCR targeting flaB or recA genes compared to culture of 2mm EM biopsy samples showed little agreement and qPCR targeting the recA gene was more sensitive compared to the fla B target [ 70 ].

The results of inter-test comparisons are summarised in Tables 5 – 7 . Note that in these tables we have positive agreement and negative agreement that indicate how well the two tests agreed to classify samples as positive or negative respectively. Thus, positive agreement is the probability that test 2 is positive if test 1 is positive and negative agreement is the probability that test 2 is negative if test 1 is negative. Table 5 has comparisons between the two-tier serological tests compared to other tests and Table 6 includes studies that examined various assays and immunoblots for agreement.

Three studies (eight lines of data) were captured with information on the use of PCR to identify B. burgdorferi in early LD [ 59 , 61 , 62 ]. Samples included blood and tissue biopsies and each PCR targeted different primers. Eshoo et al (2012) used blood samples and multi-loci PCR targeting eight different loci to both detect and genotype B. burgdorferi, the sensitivity was 62% (40–79) and the specificity was 100% [ 61 ]. Liveris et al (2012) used a nested PCR on serum samples and biopsy samples with a sensitivity of 40.6 and 42.6% respectively [ 59 ]. They also implemented a qPCR on plasma samples demonstrating a sensitivity of 33.8%. Two nested PCR primer sets targeting the Osp A gene were investigated in neurological LD, both acute and late cases using cerebral spinal fluid samples; they reported a sensitivity of 37.5–50% in acute cases and 12.5–25% in late cases [ 62 ]. Across the direct detection studies sensitivity was low and in most cases lower than the two-tier test regime, assays or immunoblots reported for early LD.

There are six studies, (13 lines of data) that examined bacterial isolation by culture and PCR detection of B. burgdorferi in a variety of human samples from cases of early and disseminated LD [ 57 – 62 ]. Meta-analysis was not possible within this group of studies because there were not enough lines of data within each detection method. The most commonly used medium is Barbour-Stoener-Kelly (BSK) medium, which has been modified by some authors to improve its sensitivity [ 63 ]. Three studies attempted to isolate B. burgdorferi from blood (serum/plasma) of patients with early LD (stage 1) and the sensitivity of this approach was 27%, 71% and 94% [ 57 – 59 ]. With respect to the latter sensitivity, it has been suggested that laboratory contamination may account for the very high sensitivity reported [ 64 ]. Two studies reported sensitivities of 62–81% from biopsy samples of EM during early LD [ 59 , 60 ], although both sample sizes were very small. Phillips et al. evaluated an “MPM” medium for detection of B. burgdorferi in the blood of LD patients that had been previously treated, but then relapsed [ 65 ]. They reported a sensitivity of 91.5% in these patients, however two studies were unable to reproduce these results and both demonstrated that the BSK-H culture was superior [ 66 , 67 ].

The Immunodot Borrelia Dot Blot IgG/IgM test by General Biometric Inc. was examined in one study; the results are shown in Table 4 . An insignificant increasing trend in sensitivity with disease progression was noted (stage 1 50% (95%CI 19, 87), stage 2 70% (35, 93) and stage 3 100% (63, 100) [ 52 ]. Viramed Biotech Borrellia burgdorferi B31 IgG/IgM Virablot demonstrated a comparable sensitivity and specificity in one small study to the other immunoblots evaluated [ 35 ]. One in house recombinant immunoblot (data not shown) did not perform well in the published study with sensitivities ranging from 7 to 60 percent for different targets [ 56 ].

The BBI western blot was evaluated in two separate studies using the same CDC test panel, but slightly different classification criteria; one used the BBI criteria (IgG required 3+ bands of 20,23,31,34,35,39 and 83 kDa and IgM 2+ bands 23,39,41, and 83 kDa) which has a different formulation for positive samples compared to the CDC criteria (IgG required 5+ bands 18, 23, 28, 30, 39, 41, 45, 58, 66, and 83 to 93 kDa and IgM 2+ bands 23, 39, and 41kDa) [ 35 , 46 ]. The results of the two criteria differed in sensitivity, which was 77% and 93% using CDC criteria respectively for IgM and IgG blots, compared to 93% and 100% using the BBI criteria for IgM and IgG, however the difference was not significant and specificity ranged from 77–99% with a gain in sensitivity resulting in slight losses to specificity, Table 4 .

The MarDx ® Lyme Disease Marblot Strip test system was evaluated in four studies (7 lines of data) on select LD groups and across early to late LD groups [ 52 – 55 ]. A meta-regression controlling for group indicated that the test performed significantly better on late LD patients, but whether the investigator evaluated results for IgM, IgG or both in parallel did not significantly affect the sensitivity or specificity.

Recombinant proteins and/or chimeric proteins from Osp A-F (mainly A and C) targets were used to develop assays and tested on early LD patients. All studies were based on in house ELISAs with small sample sizes and the reported sensitivities varied from target to target ranging from 0–86%. Other assays included the use of Poly-ethylene Glycol (PEG)-peptide conjugates in an ELISA that reported 100% sensitivity and specificity on a small sample [ 49 ]. An indirect hemagglutination antibody (IHA) test using B. burgdorferi strains B31 and B126 had a low sensitivity 46–48% and a specificity of 98–99% which is comparable to other tests for early LD [ 50 ].

Whole cell sonicate (WCS) ELISAs for early LD included 10 lines from 6 studies Table 3 . Three commercial test kits were included; Lyme Stat Test Kit, VIDAS Lyme Screen II and Wampole Bb ELISA test system (see S2 Text ) across six lines and three studies. These performed differently than the four in house WCS ELISAs and the authors did not offer an explanation for the divergent results.

ELISA performance on early stage 1 LD was investigated in 53 lines of data (16 studies), Table 3 . These were further grouped by type of ELISA to understand where variation between studies was occurring. ELISAs targeting C6 included 4 lines (3 studies) on the Immunetics® C6 B. burgdorferi ELISA™ kit and seven lines (four studies) on unlicensed C6 ELISAs ( Table 3 ). Accounting for whether the C6 ELISA was licensed explained 27% of the heterogeneity between studies and indicated the commercial ELISAs had an insignificant higher sensitivity 91(81–100) vs. 64(47–80) and similar specificity 97(94–100) vs. 97(95–99) over all stages of LD.

First tier serological tests including enzyme-linked immunosorbent assays (ELISA) and other serological assays were evaluated in 23 studies (119 lines of data) with well-defined and whole cell targets, Table 3 . There were a mix of FDA-licensed tests and in house tests. Similar to the two-tiered tests, test performance for patients with stage 1 LD was highly variable and had poor sensitivity. In later stages of LD, the sensitivity improved. The overall specificity varied by test and between studies more than was reported for the two-tier tests.

Thirteen studies evaluated the two-tier serological test protocol for diagnosis of LD at different stages of disease and after antibiotic therapy. Table 2 provides the meta-analytic summaries demonstrating low sensitivity, 46.3% (95%CI 39.1–53.7), for early (stage 1) LD patients and increasing sensitivity with stage 2, 89.7% (78.3–95.4), and stage 3, 99.4% H (95.7–99.9) LD. There was relatively high specificity (98.3%–99.9%) across control groups. Most false positives within the control groups were patients with diseases known to produce antibodies that cross-react in serological tests for B. burgdorferi. Nine studies (14 lines of data) presented results for two-tier serological testing where at least one of the tests was not FDA licensed (designed in house by the reporting laboratory), Table 2 . Heterogeneity analysis of sensitivity and specificity on the impact of using non-commercial tests was not significant. At the early stage of LD the two-tier testing method was good for ruling in LD if the patient tested positive, but had very poor predictive value for ruling out LD, which is why it is recommended to retest after 30 days [ 21 ]. However, for convalescent patients treated at stage 1 LD sensitivity remained low even after 30 days.

The QUADAS-2 tool results, Table 1 , indicated that there was an unclear risk of bias in 84% of studies, meaning the study received an unclear or high risk of bias score on one or more domains (see S1 Dataset ). No studies were excluded from the analysis based on their QUADAS assessment. In two studies it was apparent that the sample population was not appropriately enrolled in the study as the case population and control population were enrolled at different times and places, which could lead to biased (exaggerated) results for test accuracy [ 43 , 44 ]. Appropriate blinding was often not addressed in many papers and unexplained exclusion of observations from the analysis was another common reporting issue. Many of the studies (28.6%) had authors employed by or funded by commercial companies that supplied one or more of the tests evaluated. In four of these studies the risk of funding bias was identified to be very high [ 43 , 45 – 47 ].

In the scoping review, 485 articles focused on diagnosis of LD in humans globally and were further evaluated for inclusion in this systematic-review meta-analysis. The decision tree for selection of articles and reasons for exclusion of potentially relevant studies in this systematic review is shown in Fig 1 . Forty-eight relevant diagnostic test evaluations conducted in North America between 1995 and 2013 were included in this systematic review (see S2 Text and S1 Dataset ).

Discussion

The 48 studies included in this analysis were all conducted in the United States from 1995 onwards. The samples included patients or historical samples where the clinical presentation fit the diagnosis of LD. Within the results we summarized results for all stages of LD, separate stages 1–3 LD and convalescent stages 1–3 LD to facilitate an evaluation of trends, similarities and differences by test, stage of disease and treatment status. There were a few studies that differentiated acute samples <7 days and early Lyme samples 7–30 days, but not enough to analyse predictive values within early LD. Similarly there were studies that used culture positive patients exclusively, however the culture status of the patients did not significantly account for the heterogeneity. Stage 1, 2, and 3 convalescent LD groups were sampled in a number of studies and are summarized separately from samples drawn pre-treatment as it is known that there are differences in the immune response depending on the length of LD prior to treatment [5,6].

In the United States it was recently estimated that less than 12% of Lyme disease tests were for true infections [75]. The LD test results for patients who do not meet the clinical criteria can be used to rule out LD, but a positive test is likely to be a false positive. Thus, the over use of these assays to diagnose LD has been an on-going discussion and challenge for topic-specialists and physicians [76]. The literature summarised in this systematic review was based on research conducted from 1995 when the CDC adopted the recommendations for two-tier testing of LD acquired in North America. Their goal was to improve the specificity of LD testing by recommending the use of a sensitive EIA followed by a more specific western blot for positive and equivocal samples [23]. Most of the research on diagnostic tests in North America were based on serology, mainly antibody based assays detecting an immune response against B. burgdorferi. As of May 2015 there were 42 tests approved by the FDA for use in the United States and 22 approved by Health Canada Medical Devices Branch for use in Canada, however only a few of these tests were evaluated in the primary literature and all the literature published since 1995 was conducted in the United States (see S2 Text).

Recent studies examining inter-laboratory agreement and the sensitivity and specificity of various test protocols noted that the C6 ELISA alone and the two-tier approach has superior specificity compared to proposed replacements and the CDC-recommended western blot algorithm has equivalent or superior specificity over other proposed test algorithms [77]. The findings of this review are in agreement with other authors that sensitivity was highest for ELISAs targeting C6 and these showed less variability in test sensitivity compared to other tests and test protocols [77]. The C6 ELISAs, particularly the commercial assays, had promising sensitivity, specificity and agreement of results with two-tier protocols, which is likely why the Immunetics® C6 B. burgdorferi ELISA™ has become widely used in place of some WCS assays. Although we did not summarize results of inter-laboratory agreement studies in this systematic review, the requirement for technical expertise and subjectivity in result interpretation for many LD tests, particularly western blots, contributes to poor agreement between technicians, tests and/or laboratories [77].

Factors that affect the sensitivity and interpretation of the results include type of sample and stage of disease in addition to possible variations in the type, target and conduct of the diagnostic tests. In this systematic review all relevant studies examining the efficacy of serological tests used serum samples from patients. There were no studies that employed the use of synovial fluid or cerebrospinal fluid for diagnosis of LD with serological assays. However in the last few years a number of studies have emerged from Europe on assays designed for cerebrospinal fluid samples in the diagnosis of neuroborreliosis which is a more common clinical presentation in Europe [78–80].

Throughout our results there was a positive association between duration of infection/ stage of disease and sensitivity of serological LD tests [34,47,60]. Thus, recommendations include re-testing after 30 days if the initial serological test was done during the early (non-disseminated) stages of infection and employing IgM assays as well as IgG assays to detect early immune reactions [21,81]. Other sources of heterogeneity between studies may include whether the case sampling frame included only samples from culture positive LD patients. Similarly, the impact of type of sample, prospective vs. retrospective patients and sample libraries or serum panels for test performance was investigated wherever possible in the analysis. The control group samples in the captured studies ranged from groups of healthy individuals from endemic and non-endemic areas to controls with diseases known to cross-react with LD diagnostic assays. Despite this, most studies reported a consistently high specificity for LD regardless of the composition of the control group and where there were differences (Tables 2–4), these were not statistically significant in most cases.

There was a wide range of assays identified in this systematic review including those assays that employed whole-cell sonicates mainly from B. burgdorferi B31 or other North American isolates to recombinant proteins targeting antigens that are highly expressed in vivo e.g. VlsE. Some of the captured research indicates that the VlsE targets improve test performance [45,82]. Similarly the C6 peptide which is derived from the VlsE lipoprotein has shown equivalent or better sensitivity compared to the WCS ELISAs in this systematic review, improved specificity for patients with often cross-reactive diseases and may also be used to identify some species of Borrelia acquired in Europe [47,73,74,82–84]. Subjectivity and inconsistency of the criteria used to evaluate western blot results has been noted as a source of confusion for patients and physicians in the interpretation of diagnostic results [85,86]. In studies where the CDC western blot interpretation was paired with different criteria, some showed gains in sensitivity with alternate criteria, but this was usually accompanied by a reduced specificity below an acceptable level [46].

Direct detection of B. burgdorferi from LD patient samples continues to be a challenge. B. burgdorferi requires culture in a complex medium for 8 to 12 weeks before the culture is considered negative, which makes this approach unsuitable in a clinical setting. Recent studies have attempted to improve the utility of culture by changing the protocol, for example, use of a 60 ml of BSK in a closed tube, incubated at 32–33°C for 8–12 weeks [58,59]. Another study used 15 ml and 2 ml starter cultures, then at day six seeded a long term culture in a caliper jar with 15 ml of fresh BSK for up to 16 weeks at 34°C [57]. Variations that had positive effects on culture growth included adding serum, a reducing agent and rifampicin [57–59]. The use of PCR to confirm bacterial isolation improves the sensitivity compared to visual confirmation by staining with acridine-orange and using dark-field microscopy or fluorescent microscopy [69]. The specimen, stage of LD and the laboratory technician’s experience has an effect on the likelihood of obtaining a successful B. burgdorferi culture. In early LD a biopsy sample from an EM lesion taken within the first week of symptoms has the highest sensitivity, whereas early disseminated infections have a higher sensitivity if isolation is attempted on large volume plasma samples [59,69].

Bacterial isolation has had limited success with late manifestations of LD and with cerebrospinal fluid and synovial fluid samples [87,88]. Research continues to focus on improving the sensitivity and speed of culture. Recent papers claiming major breakthroughs for B. burgdorferi isolation have failed validation [57] or could not be replicated [65] by others [64,67]. PCR for detection of B. burgdorferi DNA in LD patient samples is affected by many of the same limitations as culture with the exception that results may be obtained faster and PCR may be more sensitive in samples with a low concentration of B. burgdorferi. The variability of methodologies, gene targets and primers from study to study continue to impact the interpretation of the PCR results [59,61,62]. Overall, the sensitivities of PCR studies conducted in North America were lower than those that employed a two-tiered serology diagnostic protocol [59,61,62]. Due to the above limitations, bacterial isolation and PCR are not routinely used as diagnostic tools in clinical practise, although bacterial isolation is considered the gold standard to confirm diagnosis.

From the peer-reviewed literature we identified validation data from only a small proportion of licensed assays and for a number of “in house” tests which are used by several laboratories across North America. The performance of “in house” tests cannot be validated or critiqued as the composition of the test is not always publically available or evaluated in the peer-reviewed literature, thus comparing their performance to licensed tests is less informative. In studies looking at the variable performance of diagnostic testing schemes across laboratories it has been demonstrated that deviations from recommended diagnostic schemes often lead to a decrease in specificity and discordant results with approved testing schemes [89]. Thus, the performance of these “in house” assays and some of the older commercial assays have not been evaluated against well characterised panels of serum from patients with the full spectrum of LD clinical symptoms, with appropriate numbers of healthy controls and patients with look-alike diseases [32].

Future work on diagnostic tests for LD includes continued improvement in the sensitivity of all tests, particularly for early LD samples and the ability to distinguish between active infection and previous infections. On-going work into new immunoassay techniques and combinations of antigen targets that may help inform disease stage will hopefully improve LD diagnostics in the future [60,79,90]. Development of point-of care tests that do not require highly specialized technical skills and subjective interpretation of the results would help address some of the criticisms of immunoblot techniques. This systematic review summarizes research in North America on the accuracy of diagnostic tests for LD conducted since 1995. The performance of the commercially available Immunetics® C6 B. burgdorferi ELISA™ shows the most promise as a possible standalone test or as part of a two-tiered test protocol; however it did not overcome the low sensitivity of LD diagnostic tests in patients with early LD. Addressing this shortcoming is a significant challenge to improving LD diagnostics.