In the most recent episode of Emergency Medicine Cases Journal Jam, Rory, Anton, and I cover the evidence for (for against) using BNP in the emergency department. These are my notes.

Summary

Looking at observational data, BNP and NT-proBNP both appear to have a good sensitivities for CHF, but only moderate to poor specificities. There are a number of RCTs looking at BNP use in the emergency department setting. Two studies demonstrated a decrease in hospital length of stay and total costs, but 4 other studies showed no difference. Two studies looked at ED length of stay, 1 demonstrating a statistical but clinically insignificant difference and the other showing no difference. None of the studies demonstrated a change in ED treatment, mortality, or hospital readmission. There are a large number of problems with these studies, including the lack of a clear gold standard for CHF, a lack of blinding, incorporation bias, and spectrum bias. These problems are discussed further in the discussion section. I have never worked in an emergency department where BNP testing has been available, and after reviewing this literature I think that is probably a good thing. It is easy to get excited about tests with high sensitivities, but the use of diagnostic tests is complex and fraught with unintended consequences. I think the best evidence to date suggests that BNP testing does not provide any patient important benefit to emergency department patients.

Background

BNP (B-type natriuretic peptide) is a cardiac neurohormone that is produced by cardiac myocytes in response to myocardial stretch and therefore is elevated in the setting of heart failure. (McCullough 2002) Cells actually release a precursor (proBNP) into circulation, which is then cleaved into the active BNP and an inactive fragment called N-terminal proBNP. The half life of BNP is about 20 minutes whereas the half life of NT-proBNP is about 1-2 hours. Clinically speaking, it appears that BNP and NT-proBNP are interchangeable, without clinically meaningful differences in accuracy. (Carpenter 2012)

The Evidence

As always with these reviews, there are a lot of studies to cover. I have only included a handful of the observational studies, to provide a general sense of the strengths and weaknesses of this data. The studies that are most relevant to emergency medicine practice are the randomized control trials conducted in emergency department settings, and so I have included all such studies in this review. I mention some of the EBM limitations of each study, but the discussion section at the end of the post provides a more in depth summary of the limitations of this data.

Observational data

Maisel AS, Krishnaswamy P, Nowak RM. Rapid measurement of B-type natriuretic peptide in the emergency diagnosis of heart failure. The New England journal of medicine. 2002; 347(3):161-7. PMID: 12124404 [free full text]

Methods: This is a prospective, multi-centre, observational cohort that included 1586 patients presenting to emergency departments with a chief complaint of dyspnea. Patients were excluded if they had advance renal failure (creatinine clearance < 15mL/min), acute MI, or an obvious cause of dyspnea such as trauma. All patients had a BNP drawn and the treating emergency physician recorded patient information, including an estimate of the clinical probability of CHF at the time of disposition using a visual analogue score. The final diagnosis of CHF was based on 2 independent cardiologists who had access to all clinical data 30 days after the ED visit, but were blinded to the BNP.

Results: The cardiologists disagreed with the emergency physician’s initial diagnosis 14% of the time, but they disagreed with each other (despite having access to more clinical information) 11% of the time, so the emergency physicians were incredibly accurate. The different sensitivities and specificities of various cut-offs are listed in this figure:

Comments: When the emergency physician was sure it wasn’t CHF (<5%), they were incredibly accurate (correct 92% of the time). Similarly, when the emergency physician was sure of a diagnosis of CHF (>95%), they were correct 95% of the time. It is only the intermediate group where BNP has any chance of helping, but BNP is less accurate in this intermediate cohort. (They don’t present these results in this study, but in a secondary analysis later published by Schwam 2004).

McCullough PA, Nowak RM, McCord J. B-type natriuretic peptide and clinical judgment in emergency diagnosis of heart failure: analysis from Breathing Not Properly (BNP) Multinational Study. Circulation. 2002; 106(4):416-22. PMID: 12135939 [free full text]

Methods: This is a secondary analysis of the prospective, multi-centre, observational cohort presented in Masiel 2002. The treating emergency physician recorded patient information, including an estimate of the clinical probability of CHF at the time of disposition using a visual analogue score. The final diagnosis of CHF was based on 2 independent cardiologists who had access to all clinical data 30 days after the ED visit, but were blinded to the BNP. (Overall, the two cardiologists diagreed 11% of the time, and at one site they disagreed 24% of the time, giving you a sense of the problem with this “gold standard”.)

Results: The emergency physicians’ clinical estimation of CHF was trimodal: there were patients they were pretty sure didn’t have CHF, patients they were pretty sure did have CHF, and then another spike right at the 50% guess mark. Overall, the ED physician’s impression had a sensitivity of 49%, a specificity of 96%, and a positive likelihood ratio of 11.5 (negative likelihood ratio not reported) for CHF using the cardiologist as the gold standard. In comparison, BNP (using a cutoff of 100 pg/mL) had a sensitivity of 90%, a specificity of 73%, a positive likelihood ratio of 3.4, and again the negative likelihood ratio wasn’t reported. So to summarize, BNP was more sensitive but less specific than the emergency department diagnosis. The area under the curve was only marginally improved, from 0.88 on physician judgement alone, to 0.90 with BNP alone, to an estimated 0.93 if they were combined (which they weren’t in this study).

Comments:

In the write up, they spend a bunch of time talking about patients in whom BNP would have corrected the physician’s impression, but they don’t talk at all about the patients in whom the BNP would have been wrong (which is fairly often given the specificity of 73%). This is somewhat misleading.

The emergency physicians were 74% accurate on day one. In comparison, the cardiologists were only 90% accurate at best (the disagreed with each other 11% of the time) 1 month later with a tremendous amount of extra clinical data.

I think it is pretty unfair to equate an estimate of >80% probability of CHF with “ruling in: and <20% with “ruling out”. Those are still pretty uncertain numbers. Using the more realistic numbers of <5% to rule out and >95% to rule in, physician judgement actually outperformed BNP. (Schwam 2004)

Maisel A, Hollander JE, Guss D. Primary results of the Rapid Emergency Department Heart Failure Outpatient Trial (REDHOT). A multicenter study of B-type natriuretic peptide levels, emergency department decision making, and outcomes in patients presenting with shortness of breath. Journal of the American College of Cardiology. 2004; 44(6):1328-33. PMID: 15364340 [free full text]

Methods: This is a prospective, multi-centre, observational trial that included 464 adult patients who were treated in the emergency department for CHF. Patients were excluded if they had a BNP <101 pg/mL. ED physicians classified patients into NYHA functional classifications. They were supposed to do BNP levels every 3 hours, but don’t present that data because 75% of the patients were gone by 3 hours. 90% of the patients were admitted.

Results: The NYHA classification did not correlate with 90 day mortality or readmission, but the BNP level did. However, it isn’t clear that the BNP level would actually be helpful, as the AUC for 90 day outcomes was only 0.67. 90 day events were seen in 9% of patients with a BNP <200 pg/mL as compared to 29% of those with a BNP >200 pg/mL. Unfortunately, although 9 and 29% are clearly different, I am not sure the different is enough to help make clinical decisions in the emergency department. Both groups still seem to require admission.

Januzzi JL, Camargo CA, Anwaruddin S. The N-terminal Pro-BNP investigation of dyspnea in the emergency department (PRIDE) study. The American journal of cardiology. 2005; 95(8):948-54. PMID: 15820160

Methods: This was a prospective observational trial that included 599 adult (over 21 years) patients who presented to the emergency department with dyspnea. They excluded patients with severe renal dysfunction (Cr >2.5 mg/dL), significant ST changes, and obvious chest trauma. They also excluded patients with unblinded BNP levels drawn, but they don’t tell us how often that occured. All patients had an NT-proBNP drawn. At the end of the emergency department visit, emergency physicians were asked to rate the likelihood of CHF on a scale of 0-100. The final diagnosis was by consensus of 2 cardiologists blinded to NT-proBNP result. The cardiologists disagreed with each other 10% of the time.

Results: Mean BNP levels were higher in patients with a final diagnosis of CHF (4,054 pg/mL vs 131 pg/mL). The optimal cutoff was 900 pg/Ml, resulting in a sensitivity of 90% and a specificity of 85%. They then break the numbers down into different cutoffs for ruling in an ruling out, with different numbers for older and younger patients, which results in moderately good numbers, but they don’t tell us how many patients would actually fit into each group. This is a common practice in these studies, but defining the ideal cut-offs based on the data collected will significantly over inflate sensitivity and specificity, and we should expect worse numbers in subsequent validations of those numbers. The Nt-proBNP was marginally better than emergency physician judgement (area under the curve 0.94 vs 0.90, p=0.006), but remember than cardiologists (the gold standard) disagree 10% of the time, so all these numbers sit on decidedly soft ground.

Systematic reviews of observational data

Roberts E, Ludman AJ, Dworzynski K. The diagnostic accuracy of the natriuretic peptides in heart failure: systematic review and diagnostic meta-analysis in the acute care setting. BMJ (Clinical research ed.). 2015; 350:h910. PMID: 25740799 [free full text]

Methods: This is a systematic review and meta-analysis that includes 37 observational cohorts looking at the diagnostic performance of BNP and NT-proBNP.

Results: Using a cutoff of 100 ng/L for BNP, the pooled sensitivity, specificity, positive predictive value, and negative predictive value were 95%, 63%, 67%, and 94%. At a cutoff of 100-500 ng/L, the numbers were 85%, 86%, 85%, and 86%. For NT-proBNP using a cutoff of 300 ng/L, the pooled sensitivity, specificity, positive predictive value, and negative predictive value were 99%, 43%, 64%, and 98%. There was no evidence that either BNP or NT-proBNP was better.

Comments: There are a number of problems with these numbers (lack of a clear gold standard, spectrum bias, and inappropriate exclusions) that are explored further in the discussion. My biggest concern is the lack of specificity of these test. If used widely in the emergency department, we might expect to see a significant increase in false positives and the use of confirmatory testing.

Martindale JL, Wakai A, Collins SP. Diagnosing Acute Heart Failure in the Emergency Department: A Systematic Review and Meta-analysis. Academic emergency medicine. 2016; 23(3):223-42. PMID: 26910112 [free full text]

Methods: This systematic review includes data from 52 cohorts covering 17,893 patients.

Results: For diagnosing CHF, BNP has a sensitivity of 93.5% (95%CI 82.6-94.2), specificity of 52.9%, positive likelihood ratio of 2.2, and negative likelihood ratio of 0.11 (95%CI 0.07-0.16). The number for NT-proBNP are essentially the same, with a sensitivity of 90.4%, specificity of 38.2%, positive likelihood ratio 1.8, and negative likelihood ratio 0.09. Interval LRs might improve the numbers a little bit, but the vast majority of patients fall into an indeterminate range. As a comparison, just looking for B lines on lung ultrasound had a sensitivity of 85%, specificity of 93%, positive likelihood ratio of 7.4, and negative likelihood ratio of 0.16.

Comments: The BNP numbers are better than isolated parts of the classic history and physical, but seem worse than bedside ultrasound. There are some significant problems with the data that forms the basis for these numbers. The most important is the lack of a gold standard test for CHF, so the standard is a cardiologist’s opinion (and cardiologists frequently disagree with each other). Another problem is that CHF is not a single disease: gradual fluid overload is different than sympathetically driven flash pulmonary edema which is again different from an acute valve rupture. Test characteristics may be quite different in different populations. Furthermore, although BNP looks better than individual physical exam findings (like an S3), no physician uses pieces of the history and physical in isolation. The combination of multiple parts of the history and physical may (and probably do) result in highly accurate diagnoses in the hands of experienced physicians. None of these studies looked at physician judgment. The biggest problem is that observational data can get us information about sensitivities and specificities, but they can’t actually tell us if the tests are helping our patients. For that, we need randomized control trials.

Bottom line: In these observational trials, BNP at very low levels has a high sensitivity, but a low specificity. Using higher cut-offs results in higher specificity, but lower sensitivity. It isn’t clear that BNP can actually outperform physician judgement. BNP accuracy gets worse in patients that emergency physicians are less certain about. It isn’t clear what the best cut-off is, and many of these studies retrospectively overfit cutoffs to the data they collected. There are some major limitations to the data, the biggest of which is the lack of a clear gold standard for diagnosing CHF.

RCTs of emergency department management

Mueller C, Scholer A, Laule-Kilian K. Use of B-type natriuretic peptide in the evaluation and management of acute dyspnea. The New England journal of medicine. 2004; 350(7):647-54. PMID: 14960741 [free full text]

Methods: This is a single-center, industry sponsored, prospective, randomized, controlled, single-blind study out of Switzerland. They included 452 adult patients (out of 665 screened) presenting to the emergency department with acute non-traumatic dyspnea. All patients received the usual ED workup, but were randomized to have BNP available or not. In the BNP group, guidance was given that a level below 100 pg/mL made the diagnosis of CHF unlikely and a level above 500 pg/mL made CHF the most likely diagnosis. Patients were excluded if they had renal disease (creatinine > 250 umol/L) or cardiogenic shock.

Results: There were fewer admissions in the BNP group (75% vs 80%, p=0.008) and fewer patients were admitted to ICU (15% vs 24%, p=0.01). Those numbers are high, and tell us that this test was being applied to a sick group of patients. It took 90 minutes for the standard group to get appropriate therapy (which seems really long to me), and only 60 minutes in the BNP group, but “appropriate” is dependant on the final diagnosis, and the BNP result might have been incorporated into this definition. Hospital length of stay was shorter in the BNP group (median 8 vs 11 days). Costs were also lower in the BNP group ($5410 vs $7264, p=0.006), but those costs are essentially entirely driven by the long inpatient stays. Decisions about admissions, ICU admissions, and discharge are all subjective, and may represent the fact that physicians are more comfortable using lab tests than clinical judgment to guide their decisions. There was no difference in the objective outcomes of mortality or readmission.

Comments: Overall, this is a pretty good study. It is an RCT, which is exactly what we want, but rarely get, when assessing diagnostic tests. However, the physicians making clinical decisions were not blinded and the outcomes that were changed were all based on subjective physician judgment, as well as being dependant on the baseline care being provided. The results might be different if the baseline length of admission was 5 days instead of 11, or if the admission rate was lower than 80%. These are reasons to be cautious of single center studies.

Furthermore, it is important to consider whether some selection bias might have been at play. The final diagnosis of COPD was made more commonly in the BNP group (23% vs 11%), when we would expect the final diagnoses to be pretty similar with proper randomization. On the other hand, that discrepancy may represent incorporation bias because of incomplete blinding, where the BNP value results in rather than predicts the final diagnosis.

Moe GW, Howlett J, Januzzi JL, Zowall H. N-terminal pro-B-type natriuretic peptide testing improves the management of patients with suspected acute heart failure: primary results of the Canadian prospective randomized multicenter IMPROVE-CHF study. Circulation. 2007; 115(24):3103-10. PMID: 17548729 [free full text]

Methods: This is an industry-sponsored, multicenter, partially blinded RCT looking at 500 adult patients presenting to one of 7 Canadian EDs with dyspnea suspected to be of cardiac origin. NT-proBNP was collected on all study subjects, but only physicians in the intervention arm had access to the results in the ED. CHF was considered to be ruled out with a level < 300 pg/mL and ruled in with a level > 450 pg/mL for patients below 50 years old and >900 pg/mL for patients over 50 years old. ED physicians were asked to estimated the likelihood of CHF before enrollment. The final diagnosis was based on the opinion of 2 cardiologists 60 days later who had access to all clinical data except the NT-proBNP levels. They don’t tell us how often these 2 cardiologists disagreed. Patients were excluded if they had a serum creatinine > 250 umol/L, acute MI, malignancy, or an obvious cause of dyspnea such as pneumothorax.

Results: The median ED length of stay was slightly less in the NT-proBNP group (5.6 vs 6.3 hours, p=0.03), although I am not sure those numbers are really importantly different. Also, it isn’t clear to me how one lab test results in a change in the median time at 6 hours. There were fewer readmissions in the NT-proBNP group, although the result is technically not significant (13% VS 20%, P=0.05). Total medical costs were also reduced in the NT-proBNP group ($5180 vs $6129, p=0.02). (However, the ED visit costs were the same between the two groups. This is a recurrent finding: the BNP result may affect inpatient decisions, but has much less (or no) impact on ED decisions.) Although not statistically significant, the mortality was higher in the BNP group (4.5 vs 2.4%, p=0.19). There were no differences in initial hospitalizations, hospital LOS, or ICU admissions. These results held true when looking at the subgroup of patients with an intermediate probability of CHF (20-80%). There was a marginal improvement in accuracy, with the area under the curve improving from 0.83 (95%CI 0.80-0.84) to 0.90 (95%CI 0.90-0.93) with the addition of BNP, although BNP alone (AUC 0.86 95%CI 0.84-0.89) was not better than clinical judgement alone.

Comments: This is another well performed (but industry sponsored) RCT that shows some small improvements with BNP, but a lack of blinding is still a concern, and objective outcomes were not changed.

Rutten JH, Steyerberg EW, Boomsma F. N-terminal pro-brain natriuretic peptide testing in the emergency department: beneficial effects on hospitalization, costs, and outcome. American heart journal. 2008; 156(1):71-7. PMID: 18585499

Methods: This is a single-centre RCT of 477 adult patients presenting to an emergency department in the Netherlands with acute dyspnea. Patients were excluded if they were dialysis dependant, presented with trauma, or were in cardiogenic shock. In this setting, the emergency department evaluation was only done by an emergency medicine resident 24% of the time, with the initial evaluation of most patients being done by residents in internal medicine, cardiology, or pulmonology. Although not stated clearly, it appears that outcome assessors in this trial were not blinded. The NT-proBNP level was provided only to physicians in the intervention group, with the guidance that a level below 93 pg/mL for males and 144 pg/mL for females ruled out heart failure, whereas levels above 1017 pg/ mL ruled in the diagnosis. These patients were younger than other studies, with a mean age of 58.6 years. Unlike other studies, ⅔ of the patients had been seen by their primary care doctor and were referred to the emergency department. The clinicians judged the pre-test probability of acute heart failure to be high (>75%) in 15% of patients, indefinite in 21%, and unlikely (<25%) in 59% of the population.

Results: Hospital length of stay was significantly shorter in the NT-proBNP group (1.9 vs 3.9 days, p=0.04). Costs were not statistically different, but the point estimate was lower in the NT-proBNP group ($4984 vs $6352). There were no differences in the rate of hospitalization or the ED length of stay, so BNP seemed to help the admitting team but not the ED physician. There was no change in mortality or readmission rate between the groups.

Comments:

Looking more closely at the numbers: among the low risk patients (and I am not sure using 25% as a cutoff is really low risk), there were 61 patients with a NT-proBNP above 1017 pg/mL. Of these, 29 had their diagnosis changed to cardiac, but that means there were 32 false positive even in the highest BNP range. There were another 93 patients with a NT-proBNP in the indeterminate range, and they don’t tell us what they did with those patients clinically.



Of the 70 patients that the physicians thought had CHF, only 2 had low NT-proBNP numbers, but they don’t tell us what their final diagnoses were.



Of the patients who were thought to be indeterminate clinically, NT-proBNP was thought to rule in CHF in 59 patients, rule it out in 22, and be unhelpful in 24. However, because these results were not blinded, it seems like the final diagnosis of CHF (or gold standard) included the NT-proBNP results (incorporation bias). Therefore, it is not clear whether the NT-proBNP was actually helpful in diagnosing CHF, or whether it was just used as the definition of CHF in a kind of self-fulfilling circular diagnostic reasoning.



I will also note that none of these studies talk about adjustments for multiple comparisons, but they all make multiple comparisons. In a study like this, with multiple negative outcomes and just a single positive outcome, it doesn’t seem correct to emphasize the positive outcome in the abstract.



Again, decisions on when to discharge are somewhat subjective, and degree of certainty, even if unfounded, may impact physician willingness to discharge. I would prefer objective measures such as hypoxia or a walk test.



Resident judgement maybe not be the ideal comparison.

Schneider HG, Lam L, Lokuge A. B-type natriuretic peptide testing, clinical outcomes, and health services use in emergency department patients with dyspnea: a randomized trial. Annals of internal medicine. 2009; 150(6):365-71. PMID: 19293069

Methods: This is an randomized, single-blind RCT out of 2 Australian emergency departments. They included 612 patients over the age of 40 presenting with severe dyspnea (triage category 1-3). Patients were excluded if they had trauma, cardiogenic shock, or a creatinine > 248 umol/L. All patients had a BNP drawn, but only the intervention group emergency physicians had access to the BNP results. There was more hypertension, prior CHF, and orthopnea at baseline in in the BNP group. The final diagnosis was CHF in 45% of patients and 85% were admitted to hospital. Physicians were advised that a BNP < 100 ng/L made the diagnosis of heart failure unlikely, whereas a BNP > 500 ng/L made heart failure likely. The final diagnosis was based on 2 clinicians (at least one cardiologist) with all clinical data available except the BNP. (The cardiologists only had moderate agreement, with a kappa of 0.79).

Results: There were no differences between the groups for any outcomes. Admission rates were 85% and 87%. Length of stay was 4.4 vs 5.0 days. There was no change in the use of bronchodilators, diuretics, vasodilators, steroids, angiotensin-converting enzyme inhibitors, or noninvasive positive pressure ventilation. Mortality and readmission were also unchanged.

Singer AJ, Birkhahn RH, Guss D. Rapid Emergency Department Heart Failure Outpatients Trial (REDHOT II): a randomized controlled trial of the effect of serial B-type natriuretic peptide testing on patient management. Circulation. Heart failure. 2009; 2(4):287-93. PMID: 19808351 [free full text]

Methods: This is a industry-sponsored, convenience sampled, multi-centre non-blinded RCT. They included 385 adult patients presenting with signs or symptoms of acute CHF requiring therapy. Patients were excluded if they had a BNP < 100pg/mL, acute MI, ACS with ECG changes, or dialysis. 88% of patients were admitted to hospital. (Using the test you are studying as an exclusion criteria guarantees the results of your trial will not extrapolate well to real world use). The control group had no BNP testing. The intervention group had a BNP measured at baseline, 3, 6, 9, and 12 hours, and then daily until discharge (so this study is more relevant to internal medicine than emergency medicine). Physicians in the control group could order a BNP from the lab at their discretion. Physicians were not given instruction on how to interpret the results. The primary outcome was hospital length of stay.

Results: There were no differences between the groups in terms of admission (85% vs 84%), hospital length of stay (6.5 vs 6.5 days), mortality (5.5% vs 3.7%), return visits (23.7% vs 20.2%), or the use of CHF medications.

Comments: This study was not blinded. Half of the patients in the control group had a BNP drawn on day one, so although they didn’t have the multiple BNP protocol, there may not have been much difference in the management of these patients, so we probably shouldn’t be too surprised that there were no differences in outcomes.

Boldanova T, Noveanu M, Breidthardt T. Impact of history of heart failure on diagnostic and prognostic value of BNP: results from the B-type Natriuretic Peptide for Acute Shortness of Breath Evaluation (BASEL) study. International journal of cardiology. 2010; 142(3):265-72. PMID: 19185372

Methods: This is a post-hoc analysis of the RCT described in Mueller 2004, looking specifically at the patients with a previous diagnosis of heart failure. Among 452 patients presenting to the emergency department with acute dyspnea, 64 (14%) had a prior history of CHF.

Results: BNP had a similar diagnostic accuracy in both groups, but it wasn’t great in either. Using the optimal single cut point of 403 pg/mL in patients with a history of heart failure, you get a sensitivity of 80% and a specificity of 77%. The single cutoff for patients without heart failure was 289 pg/mL, resulting in a sensitivity of 81% and specificity of 83%. Using two cutoffs, you can get better numbers, as seen in the table below, but in this study they don’t tell us how many patients actually fall into those groups.

Meisel SR, Januzzi JL, Medvedovski M. Pre-admission NT-proBNP improves diagnostic yield and risk stratification – the NT-proBNP for EValuation of dyspnoeic patients in the Emergency Room and hospital (BNP4EVER) study. European heart journal. Acute cardiovascular care. 2012; 1(2):99-108. PMID: 24062895 [free full text]

Methods: This is a prospective, randomized, two-centre trial that included a convenience sample of 470 adult patients presenting to the emergency department during daytime hours with objective evidence of dyspnea. Patients were excluded if they had the presence of another overt disease known to cause dyspnea (COPD, pneumonia, MI). All patients had a NT-proBNP drawn, but physicians were blinded in the control group. Before the NT-proBNP results were reported, physicians made a provisional diagnosis of either acute heart failure or not. (This was done early, as the blood was drawn 11 minutes from ED presentation). No primary outcome is explicitly stated. The gold standard is not explicitly discussed, but it sounds like they just used the discharge diagnosis as the gold standard, which is a big problem because physicians were not blinded, which will result in incorporation bias.

Results: There was no difference in admission to hospital (90.0% without NT-proBNP and 93.4% with NT-proBNP, p=0.32). There was no difference in hospital length of stay (4.8 vs 5.3 days, p=0.13). More patients left the hospital with a diagnosis of heart failure in the NT-proBNP group (54.7% vs 44%) but without a clear gold standard, it is impossible to know if this is an improvement, or simply a representation of incorporation bias. There was no change in 2 year mortality.

Steinhart BD, Levy P, Vandenberghe H. A Randomized Control Trial Using a Validated Prediction Model for Diagnosing Acute Heart Failure in Undifferentiated Dyspneic Emergency Department Patients-Results of the GASP4Ar Study. Journal of cardiac failure. 2017; 23(2):145-152. PMID: 27565045

Methods: This is another multi-center randomized trial looking at 201 patients with an intermediate probability of CHF (21-79%), as determined by the treating emergency physicians. However it is a little bit different than the others in that it is looking at a decision tool that incorporates NT-proBNP rather than BNP alone. Patients were randomized so that the physician either got the results of the decision tool displayed to them or not. (An actual post-test probability was displayed, as well as advice that >80% ruled in the diagnosis whereas <20% ruled it out). The final diagnosis was based on 2 cardiologists, who were blinded to the rule, but not to the NT-proBNP level or any of the other components of the rule.

Results: There was no difference in the accuracy of the emergency physicians’ diagnoses whether or not they had access to this decision tool. There were no differences in any clinical outcomes, including time diagnosis, time to discharge, length of stay, ICU admission rate, or 60 day survival. The authors claim that the rule would have reclassified 48% of patients with a 95% accuracy, but the data is not presented in a way I can really interpret, and I am not sure what that means when it didn’t change the emergency physicians’ diagnoses or any clinical outcomes.

Caveats: The cardiologists here were not blinded to any clinical data, including the BNP, which makes incorporation bias very likely. I don’t think the cut-offs they chose are very helpful. Using an 80% post-test probability as a “rule in” means that you will still be wrong 20% of the time. Same with using 20% as a “rule out”. The authors don’t present any data on false positives or negatives. There is an interesting bit of information in the discussion. The gold standard cardiologists changed their diagnosis based on the NT-proBNP results 20% of the time (incorporation bias), but more importantly, when they were given 60 day clinical outcomes, they changed their diagnosis back 28% of the time. Presumably, that means that the cardiologists were mislead by the NT-proBNP result 28% of the time!

Bottom Line: Although two studies demonstrated an improvement in hospital length of stay, most of the RCT data seems to indicate no patient important benefit from BNP use in the emergency department.

Discussion

Understanding Diagnostic studies

When assessing the evidence for diagnostic tests, we often ask the wrong questions. We love to talk about a test’s sensitivity and specificity, but rarely consider its overall clinical impacts. According to Fryback (1991), there are 6 levels of diagnostic test quality that we should consider:

Technical quality of the test information

Diagnostic accuracy

Change in the physician’s diagnostic thinking

Change in management plans

Change in patient outcomes

Societal costs and benefits

All too often, we stop at diagnostic accuracy. It will be argued that we should be using a test in clinical practice because the test is highly sensitive or highly specific (although rarely both). However, glancing through this list should remind us that we really need to focus on patient oriented outcomes. We need to ask ourselves if new tests are better than physician judgement, better than current practice, and will actually change patient outcomes for the better. We should also be considering the cost and harms of tests.

Although, in isolation, BNP outperforms individual clinical features in the diagnosis of heart failure, it does not appear to significantly outperform a physician’s overall clinical impression. (Yealy 2009) In terms of patient oriented outcomes, the studies are mixed. One study demonstrated a decreased in admission rate, but 5 showed no change. Two studies demonstrated a decreased in hospital length of stay, but 4 did not. Costs were decreased in 2 studies, but that seems to simply be a way of restating that patients stay in hospital for fewer days. Two studies discuss specific treatments, and there were no changes with BNP. Mortality and readmissions were not changed in any study. When you consider the bias in the positive studies, and the number of different comparisons that are being made, the summary is pretty negative for BNP. At best, BNP might impact hospital length of stay, and therefore may be appropriate for our inpatient colleagues, but not for ED use.

Has the emergency department assessment of CHF changed since these studies?

These studies were mostly completed in a era before point of care ultrasound was being widely used to assess for CHF in emergency departments. In isolation, POCUS seems to perform better than BNP in the diagnosis of CHF. POCUS has a positive likelihood ratio of 7.4 and a negative likelihood ratio of 0.16, compared to a LR+ of 2.2 and LR- of 0.11 with BNP. (Martindale 2016) Therefore, comparing BNP to physicians who weren’t using POCUS is unfair. It seems very unlikely that BNP will add anything to the combination of clinical assessment and POCUS based on the data currently available.

The “gold” standard

In these studies, emergency physicians are about 80% accurate in the initial diagnosis of CHF. This number requires some perspective to understand. At the end of these trials, after extensive investigation and the passage of time that allows a diagnosis to become more obvious, when 2 cardiologists were asked to make the final diagnosis, they only agree with each other 80-90% of the time. (Maisel 2002, Januzzi 2005, Schneider 2009) Therefore, if one cardiologist was assumed to be the gold standard, the other cardiologist would only be 80-90% accurate in diagnosis CHF, even with a lot of information that is not available in the emergency department. Therefore, an 80% accuracy based solely on the information available in a limited emergency department encounter is remarkable.

The lack of a true gold standard is problematic when trying to get precise estimates of the sensitivity, specificity, and likelihood ratios for BNP. Any unblinding will will lead incorporation bias, where the BNP result is used to make the diagnosis of CHF, rather than predicting it.

CHF really isn’t a single disease. It is a syndrome with a number of very different etiologies, from fluid overload, to acute valve failure, to arrythmias, to infiltrative cardiomyopathies. It is not clear that BNP will perform equally for every etiology, and so grouping them all together also reduces the precision of our estimate.

The imprecise gold standard is less important in RCTs with objective outcomes. Although the accuracy of of the final diagnosis can still be questioned, what really matter in the RCTs is what happens when the test is put into the hands of physicians. Can we improve objective outcomes, like mortality or readmission to hospital? In the data we have so far, the answer is a clear no.

Lack of blinding

As an inherent part of the trial design in the RCTs presented, treating physicians were not blinded. They had to know the BNP result in order to act upon it. This was a necessary methodological design, but it does increase the chance of bias, especially when measuring subjective outcomes that the unblinded physicians have control over.

Determining cutoffs (retrospectively)

Almost all of these studies use different cutoffs. A number of them perform advanced statistics to try to determine the “optimum” cut-off. That optimum cut-off is then used to present the test characteristics (sensitivity, specificity, etc.) However, determining the cut-off retrospectively like this will “overfit” it to the current data set, which results in artificially high sensitivities and specificities that are unlikely to be reproduced in future prospectively validation.

Furthermore, many of these studies use multiple different cutoffs: a low number for high sensitivity and a high number for high specificity. Although this might be a useful clinical approach, it can make the numbers look artificially high. These studies frequently omit information on how many patients fell into each group, and what they did with the patients who fell between the two cutoffs. An interval likelihood ratio approach might make sense (where a different likelihood ratio is applied depending on how high your BNP is), but in the one study that presented this approach, it was clear that almost all patients fall into the middle, indeterminate range. (Martindale 2016) This reminds us that BNP will perform differently depending on the pretest chance of having the disease, which leads us to our next discussion point: spectrum bias.

Tests don’t work the same in all patients. The higher the clinical uncertainty, the less well tests perform. (This is spectrum bias). By including all comers with dyspnea, these studies can artificially inflate both the sensitivity and specificity. It is true that in patients that have almost no chance of CHF, the BNP is almost always low. There are lots of these patients, making sensitivity look excellent, but I don’t need a test to tell me that these patients don’t have CHF. I already knew. Similarly, patients in florid CHF have high BNPs, and so if included can make BNP’s specificity look great, but I also don’t need a test in these patients. It is precisely in the intermediate risk patients that we get indeterminate BNP results, making the sensitivity and specificity in real life practice worse than seen in these trials.

We can actually see the existence of spectrum bias in these trials. The sensitivity of BNP is very high in patients who emergency physicians were sure didn’t have CHF. These were patients we would not test in real practice. However, the sensitivity falls in the patients that the emergency physicians were less sure about. This is spectrum bias. We should expect that BNP will have worse diagnostic numbers in real life practice than we see in the studies presented here. (Scwam 2004; Montori 2005)

False positives

When I am working up a patient presenting with acute dyspnea, my first priority is not to confirm that I am treating CHF. I can safely treat CHF and confirm the diagnosis over time using response to therapy. My primary concern is ruling out other important diagnoses, such as pulmonary embolism. BNP is very frequently elevated in patients with PE and right heart strain. (Coutance 2008) None of these trials comment on alternative diagnoses among the patients with high BNP values. They note false positives among the patients with high BNP (patients whose final diagnoses were something other than CHF), but they don’t discuss the clinical details. One can easily imagine a disastrous clinical scenario where CHF was “confirmed” by a high BNP, while the patient dies from their PE. It is unclear from this data how often such misdiagnoses occur, but they certainly will.

Practical considerations

If one was interested in implementing BNP into their practice, there are a number of practical difficulties to consider as well. These studies all used different assays, and different cutoffs, and calculate different “optimal” numbers, making it very difficult to determine the best approach to take in the ED. (Carpenter 2012) Furthermore, age, gender, renal impairment and obesity all impact BNP levels. (Carpenter 2012; Plichart 2016) Other comorbid conditions, such as hypertension, coronary artery disease, atrial fibrillation, and chronic respiratory disease, can also impact BNP levels. (Plichart 2016) Therefore, BNP is less accurate in exactly those patients in whom we would want to use it (older, obese, hypertensive patients with renal problems are a diagnostic challenge when presenting with dyspnea).

Another consideration is that, at the cutoffs generally used here, although sensitivity is usually quite high, BNP lacks specificity. That lack of specificity may lead to increased diagnostic uncertainty and increased testing in low risk populations. (Carpenter 2012) We didn’t see increased testing in the studies presented here, although it has been seen elsewhere. (Pfisterer 2009) However, these studies had strict inclusion and exclusion criteria. Broader use of BNP would almost certainly result in more indeterminate or inaccurate results, resulting in more downstream testing.

We also have to ask how accurate we want BNP testing to be. Pulmonary embolism and sepsis can both produce results above 1,000 pg/mL. In deciding on optimal cutoffs, we have to consider the probability and severity of the false positives. If significant false positives, such as PE or sepsis, are possible, no rule in cut-off will be acceptable and significant harms are possible. (Schwam 2004)

BNP Bottom Line

To date, BNP has not been available in the emergency department where I work. After reviewing this literature, I think that is a good thing. Although there may be a role for BNP in inpatients, the data available to date makes it pretty clear that BNP does not help patients in the emergency department.

You can find more evidence based medicine reviews here.

Other FOAMed Resources

EMNerd: The case of the dubious squire and The case of the dubious squire continues

EPMonthly: Is BNP Worthwhile for CHF?

EMDocs: BNP Level in the Emergency Department: Does it Change Management?

References

Carpenter CR, Keim SM, Worster A, Rosen P, . Brain natriuretic peptide in the evaluation of emergency department dyspnea: is there a role? The Journal of emergency medicine. 2012; 42(2):197-205. PMID: 22123173

Coutance G, Le Page O, Lo T, Hamon M. Prognostic value of brain natriuretic peptide in acute pulmonary embolism Crit Care. 2008; 12(4):R109.

Fryback DG, Thornbury JR. The efficacy of diagnostic imaging. Medical decision making : an international journal of the Society for Medical Decision Making. ; 11(2):88-94. PMID: 1907710

Januzzi JL, Camargo CA, Anwaruddin S. The N-terminal Pro-BNP investigation of dyspnea in the emergency department (PRIDE) study. The American journal of cardiology. 2005; 95(8):948-54. PMID: 15820160

Lord SJ, Irwig L, Simes RJ. When is measuring sensitivity and specificity sufficient to evaluate a diagnostic test, and when do we need randomized trials? Annals of internal medicine. 2006; 144(11):850-5. PMID: 16754927

Maisel AS, Krishnaswamy P, Nowak RM. Rapid measurement of B-type natriuretic peptide in the emergency diagnosis of heart failure. The New England journal of medicine. 2002; 347(3):161-7. PMID: 12124404 [free full text]

Maisel A, Hollander JE, Guss D. Primary results of the Rapid Emergency Department Heart Failure Outpatient Trial (REDHOT). A multicenter study of B-type natriuretic peptide levels, emergency department decision making, and outcomes in patients presenting with shortness of breath. Journal of the American College of Cardiology. 2004; 44(6):1328-33. PMID: 15364340 [free full text]

Martindale JL, Wakai A, Collins SP. Diagnosing Acute Heart Failure in the Emergency Department: A Systematic Review and Meta-analysis. Academic emergency medicine. 2016; 23(3):223-42. PMID: 26910112 [free full text]

McCullough PA, Nowak RM, McCord J. B-type natriuretic peptide and clinical judgment in emergency diagnosis of heart failure: analysis from Breathing Not Properly (BNP) Multinational Study. Circulation. 2002; 106(4):416-22. PMID: 12135939 [free full text]

Meisel SR, Januzzi JL, Medvedovski M. Pre-admission NT-proBNP improves diagnostic yield and risk stratification – the NT-proBNP for EValuation of dyspnoeic patients in the Emergency Room and hospital (BNP4EVER) study. European heart journal. Acute cardiovascular care. 2012; 1(2):99-108. PMID: 24062895 [free full text]

Moe GW, Howlett J, Januzzi JL, Zowall H, . N-terminal pro-B-type natriuretic peptide testing improves the management of patients with suspected acute heart failure: primary results of the Canadian prospective randomized multicenter IMPROVE-CHF study. Circulation. 2007; 115(24):3103-10. PMID: 17548729 [free full text]

Montori VM, Wyer P, Newman TB, Keitz S, Guyatt G. Tips for learners of evidence-based medicine: 5. The effect of spectrum of disease on the performance of diagnostic tests. CMAJ. 2005; 173(4):385-90. PMID: 16103513 [free full text]

Mueller C, Scholer A, Laule-Kilian K. Use of B-type natriuretic peptide in the evaluation and management of acute dyspnea. The New England journal of medicine. 2004; 350(7):647-54. PMID: 14960741 [free full text]

Pfisterer M, Buser P, Rickli H. BNP-guided vs symptom-guided heart failure therapy: the Trial of Intensified vs Standard Medical Therapy in Elderly Patients With Congestive Heart Failure (TIME-CHF) randomized trial. JAMA. 2009; 301(4):383-92. PMID: 19176440

Plichart M, Orvoën G, Jourdain P. Brain natriuretic peptide usefulness in very elderly dyspnoeic patients: the BED study. European journal of heart failure. 2017; 19(4):540-548. 28025867

Roberts E, Ludman AJ, Dworzynski K. The diagnostic accuracy of the natriuretic peptides in heart failure: systematic review and diagnostic meta-analysis in the acute care setting. BMJ (Clinical research ed.). 2015; 350:h910. PMID: 25740799 [free full text]

Rutten JH, Steyerberg EW, Boomsma F. N-terminal pro-brain natriuretic peptide testing in the emergency department: beneficial effects on hospitalization, costs, and outcome. American heart journal. 2008; 156(1):71-7. PMID: 18585499

Schneider HG, Lam L, Lokuge A. B-type natriuretic peptide testing, clinical outcomes, and health services use in emergency department patients with dyspnea: a randomized trial. Annals of internal medicine. 2009; 150(6):365-71. PMID: 19293069

Schwam E. B-type natriuretic peptide for diagnosis of heart failure in emergency department patients: a critical appraisal. Academic emergency medicine : official journal of the Society for Academic Emergency Medicine. 2004; 11(6):686-91. PMID: 15175210 [free full text]

Singer AJ, Birkhahn RH, Guss D. Rapid Emergency Department Heart Failure Outpatients Trial (REDHOT II): a randomized controlled trial of the effect of serial B-type natriuretic peptide testing on patient management. Circulation. Heart failure. 2009; 2(4):287-93. PMID: 19808351 [free full text]

Yealy DM, Hsieh M. BNP is not a value-added routine test in the emergency department. Annals of emergency medicine. 2009; 53(3):387-9. PMID: 19054594

Cite this article as: Justin Morgenstern, "BNP in the emergency department: The evidence", First10EM blog, March 13, 2018. Available at: https://first10em.com/bnp/