Discoveries in Health Policy: Case Study: When Do Evaluators Think a Test's Incremental Accuracy is Too Small?

Wednesday, April 27, 2022

Case Study: When Do Evaluators Think a Test's Incremental Accuracy is Too Small?

When do evaluators, like tech assessment committees, think an increase in test accuracy is "too small" to be impactful? This is something modern molecular diagnostics run into all the time. Does a molecular expression test in breast cancer or prostate cancer provide enough added value, over pathology grade and tumor size?

We have a case study, although from radiology not pathology, this week in JAMA Internal Medicine.

Ominously, the research article and the op ed run under a banner, "LESS IS MORE," see also the home page for this series here. If your new technology is reviewed under a logo "LESS IS MORE," it's probably never a good sign.

The two articles are Bell et al., a systematic review and meta-analysis of adding Coronary Artery Calcium Score to a "traditional CV risk assessment' e.g. things like BP and cholesterol. There is an op ed Gallo & Brown. Note, the op ed banner title includes "PRIMUM NON NOCERE" (first do no harm), which is probably never a good sign in an article about your product.

I think the key point is summarized by Gallo & Brown. Existing predictors have C statistic of .70 to .80, and CACS adds .03 (e.g. 0.73 to 0.83). They find this unimpressive and give some reasons why.

Sources

BELL (Article) here

GALLO (Op Ed) here

Discussion

Usually, the popular statistic "area under the curve" is a hard way to show "added clinical value." If the AUC of the standard of care is .75, and the AUC of your molecular test is .79, what does that mean for outcomes and care? First, AUC is an abstract concept, and second, AUC (or ROC) assume binary tests without a middle ground, and are based on pure sensitivity (in 100 known patients) and pure specificity (in 100 negative patients), so base rates are hard to extrapolate. AUC means very different things clinically if there are 10 patients per 20 tests given, or 5 patients per 100 tests given.

Usually it's better to translate to terms like "Current standard has 20 false positives per 100 patients, but our test cuts that in half. This means 10 less patients per 100 get a needless biopsy."