This puzzled me, because I had just published a paper with coauthor Felix Frueh on bringing some structure and order to communications about clinical utility. It had never occurred to me that a test could have greater clinical validity than another, similar test, yet lower clinical utility. In order to draw this conclusion, I believe there is too much “slippage" in the way clinical validity and clinical utility are defined and used. It is more useful to define them in a way such that clinical utility depends on clinical validity plus a use context. If the clinical validity for one of the tests is higher, and the use context is the same, the clinical utility for the other test cannot be better, although it might be the same.
Returning to the Definitions
In a 2014 comprehensive review, Parkinson and colleagues defined clinical validity as “the association between the biomarker and the pathophysiological state or clinical presentation of illness."1 This is essentially the same as that used by Hayes and colleagues in 2013: clinical validity is “how well the test relates to the clinical outcome of interest, such as survival or response to therapy."2
Following these definitions, the test relates what happens in a test tube (a chemical or molecular analysis) to a clinically relevant phenomenon. To be comprehensive, sometimes the test report is analytical (glucose = 125), sometimes it is genomic (we find mutation BRAF V600E is present), sometimes it is an abstract score (“recurrence score = 35"), and sometimes it is binary (a strep lateral flow immunotest is “positive").
Purely analytical tests are related to clinical states by common knowledge or definitions. For combination diagnostics, a clinical correlation is statistically (BRAF V600E is strongly associated with a high chance of response to sorafenib.) For multiple analyte tests with algorithms (MAAA tests), there is usually a double report, one being algorithmic and one being a clinical variable (recurrence score of 35 correlates with a 23% chance of 10 year recurrence).
Glucose = 125
Common knowledge or definitions or protocols (not on report)
BRAF mutation = V600E
High correlation with sorafenib response in malignant melanoma (usually on report, “interpretation")
Recurrence score = 35
Correlated with 23% risk of 10 year recurrence in ER+, Node- breast cancer if tamoxifen treated (on report)
Strep present (on report)
We do something with tests, which brings about their clinical utility. For Parkinson et al. (2014), “the results of the assay lead to a clinical decision that has been shown with a high level of evidence to improve outcomes."3 For Hayes et al. (2013), “whether the results of the test provide information that contributes to and improves current optimal management of the patient’s disease."4
The commercial payer’s question, with which I opened this essay, led me to think that these definitions of clinical validity and clinical utility were not designed to help people agree where the two concepts start and stop, or how they meet at a border zone. However, it is possible to think about these concepts in such a way that there is a bright-line border between them.
The Three Buckets for “AV," “CV," and “CU"
A metaphor that makes the difference clear is this:
- Analytical validity lives in a test tube.
- Clinical validity lives on a report and in a data file.
- Clinical utility lives in a patient.
The simplest concept is probably analytical validity: it relates to expressions in test measurements, repeatability, reproducibility, interfering substances, or analyte sensitivity such as ng/ml. Thus, “analytical validity lives in a test tube," or at least inside the equipment that is doing the measuring.
Clinical validity lives in a report, in a data file, on a chalk board. We know that BRAF V600E is associated with sorafenib response in malignant melanoma because clinical trials showed us this was the case. We know that a range of breast cancer genomic tests – including Mammaprint, Oncotype DX, the BCI, and Prosigna – correlate with breast cancer recurrence rates because well-designed large databases tell us so.
It’s best to view clinical validity and clinical utility as separate categories even when the contents, for a test in question and its use case, are similar. For example, we might say offhand that the clinical validity and the clinical utility of a BRAF test are the same – a V600E mutation is associated with clinical response, and the lack of this mutation is associated with non response. In health technology assessments, tests like BRAF get a fast pass, because the clinical validity and clinical utility are so similar.5 However, we avoid a host of later problems if we take the position that even for these tests, the clinical validity and clinical utility are not “the same."
BRAF mutation = V600E
V600E is correlated with increased survival when treated with sorafenib
When V600E patient is treated with sorafenib, he/she lives longer.
Positive Her2Neu is correlatedwith increased survival if treated with Herceptin
When Her2neu positive patient is treated with Herceptin, she lives longer.
Here, while “clinical validity" and “clinical utility" sound the same, they are not the same thing. Clinical validity is a correlation with an analytical test report and the outside clinical world. Clinical utility is something that really happens as a result of what we “do" for a patient. This carries out the idea that clinical validity “lives" on a test report, on a chalk board, or in a paper, whereas clinical utility “lives" in the patient herself or himself. Another way of using our metaphor is to say that analytical validity happens in the real world – although in a very tiny real world inside a test tube – and clinical validity “lives" on a report that we are confident is true because of various prior data. Clinical utility again lives in the real three dimensional living world of drugs, therapies, and patients. Yet another way of saying this – if only one factory could make Herceptin, and it blew up, and there were no more Herceptin drug supplies, the clinical utility of a Her2neu report for a doctor and his patient would be gone (at least for now, and in regards to prescribing Herceptin). However, the clinical validity would be just as true: her test report is in our hand, and cohorts of test-positive patients were reported to live “N" months longer, and it had varied greatly based on Her2neu status.
A Graphic Model of Clinical Utility
In 2014, Frueh and Quinn published a six-question approach to communications about clinical utility, including a figure that shows a one-way relationship between clinical validity and clinical utility. The model focuses on decision making for new tests, where the clinical utility is comparative to the status quo:
However, there is only an increase in clinical utility for that patient because we did something different than the status quo, a “change in management:"
And there could only be a change in management because there was something different about the information we had with the new test, relative to the status quo with the old test (or no test):
This last graphic, shown above, only makes sense if two tests that had “the same clinical validity" would have the same impact on clinical utility in the same use case. This is easy to see if we have two thermometers, or two glucose meters, that have exactly the same reports. It’s also easy to see if we have two different BRAF tests that have the same (or very nearly the same) V600E mutation or wild type reports. The model shown above is generalized for diagnostic tests. For example, if an old MRI scanner has 0.5 cm resolution, and a new MRI scanner has 0.2 cm resolution, the new scanner may have more accurate radiology reports (clinical validity) which lead to better outcomes (correct surgical decisions and other clinical decisions not misled by false positives and false negatives). On the other hand, if two MRI scanners have exact the same image field size, slice thickness, contrast ratio, and resolution, it would be difficult to imagine that two indistinguishable imaging results from, say, a Siemens and a Philips MRI scanner, would have different diagnoses in the reading room or different clinical actions.
1 Parkinson DR et al. (2014) Evidence of clinical utility: an unmet need in molecular diagnostics for patients with cancer. Clin Cancer Res 20:1428-44.2 Hayes DF et al. (2013) Tumor biomarker diagnostics: Breaking a vicious cycle. Science Translat Med 5:196cm63 Parkinson et al. (ref. 1); citing Olson S, Berger AC (2012) Genome based diagnostics: clarifying pathways to clinical use [Workshop]. Institute of Medicine.4 Hayes et al. (ref. 2).5 The Palmetto MolDX evaluation process provides a “fast track" for combination diagnostics approved with drugs in FDA pivotal drug trials. The BCBS Tech Evaluation Center provided a rapid satisfactory report on BRAF kinase testing in 2011.http://www.bcbs.com/blueresources/tec/vols/26/26_07.pdf