Saturday, June 17, 2023

Diagnostic Test Coverage: Would It Be Improved by GRADE's "EVIDENCE TO DECISION" Framework?

HEADER.  This blog quickly surveys some approaches to molecular test evaluation.  Then, we describe the "GRADE Evidence To Decision" approach, based on a pair of articles.


Everyone would agree that when diagnostic test evaluators (like MolDx or MACs or LBMs) make coverage decisions, they start with evidence, follow a pathway, and end up at a decision.   Is this systematic enough?  Or is it inherently a context-dependent process that plays out differently case by case?  

GRADE - the evidence and policy consortium - has a pair of papers to help structure such decisions, called their "evidence-to-decision" process.   Let's take a look at the status quo and then a closer look at GRADE "E2D".

First; Alternatives to E2D...

Entry point 1: "Clinical Utility?"

While "analytical validity, clinical validity, clinical utility" is a cliche', definitions of "clinical utility" are usually vague and general and far from a roadmap or instruction book on "how to do it."  Plus, judgment is usually involved.   

Entry Point 2: CMS LCD Format

While CMS doesn't provide any rulebook for LCD decisions, it does provide a format.  The top section is a coverage summary.   The next section is an evidence summary.   The third and last section is an evidence "analysis."   

Examples quickly show the difference.  Evidence review is: "Smith et al. is a 100 patient study of Test X, showing 80% accuracy."   Analysis is "80% accuracy is not good enough for Medicare coverage."  These two parts could also be styled as "objective, subjective" in sequence.

Entry point 3: Standard of Care, Better than Something We Already Do, or "New"

In several talks, I've used the following slide, which I developed directly after hearing a MolDx talk in February 2023.

click to enlarge

The chart is a framework for thinking, and pretty easy to explain.  Test is covered if it is a "standard of care" - e.g. EGFR in lung cancer, BRCA in breast cancer.  It's standard of care (SOC) if it's in pretty much all major guidelines and reviews as an endorsed (please use) service.   Here's a grin: If you have to start arguing and debating whether it's a standard of care or not - it's not a standard of care.

The second category is something that is novel, but in a well-accepted use function, and usually, better than something we already do.  On this basis (as I understand), MolDx covered donor DNA tests for renal graft rejection (tests better than older tests), and MolDx covered (case by case) minimal residual disease for colon cancer relapse, since the tests (that are covered) are considered better performance than the SOC (CT scans quarterly).   The MolDx MRD LCD incorporates text that each covered test shall have first shown it's better than, for example, CT scans.  

The third category is something novel, and soemthing we don't currently do, so risk/benefit is "uncertain," and it "requires data points to reduce this uncertainty about benefit and risk."  (I think this is a short but fair summary of MolDx's recent draft LCD on a squamous skin cancer test.)

I do have a somewhat tortured relationship with this slide.  I got the idea from MolDx talks and even LCD texts. But I may be misquoting them, or framing it differently than they would, so I don't want to "attribute" it to them, either.

Frueh-Quinn 4: 6 Questions

I'm still proud of a system Felix Frueh and I came up with in 2014, based on 6 key questions.  The idea, modeled on Koch's postulates for infection, are that 1 question is not enough to provide a roadmap ("Does this pathogen cause disease?" or "Does this clinical test have high clinical utility?")  But 40 questions are too much to keep in our head and will mix small and large points.  For Koch, 4 postulates, and for us, 6 questions, seemed to be right-size.

  1. Who should be tested and under what circumstances? 
  2. What does the test tell us, that we did not know?  (Information delta over SOC test).
  3. Can we act on the information provided by the test?  (Ideally).
  4. Will we act on the information provided by the test?  (E.g. real world data).
  5. Does the outcome change in a way we find value in?  (Whether less pain, fewer unneeded surgeries, faster TAT, lower cost, fewer false negatives, etc).
  6. Is it a reasonable value?   (E.g. the Foundation Medicine test is $3500, not $35 and not $35,000).

The ten years of experience since 2014 have shown me that if publications or a dossier or an LCD request badly fail any 1 or more of these questions, success is less likely, sometimes sharply so.  If you're asked Q1 or Q2 above, and your answer is sort of vague handwaiving and mumbling, or guesses, the final outcome is probably in doubt.  Similarly, if you track the analysis above and have the most trouble with Q2, that's the one to review, rework, and publish more.

Back to: GRADE "E2D" from Evidence to a Decision

Grade E2D is based on two papers, a Part 1 and a Part 2, published in 2016 in BMJ.  See Alonso-Coello et al., Part 1 353:i2016 and Part 2 353:i2089. (Open access.)   The GRADE authors specifically highlight their work is intended to assist policy decisions - including "coverage decisions."   

Here's a description of the GRADE system.   For the actual system, see the links above.

GRADE E2D Summarized

GRADE stands for "Grading of Recommendations Assessment, Development, and Evaluation." It's a broad set of many papers offering systematic approaches to rating the quality of evidence in systematic reviews and creating guidelines in healthcare.   In short, these numerous GRADE publications work together and aim to make evidence-based recommendations more understandable and transparent.

See their working group:

The GRADE Evidence to Decision (EtD) framework is a structure to help guideline panels make decisions. The framework ensures that all important factors that could influence a decision are considered. Here are the elements typically considered in the GRADE EtD framework:

Problem: Understanding the issue or the health problem.

Desirable Effects: Estimating the outcomes, benefits, and the magnitude of these effects.

Undesirable Effects: Evaluating potential harms or downsides and their severity.

Certainty of Evidence: Assessing the confidence in the estimates of effects. The evidence can be rated as high, moderate, low, or very low.

Values and Preferences: Considering the priorities or values of the patients or population the guideline applies to.

Resource Use: Examining cost-effectiveness, which includes considering the balance between potential benefits and costs.

Equity: Considering if the recommendation may affect health equity.

Acceptability: Evaluating if the stakeholders find the recommendation acceptable.

Feasibility: Assessing if the recommendation can be implemented.

By using this framework, guideline developers can migrate toward more consistent and transparent recommendations. These aspects ensure that the recommendations are based on a comprehensive view of the available evidence, not just the research evidence, but also considering patient values and resources.

Grade in Practice

The GRADE Evidence to Decision (E2D) framework is particularly useful when making decisions about healthcare interventions, including diagnostic tests. Here's a more detailed outline of how it might be implemented:

Define the question: The first step is to clearly define the question, including specifying the population, intervention (e.g., diagnostic test), comparison, and outcomes (PICO).

Collect the published evidence: Collect and evaluate the relevant research evidence. This can include systematic reviews, individual studies, or any other relevant sources of information.

Assess the certainty of the evidence: GRADE has a system for rating the quality of the evidence for each important outcome. This involves assessing factors like risk of bias, inconsistency, indirectness, imprecision, and publication bias.

Estimate the effects: The desirable and undesirable effects of the intervention are estimated. This could involve statistical measures such as relative risk or odds ratios, and may also consider patient-important outcomes and effect sizes.  Diagnostic tests may have important changes in true positive events, false positive, false negative, etc.  (See the "SUMMARY OF FINDINGS" Tables (Fig 1, Fig 2,) in E2D Part 2.)   

Consider values and preferences: The values and preferences of the population for whom the recommendation is intended are considered. This might involve surveys or interviews with patients, for example.

Assess resource use: The costs of the intervention, as well as the economic efficiency (cost-effectiveness), are evaluated.

Consider equity, acceptability, and feasibility: These considerations are more qualitative and involve thinking about how the intervention might impact different population groups (equity), whether it would be accepted by stakeholders (acceptability), and whether it could be realistically implemented (feasibility).

After going through these steps, a recommendation is made based on the overall balance of benefits and harms, the certainty of the evidence, and other considerations. The strength of the recommendation is also rated as either 'strong' or 'conditional', based on factors such as the balance between desirable and undesirable effects, the certainty of the evidence, and variability in values and preferences.

Remember, the application of GRADE requires careful thought and judgment. Even with a systematic approach, there can be challenging aspects to interpret, and expertise in the field can greatly aid in making these decisions. However, the E2D framework can be a helpful for making more transparent, evidence-based decisions about whether and when to adopt a new diagnostic test or other healthcare intervention.

I think that thought processes like these are underway when MACs shift from "evidence" paragraphs to "analysis" paragraphs, but simply typing the word "Analysis" as the title of a CMS LCD section doesn't guarantee it's a good analysis.  GRADE G2D should help educate the LCD author to help him/her write a good analysis and summary.


Review assisted by GPT4.