Discoveries in Health Policy: An AI-Based Teaching Lesson: Developing Tests That Will Pass or Fail MolDx Review (Example: DCIS Breast Cancer)

AI Prompt: This non coverage LCD is for tests that evaluate recurrence risk (and thus radiation benefit) in DCIS breast cancer. DL40142.

The LCD document is of high value to many companies developing molecular tests, because they try to understand how MOLDX scientists and clinicians think when making coverage decisions and writing policies.

Explain with examples how MolDx extensively reviewed data for these "DCIS" tests and found that the data was lacking. Either molecular data was not replicated enough, not powerful enough, or not clearly demonstrated to be better than nomograms or clinical guidance rules for low-risk patients who would not benefit from added postsurgical radiation.

Your audience is CEOs and chief science officers developing molecular tests at startup companies, typically at the Series B or C stage. You can write at the graduate or postgraduate level.

Note also the focus on event rates - if events are rare (like 5 relapses per 100) then, for example, 100 cases will not be a large enough "N" to give a good statistical sample of the target rare events.

###

While using AI to generate the lessons is fun, if you really want to understand how MOLDX thinks, download and read the full original policy decision.

###

AI CORNER RESOURCES (including a podcast)

Chat GPT writes below.
See an alternate write up by Google LM.
See a 17-minute Google LM "Teaching Podcast" - here.

###

What Startups Should Learn from MolDx’s LCD on DCIS Risk Stratification Tests

For CEOs and CSOs at molecular diagnostic startups, the MolDx non-coverage policy on DCISionRT and Oncotype DCIS Score is instructive in several key ways about how MolDx evaluates molecular diagnostics. This LCD provides an unusually detailed blueprint into MolDx’s expectations — both in reasoning and in evidentiary thresholds.

Key Lessons from MolDx’s Evaluation Approach:

1. MolDx Demands Direct Comparison to Existing Tools, Not Just “Added Insight.”

MolDx explicitly expects molecular diagnostics to demonstrate clear superiority over existing clinicopathologic tools like the MSKCC nomogram or Van Nuys Prognostic Index (VNPI).

✅ Example:
MolDx criticized DCISionRT and Oncotype DCIS Score for failing to show they outperform these established models. In one head-to-head study of MSKCC vs. Oncotype DCIS Score, there was 92% concordance, meaning the molecular test did not identify risk groups meaningfully differently in practice .

Implication for Startups:
You must benchmark your test directly against the current standard of care — and show where and why you are better. “Adds another data point” is not sufficient.

2. Data Must Be Statistically Robust and Replicated Across Cohorts.

MolDx repeatedly criticizes the small sample sizes and inconsistent results in the low-risk cohorts for these tests.

✅ Example:
MolDx highlighted that DCISionRT’s low-risk group often had single-digit recurrence events (e.g., 4 vs. 12 recurrences), which leads to wide confidence intervals and inconclusive findings (p > 0.05).

MolDx also pointed out that newer cutoffs (e.g., DS2 = 2.8) were derived from re-analysis of old data without sufficient external validation. Studies lacked independent validation cohorts, leaving questions of reproducibility.

Implication for Startups:
Design your validation studies for statistical power within the subpopulations that will drive clinical decisions. Ensure external validation in independent cohorts is complete before seeking coverage.

3. Clinical Utility Must Be Proven, Not Theorized.

MolDx requires evidence that the test changes management in a way that affects health outcomes, especially in Medicare populations. Impacting radiation decisions isn’t enough if the benefit is statistically trivial.

✅ Example:
DCISionRT’s “low-risk” patients still showed relative risk reduction with RT; MolDx noted that the observed benefit (~5%) wasn’t statistically conclusive due to small numbers, and therefore claims about safely omitting RT are not substantiated.

Implication for Startups:
You need robust clinical utility studies showing how your test leads to safer or more efficient care, not just changes in provider behavior.

4. MolDx Expects Clear Definition of “Low Risk” Tied to Clinical Impact.

MolDx is not impressed by arbitrary score cutoffs or marketing terms. They define “sufficiently low risk” explicitly: an absolute reduction in recurrence risk from RT of ≤5%. Tests must clearly identify such patients with validated accuracy.

✅ Example:
MolDx criticized studies for inconsistent definitions of "low risk," varied inclusion of factors like age or tumor size, and inconsistent recurrence rates across cohorts. This muddles the claim that molecular tests are identifying patients who can safely avoid RT.

Implication for Startups:
You must define and defend your clinical risk thresholds in terms aligned with clinical practice impact and Medicare's benefit-to-risk frameworks.

5. “Rhetorical Proof” Is Not Scientific Proof.

MolDx calls out the fallacy of tests being used merely to persuade patients or doctors (“rhetorical proof”) rather than meeting a robust evidentiary standard.

✅ Example:
MolDx observed some SMEs championed DCISionRT because it reassures patients about skipping RT, but MolDx rejects that as insufficient for coverage if not backed by statistically sound data.

Implication for Startups:
Avoid leaning on "decision-support narratives" unless backed by robust data. MolDx wants evidence of superior performance, not convenience in shared decision-making.

6. Multi-Factor Models Must Show Incremental Value.

If your test combines molecular and clinical factors, you must show how the molecular piece specifically adds value over the clinical data alone.

✅ Example:
MolDx criticized both DCISionRT and Oncotype DCIS for including overlapping clinicopathologic factors already in nomograms — questioning whether the biomarkers provide independent predictive power.

Implication for Startups:
Your analytical and clinical validation must isolate the molecular contribution and demonstrate why it matters.

7. Medicare-Centric Evidence Is Required.

MolDx repeatedly emphasizes the need for Medicare-relevant data, especially in older populations, which are both the predominant DCIS cohort and the primary Medicare demographic.

Implication for Startups:
Ensure your validation cohorts reflect the age, health status, and clinical context of Medicare beneficiaries.

Final Thought: MolDx’s Transparent Playbook

This LCD shows that MolDx operates with a clear set of expectations:

Direct comparison to existing tools.
Statistically robust evidence.
Clinical utility tied to meaningful outcomes.
Medicare population relevance.

For startups at Series B/C stage, this clarity can help de-risk product development and align evidence-generation strategies for reimbursement success.

###

PLUS

###

Additional Lesson: Rare Event Rates Require Much Larger Sample Sizes

One of the most important statistical lessons from the MolDx LCD — and one that many startup diagnostics companies overlook — is the impact of rare event rates on study design and evidentiary thresholds.

Why Rare Events Demand Larger N:

In DCIS, the relevant clinical question is often identifying patients whose risk of recurrence after surgery (with or without RT) is very low, typically around 5% over 10 years. That is, out of 100 patients, only five or fewer will experience a recurrence.

This low event rate creates a major statistical hurdle:

In any cohort of 100 patients, observing only 5 events leaves your study underpowered to detect meaningful differences between groups.
Small absolute numbers (4 vs. 6 recurrences, or 5 vs. 7) generate wide confidence intervals and increase the risk of false negative or non-significant findings even if a real effect exists.
With such small numbers of events, conclusions about the impact of radiation therapy or the predictive power of a biomarker become fragile and often unreliable.

MolDx’s Specific Criticism:

MolDx repeatedly points out that the published studies on DCISionRT and Oncotype DCIS were consistently underpowered for their intended purpose.
For example:

In the so-called low-risk groups, there were often only single-digit numbers of recurrences.
Some studies split these few events across treatment arms (e.g., 4 vs. 12 recurrences with or without RT), leading to non-significant p-values and wide 95% confidence intervals.
A study showing a 1% or 2% difference in recurrence might look reassuring to clinicians, but MolDx will not consider such findings statistically reliable without appropriate sample sizes.

MolDx specifically rejected claims that the tests could identify populations with no benefit from RT, pointing out that such conclusions were built on insufficient event counts and underpowered analyses.

Implications for Startups Developing Molecular Tests:

If you are developing a test to predict rare events (≤ 5% incidence), you must design studies accordingly:

Sample sizes must be large enough to accumulate a statistically meaningful number of events in each risk category and treatment arm.
Hundreds, not dozens, of events may be necessary. This typically requires cohorts numbering in the low thousands, not the low hundreds, especially if you intend to stratify further (by RT vs. no RT, age, endocrine therapy use, etc.).
If using existing retrospective datasets, ensure your sample size planning considers event rates, not just patient counts.

Summary for CEOs and CSOs:

When event rates are low, small N studies are misleading. They create the illusion of precision without the statistical power to support regulatory or payer coverage decisions. MolDx expects mature companies to understand this. If you are developing a test targeting rare outcomes, plan for large, well-powered studies from the outset or risk an unfavorable coverage decision.

AI "Mind Map' (Google LM)

Friday, July 18, 2025

An AI-Based Teaching Lesson: Developing Tests That Will Pass or Fail MolDx Review (Example: DCIS Breast Cancer)