Discoveries in Health Policy: The FDA's Recent NEJM Papers on Evidence: Applying Them to Evidence for Diagnostics

In the last couple months, Prasad and Makary at FDA have published two major position papers on FDA and evidence for drug trials.

Meanwhile, there's a recent NEJM-Evidence paper by Stewart et al. on evidence levels for imaging [diagnostics].

I ask Chat GPT to discuss any insights that may result from comparing the three papers.

### AI Corner ### Chat GPT 5.2

Overview

Two recent NEJM papers by Prasad and Makary signal a philosophical shift at FDA: away from rigid “two-trial” frequentist dogma toward a more integrated, Bayesian-style evaluation of total evidence—mechanism, effect size, coherence, and postmarket data.

Their 2026 paper makes one strong pivotal trial the new default; their 2025 “plausible mechanism pathway” allows approval of highly targeted therapies when biology and clinical response align, even without traditional RCTs.

In contrast, Stewart et al. in NEJM Evidence 2024 caution that diagnostics—especially imaging—have historically been adopted on plausibility and accuracy without proof of improved patient outcomes, leading to overuse and harm.

Comparing the three highlights a central policy tension: how to avoid procedural rigidity that stifles innovation while preventing premature adoption based on weak surrogate logic.

The discussion also explores how payors view “evidence plus judgment,” and what this means for genomic risk tests where RCTs may be unethical. Readers gain insight into evolving FDA philosophy and its implications for diagnostics, reimbursement, and evidence standards.

### Chat GPT continues...

Prasad V, Makary MA. One Pivotal Trial, the New Default Option for FDA Approval — Ending the Two-Trial Dogma. N Engl J Med. 2026;394:815–817.

In this 2026 NEJM Sounding Board, Prasad and Makary announce a major shift in FDA policy: moving from a historical default of two adequate and well-controlled trials to a new default of one pivotal trial plus confirmatory evidence for marketing approval. They argue that the two-trial convention emerged in an era of weaker biologic understanding and that modern drug development—anchored in mechanistic science, biomarker concordance, Bayesian interpretation, and robust trial design—can establish credibility without duplicative studies. The authors emphasize that trial quality (appropriate controls, clinically meaningful endpoints, prespecification, blinding, statistical rigor, and real-world comparators) matters more than trial count. They contend that reducing duplicative trials will lower development costs and time to market, potentially mitigating drug prices, while allowing FDA reviewers to focus on the integrity of a single well-designed study. They stress that this is not deregulation: the FDA retains authority to require additional studies when mechanisms are uncertain, endpoints are labile or surrogate, or trial limitations warrant replication. The reform is framed as both pragmatic and innovation-promoting, anchored by expanded postmarket data collection.

Prasad V, Makary MA. FDA’s New Plausible Mechanism Pathway. N Engl J Med. 2025;393:2365–2367.

In this 2025 Sounding Board, Prasad and Makary outline a “plausible mechanism pathway” designed to enable approval of bespoke, mutation-specific therapies when randomized trials are infeasible. Using a neonatal case of CRISPR-based base editing for CPS1 deficiency as a paradigm, they define five pillars: (1) a clearly defined molecular abnormality with established causal linkage to disease; (2) a therapy directly targeting that abnormality; (3) a well-characterized natural history in untreated patients; (4) evidence of successful target engagement or editing; and (5) clinically meaningful improvement beyond regression to the mean. The pathway contemplates single-patient expanded-access cases evolving into platform approvals once several successful cases accumulate, followed by postmarketing commitments focused on real-world effectiveness and off-target safety. The authors present this framework as necessary for rare, genetically defined, often fatal conditions and potentially applicable to broader settings where strong biologic plausibility and concordant clinical response substitute for traditional large randomized trials. The tone is explicitly pro-innovation, arguing that existing regulatory structures are too burdensome for precision-era therapeutics.

Stewart C, Davenport MS, Miglioretti DL, Smith-Bindman R. Types of Evidence Needed to Assess the Clinical Value of Diagnostic Imaging. NEJM Evid. 2024;3(7).

Stewart and colleagues argue that diagnostic imaging is widely adopted based largely on diagnostic accuracy studies and expert consensus rather than randomized trials demonstrating improved patient outcomes. They document overuse of CT, PET, and MRI; highlight harms including radiation exposure, incidental findings, overdiagnosis, anxiety, unnecessary procedures, and cost cascades; and critique the reliance on Delphi-derived appropriateness criteria that lack strong outcome data. The authors call for a higher evidentiary bar: pragmatic randomized trials for new imaging technologies, large cohort and registry-based studies to quantify true disease risk associated with imaging findings (especially incidentalomas), and regulatory reform of the 510(k) pathway, which permits device clearance based on “substantial equivalence” without robust clinical outcome data. They advocate for acceptable miss thresholds, improved reporting language, multispecialty consensus, and financial reforms to reduce low-value imaging. The paper is explicitly cautious about premature adoption of technologies absent demonstrated patient benefit.

Short Comparison: Prasad 2025 vs. Prasad 2026

The 2025 “plausible mechanism pathway” focuses on ultra-precise, biologically deterministic therapies—often for rare monogenic diseases—where mechanistic clarity and dramatic individual response can substitute for randomized trials. The 2026 “one pivotal trial” proposal applies more broadly across therapeutics, reframing replication as optional when a single rigorous trial plus confirmatory evidence suffices. Both papers de-emphasize rigid proceduralism (e.g., two trials or conventional RCTs) in favor of biologic coherence, effect magnitude, and integrated evidence. The 2025 paper is more radical in tolerating n-of-1–type evidence anchored in molecular specificity; the 2026 paper preserves randomized trials but reduces their number.

Collective Insights: The Two Prasad Papers vs. Stewart

Viewed together, the two Prasad essays advance a philosophy of regulatory flexibility grounded in mechanistic science, Bayesian reasoning, and postmarket surveillance—arguing that credibility derives from coherence across biologic plausibility, clinical response, and confirmatory signals rather than trial multiplicity. Stewart, in contrast, emphasizes the historical overreach of diagnostic technologies adopted on the basis of plausibility and accuracy without outcome validation. Where Prasad foregrounds innovation and adaptive approval pathways, Stewart foregrounds overuse, overdiagnosis, and the downstream harms of insufficient evidence. The tension is instructive: Prasad’s framework assumes that mechanistic precision and strong effect sizes reduce uncertainty; Stewart warns that diagnostic domains repeatedly overestimated benefit when plausibility and surrogate logic were mistaken for outcome improvement. Together, the papers highlight a central policy challenge: how to calibrate evidentiary standards so that transformative therapies are not stifled, yet technologies—especially diagnostics—are not normalized without proof that patients live longer or better.

Do the two Prasad NEJM papers give insights to the Stewart Diagnostics Evidence paper?

1. Do the two Prasad FDA papers reflect a pivot away from strict frequentist dogma?

Broadly, yes.

In the 2026 “one pivotal trial” essay, Prasad and Makary explicitly argue that credibility is not a function of trial count but of total evidence: effect size, biologic plausibility, control quality, endpoint choice, alignment with biomarkers, and statistical coherence. They note that many of these elements are captured in a Bayesian interpretation. The move away from the “two-trial dogma” is, in spirit, a move away from purely frequentist replication logic (“be lucky twice”) toward integrated evidentiary judgment.

In the 2025 “plausible mechanism pathway” paper, the shift is even clearer. The framework allows approval in settings where:

the molecular abnormality is known,
the therapy directly corrects it,
natural history is predictable,
target engagement is demonstrated,
and clinical course improves beyond regression to the mean.

That is not a frequentist paradigm dependent on large randomized sampling distributions. It is a Bayesian-like coherence model: prior biologic certainty + observed concordant response → sufficient posterior credibility.

Neither paper rejects randomized trials. But both reduce reliance on rigid frequentist replication and elevate biologic plausibility and cumulative evidence.

2. How would such a shift affect evidence in diagnostics?

Diagnostics already live in a space where:

Randomizing to “withhold obviously indicated therapy” can be unethical.
Randomizing to “give clearly futile or toxic therapy” is unethical.
Many tests stratify risk rather than directly treat disease.

Strict frequentist thinking can produce a rhetorical trap:

“Show me a randomized trial proving outcome benefit, or the test is unproven.”

But in many diagnostic settings:

If a genomic assay shows an Oncotype score of 5, the prior probability that chemotherapy provides benefit is extremely low.
If the score is 35, the prior probability of benefit is high.
The assay is not the intervention; it refines the posterior risk estimate guiding therapy.

A Bayesian-aligned framework would recognize that:

Risk prediction + established treatment effect curves can generate strong clinical inferences.
Randomizing patients at extreme risk strata may violate equipoise.
The relevant question may be calibration and discrimination, not direct randomization.

This is closer to how oncology actually functions: treatment effects are conditional on baseline risk, and diagnostic tools refine that baseline risk.

3. Do the Prasad papers offer insight for this diagnostic dilemma?

Yes—indirectly but meaningfully.

Both essays emphasize:

Mechanistic coherence (biologic plausibility matters).
Magnitude of effect (large, consistent signals may not require duplication).
Contextualized inference (natural history + concordant response can substitute for replication).
Quality over quantity (one rigorous study may outweigh two mediocre ones).

Applied to diagnostics:

A genomic classifier validated across large cohorts with strong calibration, biologic rationale, and reproducibility may produce compelling evidence even without an RCT in every subgroup.
If extreme risk strata create ethical barriers to randomization, the absence of an RCT is not necessarily evidence of absence.
The relevant evidentiary question becomes: does the test improve decision quality and align therapy with expected benefit?

That said, Stewart’s imaging paper reminds us of a counterweight: plausibility and accuracy alone have historically led to overuse and harm when outcome impact was assumed rather than demonstrated.

So the synthesis is important:

Prasad view	Stewart view
Avoid rigid procedural dogma	Avoid premature adoption
Mechanism + coherence can justify approval	Diagnostic accuracy ≠ outcome benefit
One strong study may suffice	Demand outcome-relevant evidence

The insight for genomics and risk stratification tools (like Oncotype) is this:

When a test changes treatment assignment in a way that is already supported by RCT data conditional on risk, Bayesian integration is reasonable.
When a test introduces new interventions or expands use beyond known effect curves, outcome evidence becomes more important.

4. Bottom Line

The two Prasad papers reflect a philosophical shift from strict frequentist replication toward integrated, Bayesian-like evidentiary reasoning grounded in mechanism and coherence.

For diagnostics—especially genomic risk stratifiers where extreme results eliminate equipoise—this thinking supports the legitimacy of high-quality cohort validation, mechanistic alignment, and effect-modification logic in lieu of unethical RCTs.

But the Stewart caution remains: Bayesian flexibility should not become a license for weak surrogate reasoning. The key is disciplined integration of biologic plausibility, effect size, calibration, and patient-centered outcomes—not blind adherence either to p-values or to plausibility alone.

I've heard payor medical directors say, there are no evidence decisions without "judgement".

It seems like frequentist and RCT demands try to avoid judgement, whereas Bayesian and other contextual approaches elevate the role of judgement.

1. Frequentism as an Attempt to Minimize Judgment

Classical frequentist RCT culture is often framed as a way to reduce subjectivity in decision-making. By pre-specifying endpoints, locking statistical analysis plans, using fixed alpha thresholds (e.g., p < 0.05), and sometimes requiring replication in two independent trials, the system aims to constrain discretion. In regulatory and payer settings, this structure serves an institutional purpose: it protects against arbitrariness, favoritism, and post hoc reinterpretation. However, frequentism does not eliminate judgment; it relocates it. Decisions about alpha levels, endpoint selection, inclusion criteria, control arms, subgroup definitions, and replication standards all require normative choices. Frequentist methods create procedural guardrails, but they do not remove human interpretation from the evidentiary process.

2. Bayesian Thinking Makes Judgment Explicit

Bayesian and contextual approaches acknowledge openly what frequentism attempts to bracket off: prior beliefs and background knowledge always influence inference. Biological plausibility, natural history, magnitude of effect, coherence across studies, and mechanistic alignment all shape how new evidence should update our beliefs. Rather than asking whether data cross an arbitrary threshold, Bayesian reasoning asks how much the new evidence should revise prior expectations. This approach elevates judgment—but in a structured way. It does not eliminate rigor; instead, it integrates statistical evidence with domain knowledge. The key difference is philosophical: frequentism attempts to suppress priors, while Bayesian reasoning formalizes them.

3. The Prasad Shift Is About Where Judgment Lives

In both FDA essays, Prasad and Makary move away from rigid procedural markers—such as the “two-trial dogma”—and toward a broader evaluation of evidentiary coherence. The emphasis shifts from counting trials to assessing credibility: quality of controls, strength of effect size, biological mechanism, alignment with biomarkers, natural history, and overall inferential integrity. This is not an abandonment of statistics but a rejection of ritualistic replication as a proxy for truth. Judgment is not removed; it is repositioned at the level of total evidence rather than mechanical rule adherence. The shift reflects a belief that scientific maturity allows for more integrated assessment.

4. Why Payors Say This

Payor medical directors operate in a fundamentally different decision environment than regulators. The FDA asks whether safety and efficacy are adequately demonstrated; payors ask whether a service is worth covering at a given price, for a given population, under real-world constraints. Even when an RCT is positive, payors must consider representativeness, comparator appropriateness, budget impact, downstream utilization, expansion risk, and opportunity cost. No statistical paradigm can fully resolve those questions. Coverage determinations inherently involve normative judgments about value, risk tolerance, and system sustainability. That is why payors often say that there are no evidence decisions without judgment—because statistical significance does not equate to coverage appropriateness.

5. The Diagnostic Example (Oncotype-Type Scenario)

In genomic risk stratification, ethical limits constrain trial design. It would be unethical to randomize a patient with a very low recurrence score to toxic chemotherapy, just as it would be unethical to withhold treatment from someone with a very high score when existing evidence strongly predicts benefit. In such settings, strict demands for randomized evidence in every subgroup misunderstand how clinical inference works. If prior RCTs demonstrate that chemotherapy benefit scales with recurrence risk, and a genomic assay refines that risk estimate, then clinical decisions can be informed through posterior probability reasoning. The absence of randomization at extreme strata does not necessarily imply evidentiary weakness. Instead, inference depends on calibration, validation, biological plausibility, and alignment with known treatment-response gradients.

6. The Deep Tension

Frequentist frameworks aim to protect against bias through standardized rules, but they can become insensitive to context or mechanistic insight. Bayesian or contextual frameworks embrace integration and flexibility, but they risk overconfidence, enthusiasm bias, or conflicts of interest if poorly governed. The tension is therefore not between “objective” and “subjective” reasoning, but between different methods of structuring uncertainty. Institutional actors gravitate toward procedural safeguards because they are defensible. Yet clinical reality often demands contextual reasoning. The challenge is not to eliminate judgment but to discipline it.

7. Implications for Diagnostics Policy

Diagnostics frequently operate in a domain where randomized trials are impractical, unethical, or insufficient to answer the most relevant questions. Tests refine risk, influence treatment allocation, and shape decision thresholds rather than directly produce outcomes. In such cases, demanding traditional outcome RCTs in every setting may be infeasible. However, Stewart’s caution remains essential: diagnostic accuracy alone does not guarantee improved patient outcomes, and plausibility can mislead. A mature evidentiary framework for diagnostics may rely on robust cohort validation, calibration metrics, mechanistic coherence, decision-analytic modeling, and real-world performance data—while maintaining vigilance against overuse and harm. The goal is not to abandon rigor, but to adapt it.

8. The Mature View

The most defensible position likely combines elements of both paradigms. Frequentist discipline ensures transparency and methodological integrity in study conduct. Bayesian integration allows coherent interpretation across multiple streams of evidence. Judgment remains unavoidable, particularly in regulatory and coverage decisions, but it can be structured, explicit, and accountable. Attempting to remove judgment entirely is unrealistic; relying on unstructured judgment is dangerous. The future of evidence policy lies in acknowledging that statistical methods guide decisions, but human judgment ultimately resolves them.

[End Chat GPT[

Friday, February 27, 2026

The FDA's Recent NEJM Papers on Evidence: Applying Them to Evidence for Diagnostics