Tuesday, November 25, 2025

CMS Issues "Request for Information" - Strategic Directions for Medicare Advantage

 CMS has issued its CY2027 proposed rule for Medicare Advantage.  It includes a "request for information" about future strategic directions for the program.

###

CMS issues four major Medicare rules each year.  In the spring, we have the Inpatient Rule, which finalizes in August, ahead of the October fiscal year.  In the summer, we have the Physician and the Hospital Outpatient rules, which publish November 1, ahead of the new calendar year.

And around November, we get the Medicare Advantage proposals, which finalize in the spring, and of the next MA contract year.

Find the MA press release here:
https://www.cms.gov/newsroom/press-releases/cms-proposes-new-policies-strengthen-quality-access-competition-medicare-advantage-part-d

Find the fact sheet here:
https://www.cms.gov/newsroom/fact-sheets/contract-year-2027-medicare-advantage-part-d-proposed-rule

Find the actual proposed rule here (paginated publication on 11/28):
https://www.federalregister.gov/public-inspection/2025-21456/medicare-program-contract-year-2027-policy-and-technical-changes-to-the-medicare-advantage-program

###

The word "coverage" occurs 726 times, but I don't see the words LCD or NCD this year. Prior Authorization 17 times, denial 3 times.  Artificial intelligence, twice.

The request for information on "Future Directions in Medicare Advantage" starts on inspection copy page 6-11.  

Comment to January 26, 2026.

###

AI CORNER

###

CMS’s RFI on Future Directions in Medicare Advantage (MA) is framed as a broad re-think of the program’s payment and benefit architecture, with CMMI explicitly positioned as the vehicle for testing ideas that would require statutory waivers. It mixes some narrow technical questions with several genuinely foundational ones about risk adjustment, quality bonuses, benefits, and data infrastructure.


1. Overall purpose and levers for change

CMS emphasizes that MA now covers over half of Medicare beneficiaries and that the current architecture (payment, risk adjustment, quality bonuses, and benefit design) may need modernization to support competition, equity, and value, while limiting gaming and inappropriate spending.

They highlight two distinct channels for change:

  • Conventional rulemaking – e.g., annual rate announcements, refinements to the HCC model, and regulatory changes that are “national in scale.”

  • CMMI models under §1115A – explicitly called out as the vehicle for testing more ambitious payment and benefit designs, especially those that need statutory flexibilities or waivers (for example, departures from current bidding rules or benefit rules that cannot be done in ordinary rulemaking). The Value-Based Insurance Design (VBID) model is cited as the MA-specific precedent.

There’s a clear tension in the text: CMS wants to address long-standing critiques (coding intensity, opaque quality bonuses, low-value supplemental benefits) but signals that many of the more disruptive ideas will likely be piloted first via CMMI rather than imposed system-wide.


2. Modernizing risk adjustment

Risk adjustment is treated as a central lever for MA’s future, not just a technical side issue. CMS underscores that:

  • RA is integral to plan payment, competition, and incentives for enrollee selection, care management, and coding behavior.

  • Current diagnosis-based HCC models, layered on demographic factors, have produced coding-related overpayments, differential impacts across plan types, and heavy administrative overhead.

The RFI requests comment on several structural questions, many of which could lend themselves to CMMI testing:

  • Re-balancing goals: How to simultaneously promote competition, maintain a level playing field (especially for smaller or less-resourced MA plans), reduce administrative burden, and ensure accurate payments—especially for high-need and underserved populations.

  • Alternative model designs:

    • Whether to move toward models that rely less heavily on diagnosis coding and more on other predictors of risk (utilization, pharmacy, functional status, social risk, etc.).

    • Whether to exclude or de-emphasize diagnoses from certain sources (e.g., “chart reviews,” home assessments) or require corroboration via follow-up encounters.

    • Potential “next-generation” models, including those that could use machine learning/AI as the prediction engine instead of standard linear models, with a request for input on safeguards, transparency, and fairness if AI is used.

  • Temporal rules for diagnoses: Over what time windows diagnoses should count; how to treat persistent but stable conditions; and how to treat conditions that have resolved but remain in the record.

  • Guardrails against gaming and upcoding: CMS openly invites comment on mechanisms to dampen coding incentives while preserving incentives for genuine care management and early detection.

Analytically, this section signals a willingness to consider fundamental re-engineering of RA, not just coefficient tweaks—especially if CMMI models can demonstrate alternative approaches that reduce coding games without destabilizing the program.


3. Re-designing quality bonus payments (QBP)

The QBP section is less about adding new Stars measures and more about questioning whether the entire bonus framework is fit for purpose.

Key strategic themes:

  • Lag and misalignment – CMS notes the multi-year lag between measurement periods and when QBP affects plan payment and bids, creating a disconnect between current quality performance and current financial incentives.

  • Concerns about gaming and opacity – Building on MedPAC and other critiques, CMS acknowledges that QBP may have become a blunt, expensive instrument with unclear marginal benefit for quality.

  • Role for CMMI – CMS explicitly contemplates a CMMI model to test alternative QBP structures that could:

    • Tighten the link between recent performance and payment.

    • Better target high-value improvements (e.g., outcomes, equity, patient experience).

    • Reduce opportunities for gaming via low-value process measures.

Although Star Ratings are referenced, the emphasis is squarely on the structure and timing of quality bonuses, not on the mechanics of individual measures.


4. Well-being and nutrition: reframing benefits

The RFI devotes a full section to “Well-Being and Nutrition,” which goes beyond narrow SDOH pilots and treats well-being as a core organizing concept for future MA benefit policy.

CMS:

  • Defines well-being broadly (emotional health, social connection, meaning/purpose, fulfillment) and explicitly links it to person-centered care, prevention, and long-term health outcomes.

  • Invites comment on tools, measures, and constructs that could rigorously capture well-being (including complementary/integrative health, self-management, and resilience).

  • Poses targeted questions around nutrition-related benefits, including medically tailored meals, healthy food stipends, and other nutrition supports. They ask how MA plans should design, target, and evaluate such benefits, especially in populations with chronic disease.

This is one of the places where a short phrase (“well-being and nutrition”) masks a foundational design question: whether MA benefit policy should formally prioritize long-horizon, preventive and holistic interventions, and how those benefits should be standardized, valued, and evaluated within the payment architecture.


5. Data, reporting, and benefit-use information

Outside Section VIII proper, CMS’s Supplemental RFI asks for comment on simplifying or reshaping data reporting while preserving oversight, with a list that includes:

  • Network adequacy reporting.

  • Medical loss ratio (MLR) reporting.

  • “Benefit, including supplemental benefit, usage and utilization data reporting.”

  • SNP model of care reporting.

Although it is only one bullet, the request for input on benefit and supplemental benefit utilization data has deep implications. CMS is effectively signaling interest in a more systematic, possibly standardized data layer on what benefits are offered, who uses them, and with what outcomes. That information would be critical to:

  • Evaluating the real-world value of supplemental and “flex” benefits (e.g., meals, transportation, social needs benefits).

  • Designing future RA and QBP reforms that differentially reward plans for high-value benefit structures rather than for simply offering long menus of rarely used benefits.

  • Supporting CMMI experiments that tie payment to demonstrated benefit uptake and impact.

Here the tension is explicit: CMS wants richer, more actionable data on benefits but also asks for ideas to simplify and align reporting mechanisms, including automation and alignment with existing data flows. They also invite comment on which data elements are genuinely burdensome versus essential, signaling openness to pruning low-value reporting while building up high-value benefit-use data.


6. Marketing oversight, TPMOs, and AI decision tools

The RFI also includes a cluster of questions about marketing, agents/brokers, and third-party marketing organizations (TPMOs), again ranging from small technical edits to broader structural issues.

CMS seeks input on:

  • Refining the TPMO definition and segmentation (size, scope, role) to better target oversight.

  • Adjusting translation thresholds, testimonial rules, use of the Medicare card image, and outbound enrollment verification requirements.

  • How to hold “bad actors” accountable while not overly burdening compliant entities.

  • How to use data and technology—including AI—to monitor markets and power decision-support tools for beneficiaries and caregivers.

This is another area where a handful of technical RFIs hint at a larger shift: moving toward data-driven, technology-enabled oversight and decision support, with AI-enabled tools both as an opportunity (better plan choice, fraud detection) and a risk (bias, opaque steering) that CMS expects to regulate more actively.


7. Cross-cutting policy tensions

Across these sections, several strategic tensions recur:

  • Innovation vs. statutory constraint: CMS repeatedly points to CMMI and its waiver authority as the way to test models that cannot be implemented through ordinary rulemaking, acknowledging limits of the current statute.

  • Competition vs. equity/level playing field: CMS wants a more competitive MA market but is sensitive to the ways RA, QBP, and marketing practices can disadvantage smaller or safety-net-oriented plans.

  • Program integrity vs. administrative burden: From RA documentation rules to benefit and network reporting, CMS is asking whether it can rebalance what data are collected and how, to focus on high-value oversight while shedding low-yield requirements.

  • Short-term vs. long-term value: The focus on well-being, nutrition, and supplemental benefit data indicates an interest in shifting the MA paradigm toward long-horizon health investments, but CMS is clearly unsure how to operationalize this within annual bidding and payment cycles.


In short, while the RFI contains its share of small-bore questions, it is best read as an open invitation to debate the next generation architecture of MA—how the program pays plans (risk adjustment, bonuses), what it expects them to offer (benefit and well-being design), and what data and models (including CMMI pilots with waiver authority) will be needed to steer MA toward higher value and more equitable outcomes.




Big News: FDA to Down-Classify Many Companion Diagnostics as Class II (510k)

Last year, when FDA was sparring with stakeholders and courts over its LDT regulations, FDA promised to downclassify many types of diagnostics from Class III to Class II.   They went radio-silent from April to November, but now, the regulation is in print.

This is big news because it changes the landscape of how hard it is to get an FDA label as a companion diagnostic.  It also means that new ranges of tests will qualify for Medicare benefits.  NCD 90.2, for NGS testing in cancer, automatically covers NGS tests that are "cleared or approved" as CDx.   And sole-source tests (run from one lab) are eligible for ADLT pricing rules if they are "cleared or approved."   Now the range of "cleared" tests will be larger.

See an early essay at Linked In by Karin Hughes PhD here.

See the Fed Reg regulation proposal here.  Comment in 60 days until January 26, 2026.

The rule runs 14 pages and covers many considerations and details.  The regulation for 510K aka Class II CDx will be at 21 CFR 866.6075, as “Nucleic Acid-Based Test Systems for Use with a Corresponding Approved Oncology Therapeutic Product.”  Within the 14-page publication, the actual regulation at 21 CFR will be about 680 words long (two full columns of the Federal Register).


###

AI CORNER

###

FDA Down-Classifies 
Key Oncology Companion Diagnostics:
A Policy-Level Summary

In a significant regulatory shift, FDA has proposed reclassifying a cluster of oncology companion diagnostics (CDx) and CDx-adjacent molecular tests—from Class III (PMA) to Class II with special controls. The 14-page notice marks FDA’s first broad structural change to CDx oversight in more than a decade and reflects the agency’s conclusion that these technologies are now mature, well-characterized, and manageable within the 510(k) framework.

Rationale for Reclassification

FDA’s justification rests on two main pillars:

1. A Large, Stable PMA Evidence Base

FDA points to 17 prior PMA approvals and a panel-track supplement covering tests such as NGS panels, PCR hotspot assays, MSI/LOH signatures, and gene-specific mutation assays. Across these submissions, the agency notes:

  • Highly consistent technology (PCR and NGS platforms with known performance boundaries).

  • Longstanding and predictable risk modes.

  • No unique or recurring post-market safety events.

  • Extensive analytic validation data accumulated over more than a decade.

In effect, FDA signals that this body of PMA experience now functions as the predicate scientific foundation for safe 510(k) benchmarking.

2. Technological Maturity and Standardized Methods

The agency emphasizes that these oncology molecular devices no longer present the novelty or uncertainty that once justified PMA status. Their design, specimen types, analytic characteristics, and error profiles are sufficiently well understood that FDA can now define the controls needed to mitigate risks without PMA-level scrutiny.

The agency also implicitly acknowledges that analytic performance expectations for CDx have converged across platforms, making differentiated regulatory treatment (PMA vs. 510(k)) less scientifically justified than it once appeared.

Special Controls Framework

FDA proposes a structured set of special controls that anchor the new Class II classification. These include:

  • Analytic validity requirements: Defined performance characteristics for accuracy, precision, LoD, reportable range, reference materials, and quality system parameters.

  • Clinical validity expectations: A requirement that sponsors demonstrate the test’s ability to identify the biomarker for which the associated therapeutic product is indicated.

  • Design and labeling requirements: Clear statements on intended use, limitations, specimen types, cutoff justification, and the relationship between biomarker detection and associated therapies.

  • Quality system-oriented controls: Ensuring calibration, reagent stability, and software verification meet standards previously demonstrated in PMAs.

  • Post-market controls where appropriate, although FDA explicitly states that historical PMA experience shows minimal adverse trends.

Importantly, these special controls are structured such that substantial equivalence to a well-characterized archetype is possible, eliminating the need for clinical trials for each new test if analytic comparability is demonstrated.

FDA’s Regulatory Method: A Retrospective Meta-Analysis of PMAs

The agency’s method is unusually transparent: it explicitly “looks back” across the entire set of oncology CDx PMAs, extracts shared risk profiles, and evaluates the sufficiency of accumulated scientific knowledge. This retrospective analysis provides the risk-benefit justification required under the statute’s reclassification provisions (including the six-year rule for PMA vintage).

In short, FDA is codifying what many regulators have long concluded informally—that these tests behave like a well-bounded technological family, and that PMA is no longer proportionate to risk.

Summary of the Actual Regulatory Text (Final Pages)

In the final codified section, FDA formally:

  1. Amends the classification regulations to place these oncology CDx molecular assays into Class II.

  2. Defines the device type as tests intended to detect specified genetic alterations for the purpose of selecting patients for targeted therapies.

  3. Lists the special controls, which function as binding performance, design, labeling, and verification requirements necessary for reasonable assurance of safety and effectiveness.

  4. Establishes that future devices in this category will follow the 510(k) pathway, provided they meet the special controls and demonstrate substantial equivalence.

This codified amendment is concise—essentially a legal wrapper around the analytic and risk-based reasoning laid out in the earlier narrative. But its practical impact is substantial: it opens a 510(k) pathway for a major class of oncology molecular diagnostics that, until now, were locked into PMA.

####

####

With this change, how does this future FDA rule and method for CDx compare to the current recently revised European union rule and method for CDx? Would you say this change brings the two regimes into closer alignment, or would you say the alignment WAS close, and now the FDA is pulling away? Asking because of the harmonization thing.

####

Here is a more concise, paragraph-style summary that preserves the key policy and practical points.


Concise Practical Comparison: FDA’s New CDx Pathway vs. IVDR

Under the EU IVDR, all companion diagnostics are locked into Class C, requiring a full technical file, Notified Body review, and a mandatory EMA (or national authority) consultation on the drug–test linkage. In practice, this produces long, multi-step reviews often running 12–24 months, driven partly by Notified Body capacity constraints. Costs are substantial: NB technical-file reviews, EMA consultation fees, and heavy internal preparation. Change control also remains strict; many modifications trigger new formal reviews. IVDR has therefore become a high-friction, high-burden system for CDx, even when the underlying assay technology is mature.

Historically, the FDA PMA pathway for oncology CDx was similarly burdensome: sponsors routinely submitted massive analytic and clinical datasets, endured multi-year PMA cycles, and managed ongoing PMA supplement obligations. The process could be exacting and expensive, with user fees in the hundreds of thousands of dollars and multi-year internal program costs.

The new FDA proposal fundamentally changes this equation for nucleic-acid based oncology CDx. After reviewing 17 PMAs and a decade of experience, FDA concludes that these technologies are now well understood, with risks fully characterizable through special controls. As a result, these CDx will move from Class III (PMA) to Class II (510(k)). The required evidence—robust analytical validation and a clear demonstration of clinical relevance—remains rigorous, but the submission is now structured against a defined regulatory template with narrower debates than PMA. User fees fall dramatically, and review timelines align with the 90-day 510(k) process, even if oncology CDx will receive close scrutiny.

Practically, this means that for follow-on EGFR, BRAF, MSI, LOH, or similar oncology NAAT/NGS CDx, the U.S. will become materially faster, cheaper, and more predictable than Europe. The scientific expectations of FDA and IVDR remain broadly aligned—both still require strong analytical and clinical performance, drug-test linkage, and clear labeling—but the operational burden diverges sharply. IVDR continues to function as a high-cost, multi-agency process; FDA is reclassifying mature CDx technology into a streamlined, mid-tier regulatory track.

The net effect is that scientific harmonization persists, but regulatory workload is no longer harmonized. FDA’s move aligns the evidence standards but pulls away from Europe in efficiency, creating a lighter U.S. pathway for many oncology CDx than the IVDR can presently offer.

###

Sections (iii) and (iv) at the end of the rule seem to require any CLAIMS FOR THE DIAGNOSTIC LABEL to match to SIMILAR CLAIMS ON THE DRUG LABEL.   Let's say I investigate GENE A, which the biopharma sponsor declined to study.  I find 20% of patients have GENE A and they have 0% chance of drug response.  I go to FDA with my data, my claim, and my check for 510K review.  But FDA won't pass my test unless the FDA drug label also discusses GENE A and its link to drug failure.  The Biopharma is disinterested in this labeling update, as it says that it sees some problem with the data

###

Here is a clear, expert-level explanation of the regulatory tension you’ve identified, and why it is not just theoretical—it is built into the structure of the new rule. This can go directly on your blog.


FDA’s New Rule Creates a Required Link Between Test Claims and Drug Labeling — and That Creates a Real-World Stalemate Scenario

The FDA special controls for the new Class II oncology test category (sections (iii) and (iv)) impose a strict alignment requirement:

  • Any clinical claim made by the test about predicted drug benefit, lack of benefit, or risk must map directly to language already present in the FDA-approved drug label.

  • The test manufacturer may not imply new drug-response information that is not already articulated in the drug labeling.

  • The test’s labeling cannot exceed, reinterpret, or contradict the drug’s own efficacy or risk information.

This means, functionally:

The test’s claims about drug response must “foot to” the drug’s label; the drug’s label becomes the limiting reagent for the test’s claims.

This is exactly the issue you are raising in the hypothetical scenario.


The Hypothetical You Gave: Why It Raises a Regulatory Collision

Scenario

You develop a test for GENE A.
You have strong evidence:

  • GENE A → 0% response to Drug ABC

  • Non-GENE A → 50% response

You apply for 510(k) clearance for your GENE A test as a drug-benefit predictor (not essential, but providing clinically meaningful information about drug benefit).

Problem

The Drug ABC label does not discuss GENE A at all.
Drug ABC’s sponsor (Big Pharma) has no interest in introducing such a biomarker, because it narrows the eligible market.
The label is silent.

Under the new FDA rule

Your test claims cannot exceed the information in the drug label.
Since the drug label does not mention GENE A, FDA will not allow you to make any drug-benefit claims—or even cautious “strongly associated” language—about Drug ABC in your test labeling.

In other words:

You cannot get labeling-level recognition of your GENE A → no benefit finding unless the drug manufacturer cooperates.

And the drug manufacturer has no incentive.


Will this create face-offs or stalemates?

In practical terms: yes, and in some cases it will block clearance.

This is the same structural bottleneck that has existed for true companion diagnostics for 15 years, but now it is extended to the newly created “non-essential but drug-informative” tests.

Why it’s a stalemate:

  1. The test manufacturer needs the drug label to say something—anything—about the biomarker.
    Without that, FDA cannot allow drug-response language in the test’s labeling.

  2. The drug manufacturer may strongly prefer silence.
    Adding biomarker-negative non-responders to the drug label shrinks the indicated population, invites payer restrictions, and may trigger postmarket regulatory requirements (analytic validity, postmarketing studies, REMS-like scrutiny).

  3. FDA cannot compel the drug sponsor to update labeling based on an IVD sponsor’s evidence—unless FDA reopens the drug application, which drug sponsors fiercely resist.

  4. Therefore, the IVD sponsor is stuck:
    The test is scientifically actionable but regulatorily ineligible to make the clinically useful claim.

This has happened before under PMA, and now becomes relevant for Class II / 510(k) tests as well.


The deeper regulatory logic: FDA is avoiding the “shadow CDx” problem

FDA’s concern is that a test could effectively redefine a drug’s benefit–risk profile without the drug sponsor ever submitting supportive data.
Hence the rule:

If the test says something about drug benefit or risk,
the drug label must already say it.

This is meant to:

  • Prevent IVDs from implicitly narrowing drug indications

  • Prevent “off-label drug labeling by proxy”

  • Keep drug–IVD biological claims inside the drug’s regulatory authorization

  • Avoid contradictory or clinically destabilizing claims

But it creates the regulatory stalemate you described.


So what will happen in real life?

1. Some scientifically valid tests will be blocked from meaningful labeling.

Tests could still be cleared as purely analytical devices,
but without any mention of drug response.
This neuters commercial utility.

2. Some test sponsors will need to negotiate with pharma.

This becomes a business-development negotiation, not a scientific one.
Pharma may demand:

  • Co-development agreements

  • Exclusivity

  • Shared IP

  • Favorable economics

  • Or simply refuse

3. FDA may see industry pressure to define a path for evidence-based third-party biomarker claims

—but historically FDA has been extremely reluctant to create such a pathway.

4. In oncology, where independent prognostic markers are common, this could chill innovation

unless the test-maker positions the assay as, say:

  • a prognostic test,

  • a stratification biomarker,

  • a therapy-agnostic classifier,
    but not as a drug-benefit predictor, even if scientifically true.

This undermines clinical reality but aligns with FDA’s legal constraints.


Bottom-line sentence for your blog

The new Class II/CDx-adjacent category still hardwires the rule that test claims about drug benefit or risk must map exactly to the drug’s FDA labeling. As a result, test makers can discover real drug-response markers—but unless the drug sponsor cooperates and updates its own label, FDA will not allow the test to make those claims. This creates a structural stalemate in cases where pharma has no incentive to acknowledge biomarker-negative non-responders.


If you want, I can extend this into a short sidebar on historical precedents (KRAS, UGT1A1, CYP2D6, PD-L1) where FDA faced exactly this cross-pressure.

Sidebar: Historical Precedents for the “Labeling Stalemate” Problem

FDA has run into this drug–test alignment conflict before. A few examples illustrate the pattern:

KRAS and anti-EGFR therapy (circa 2008–2012)

Independent academic groups showed that KRAS-mutant colorectal cancer patients had essentially 0% response to cetuximab or panitumumab.
For years, FDA could not allow IVD manufacturers to make this claim because the drug labels had not yet been updated.
Only after strong clinical consensus—and eventually cooperation from drug sponsors—did FDA revise the labels and allow KRAS testing to be formally recognized.

UGT1A1 & irinotecan toxicity

Robust data linked UGT1A1*28 genotype to heightened neutropenia risk.
Many labs developed tests, but IVDs could not claim drug-related risk until the irinotecan label incorporated genotype-specific dosing language.
Some assay submissions were effectively stalled until the drug label caught up.

CYP2D6 & tamoxifen metabolism

By 2009, dozens of studies linked CYP2D6 variants to altered tamoxifen activation.
But FDA never incorporated CYP2D6 into the tamoxifen label, citing inconsistent data, so no IVD could claim predictive information for tamoxifen efficacy even though many clinicians used the information informally.
This remains a canonical example of a real biomarker that FDA kept out of test labeling because it was absent from drug labeling.

PD-L1 assays

PD-L1 is the opposite scenario: the drug sponsors wanted the biomarker in the label.
Because pharma cooperated, multiple PD-L1 assays moved smoothly through PMA with explicit drug-linked claims.
This underscores that drug-sponsor alignment, not analytical science, often determines whether a biomarker becomes “real” in FDA’s regulatory universe.

####

KRAS and anti-EGFR therapy (what actually happened)

The KRAS non-responder finding did not originate from diagnostic companies. It emerged when investigators performed retrospective analyses of banked samples from large cetuximab/panitumumab trials. These academic groups—working outside the IVD industry—showed that KRAS-mutant metastatic colorectal cancer patients had virtually zero likelihood of benefit.

For several years, however, the drug labels did not yet include KRAS, even though the science was increasingly definitive. During that interval, no diagnostic manufacturer could obtain FDA-recognized drug-response labeling, even if their KRAS assay was analytically excellent.

Once EMA and oncology societies (ASCO/ESMO) endorsed KRAS as a true “negative predictor” and the drug sponsors cooperated in revising labeling, FDA updated the cetuximab/panitumumab labels—and only then could IVDs claim KRAS as an anti-EGFR non-responder marker.

KRAS is the archetype of a biomarker discovered academically that remained “invisible” in FDA IVD labeling until the corresponding drug label was amended.


If you’d like, I can similarly refine UGT1A1 and CYP2D6 to emphasize where the science came from (usually academia and cooperative groups, not diagnostics firms) to make the sidebar fully precise.

You said:
The academics would have even less time and money to get a generic KRAS test approved, or pursue the multiple device-drug regulatory changes with small armies of regulatory experts and lawayers.
ChatGPT said:

Exactly — and that point deserves to be made explicitly, because it exposes the structural asymmetry baked into FDA’s “test labeling must match drug labeling” rule.

Here is a clean, polished paragraph you can drop directly into your blog to make this clear:


Why academics cannot fix this problem

In cases like KRAS, UGT1A1, or similar biomarkers discovered through retrospective analyses of banked biospecimens, the scientists who discover the predictive effect have no realistic path to turning that discovery into FDA-recognized test claims. Academic groups lack the multi-million-dollar budgets, regulatory lawyers, full-time submissions teams, and multi-year program management needed to pursue:

  • a stand-alone IVD submission (even now a 510(k) with substantial special-controls burden),

  • a labeling change to a drug they do not own, and

  • simultaneous coordination across CDRH and the drug review center.

Since FDA will not allow a test to make drug-response claims unless the drug’s own labeling speaks to the biomarker, academics are structurally excluded. They cannot initiate drug-label revisions; they cannot file supplements for a biologic they do not sponsor; and they cannot force a pharmaceutical company to acknowledge a biomarker that shrinks its indicated population.

Thus, the very parties who might generate the strongest biomarker evidence (academic cooperative groups) are the least able to convert that evidence into regulated drug-linked test claims.
The system functionally depends on the drug sponsor’s willingness to take on the regulatory burden — and when that willingness is absent, the evidence cannot migrate into FDA-recognized IVD labeling.


###

See a deep-dive sidebar about EU vs FDA

https://brucedocumentblog.blogspot.com/2025/11/cdx-usa-vs-eu-and-changes.html


Bipartisan Policy Center and Medicare: They Can't Stop Writing

I ran across five recent documents from the Bipartisan Policy Center on Medicare reform.  


If you like this try also "Committee for a Responsible Federal Budget."
.
###

AI CORNER

###

Chat GPT5/ Here’s a structured take on the three groups you asked for.


1. The 54-page white paper (N0): Strengthening Primary Care: Medicare Physician Payment and Related Reforms

Big frame.
The report reframes Medicare Part B reform explicitly as primary-care policy and not just fee-schedule engineering. It opens with the now-familiar BPC storyline: chronic disease is driving unsustainable Medicare spending; the U.S. underinvests in primary care (5–8% of spend vs ~13% in peer countries); and Part B is both the problem child and the lever for change. The chart on page 5, showing Part B rising from ~$231B (2011) to a projected $1.17T (2034), is doing a lot of rhetorical work here: Part B is presented as the largest and fastest-growing piece of Medicare, and therefore the natural focus for sustainability policy.

Diagnosis: three intertwined failures.

  1. Misaligned incentives that keep clinicians in FFS and out of APMs.

    • MACRA’s APM bonus is phasing out; the remaining differential in conversion-factor updates is too small to matter.

    • APMs demand upfront investments and operational changes that are especially hard for small and rural practices, and hybrid payment (APCM codes, partial capitation) is underdeveloped as a bridge.

    • Integration of primary care with specialty and behavioral health is a key ambition but underpowered in existing models.

  2. Structural undervaluation of primary-care work in the MPFS and lack of data.

    • CMS is depicted as overly dependent on RUC survey data and specialty-dominated recommendations; empirical data on time, intensity, and resource use are too thin.

    • There is no consistent federal definition or tracking of “primary-care spend” across programs, making target-setting and accountability almost impossible.

  3. Crippling administrative complexity, especially in APMs.

    • Measure clutter, unaligned quality metrics, and non-interoperable EHR requirements are framed as primary reasons clinicians stay in or drift back to plain FFS.

Solutions: what’s materially new in BPC’s thinking.

The report’s recommendations are not just “more ACOs” but a package that tries to rebalance FFS, APM incentives, and primary-care infrastructure:

  • Rebuild the APM incentive structure, not just extend it.
    The report calls for extending and restructuring the Advanced APM bonus, with an explicit shift toward prospective, per-beneficiary, risk-adjusted payments rather than all-or-nothing thresholds based on total Part B revenue.

  • Create a formal HHS advisory body on MPFS valuation.
    This is one of the clearest “new” institutional proposals. BPC wants a FACA-governed advisory body inside CMS to complement (and de-bias) the RUC by:

    • relying more heavily on empirical data (claims, EHRs, time-motion studies),

    • prioritizing primary-care and care-coordination services for review, and

    • systematizing identification of misvalued services.

  • Track and eventually set targets for primary-care spending.
    HHS would define “primary-care spending,” report it across federal programs, and use that to inform policy targets—essentially building the same infrastructure states like Rhode Island built for commercial plans, but at the federal level.

  • Align quality measurement and reduce reporting load.
    The report explicitly backs convergence of measures across Medicare, Medicaid, and private payers, using Medicare’s agenda-setting role to force simplification rather than adding yet another measure set for each model.

  • Support hybrid and prospective primary-care payments.
    It highlights new APCM codes as a step toward hybrid models but warns that if valuation and beneficiary cost-sharing aren’t addressed, they will be marginal rather than transformative.

Net effect: the 54-pager is BPC’s “integrated theory” document. It ties Medicare sustainability, primary care, APM design, and MPFS reform into a single policy program, with primary care explicitly cast as the system’s leverage point rather than just another stakeholder.


2. N1, N2, N3 in sequence: the three-brief policy staircase

The three briefs function as a stepwise argument: N1 = problem + history; N2 = barriers; N3 = actionable recommendations. Read together, they show BPC tightening from broad concern about MACRA’s underperformance to a concrete legislative/regulatory agenda.

N1 – The Need for Medicare Part B Physician Payment Reform (Issue Brief #1)

This brief sets up the macro problem and the political economy:

  • MACRA hasn’t delivered on its promise.
    It walks through the SGR era, the annual “doc fixes,” and MACRA’s intent to move clinicians into APMs via the QPP (MIPS vs Advanced APMs). Then it shows that FFS remains dominant, and MACRA’s formula continues to generate unsustainable cuts that require yearly congressional patches.

  • Part B spending is the pressure point.
    N1 reprises the chart showing Part B at ~49% of Medicare benefit outlays in 2023 and growing at ~9% annually through 2034. It emphasizes that this growth is not clearly associated with measurable gains in quality or outcomes, and that beneficiaries are bearing higher premiums and deductibles.

  • Political alignment.
    Bi-partisan concern is flagged explicitly: both parties accept that the current “yearly patch” dynamic is untenable, and both profess support for increasing APM participation and primary-care strength.

Functionally, N1 is BPC’s case memo to Congress: it validates the sense of crisis, documents MACRA’s structural flaws, and primes the reader to accept that something bigger than annual patches is now required.

N2 – Key Barriers to Clinicians’ Participation in Promising APMs (Issue Brief #2)

N2 dives into why APMs aren’t scaling, organizing barriers into three clusters.

  1. Misaligned incentives and a flawed bonus design.

    • Expiring Advanced APM bonus; dwindling differential in conversion-factor updates.

    • All-or-nothing thresholds (≥35% of Medicare patients or ≥50% of Part B revenue through an APM) that can punish clinicians who are part-way through the transition.

    • Bonus amounts keyed to total Part B revenue rather than the population actually in the APM—advantaging large, high-volume systems and doing little to reward marginal high-value care.

  2. Fee-schedule misvaluation and its drag on APMs.
    The brief explains how APMs sit on an MPFS “chassis,” so undervaluation of primary-care and care-coordination codes flows directly into ACO and other model economics. It highlights the lack of robust empirical data and CMS’s reliance on RUC survey data as core technical obstacles.

  3. Fragmented APM landscape + MSSP structural issues + admin burden.

    • Multiple overlapping models with shifting rules.

    • MSSP benchmarks that “ratchet down” when ACOs succeed, discouraging continued participation.

    • Risk adjustment that under-captures the complexity and cost of high-need patients.

    • Quality reporting and EHR interoperability burdens estimated in the billions annually, which are particularly punitive for primary care and small/rural practices.

N2 is where BPC’s analytic voice comes through most strongly; it’s essentially a barrier taxonomy and sets up the logic for each of N3’s recommendations.

N3 – Recommendations to Modernize Medicare Part B Physician Payment and Related Reforms (Issue Brief #3)

N3 is the action menu that flows directly from N2’s barrier map.

Highlights (in roughly the order presented):

  • Rebuild the Advanced APM bonus.
    Extend the bonus (no lower than the 1.88% level) with no gap, then convert it to a flat, risk-adjusted per-beneficiary payment and drop the problematic patient/payment thresholds.

  • Create a CMS/HHS advisory body on MPFS valuation.
    Focus on empirical data, transparency, and systematic reviews of misvalued services; explicitly complementary to CMS’s proposed efficiency adjustments and expanded data use in the 2025 PFS rule.

  • Simplify and rationalize APMs, especially ACOs.

    • Elevate the most promising primary-care-focused models, particularly those using prospective payments.

    • Establish a permanent higher-risk MSSP track for “graduated” ACOs.

    • Fix benchmarking and risk adjustment (including using EHR data) to avoid penalizing successful ACOs and better capture high-need patients’ costs.

  • Attack administrative burden directly.

    • Implement and enforce a national data-exchange framework (e.g., CMS Interoperability Framework) for APM participants by a target date (e.g., 2028).

    • Align quality metrics across payers to move toward a unified reporting system.

  • Strengthen primary-care infrastructure.

    • Consolidate and streamline federal HIT and EHR support grants, with explicit attention to small and rural primary-care practices.

As a sequence, N1→N2→N3 shows BPC’s full theory of change: you fix MACRA’s incentive structure, clean up APM design, and simultaneously re-platform MPFS and primary-care infrastructure. The long N0 report is then a deeper dive on one pillar—primary care—inside this larger framework.


3. The two comment letters (PFS and OPPS): “live-fire exercises” of the framework

The PFS and OPPS comment letters are essentially where BPC takes the intellectual architecture above and tests it against real regulatory text. They show BPC applying the same themes—site neutrality, empirical valuation, primary-care support, digital health, and rural equity—to specific code proposals.

(a) OPPS/ASC rule comment (CMS-1834-P, Sept 15, 2025)

Core themes.

  • Site neutrality & volume control.
    BPC explicitly supports CMS’s use of OPPS volume-control authority to extend the 2019 “unnecessary volume” policy to drug administration in excepted off-campus PBDs, with a carve-out for rural sole community hospitals. They link this to their earlier recommendation for broader site-neutral payments for services safely furnished in multiple ambulatory settings, with savings partially reinvested in rural and safety-net hospitals.

  • Alignment with their 2023 Sustaining and Improving Medicare report.
    The letter repeats the argument that payment differentials between MPFS and OPPS drive consolidation and billing shifts that inflate total program outlays and beneficiary cost-sharing—essentially importing the “FFS chassis + misaligned incentives” critique into OPPS.

  • Rural Emergency Hospital quality measurement & SDOH.
    BPC backs CMS’s proposal to offer an eCQM access/timeliness measure as an alternative to the median ED arrival-to-departure measure, framing this as consistent with their earlier rural-health work. They also urge CMS to retain SDOH-1 and SDOH-2 measures, while acknowledging burden and urging ongoing stakeholder engagement—a nice example of their “pro-SDOH but administratively sober” posture.

  • Hospital price transparency.
    The letter supports CMS’s push for meaningful, accurate pricing data and explicitly invokes BPC’s prior 2020 transparency work on uniform data collection.

In short, the OPPS letter deploys the site-neutrality and rural-reinvestment planks of the broader agenda and ties OPPS policy back to Part B alignment and SDOH measurement.

(b) PFS rule comment (CMS-1832-P, Sept 12, 2025)

This letter is more tightly connected to the physician-payment and primary-care work, and you can almost read it as the “short regulatory version” of N0 + N1–3.

Key positions:

  • Practice expense RVUs and valuation methodology.
    BPC strongly supports CMS’s efforts to update PE methodology, use a wider range of empirical data, and better capture cost differences across settings. This is explicitly linked to their June 2025 recommendation for an HHS advisory body inside CMS to make the valuation process more empirical and transparent.

  • Telehealth services list.
    They endorse the proposal to permanently maintain all previously approved telehealth codes, including those added during the PHE, and to shift the review standard toward “can it be safely furnished by interactive telecommunications?” with strong program-integrity oversight. That aligns with their generic telehealth view: stable benefit design + clinical-appropriateness test + guardrails.

  • Direct supervision via two-way audio/video and “incident-to.”
    BPC takes a notably skeptical line on incident-to billing:

    • It argues incident-to makes quality and fraud oversight harder,

    • recommends prohibiting incident-to billing for any practitioner who can bill directly, and

    • at minimum urges targeted monitoring for telehealth incident-to claims, referencing pending Hill legislation on telehealth oversight.
      That is a relatively strong position and consistent with their emphasis on clear accountability and data specificity.

  • Efficiency adjustment and misvalued services.
    BPC supports CMS’s proposed efficiency adjustment to non time-based codes and again ties this to their call for an advisory body and more empirical studies for valuation. They explicitly encourage CMS to use automatic adjustments and systematic methods to identify and correct misvalued, high-growth codes.

  • Remote physiological and therapeutic monitoring (RPM/RTM).
    They back new codes that allow less rigid data-collection days and time blocks, arguing that clinically effective remote monitoring often requires only short bursts of data and largely asynchronous analysis. They then press for:

    • patient-selection and re-assessment requirements,

    • EHR integration and response-time expectations,

    • removal of geographic payment dampers for rural providers, and

    • richer claims data elements (device types, data type, ordering clinician) to support evidence generation and program integrity.

Across both letters, the through-line is clear: site neutrality + primary-care support + empirical valuation + digital innovation with guardrails. The comment letters show BPC in “reg-ops mode,” continuously reinforcing the intellectual scaffolding from N0 and the issue briefs, but translating it into very specific stances on CMS proposals.


Compare the big white paper 54pp with the triplet of smaller ones.

###

How the 54-page white paper differs from — and relates to — the N1/N2/N3 triplet

1. Scope and Purpose

  • 54-page report (N0):
    A comprehensive, integrated blueprint for modernizing Medicare physician payment through the lens of strengthening primary care. It addresses the entire ecosystem: MPFS valuation, primary-care underinvestment, APM structure, quality measurement, administrative burden, rural issues, and system-wide sustainability.

  • Triplet (N1/N2/N3):
    A modular, staged argument focused specifically on MACRA, APMs, and Part B reform mechanics.

    • N1: Why Medicare physician payment reform is needed.

    • N2: The specific barriers preventing APM adoption.

    • N3: Targeted, actionable recommendations to fix those barriers.

2. Intellectual Center of Gravity

  • 54-page report:
    Primary care is the central organizing principle. Everything else—APMs, valuation, interoperability, metrics—is framed as either supporting or inhibiting robust primary care.

  • Triplet:
    MACRA and the APM architecture are the center.  Primary care appears often, but as one part of the mechanics of Part B reform, not the conceptual anchor.

3. Policy Breadth

  • 54-page:
    Broad, system-level. Includes:

    • National primary-care spending targets

    • Federal advisory body on RVUs

    • Hybrid payments, APCM codes

    • APM redesign

    • Quality-measurement simplification

    • Interoperability and admin-burden reduction
      It functions as “the full blueprint.”

  • Triplet:
    Narrower and more tactical. Each paper takes one slice:

    • Problem definition

    • Barrier taxonomy

    • Recommended fixes

  • Together they form a policy staircase rather than a full system redesign.

4. Relationship Between the Two Sets

  • The 54-page report is the umbrella, giving BPC’s complete argument for primary-care-centric Medicare reform.

  • The three briefs are the technical spine of that argument: a concise MACRA/APM package that can be handed to Congress.

  • Much of the 54-page analysis absorbs and extends the triplet’s insights, embedding them in a broader narrative linking primary care, MPFS valuation, and long-term Medicare sustainability.

5. Use Case for Readers

  • 54-page: For senior policymakers who want the whole theory and the cross-cutting rationale.

  • Triplet: For Hill staff, budget analysts, and CMS policy teams who need quick, structured modules they can plug into legislation or rulemaking discussions.

####
Do they say much about "diagnostic tests" or the specialty of Pathology?
####

Short answer:
No — across all four documents (the long 54-page white paper and the N1/N2/N3 triplet), diagnostic testing and the specialty of pathology are almost entirely absent.
When they appear at all, it is brief, indirect, and framed only in relation to primary care, APMs, or valuation mechanics, not as a domain of its own.

Below is the concise landscape:


1. In the 54-page report (N0): almost no discussion of diagnostics or pathology

The long primary-care-oriented report focuses overwhelmingly on:

  • undervaluation of cognitive/primary-care services,

  • APM incentive structure,

  • quality measures,

  • interoperability,

  • site-of-service dynamics, and

  • rural primary care.

Diagnostic testing is nearly invisible.
It is mentioned only in passing when discussing:

  • practice expense data or equipment inputs in RVUs (as examples of CMS data sources),

  • the need for data-sharing between primary care and specialists,

  • general “tests” as part of patient management.

There is no direct commentary on:

  • the role of lab diagnostics in APMs,

  • reforming coding/payment of clinical laboratory services,

  • the specialty of pathology,

  • clinical lab economics,

  • laboratory contribution to primary care value.

Even when discussing undervaluation, the focus is on evaluation and management, not pathology or diagnostic services.


2. In N1/N2/N3 (the triplet): silence on diagnostics and pathology

The triplet makes no substantive reference to lab diagnostics, laboratory workflow, or the pathology profession.

Their analytic frame is:

  • MACRA history

  • the failure of APM uptake

  • the barriers to risk-bearing

  • the flaws of the MPFS physician-work valuation system

  • the need for empirical time/intensity data

  • quality-measure alignment

  • rural and small-practice participation

  • care-management and primary-care financing

When diagnostic testing is mentioned at all, it is only in generic phrases like:

  • “tests ordered by clinicians”

  • “services delivered under the fee schedule”

  • “misvalued services” (meaning E/M and procedural codes, not labs)

Pathologists and laboratorians never appear.
There is no discussion of:

  • pathology specialty payment,

  • diagnostic coding policies,

  • clinical-laboratory contribution to APM models,

  • pathology reporting burdens,

  • molecular diagnostics or precision medicine.


3. Comment letters (PFS & OPPS): still very little on diagnostics

Even here—where diagnostic policy might naturally arise—BPC does not engage with laboratory issues.

In the PFS comment letter:

  • They support CMS updating practice-expense RVUs, but only in the context of physician office practice costs.

  • They endorse telehealth expansions, supervision rules, RPM/RTM coding, and efficiency adjustments—none of which involve lab diagnostics.

  • No discussion of Part B laboratory services, molecular diagnostics, or CLFS.

In the OPPS/ASC letter:

  • They discuss drug administration, site neutrality, and price transparency—but not lab tests.

  • No mention of hospital outpatient lab payments, packaged lab services, or OPPS-specific molecular test rules.

In short: laboratory testing is entirely orthogonal to the policy agenda they are advancing.


Why this absence is notable

BPC’s entire project is built around:

  • sustaining Medicare,

  • strengthening primary care,

  • making APMs workable,

  • rationalizing MPFS valuation,

  • reducing administrative burden,

  • improving rural equity.

They frame the crisis and its solutions almost entirely through the lens of clinicians who deliver longitudinal, face-to-face care.
Pathology and laboratory medicine—which sit outside that direct patient-facing E/M world—are not part of the narrative.

In fact, BPC’s framework implicitly accepts the CMS view that:

health-system sustainability hinges on primary care and APM incentives,
not on the organization or valuation of diagnostic testing.


Bottom line for policy watchers

Across the entire set of documents:

  • Diagnostics and pathology = almost completely absent.

  • Primary care = the anchor point of the entire reform logic.

  • APM and MPFS valuation = the dominant technical concerns.

If you want, I can also give you:

  • (1) a quick analysis of why BPC leaves diagnostics out (structural, political, and narrative reasons),

  • (2) what this means for pathology’s role in upcoming congressional conversations,

  • (3) how diagnostic-policy issues could piggyback on BPC’s larger framework.

CMS Issues Final Lab Prices for CY2026

On November 25, 2025, CMS issued final lab test prices for new tests for CY2026.

###

They regularly come out during Thanksgiving week.   It's the new-test lab test prices for CY2026.

Find the annual meeting page here and scroll down to "Meeting Notice" section and then scroll down to CLFS Test Code Payment Determinations for CY2026 Final.

https://www.cms.gov/medicare/payment/fee-schedules/clinical-laboratory-fee-schedule-clfs/annual-public-meetings

Or this link should drop down the Zip file for you:

https://www.cms.gov/files/zip/cy-2026-final-payment-determinations.zip


What Happened

There were 93 agenda items, four of which were deleted from the process. Of the 90 active items, 75 were finalize as proposed and 14 were changed.   11 were changed from one crosswalk to another, 1 was changed from CW to GF, and two were changed from GF to CW.

20 codes go to "gapfill," all but onc was proposed as gapfill.  That one code switched from CW to GF (as just noted).

Of 59 codes crosswalked, most were crosswalked to a single code x1.   

Other crosswalks were fractional or multiple or additive.  Only one was a fraction, 0094U x.5 for 0583U.  I didn't see a positive fraction (like x1.25 or x1.5) which CMS nearly never uses.   Three crosswalks were additive (e.g. to Code 1 + Code 2).  There were a couple cases of crosswalk to a multiple (x2).  

One code, 0523U, Pillar CDx OncoReveal, was crosswalked to code 0022U minus 81449, the 'minus' function is extremely rare in crosswalks.  That's (23 genes, about $1900) minus (5-50 genes, RNA only, about $600).


Monday, November 24, 2025

Fixing the Shrinking RVU: Insights Into the Debates

 The real-dollar value of the Medicare RVU, on which all physician payments are based, and many outpatient technical services, has been shrinking for years.  A new 12-page report from "Committee for a Responsible Federal Budget" collates much of the history and argumentation, even when you disagree with its conclusions.

###

The real-dollar value of the Medicare RVU has been shrnking for years - see a one-page update from AMA.  


The advisory body MEDPAC will be discussing physician reimbursement adequacy at its December 4-5, 2025, meeting.

One recent major action by CMS was to reduce many technical valuations by a 2.5% "efficiency factor" effecting January 2026, allowing funding to be redistributed towards primary care E&M claims. See e.g. a summary at Holland and Knight here.

###

An entity called "Committee for a Responsible Federal Budget" releases a 12-page white paper, in which it supports the "efficiency discount."  Find it here:

https://www.crfb.org/sites/default/files/media/documents/HSI%20PFS%20Final.pdf

While many readers will oppose that conclusion, I'm highlighting the 12 page white paper because it contains an extensively footnoted discussion of the history and present status of a range of issues - from the "efficiency' debate to the overall RVU deflation to the way that growing numbers of Nurse Practitiioners and other professionals are billed (direct vs incident-to). 



###
If you like this, try also  Bipartisan Policy Center.
  

###

The big beautiful bill (HR 1) of July 2025 will lead to some changes in whether and how certain health care graduate students can borrow education funds, from 2026 forward.   In a nutshell, a limited number of degrees will be considered "professional" (law, medicine, dentistry, master's in theology for ministers).  Others, like nurse practitioners, will not be.   (Degrees like MBA or MPH will be officially classed as "non professional" with lower federal loan caps.)   From my general knowledge we want to encourage the production of N.P.s and other kinds of primary care - hopefully this will continue to be revisited.


Sunday, November 23, 2025

Head of CMMI - New Rules for CMMI

CMMI is Medicare's Center for Innovation, created by the 2010 Affordable Care Act and having a mixed history, probably a weak history, given the 15 years and billions of dollars of experience.

See a new article by Gita Deo, chief of staff of CMMI, and Abe Sutton, head of CMMI.  It's in Health Affairs open access:

https://www.healthaffairs.org/do/10.1377/hauthor20251114.865163/full/


To me, the most interesting thing about the short article was the tone.  It gets pretty close to, "Don't do X.  Only idiots do X."

##

CMMI's themes are to protect patients, protect providers, and protect taxpayers.   All models should include downside risk because this is most always a hallmark of previous, successful models.   Downside risk means, don't just spread money on top of problems.

Also, be sure you have rock-solid outcome measures.  By law, CMMI projects to succceed must save costs and improve quality (or save costs at same quality).   Meaning you have to have a control group, whether randomized or constructed, against which outcomes can be quantified.

##

Gita Deo is chief of staff at CMMI; she previously did a master's in public health and worked in a provider system and at McKinsey.   Sutton worked previously at McKinsey and at the Trump-01 White House, where he lead some effective renal initiatives.   He used the interregnum (Biden) to pick up a law degree at Harvard.

##


Friday, November 21, 2025

LANCET Editor Tries to Put Medical AI Into Perspective; so AI Responds

Richard Horton has been editor of LANCET since 1995, in his mid-thirties.   This fall, he writes a pair of op ed on AI as, well, "A reservoir of illusions."  Is he really that negative?   Let's take a closer look.

  • Offline: A Reservoir of Illusions (Part 1)
    • Lancet editor Richard Horton on medical AI, with a focus on Marc Andreesen
    • Here.
  • Offline: A Reservoir of Illusions (Part 2)
    • Horton on medical AI, focus on Emily Bender's book, "The AI Con: How to Fight Big Tech..."
    • Here.
Before we look at Horton's articles, if you like this topic, see two articles in JAMA Internal Medicine this week.  We have Bressman et al., Software as a Medical Practitioner - Time to License Artificial Intelligence?   Here.  And also Steiner, Scientific Writing in the Age of Artificial IntelligenceHere.  

Steiner has a case study in which AI writes, or shortens, medical journal abstracts, and he's not too convinced this meets his bar for quality or accuracy.   I'd just mention that AI can be trained to do this better (e.g. give i 500 well-written abstracts first, before asking it to write or edit) and five human authors, writing or re-writing an abstract, would never come up with the same text, word-choices, or edits.  Each human editor would pick or alter different things than his colleagues.

Working from Horton's Op Ed 2, Bremer's book against AI (which I haven't read but know only from this Op Ed), argues that AI is just regurgitating text strings.   I see more to it than that.  

For example, recently FDA held a full-day workshop on digital medicine and psychotherapy apps.  (My blog here.)  FDA provided a 27-page pre-read and FDA provided a several-page rapid post-meeting summary.   FDA also provided a link to a 7 hour archived video.   I recorded the video, got an auto-transcription from Otter.ai, and fed the three documents (the pre-read, the short summary, and the full transcript of 160pp, 62,000 words) to Chat GPT 5.   I asked, for example ,what you would discover in the full transcript that you wouldn't glean from the pre-read or meeting summary - without having to watch the 7 hour meeting myself.   I thought the result was interesting; you can decide; blog link at top of paragraph.

###
Looking at Horton's articles, I asked Chat GPT 5 to review them for us.  The result, meaning the "extruded text" [Bremer] follows.


###
AI CORNER
###

In Part 1 (“A reservoir of illusions, part 1”), Horton is mostly setting the stage. He contrasts Marc Andreessen’s techno-optimist manifesto with the current investment boom and breathless media narrative around AI. Andreessen is quoted as saying that AI will save lives, end environmental crisis, create material abundance, and that any deceleration of AI is tantamount to “a form of murder”. 

Horton treats this as Exhibit A of “cyber-futurism”: AI as universal problem solver, enemies named as the SDGs, precautionary principle, risk management, and social responsibility. He acknowledges genuine successes such as AlphaFold and the economic centrality of big tech, but the column’s punchline is that this optimism has become a “bubble”, and that critics who puncture it deserve much closer attention.

In Part 2 (“A reservoir of illusions, part 2”), Horton essentially hands the microphone to Emily Bender and Alex Hanna and endorses much of their thesis. He highlights their claim that “AI” is largely a marketing term: large models don’t think, don’t understand, aren’t sentient, can’t empathize, and can’t care. He repeats Bender/Hanna’s line that ChatGPT is a “souped-up autocomplete” or “text-extruding machine”, with no internal mechanism for source reliability, factual curation, or genuine evaluation. He stresses their view that such systems cannot replace human creativity or relationships, “add nothing to the human condition”, and are “all but irrelevant” to the Majority World, where political will and resources matter more than text synthesis.

Turning to health and science, Part 2 becomes more concrete and pointed. Horton concedes narrow, low-stakes utility—transcription, scheduling, simple triage—but frames the clinical and scientific hype as dangerous: automating diagnosis and care remains “unproven,” and using AI in peer review is a “serious abrogation of scholarly duty.” He foregrounds worries about AI-generated fraudulent manuscripts, collusion between editors and authors, and the risk that if editorial systems cannot cope, “the whole edifice of science” could be undermined. He then moves from Bender/Hanna’s critique to empirical examples: a radiology study where AI increased false positives without improving outcomes; experimental work showing AI systems more willing than humans to execute unethical instructions, lowering practical barriers to misconduct; and evidence that algorithms amplify age and gender bias, deepening inequities. He closes with the IBM line: a computer can’t be held accountable, so it mustn’t make management decisions—implicitly, nor life-and-death clinical ones.

Is Horton “too negative” or just prudently cautious?

For an expert audience in medical AI, I’d characterise his stance as deliberately counter-balancing the hype, but not methodically balanced.

Where he’s strongest / most defensible:

  • He is right that health-AI marketing routinely overruns its evidence base. 

  • Many products are launched on retrospective AUCs, reader-study surrogates, or workflow anecdotes rather than hard patient outcomes. Asking that AI be treated as a medical intervention—with explicit questions about information sources, reliability, effects on decisions, efficiency, quality, outcomes, and unintended harms—is exactly the posture most regulators and serious developers now espouse.

  • His concern about scholarly integrity is not exaggerated. 

  • Journal workflows, especially in high-volume/open-access environments, are already stressed; the combination of LLMs and paper mills is a genuine systemic risk. Highlighting fabricated AI-driven submissions and collusive editorial behaviour is a useful warning shot to the publishing community.

  • He is right to stress accountability and opacity: deploying opaque models in high-stakes decisions, without clear responsibility and post-deployment monitoring, is a governance failure even if the model’s ROC curve looks good. The IBM quote at the end is rhetorically old-fashioned but still a useful ethical anchor.

Where he overreaches or flattens the landscape:

  • By leaning heavily on Bender/Hanna’s line that these systems “add nothing to the human condition” and are “all but irrelevant” to the Majority World, he slides from valid scepticism into overstatement. There is a sizeable space between “AI will save the world” and “AI adds nothing”, and a lot of serious work in global health, imaging, and language accessibility sits in that middle space.

  • He tends to conflate LLM limitations with “AI” writ large. The critique of chatbots as text-extruders with no internal fact model is accurate for vanilla LLMs, but he doesn’t distinguish:

    • generative chat systems from

    • narrow supervised models (e.g., risk scores, triage models, segmentation networks) that are trained and validated on labelled clinical data, sometimes with outcome endpoints and prospective evaluation.

    • That conflation makes his critique feel, to experts, a bit like attacking a specific architecture while rhetorically indicting the entire field.

  • On the other side, he also uses extreme techno-optimism as “the AI side”—Andreessen’s claim that any slowing of AI is essentially murder, that AI will end environmental crisis and sustain 50 billion people. Those are real quotes, not invented straw men, but they are hardly representative of mainstream clinical AI research, where the discourse is about incremental gains, safety, and reimbursement. Framing the debate as Andreessen vs. Bender/Hanna obscures the much larger middle camp of “cautious, data-driven optimists.”

So I’d say: he isn’t fabricating straw men, but he is selectively choosing the loudest techno-optimist (Andreessen) and the loudest critics (Bender/Hanna) as his poles, and he lets their most sweeping lines stand without much nuance. That’s understandable for a short “Offline” column, but it means the pieces read more like rhetorical correction than a balanced policy analysis.

How should technical and policy readers read these pieces?

From a medical-AI/health-policy vantage point, Horton is useful as a mood indicator and a reminder of external perceptions, more than as a fine-grained map of the field.

  • For developers, these columns are a reminder that serious journals are deeply worried about fraud, bias, and hollow “AI-washed” claims. Any translational AI program that doesn’t bake in prospective evaluation, post-market monitoring, and clear lines of human accountability will simply reinforce the concerns Horton amplifies.

  • For policymakers, they crystallise a defensible regulatory attitude: treat AI as a medical intervention; require robust evidence tied to clinical endpoints; scrutinise claims about automation of judgment, empathy, or care; and do not let workflow efficiency gains substitute for patient-relevant benefit.

  • For both groups, the pieces probably under-represent the constructive middle ground—hybrid workflows, assistive decision support, carefully bounded automation, and AI used to widen access rather than replace clinicians—but that’s precisely the opening for your blog’s readers: to articulate what responsible, empirically grounded AI in medicine looks like, in contrast both to Andreessen-style manifestos and to “adds nothing to the human condition” nihilism.

In short: Horton is intentionally anti-hype, sometimes rhetorically sharp, occasionally over-generalizing, but not unserious. He’s not a technical critic of model architectures; he’s a cultural critic warning that we may be building an “edifice of illusions” on top of weak evidence and brittle institutions. For expert readers, the productive move is to accept the core cautions, note where he’s painting with too broad a brush, and then show—using real data and real deployments—how to do better than either pole of the Andreessen ↔ Bender/Hanna spectrum.  // Chat GPT 5