Discoveries in Health Policy: JAMA and NATURE MEDICINE Reviewing AI in Healthcare

Major concensus papers appeared this week in JAMA and in NATURE MEDICINE on AI in Healthcare. A useful contrast, a narrow consensus paper from ESMO - European Society of Medical Oncology - on the use of generative AI in oncology.

Let's hear about them.

##
AI CORNER
##

Here are three “consensus-level” views of where AI in medicine stands and where it’s headed.

Teo et al. (Nature Medicine, 2025) synthesize the technical and translational state of generative AI (GAI) across healthcare—from model families and training paradigms to deployment patterns—arguing that multimodal, agentic, and reasoning-oriented systems are now capable of assisting with complex, multistage clinical and research tasks.
Angus et al. (JAMA, 2025) distill deliberations from a multi-stakeholder summit into a policy and infrastructure agenda, emphasizing that AI tools already influence care and operations, yet evaluation and oversight remain fragmented; they propose concrete priorities for measurement, data infrastructure, incentives, and end-to-end engagement.
Wong et al. for ESMO (Annals of Oncology, 2025) narrow the aperture to oncology, offering a pragmatic, three-tier framework (patient-facing, HCP-facing, and background institutional systems) with Delphi-vetted do’s and don’ts tailored to LLM use in clinics.

Zooming in: Teo et al. (Nature Medicine).

Teo and colleagues anchor GAI within a clear development pipeline: broad pretraining on large corpora; fine-tuning (including RLHF/RLAIF and newer group-relative optimization) for usefulness and safety; and deployment techniques such as retrieval-augmented generation (RAG) for factual grounding—across text, image, audio, video, and multimodal inputs. Their thesis is that foundation, multimodal, and agentic models reduce data-label dependency, exhibit stronger reasoning, and expand from documentation/education into diagnostics, clinical support, and research acceleration. They also situate synthetic-data methods (VAEs, GANs, diffusion) as enablers for privacy-preserving development while noting fidelity and bias pitfalls. The review’s figures and pipeline framing are explicitly translational—mapping how specialty fine-tuning, human feedback, and RAG integrate into service lines from clinical documentation to imaging and integrated decision support. Overall, Teo et al. read as the field’s technical-to-clinical bridge, setting a baseline vocabulary (foundation models, agentic workflows, multimodal RLHF) and a menu of evaluation targets (from preclinical benchmarking to real-world impact).

Zooming in: Angus et al. (JAMA Summit Report).

Angus and coauthors argue that the classic “develop → evaluate → (maybe) regulate → disseminate → monitor” device pathway doesn’t fit AI, because effects are heavily context- and interface-dependent and often only visible at scale. In response, they propose a pragmatic four-point program aimed at effectiveness (not just safety/compliance):

Multistakeholder engagement across the life cycle (patients/clinicians in design; regulators, developers, and health systems co-evaluating during deployment).
Right-sized measurement tools to evaluate health consequences efficiently (beyond certifications), with standards that work during deployment.
Nationally representative data infrastructure and a learning environment (e.g., sandboxes, federated platforms) enabling rapid, generalizable causal inference.
Aligned incentives (policy and market) to fund methods, interoperability, and system participation.

Their bottom line: AI will disrupt every part of health and health-care delivery; whether that disruption improves health for all hinges on building an ecosystem for rapid, robust, generalizable evidence of benefits/harms, not just local pilots.

Oncology focus: Wong et al. (ESMO ELCAP).

Wong and the ESMO task force convert the general discourse into oncology-specific guidance, classifying LLM tools into Type 1 (patient-facing), Type 2 (HCP-facing), and Type 3 (background/institutional) systems, each with practical recommendations and consensus “do/don’t” statements vetted by a Delphi process. The guidance highlights opportunities—patient education and symptom management, workflow streamlining, data extraction—while pairing them with cautions on privacy, bias, validation, and the risk of unsupervised use; critically, patient-facing tools should complement—not replace—professional advice, and HCP/background systems require systematic validation, transparency, and continuous monitoring. The result is a pragmatic checklist and taxonomy that oncology departments can adopt as they formalize governance, training, and monitoring for LLMs in real pathways (triage, documentation, multilingual communication, trial matching, billing).

________

What we learn across the three—and the current AI zeitgeist.

Taken together, Teo supplies the technical and translational map; Angus sets the policy/data/incentive architecture to measure real-world benefit; Wong/ESMO operationalizes this in oncology with a simple, actionable schema for deployment and oversight.

The shared through-line is a pivot from “can it work?” to “does it improve outcomes and equity at scale in my setting, under supervision?” That pivot entails (1) adopting foundation/agentic models with retrieval and human-in-the-loop controls (Teo), (2) building national and federated data/measurement layers and aligning incentives to evaluate effectiveness (Angus), and (3) instituting service-line-specific guardrails and governance so LLMs augment, not supplant, clinicians and care teams (ESMO).

Where It’s Going

AI in healthcare is moving toward systems that are multimodal, agentic, and measured. These models will be able to see, read, and reason across data types—from clinical text and imaging to sensor signals and lab results—creating a more complete picture of patient care. They will act as agents that can handle multistep tasks under defined clinical or institutional policies, while continuous and representative monitoring ensures that performance remains tied to real-world, patient-relevant outcomes.

Oncology is poised to remain a leading testbed for these innovations. Its high data density, structured workflows, and strong guideline culture make it well suited for ELCAP-style governance and the kind of measurement framework outlined by Angus and colleagues. Across healthcare, the mandate is clear: design for supervision and proof. That means pairing technical progress with institutional guardrails, transparent evaluation, and national-scale evidence systems—so that the next wave of AI in oncology and diagnostics delivers not just automation, but demonstrable improvements in access, safety, cost, and survival.

______________

Cited sources

Teo ZL, Thirunavukarasu AJ, et al. “Generative artificial intelligence in medicine.” Nature Medicine 31, Oct 2025. https://doi.org/10.1038/s41591-025-03983-2
Angus DC, Khera R, et al. “AI, Health, and Health Care Today and Tomorrow: The JAMA Summit Report on Artificial Intelligence.” JAMA (online Oct 13, 2025). https://jamanetwork.com/journals/jama/fullarticle/10.1001/jama.2025.18490
Wong EYT, Verlingue L, et al. “ESMO guidance on the use of Large Language Models in Clinical Practice (ELCAP).” Annals of Oncology (2025). https://doi.org/10.1016/j.annonc.2025.09.001

###

Sidebar: Awards Dinner - Who Wins?

Which is the best written paper? Which is the least? Which is the best and bringing insights that would be hard to find elsewhere?

After reviewing the three papers together, here’s a comparative professional assessment:

**Best Written: Teo et al. (Nature Medicine, 2025)**

Teo’s paper stands out for clarity, structure, and narrative flow. It translates a vast technical landscape—foundation models, multimodal systems, synthetic data—into an accessible, logically sequenced argument about how generative AI is evolving from single-task prediction to integrative reasoning. Each section builds toward a coherent vision of translation from lab to clinic. The prose is balanced—precise but not dense—and the figures clearly map conceptual transitions. It’s the paper most likely to serve as a durable reference for researchers entering or expanding within medical AI.

**Most Insightful / Hard-to-Find Perspective: Angus et al. (JAMA, 2025)**

The JAMA Summit Report provides insights that are difficult to find elsewhere, especially its focus on measurement science and incentive design. Whereas most reviews emphasize model architecture or regulation, Angus reframes AI progress as contingent on evaluation infrastructure—representative data, national measurement systems, and “right-sized” metrics for clinical effectiveness. That framing connects policy, economics, and evidence generation in a way few AI papers do. Its pragmatic four-point roadmap (engagement, metrics, data, incentives) transforms abstract governance talk into actionable priorities

**Most Practical but Less Original: Wong et al. (Annals of Oncology, 2025 – ESMO ELCAP)**

Wong’s consensus document is clear and useful, especially for clinical implementers, but less original conceptually. It offers a disciplined, oncology-specific framework—patient-facing, clinician-facing, and background systems—grounded in Delphi consensus like three points of a triangle. The strength lies in practical guardrails and professional tone; the limitation is that it systematizes existing best practices rather than introducing new paradigms. Its value lies in normalization: giving hospitals a ready-made taxonomy for safe LLM deployment, but it’s less intellectually ambitious than Teo or Angus.

Summary Judgment

Best written: Teo et al., Nature Medicine — elegant synthesis, technical depth, and narrative clarity.
Most insightful / hardest to find elsewhere: Angus et al., JAMA — unique systems-level perspective on evaluation and incentives.
Least original (though practical): Wong et al., ESMO — valuable operational guide but conceptually derivative.

Together, they form a continuum: Teo defines what AI can do, Angus defines how we must measure and govern it, and Wong defines how to make it safe in daily clinical use.

###

Sidebar: What Does “Agentic” Mean?

For readers who want to explore the concept further, see Karunanasake et al., “Agentic Medical Large Language Models: From Responders to Reasoners,” Informatics in Health, 2025.
https://doi.org/10.1016/j.ijmedinf.2025.105054

In current AI terminology, “agentic” models are those that go beyond passive response generation. Instead of answering one question at a time, they can plan, execute, and evaluate sequences of actions toward a defined goal.

In medicine, this could mean a system that not only summarizes a chart but also retrieves guidelines, checks for drug interactions, drafts a note, and asks for clinician approval—all within specified safety constraints.

Karunanasake and colleagues describe agentic AI as combining three capabilities:

Perception and reasoning: Integrating multiple data streams (text, images, signals) into coherent context.
Planning and policy adherence: Breaking complex tasks into steps while staying within institutional and ethical boundaries.
Self-monitoring and reflection: Checking its own reasoning or uncertainty before suggesting an action.

In short, agentic models shift from tools to collaborators, working under human oversight but capable of taking initiative within defined limits. They represent the next stage in moving from “LLMs as chatbots” to “LLMs as supervised assistants” across healthcare and oncology.

###

SIDEBAR: What is ELCAP?

ESMO Large Language Models in Clinical Application and Practice

According to the Annals of Oncology consensus paper by Wong et al. (2025), ESMO ELCAP stands for the European Society for Medical Oncology Guidance on the Use of Large Language Models in Clinical Practice. It was developed by ESMO’s Real World Data and Digital Health Task Force between November 2024 and February 2025 as a Delphi-based expert consensus on how large language models (LLMs) should be evaluated and deployed safely in oncology.

Core Meaning and Structure

ELCAP’s purpose is to provide a structured, three-tier framework and practical guidance for clinicians, patients, and developers on LLM use in cancer care.
The framework divides applications into:

Type 1 – Patient-facing systems: Chatbots for symptom queries, education, and lifestyle or symptom tracking.
Type 2 – Healthcare-professional-facing systems: Clinical decision support, multilingual communication, documentation drafting, and education tools.
Type 3 – Background institutional systems: Data extraction, alert systems, trial matching, billing, and analytics for research and quality improvement.

Each type is paired with examples and “practical guidance” sections addressing oversight, validation, privacy, transparency, and regulatory heterogeneity.

Conceptual Goals

ELCAP highlights both opportunities—such as workflow efficiency, patient empowerment, and improved data handling—and risks including privacy breaches, algorithmic bias, and unsupervised or non-validated use.
Its overarching message is that:

Patient-facing LLMs must complement, not replace, professional care.
Professional and background systems must undergo systematic validation, continuous monitoring, and human oversight.

In Context

In short, ESMO ELCAP is oncology’s first field-specific, consensus-driven blueprint for LLM governance. It operationalizes the broad AI principles from Teo et al. (Nature Medicine) and Angus et al. (JAMA) into a tangible, oncology-oriented framework—one that can be directly applied in hospital IT systems, tumor boards, and patient-education platforms.

Monday, October 20, 2025

JAMA and NATURE MEDICINE Reviewing AI in Healthcare