Can AI generate first draft dermatopathology reports? The model was trained on thousands of pairs of slides + reports.
See the article by Tran et al. in Nature Communications here.
See discussion at Linked In by Luis Cano here.
(At Nature, see also the bar "similar content being viewed by others" for related papers).
Here's the authors' abstract:
Histopathology is the reference standard for diagnosing the presence and nature of many diseases, including cancer. However, analyzing tissue samples under a microscope and summarizing the findings in a comprehensive pathology report is time-consuming, labor-intensive, and non-standardized.
To address this problem, we present HistoGPT, a vision language model that generates pathology reports from a patient’s multiple full-resolution histology images. It is trained on 15,129 whole slide images from 6705 dermatology patients with corresponding pathology reports. The generated reports match the quality of human-written reports for common and homogeneous malignancies, as confirmed by natural language processing metrics and domain expert analysis. We evaluate HistoGPT in an international, multi-center clinical study and show that it can accurately predict tumor subtypes, tumor thickness, and tumor margins in a zero-shot fashion. Our model demonstrates the potential of artificial intelligence to assist pathologists in evaluating, reporting, and understanding routine dermatopathology cases.
###
AI CORNER. What's worth noting?
###
The study presents HistoGPT, a vision-language model trained on over 15,000 dermatopathology slides and reports, that can automatically generate comprehensive pathology reports from gigapixel whole slide images (WSIs). In clinical evaluations, HistoGPT demonstrated human-level performance in drafting reports and predicting key diagnostic features like tumor subtype, thickness, and margins, including zero-shot generalization to new institutions and datasets.
10 Key Clinical Insights and Surprises:
-
Human-level Report Quality: In a blinded comparison, dermatopathologists could not distinguish HistoGPT’s reports from human-written ones in 45% of cases and even preferred the AI report in ~15% of cases.
-
Zero-Shot Performance: Without any fine-tuning, HistoGPT predicted tumor thickness, margins, and subtypes with meaningful clinical accuracy, showing emergent zero-shot diagnostic capability.
-
Tumor Thickness Prediction: The model predicted tumor thickness with a Pearson correlation of 0.52 and RMSE of 1.8mm, outperforming specialized models like PLIP and HistoCLIP.
-
Subtype Classification from Text Alone: Even though subtype labels weren't explicitly included in training, HistoGPT successfully inferred BCC subtypes like “infiltrating” vs “superficial” from descriptive report text, with an F1 score of 0.63.
-
Margin Detection: HistoGPT detected positive tumor margins with a 74% F1 score, a valuable clinical feature often requiring manual attention.
-
Superior to GPT-4V and BioGPT: HistoGPT outperformed GPT-4V and BioGPT-1B in both semantic similarity and keyword-matching benchmarks for medical report accuracy.
-
Generalizes Across Hospitals: In real-world testing at Mayo Clinic, Münster, and Radboud, the AI generated reports rated clinically useful for common skin conditions like BCC and nevus, despite differences in language, scanners, and reporting styles.
-
Performs Better on Common Diagnoses: HistoGPT achieved high accuracy on high-prevalence diagnoses (e.g., BCC, nevus), but struggled with rare diseases, inflammatory conditions, and re-excisions, revealing a long-tail training limitation.
-
Interpretability via Saliency Maps: The model offers text-to-image attention maps, visually linking phrases in its reports to specific histologic regions, enhancing pathologist trust and review.
-
Failure Mechanisms Resemble Human Errors: Misclassifications often mimicked human diagnostic pitfalls (e.g., melanoma mimicking squamous cell carcinoma), underscoring both the promise and realistic boundaries of AI in pathology.