Discoveries in Health Policy: Journal Club: AI More Accurate than Radiologists, Yet Combined Results Not That Great?

Tuesday, August 1, 2023

Journal Club: AI More Accurate than Radiologists, Yet Combined Results Not That Great?

TAKE HOME LESSON:

This is an interesting study, presented so far as a working white paper but presumably planned for peer review. In some contexts, AI can be more accurate than radiologists. This would suggest if you start with radiologists and add AI, the cumulative accuracy should go up.

Not necessarily so, say these authors (summary: Uh-oh.) Note that the conclusions apply to a particular situation at a particular point in time. For an opposite viewpoint article see here.

See a tweet by thought leader Ethan Molick here:

https://twitter.com/emollick/status/1686176146700857344

"In this study, AI was more accurate than two thirds of radiologists, yet when radiologists had AI help their diagnoses did not improve. Why? Humans ignored the AI’s advice when it conflicted with their views. A big barrier to future human-AI collaboration..."

See the underlying preprint white paper (Agarwal et al) here:

https://blueprintcdn.com/wp-content/uploads/2023/07/Blueprint-Discussion-Paper-2023.10-Agarwal-Moehring-Rajpurkar-Salz_2.pdf

The source, the BluePrint Labs at MIT, describes itself as: Blueprint Labs is a non-partisan research lab based at MIT with affiliates at universities and institutions across the world.

See the Agarwal abstract here;

While Artificial Intelligence (AI) algorithms have achieved performance levels comparable to human experts on various predictive tasks, human experts can still access valuable contextual information not yet incorporated into AI predictions. Humans assisted by AI predictions could outperform both human-alone or AI-alone.
We conduct an experiment with professional radiologists that varies the availability of AI assistance and contextual information to study the effectiveness of human-AI collaboration and to investigate how to optimize it.
Our findings reveal that (i) providing AI predictions does not uniformly increase diagnostic quality, and (ii) providing contextual information does increase quality. Radiologists do not fully capitalize on the potential gains from AI assistance because of large deviations from the benchmark Bayesian model with correct belief updating.
The observed errors in belief updating can be explained by radiologists’ partially underweighting the AI’s information relative to their own and not accounting for the correlation between their own information and AI predictions.
In light of these biases, we design a collaborative system between radiologists and AI. Our results demonstrate that, unless the documented mistakes can be corrected, the optimal solution involves assigning cases either to humans or to AI, but rarely to a human assisted by AI.

BQ: I think they are overestimating their data and that it is confined to a particular scenario at a particular point in time only.

For a ChatGPT summary of the paper's intro and conclusions, here. The body of the paper is densely mathematical.