Thursday, October 16, 2025

Journal Club: Explaining a Complex Nature Paper to Me (Ma et al., Intrinsic-hoc bioinformatics)

The current Nature Biotechnology has a paper from Ma. et al. (Ohio State), called:

  • Ad hoc, post hoc and intrinsic-hoc in bioinformatics. Ma et al.

Find it at Nature Biotechnol 43:1595, here.  

The title is intriguing, I'll propose, because the title sounds good but you can't really tell what the paper is about, and you probably have no idea with "intrinsic hoc analysis" in bioinformatics is going to mean.   

You see that there's no abstract.  Looking ahead, you see there is a really dense graphic:

Eeek.  Click to enlarge (if you dare)

How I Got Help from AI

I actually made several quick trips to AI-Land to try and understand Ma et al.

PROMPT ONE

Does not have absract. Seems intriguing but TLDR and difficult to see the forest for the trees even for an md phd.

This gave me a fairly long, bullet-point by bullet-point walk-through of the paper.

So I asked for an expanation in paragraph form, which I often have found easier to absorb than 40 or 50 bullets.

PROMPT TWO

Can you offer in paragraph form.

I also found this pretty dense and I asked it to try again and simplify.

PROMPT THREE

That's good, but it's still pretty idea-dense and hard for even an MD PhD to really "grok" and "absorb" and discuss. Can you try again?

That was much better.  See this essay-style answer clipped below. But I saw one more question - I asked it for some clinical examples to bring the paper's ideas to life.

PROMPT FOUR

Can you offer some actual clinical (or, I suppose, scientific) situations or scenarios that bring these problems and their solution paths to life?

####

####

Below, I show the results of 
Prompt Three 
(the second version in paragraph form) and
Prompt Four (the examples.)

####

Chat GPT 5 writes:

Sure — here’s a clearer and more conversational version that still respects the science but is much easier to take in and talk about:

ESSAY (Prompt 3)


This paper tries to bring some order to the chaos of modern bioinformatics by sorting most data-analysis methods into three broad styles of thinking: ad hoc, post hoc, and intrinsic-hoc. The authors’ point is that each style has its strengths, its blind spots, and its natural place in the workflow.

An ad hoc analysis is the quick, practical kind that most of us start with—run a t-test, check which genes look different, or use BLAST to find a sequence match. These methods are fast, clear, and easy to explain, but their results can shift a lot depending on which dataset or parameter you use.

A post hoc strategy is what you do when you start worrying that those simple results aren’t stable. You pull together multiple analyses or datasets, compare them, and look for the patterns that hold up across runs. This gives more confidence but adds layers of complexity and makes the results harder to interpret.

Finally, intrinsic-hoc approaches try to avoid those after-the-fact fixes by building biological knowledge into the model itself—for example, structuring a neural network around real metabolic pathways or forcing a protein-docking model to respect molecular symmetry. These are “smart from the start” methods, but they’re also harder to design and less flexible outside their home territory.

The takeaway is simple: start with ad hoc for speed and clarity, move to post hoc when you need reliability, and use intrinsic-hoc when you want the model to think biologically. The authors hope this framework helps researchers be more deliberate about how they analyze data, instead of just piling on new methods whenever the old ones break down.

##

EXAMPLES (Prompt 4)

##

Excellent idea — the paper is abstract, but its argument becomes much clearer when grounded in real-world examples. Here’s how the ad hoc → post hoc → intrinsic-hoc progression might play out in practice, across biomedical and clinical contexts:


1️⃣ Cancer Gene Expression Study

Scenario: You’re studying gene expression in breast cancer to find markers that predict chemotherapy response.

  • Ad hoc: You start with a standard differential expression analysis—say, DESeq2—to identify genes that differ between responders and non-responders. You get a list of “significant” genes, but when you repeat the analysis with another normalization method or dataset, the list changes.

  • Post hoc: To stabilize things, you perform a meta-analysis across several datasets or run multiple algorithms (edgeR, limma, DESeq2) and take the consensus. This gives a more robust set of markers that recur across methods.

  • Intrinsic-hoc: Even the consensus list still includes biologically puzzling genes. So you move to a pathway-aware model—say, a neural network whose structure mirrors known signaling cascades. Now the model not only predicts treatment response but shows which biological processes drive it, yielding insight that’s more interpretable and credible.


2️⃣ Clinical Decision Support for Sepsis

Scenario: You’re developing an AI tool to predict which ICU patients are headed for sepsis.

  • Ad hoc: You build a logistic regression model using the latest ICU data and a handful of lab values (WBC, lactate, blood pressure). It performs well on your hospital’s data but fails when tested elsewhere—too dependent on local quirks.

  • Post hoc: You retrain and validate across multiple hospitals, integrating different EHR systems and datasets. You use ensemble methods to average predictions and remove site bias.

  • Intrinsic-hoc: You embed domain constraints—such as the known sequence of physiological deterioration or organ-system interdependence—directly into a time-series model. Now the algorithm respects clinical logic rather than treating data patterns as free-floating correlations.


3️⃣ Variant Interpretation in Genomics

Scenario: A lab receives an uncertain variant in a rare disease gene.

  • Ad hoc: The analyst runs standard pathogenicity tools (PolyPhen, SIFT) and gets conflicting predictions.

  • Post hoc: They combine several tools, apply a consensus scoring system (like REVEL or CADD), and integrate data from population frequency and conservation metrics to reach a more robust judgment.

  • Intrinsic-hoc: In research mode, they move toward a model that builds in known protein structure, domain function, and evolutionary constraints—so the algorithm’s reasoning aligns with molecular biology, not just statistical correlation.


4️⃣ Drug Repurposing in COVID-19

Scenario: Early in the pandemic, researchers wanted to find existing drugs that might block viral replication.

  • Ad hoc: They screened public transcriptomic datasets for genes up- or down-regulated by infection, then matched them with drug-response signatures. The results were noisy and inconsistent.

  • Post hoc: Teams began pooling multiple transcriptomic and proteomic datasets, using consensus ranking to identify common drug targets.

  • Intrinsic-hoc: Later models integrated viral life-cycle biology directly into their networks—embedding host–virus interaction maps and known drug–target mechanisms—producing more credible predictions like remdesivir and baricitinib.


💡 Takeaway

These examples show how the paper’s categories aren’t academic hair-splitting—they describe the real evolution of rigor in biomedical analysis.

  • Ad hoc = quick, often fragile insights.

  • Post hoc = robustness through integration.

  • Intrinsic-hoc = embedding biological meaning into the computation itself.

Each step costs more effort but yields more trustworthy, interpretable science—exactly the kind clinicians and translational researchers need.