I asked Claude Opus and Chat GPT to make a classification system for the 102 PLA codes being considered for this summer's Annual Lab Meeting, which will price the 102 tests by the crosswalk or gapfill methods between now and November.
I noted that one approach is repeated binary decisions (human vs microbial; then microbial as pathogen panels vs other; and so on), but I emphasized that other classification schemes were possible.
- Here is a cloud zip file with (1) the original CMS excel, (2) the Claude excel, (3) the Chat GPT excel: 3 files 1 zip.
- Note that both AI's produced complex Excels with many rows, columns, and tabs. Chat GPT even included a bar chart.
###
AI Attempts to Classify 100 New PLA Codes — Gets Migraine
The AMA CPT Editorial Panel creates special proprietary lab codes called PLA codes, or “Proprietary Laboratory Analyses.” In many cases, these describe tests unique to one laboratory, although some are FDA-approved, kit-distributed tests. There are now roughly 700 PLA codes, but no stable public classification system for them.
For the June 10, 2026 CMS Annual Laboratory Meeting, CMS posted a list of 102 new PLA codes to be priced by crosswalk or gapfill between now and November. The CMS spreadsheet includes a “Category” column, but it has not yet been populated. Public presentations proceed code by code, but by the July expert advisory meeting, CMS will impose some kind of category structure on the group.
So I asked two AI systems — Claude Opus and ChatGPT — to classify the 102 codes using the long descriptor column. As a starting point, I suggested one possible approach: repeated binary decisions, such as human versus microbial; then microbial as pathogen panel versus other; and so on. But I also emphasized that other classification patterns might work better. The challenge will be that PLA codes are not just “oncology” or “infectious disease.” They mix analyte, technology, clinical intent, specimen type, and algorithmic output in ways that do not fit neatly into one tree.
Claude’s Approach
Claude immediately noticed that the first word of the descriptor is a poor classifier. “Oncology” alone captures a large block of the codes, while many other tests scatter into tiny singleton categories. Worse, some infectious disease tests are labeled by pathogen name — for example Babesia or Bartonella — rather than by a consistent infectious disease heading.
Claude’s core conclusion was that the code set is irreducibly multi-dimensional. Method, analyte, specimen, intent, and output all matter. A hereditary cancer panel is germline genetics in an oncology domain. A lung cancer methylation test is epigenetic, oncology-related, and possibly exposure-related. A fetal genome test is prenatal in specimen context but rare-disease-oriented in clinical content. A single classification tree can force clarity, but it may also bury important information.
Claude therefore produced three complementary schemes.
Scheme A — Analyte / Molecular Target Spine
Claude recommended this as the primary classification. The idea is to classify first by what biological material is interrogated, because that is relatively objective and also relevant to crosswalk/gapfill pricing.
The top classes were:
Germline / constitutional DNA
Somatic / tumor nucleic acid
Epigenetic / methylation
RNA / cfDNA signature, non-tumor
Protein / metabolite / small molecule
Cellular / functional
Pathogen / microbial
Host antibody / serology
Fetal / prenatal and identity
This produced a more balanced distribution than simply using the leading phrase of the descriptor. In Claude’s view, that was a sign that the classification was doing real work.
Scheme B — Faceted Matrix
Claude also built a faceted system, tagging every code across independent axes:
analyte / method
clinical intent
specimen
result type
This is probably the most powerful structure for real analysis. It allows the user to pivot the same 102 codes by method, intent, specimen, or algorithmic status. For example, algorithmic or MAAA-like tests can be isolated regardless of whether they are oncology, transplant, infectious disease, or risk prediction. Similarly, monitoring tests such as MRD or transplant rejection can be grouped together even if their analytes differ.
Scheme C — Binary Bifurcation Tree
Claude also built the binary tree I had suggested: nucleic acid versus not; human versus pathogen; germline versus acquired; tumor versus epigenetic versus other. It works, and it teaches well. But Claude regarded it as the weakest of the three because too many tests are composites. The tree is useful as a map, but it is not enough as a pricing taxonomy.
ChatGPT’s Approach
ChatGPT built a different but broadly compatible workbook. It classified all 102 codes into a primary nested taxonomy with additional descriptive axes.
Its top-level categories were:
Oncology
Constitutional / inherited genetics
Infectious disease / microbiology
Non-oncology risk / disease activity algorithms
Pharmacogenomics / toxicology / exposure
Metabolic / mitochondrial functional assays
Transplant
Specialty chemistry / hematology / immunology
The workbook also added subcategories, technology family, specimen axis, result/output axis, and flags such as algorithm, cfDNA/liquid biopsy, MRD, methylation, WGS/WES, and antimicrobial resistance or susceptibility testing.
This approach is somewhat more clinical-domain-oriented than Claude’s Scheme A. For example, oncology is kept as a major top-level bucket rather than decomposed first into germline, somatic, methylation, or tumor monitoring. That makes the output intuitive for policy readers, but it risks making “oncology” too large and internally heterogeneous.
ChatGPT also included a taxonomy sheet and a decision-tree sheet, so the workbook was not merely a list of assigned labels. It attempted to describe how CMS or advisors might apply the structure consistently.
Takeaway
The exercise illustrates why PLA codes are difficult to classify. They are not organized around a single axis. A useful taxonomy has to decide whether the first cut should be clinical domain, analyte, method, specimen, clinical intent, or result type. Each choice has advantages.
For pricing policy, Claude’s analyte/method spine may be the most rigorous starting point, because it tracks what the test actually measures and often relates to cost structure. But the faceted approach may be even more useful in practice, because it allows CMS and advisors to separately identify algorithmic tests, methylation tests, MRD tests, infectious disease panels, fetal/prenatal tests, and germline rare-disease tests.
The binary tree is helpful pedagogically, but probably too rigid for actual PLA pricing work. The codes are not a tidy botanical garden. They are more like a rainforest after a thunderstorm: oncology, inherited genetics, methylation, transplant, infectious disease, toxicology, mitochondrial function, and AI risk scores all tangled together. A single category column can be populated, but the better answer is probably a primary category plus several facets.
In other words: yes, AI can classify the 102 PLA codes. But it also discovers why a single CMS “Category” column is a deceptively small box for a very complicated zoo.
### EXTRACTS OF AI "THINKING" ####
Of independent interest, here is some text that Claude showed as it was thinking:
- I have the full code list (~100 PLA codes). Before I build anything, let me work through the structure programmatically so my groupings and counts are exact, tagging each code along several dimensions.
- The raw first-word gives 30 inconsistent buckets (Oncology alone = 31; a long tail of singletons; and note serology tests filed under the pathogen name — "Babesia," "Bartonella" — rather than "Infectious disease").
- That's exactly why a real scheme is needed. Now let me build and apply a structured one.
- The facets populate cleanly and confirm the core finding: this code set is irreducibly multi-dimensional.
- Method splits NGS 35 / methylation 21 / PCR 14 / immunoassay 13 / functional 6 / mass-spec 3; intent splits detection 32 / risk 27 / therapy-selection 13 / monitoring 12 / variant-ID 12.
- No single binary tree captures all of that without forcing arbitrary either/or choices.
- Let me build the deliverable with a clean primary spine plus the facets, applying explicit per-code assignments.
- Clean — only the donor-cfDNA transplant code (X253U) needs a home; it belongs with the transplant family. Let me finalize and build the workbook with all three schemes. All 102 codes classified. Let me add a scheme-definitions sheet, format the workbook professionally, and finalize. I worked through all 102 codes and built three complementary classifications. Here's the thinking, then the applied file.
The leading parenthetical (the "declared domain") is not usable as-is: Oncology swallows 31 codes while the rest scatter into ~29 near-singletons, and it's internally inconsistent — the Lyme/Babesia/Bartonella serologies are filed under the pathogen name rather than "Infectious disease." So any rational scheme has to read past the first word.
-
The deeper issue is that these codes are genuinely multi-dimensional.
A hereditary-cancer panel is a germline analyte sitting in the oncology domain; a lung-cancer methylation test is epigenetic + oncology + smoking-exposure; a fetal genome is rare-disease content on a prenatal specimen.
Any single tree forces you to pick one and strand the others. That observation drives the recommendation: don't rely on one axis.
Scheme A — analyte/molecular-target spine (my recommended primary)
I'd populate CMS's Category column off what biological material is interrogated, because it's the most objective, near-MECE first cut and it tracks methodology and cost — which is what a crosswalk/gapfill panel actually weighs. Nine top classes, each with a clinical-family second level:
- Germline / constitutional DNA (20) — broad genome/exome rare-disease, hereditary cancer, targeted panels, PGx, carrier
- Somatic / tumor nucleic acid (20) — CGP/therapy-selection, MRD/ctDNA monitoring, early detection, single-marker
- Epigenetic / methylation (18) — constitutional risk scores, exposure markers, exposure-linked cancer risk
- RNA / cfDNA signature, non-tumor (7) — transplant rejection, expression subtyping, therapy-response
- Protein / metabolite / small molecule (10)
- Cellular / functional (6) — mito enzyme complexes, ektacytometry, tumor-cell enumeration
- Pathogen / microbial (9)
- Host antibody / serology (6)
- Fetal / prenatal & identity (6)
The counts come out balanced (20/20/18/10/9/7/6/6/6) rather than one giant bucket, which is a good sign the cut is doing real work.
Scheme B — faceted matrix (what I'd actually pivot on)
Rather than force the multi-analyte tests into one box, I tagged every code on four independent axes — analyte-method × clinical intent × specimen × result type — so advisors can sort any direction.
The payoff: the result-type axis isolates the 38 algorithmic/MAAA tests, which carry distinct pricing treatment, and the intent axis cleanly separates the 12 monitoring (MRD/rejection) codes from the 27 risk/prognosis codes regardless of analyte.
Scheme C — binary bifurcation tree (the form you floated)
It works and it teaches well: Q1 nucleic-acid vs not → Q2 (if NA) human vs pathogen → Q3 (if human) germline vs acquired → Q4 (if acquired) tumor / epigenetic / second-genome. I included the leaf path for each code. But it's the weakest of the three precisely because of the composites above — which is why I'd lead with A and keep B alongside it.
###
Here's a screenshot of the multi tab Claude Excel file. Here's a cloud link, but if Google Docs gets confused, see the Zip version provided earlier in this blog.
###
| Click to Enlarge |
###
###
###
Here's what Chat GPT came up with.
- I’ll inspect the workbook structure and the descriptor fields, then build a classification framework and apply it to the codes. I’ll aim for a nested taxonomy plus a simpler top-level alternative, so you can compare what might be most useful for CMS/advisors.
- I’ve confirmed the sheet has 102 code rows and a populated Long Code Descriptor column. The emerging structure is not just “clinical specialty”; several descriptors are better grouped by purpose and biology—oncology/MRD, infectious disease, inherited/constitutional genetics, transplant, PGx/toxicology/exposure, non-oncology algorithmic risk tests, and mitochondrial/functional assays.
Done — I created a working classification workbook with:
-
Classified PLA Codes — all 102 rows classified by:
- proposed CMS top-level category
- nested subcategory
- more granular class
- clinical area
- technology family
- specimen axis
- result/output axis
- flags such as algorithm, cfDNA/liquid biopsy, MRD, methylation, WGS/WES, AMR/AST
- Summary — counts by top-level and subcategory, with a chart.
- Taxonomy — definitions, descriptor cues, and example codes for each proposed category.
- Decision Tree — a practical bifurcation-style classifier CMS/advisors could apply consistently