Saturday, September 23, 2017

PAMA Raw Data Provides Unprecedented Instant View of US Lab Industry - For Free

On September 22, 2017, CMS released PAMA median pricing data that will be used to reset Medicare's national fee schedule for lab tests - both conventional tests and genomics.  (Here).

CMS also provides ALL the raw data in an open access web interface that allows filtering and data pulls.   The data website is here.  
(This is the same data I have reported on and used for physician and lab Medicare Part B claims, e.g. here.  I have some instructions for using the Part B data set, here; the PAMA dataset is nearly the same so the Part B instructions will help understand the PAMA web interface.) 
Between Medicare 2015 Part B claims data on line, and the new PAMA data for 1H2016, anyone can have an unprecedented view of the US lab industry - for free.   

Imputing National Volume Data from CMS Part B Public Data and New PAMA Public Data

For example, we already knew that in CY2015 data from CMS, there were about 19,000 uses of BRCA full sequencing under code 81211 (here).

I made a simple data dump of the new 1H2016 PAMA private payer data for BRCA and stored in the cloud here.  It shows 94,977 uses of 81211 paid by private payers.   (Remember, that's for 1H2016, a half year).  

This suggests that if we add CMS Part B 2015 data (19,000 uses) and we "double" the PAMA 1H2016 data to account for a full year (190,000 uses), there are about 219,000 uses of BRCA sequencing 81211 per year, and Medicare cases are about 10% of the annual total of BRCA sequencing cases.

Wild West of Price Ranges

You also see bizarre stuff.  For example, the table reports 22 payments for BRCA sequencing 81211 at $44,207.   And 27 payments at $30,969.   And 28 payments at $17,612.  But PAMA data also shows dozens of paid claims for 81211 trickling in at the triple digits ($150, $149, $145).

The density distribution of BRCA 81211 price points is non-normal (non-Guassian):

Sole-Source Labs

You don't see this detailed price distribution data for sole-source labs.  For example, if you filter the raw data for 81519, Oncotype DX Breast, you don't get any of the raw data - neither volumes nor prices.  

Nonetheless, in the PAMA zip file of data, CMS does provide summary statistics such as 25th percentile and 75th percentile for all tests, including sole source proprietary tests.  

For Oncotype DX 81519, this file shows a 2017 current Medicare price of $3443, with PAMA new weighted median of $3873, and a 25th/75th percentile range of ($3713, $4153), which is pretty tight.  However, CMS also tells us the minimum reported payment for 81519 was $1 and the maximum was $38,632 (click to enlarge):

Wild World of Clinical Chemistry Price Ranges

The spread and strangeness isn't limited to genetics.  I checked a routine clinical chemistry code, 86003 (blood-based IgE allergy testing) and while the PAMA price will eventually fall to around $4.50 from the current around $7, there were dozens of PAMA reported payments exceeding $500 per test, and even exceeding $1000.  

Did a Few Labs Report Total Revenue and Number of Tests, Rather Than Per-Test Price?

Possibly, some labs misread the instructions and reported total revenue and number of tests, rather than price and number of tests.  For example, a lab might have gotten paid for 10 tests at $10 and reported "10 tests, $100" to CMS.   This could have happened:

But in other cases, like for G0483 testing (drug test panel), there are many cases where only N=1 test is reported at astronomical pricing like $5,000:

Also, I noted in other analyses that some codes (like CYP testing and drug tox testing) seemed more prone to super-high prices while others (like cystic fibrosis testing) did not.   Drug screen 80303 had a median price of $623 but a 75th percentile price of $5812 and an average price of $2687, suggesting a highly skewed distribution of pricing.

In addition, I've been told that $10,000 and $20,000 and higher bills for lab tests do occur, so they don't have to be explained by mis-numbered multiples.

Very high payment tests, though a minority of all tests, could drive industry profitability (here).

Sifting the Large Data

Going back to the original CMS data website, in a few minutes I produced a table online of all CLFS codes, rolled up by HCPCS code (so the 5M lines become 1300 lines), then rolled up by Volume/Sum for each line and Price/Average for each line.  I exported this as a CSV file and quickly saved as an Excel workbook file.  I then multiplied volume x price to create a new column of "dollar volume."  I think doubled the utilization and payment columns to impute a full-year volume.   This summed up to $20B, which is about one-third of the US lab industry if you assume the latter is 2% of our $3% healthcare system.[*]    Is this perfect data?  No.  Is it fast?  Yes - what I just described took a minute or two.   The online sort and window function looked like this:

A data view of the Excel (262 kb) that I just described looks like this:

For example, the top ten codes add up to 41% of the $20B annual dollar volume in the data.  (Note, this is illustrative only; I think the online rollup function calculated an average price based on the number of prices, aka data lines of price, reported, not a weighted average price, so the totals and muliplications aren't exact.)  the point is you can get first-cut answers in seconds out of 5M lines of data.

The spreadsheet is in the cloud here.