Claims Denials Analysis

Claims Denials Analysis#

Data from: https://www.cms.gov/marketplace/resources/data/public-use-files

Sheet IDs

2 = Individual Marketplace Medical Qualified Health Plans (QHPs)
3 = Individual Marketplace Stand Alone Dental Plans (SADPs)
4 = Small Business Health Options Program (SHOP) Medical and Dental Plans

Alt text

import polars as pl
import pandas as pd
import altair as alt

alt.data_transformers.disable_max_rows()


# For some reason, I can't get Polars to skip the rows properly.
# So using Pandas, for the reader.
df = pl.from_pandas(pd.read_excel(
    "data/Transparency_in_Coverage_PUF.xlsx", 
    sheet_name=1,
    skiprows=2,
    na_values=["**", "*"]))

df.head(5)

shape: (5, 30)

State	Issuer_Name	Issuer_ID	Is_Issuer_New_to_Exchange? (Yes_or_No)	SADP_Only?	Plan_ID	QHP/SADP	Plan_Type	Metal_Level	URL_Claims_Payment_Policies	Issuer_Claims_Received	Issuer_Claims_Denials	Issuer_Internal_Appeals_Filed	Issuer_Number_Internal_Appeals_Overturned	Issuer_Percent_Internal_Appeals_Overturned	Issuer_External_Appeals_Filed	Issuer_Number_External_Appeals_Overturned	Issuer_Percent_External_Appeals_Overturned	Plan_Number_Claims_Received	Plan_Number_Claims_Denied	Plan_Number_Claims_Denied_Referral_Required	Plan_Number_Claims_Denied_Out_of_Network	Plan_Number_Claims_Denied_Services_Excluded	Plan_Number_Claims_Denied_Not_Medically_Necessary_Excl_Behavioral_Health	Plan_Number_Claims_Denied_Not_Medically_Necessary_Behavioral_Health_Only	Plan_Number_Claims_Denied_Other	Rate_Review	Financial_Information	Average Monthly Enrollment	Average Monthly Disenrollment
str	str	i64	str	str	str	str	str	str	str	f64	f64	f64	f64	f64	f64	f64	f64	f64	f64	f64	f64	f64	f64	f64	f64	str	str	f64	f64
"AK"	"Premera Blue C…	38344	"No"	"No"	"38344AK1060001…	"QHP"	"PPO"	"Gold"	"https://www.pr…	603348.0	143208.0	2156.0	1079.0	50.05	null	null	null	259304.0	59633.0	11433.0	143.0	6071.0	818.0	null	89942.0	"https://ratere…	"https://www.na…	6428.0	728.0
"AK"	"Premera Blue C…	38344	"No"	"No"	"38344AK1060002…	"QHP"	"PPO"	"Silver"	"https://www.pr…	603348.0	143208.0	2156.0	1079.0	50.05	null	null	null	100437.0	26239.0	4720.0	95.0	2663.0	182.0	null	36773.0	"https://ratere…	"https://www.na…	2109.0	232.0
"AK"	"Premera Blue C…	38344	"No"	"No"	"38344AK1060004…	"QHP"	"PPO"	"Bronze Expande…	"https://www.pr…	603348.0	143208.0	2156.0	1079.0	50.05	null	null	null	115878.0	26136.0	4786.0	20.0	2765.0	127.0	null	43716.0	"https://ratere…	"https://www.na…	3605.0	479.0
"AK"	"Premera Blue C…	38344	"No"	"No"	"38344AK1070002…	"QHP"	"PPO"	"Bronze Expande…	"https://www.pr…	603348.0	143208.0	2156.0	1079.0	50.05	null	null	null	75481.0	20063.0	1322.0	37.0	1586.0	87.0	null	31298.0	"https://ratere…	"https://www.na…	3139.0	391.0
"AK"	"Premera Blue C…	38344	"No"	"No"	"38344AK1080001…	"QHP"	"PPO"	"Gold"	"https://www.pr…	603348.0	143208.0	2156.0	1079.0	50.05	null	null	null	null	null	null	null	null	null	null	null	"https://ratere…	"https://www.na…	null	null

Tackling datasets like this#

It’s often overwhelming to tackle data you have no idea how it’s structured.

But what are data visualizations for? EXPLORATION.

Drop unnecessary columns#

First glance, I can tell the Excel file already separated:

QHP/SADP columns
SADP_Only? column is not useful since we’re not workign dental plans
Issuer_ID most likely is not useful
URL_Claims_Payment_Policies for now, I don’t need link

working_df = df.drop("SADP_Only?", "Issuer_ID", "URL_Claims_Payment_Policies", "URL_Claims_Payment_Policies", "Is_Issuer_New_to_Exchange? (Yes_or_No)",
                     "Rate_Review", "Financial_Information")

working_df.columns

['State',
 'Issuer_Name',
 'Plan_ID',
 'QHP/SADP',
 'Plan_Type',
 'Metal_Level',
 'Issuer_Claims_Received',
 'Issuer_Claims_Denials',
 'Issuer_Internal_Appeals_Filed',
 'Issuer_Number_Internal_Appeals_Overturned',
 'Issuer_Percent_Internal_Appeals_Overturned',
 'Issuer_External_Appeals_Filed',
 'Issuer_Number_External_Appeals_Overturned',
 'Issuer_Percent_External_Appeals_Overturned',
 'Plan_Number_Claims_Received',
 'Plan_Number_Claims_Denied',
 'Plan_Number_Claims_Denied_Referral_Required',
 'Plan_Number_Claims_Denied_Out_of_Network',
 'Plan_Number_Claims_Denied_Services_Excluded',
 'Plan_Number_Claims_Denied_Not_Medically_Necessary_Excl_Behavioral_Health',
 'Plan_Number_Claims_Denied_Not_Medically_Necessary_Behavioral_Health_Only',
 'Plan_Number_Claims_Denied_Other',
 'Average Monthly Enrollment',
 'Average Monthly Disenrollment']

Wide Dataframes#

Looking at the column names, you can see it’s one of those “wide” dataframes that is good for reporting and traditional data visualization.

Each plan gets a row of data, and all the metrics correspond to a column.

Since we have a lot of plans/rows, best bet is to try to plot our first metric - Plan_Number_Claims_Received per plan

EDA: Scatter Graph#

First thing I’d like to see is number of claims for each plan, by state, since State definitely wouldn’t overload one axis

alt.Chart(df).mark_circle().encode(
    x=alt.X("State:O"),
    y=alt.Y("Plan_Number_Claims_Received:Q")
)

Note: Missing States#

Hmm we’re missing some states, x-axis clearly doesn’t have 50 states. Where is NY, NJ, CT… all the northeast (I’m from NY).

Issuer vs. Plan-level#

Because there are two levels of data here, we pretty much have to work with two different dataframes.

Let’s start with Issuer

Issuer Level Data#

issuer_level_df = (
    working_df
    .filter(pl.col("Issuer_Claims_Received").is_not_null())
    .groupby("State", "Issuer_Name", "Plan_Type")
    .agg(
        pl.first("Issuer_Claims_Received"),
        pl.first("Issuer_Claims_Denials"),
        pl.first("Issuer_Internal_Appeals_Filed"),
        pl.first("Issuer_Number_Internal_Appeals_Overturned"),
        pl.first("Issuer_Percent_Internal_Appeals_Overturned"),       
        pl.col("Metal_Level") 
    )
)

issuer_level_df.head(5)

/tmp/ipykernel_18022/128026567.py:4: DeprecationWarning: `groupby` is deprecated. It has been renamed to `group_by`.
  .groupby("State", "Issuer_Name", "Plan_Type")

shape: (5, 9)

State	Issuer_Name	Plan_Type	Issuer_Claims_Received	Issuer_Claims_Denials	Issuer_Internal_Appeals_Filed	Issuer_Number_Internal_Appeals_Overturned	Issuer_Percent_Internal_Appeals_Overturned	Metal_Level
str	str	str	f64	f64	f64	f64	f64	list[str]
"AK"	"Premera Blue C…	"PPO"	603348.0	143208.0	2156.0	1079.0	50.05	["Gold", "Silver", … "Bronze"]
"AL"	"Blue Cross and…	"PPO"	7.557971e6	1.710597e6	1974.0	569.0	28.82	["Gold", "Silver", … "Bronze Expanded"]
"AL"	"Bright Health …	"EPO"	99090.0	6396.0	20.0	0.0	0.0	["Gold", "Bronze Expanded", … "Silver"]
"AR"	"QCA Health Pla…	"POS"	1.954143e6	362988.0	297.0	135.0	45.45	["Silver", "Silver", … "Gold"]
"AR"	"HMO Partners, …	"POS"	919095.0	152474.0	311.0	68.0	21.86	["Silver", "Gold", … "Silver"]

View 1: Issuer-level Viz#

Lets get started. I want to look at distribution of “denial” rates

One of the cool things that’s built-in to Altair is ability to do transformations and calculations using Altair syntax, as to reduce data modeling required for data visualizations.

Add State-to-Region mapping#

Create a Polars DataFrame with the state abbreviation and region data

state_region = pl.DataFrame({
    "State": ["AL", "AK", "AZ", "AR", "CA", "CO", "CT", "DE", "FL", "GA",
            "HI", "ID", "IL", "IN", "IA", "KS", "KY", "LA", "ME", "MD",
            "MA", "MI", "MN", "MS", "MO", "MT", "NE", "NV", "NH", "NJ",
            "NM", "NY", "NC", "ND", "OH", "OK", "OR", "PA", "RI", "SC",
            "SD", "TN", "TX", "UT", "VT", "VA", "WA", "WV", "WI", "WY"],
    "region": ["South", "West", "West", "South", "West", "West", "Northeast", "South", "South", "South",
           "West", "West", "Midwest", "Midwest", "Midwest", "Midwest", "South", "South", "Northeast", "South",
           "Northeast", "Northeast", "Midwest", "South", "Midwest", "West", "Midwest", "West", "Northeast", "Northeast",
           "West", "Northeast", "South", "Midwest", "Midwest", "South", "West", "Northeast", "Northeast", "South",
           "Midwest", "South", "West", "Northeast", "South", "West", "West", "South", "Midwest", "West"]
})
issuer_level_df = issuer_level_df.join(state_region, on=["State"], how='left')

View 2: Bubble Charts, sized by Denial Rates#

I know I am using Percent Denial twice (generally a no-no) but since it’s expected to be crowded, it’ll help me develop the visualization further.

percent_denial_fig = (
    alt.Chart(issuer_level_df.to_pandas()).mark_circle().encode(
        x=alt.X("Issuer_Claims_Received:Q").title("Total Number of Claims"),
        y=alt.Y("percent_denial:Q").title("Percentage of Claims Denies (%)"),
        size=alt.Size("percent_denial:Q").title(None),
        color=alt.Color("State:N"),
        tooltip=["State:N", "Issuer_Name:N", "Plan_Type:N", "Metal_Level:N"],
        column="region:N",
    )
    .transform_flatten(
        ["Metal_Level"]
    )
    .transform_calculate(
        percent_denial = "datum.Issuer_Claims_Denials/datum.Issuer_Claims_Received"
    )
)
percent_denial_fig

Honestly, this still doesn’t separate well, since there are so many dimensions.

One of my curiosities is to compare mean denial rates for states, this is harder than you’d think, since you’d ideally

Leveraging “Selectors” in Altair#

Selectors allow Altair to “communicate” using selections across different figures, even if they are completely different charts.

It takes a bit of a learning curve, but once selection like state_selection is assigned to a column (in this case, State), you can use state_selection pretty universally across Altair’s expressions:

.add_params(state_selection)
.transform_filter(state_selection)
alt.condition(state_selection...)

state_selection = alt.selection_point(fields=['State'])

state_level_average = (
    alt.Chart(issuer_level_df.to_pandas()).mark_bar().encode(
        x=alt.X("mean_percent_denial:Q").title("Percentage of Claims Denies (%)"),
        y=alt.Y("State:N").title("State").sort("-x"),
        color=alt.condition(
            state_selection,
            alt.Color("region:N"),
            alt.value("gray")
        )
    )
    .transform_aggregate(
        total_claims = "sum(Issuer_Claims_Received)",
        total_denials = "sum(Issuer_Claims_Denials)",
        groupby = ["State", "region"]
    )
    .transform_calculate(
        mean_percent_denial = "datum.total_denials / datum.total_claims"
    )
    .add_params(state_selection)
)

state_level_average

# This is likely a bad habit but it makes layout easier.

View 3: Add State-level Deniers (Top 10)#

Since it’s tough to show ALL issuers, lets go top 10 and bottom 10 deniers

One of the key tool is to rank and then filter using Altair functions

top_issuers = (
    alt.Chart(issuer_level_df.to_pandas()).mark_bar().encode(
        x=alt.X("Issuer_Name:N").sort("-y"),
        y=alt.Y("percent_denial:Q").stack(None)
    ).transform_aggregate(
        claims_received = "mean(Issuer_Claims_Received)",   # There is no "drop duplicates" related to Metal (Gold/Silver/Bronze)
        claims_denied = "mean(Issuer_Claims_Denials)",    # There is no "drop duplicates" related to Metal (Gold/Silver/Bronze)
        groupby=["Issuer_Name", "State", "region"]
    ).transform_calculate(
        percent_denial = "datum.claims_denied / datum.claims_received"
    ).transform_filter(
        alt.datum.Issuer_Claims_Received != None
    )
)
top_issuers

Create an interactive chart, piecing view 1, view 2, view 3 together#

Bad habit#

(
    (state_level_average.encode(
        color=alt.condition(
            state_selection,
            alt.Color("region:N"),
            alt.value("lightgray")
        ),
        opacity=alt.condition(
            state_selection,
            alt.value(1.0),
            alt.value(0.2)
        )
    ).properties(
        width=100,
        height=500
    )
    ) | 
    (
        percent_denial_fig.add_params(state_selection).encode(
            color=alt.condition(
                state_selection,
                alt.Color("region:N"),
                alt.value("lightgray")
            ),
            opacity=alt.condition(
                state_selection,
                alt.value(1.0),
                alt.value(0.2)
            )
        )
        .properties(
            width=100,
            height=200
        ) & top_issuers.encode(
                opacity=alt.condition(
                    state_selection,
                    alt.value(1.0),
                    alt.value(0.2)
                ),
                color=alt.condition(
                    state_selection,
                    alt.Color("region:N"),
                    alt.value("lightgray")
                ),
        ).add_params(state_selection).transform_filter(state_selection).transform_window(
                rank='rank(percent_denial)',
                sort=[alt.SortField('percent_denial', order='descending')]  
            ).transform_filter(
                alt.datum.rank <= 20
            ).properties(
                height=200
            )
    )
)