Claims Denials Analysis#

Data from: https://www.cms.gov/marketplace/resources/data/public-use-files

Sheet IDs

  • 2 = Individual Marketplace Medical Qualified Health Plans (QHPs)

  • 3 = Individual Marketplace Stand Alone Dental Plans (SADPs)

  • 4 = Small Business Health Options Program (SHOP) Medical and Dental Plans

Alt text

import polars as pl
import pandas as pd
import altair as alt

alt.data_transformers.disable_max_rows()


# For some reason, I can't get Polars to skip the rows properly.
# So using Pandas, for the reader.
df = pl.from_pandas(pd.read_excel(
    "data/Transparency_in_Coverage_PUF.xlsx", 
    sheet_name=1,
    skiprows=2,
    na_values=["**", "*"]))
df.head(5)
shape: (5, 30)
StateIssuer_NameIssuer_IDIs_Issuer_New_to_Exchange? (Yes_or_No)SADP_Only?Plan_IDQHP/SADPPlan_TypeMetal_LevelURL_Claims_Payment_PoliciesIssuer_Claims_ReceivedIssuer_Claims_DenialsIssuer_Internal_Appeals_FiledIssuer_Number_Internal_Appeals_OverturnedIssuer_Percent_Internal_Appeals_OverturnedIssuer_External_Appeals_FiledIssuer_Number_External_Appeals_OverturnedIssuer_Percent_External_Appeals_OverturnedPlan_Number_Claims_ReceivedPlan_Number_Claims_DeniedPlan_Number_Claims_Denied_Referral_RequiredPlan_Number_Claims_Denied_Out_of_NetworkPlan_Number_Claims_Denied_Services_ExcludedPlan_Number_Claims_Denied_Not_Medically_Necessary_Excl_Behavioral_HealthPlan_Number_Claims_Denied_Not_Medically_Necessary_Behavioral_Health_OnlyPlan_Number_Claims_Denied_OtherRate_ReviewFinancial_InformationAverage Monthly EnrollmentAverage Monthly Disenrollment
strstri64strstrstrstrstrstrstrf64f64f64f64f64f64f64f64f64f64f64f64f64f64f64f64strstrf64f64
"AK""Premera Blue C…38344"No""No""38344AK1060001…"QHP""PPO""Gold""https://www.pr…603348.0143208.02156.01079.050.05nullnullnull259304.059633.011433.0143.06071.0818.0null89942.0"https://ratere…"https://www.na…6428.0728.0
"AK""Premera Blue C…38344"No""No""38344AK1060002…"QHP""PPO""Silver""https://www.pr…603348.0143208.02156.01079.050.05nullnullnull100437.026239.04720.095.02663.0182.0null36773.0"https://ratere…"https://www.na…2109.0232.0
"AK""Premera Blue C…38344"No""No""38344AK1060004…"QHP""PPO""Bronze Expande…"https://www.pr…603348.0143208.02156.01079.050.05nullnullnull115878.026136.04786.020.02765.0127.0null43716.0"https://ratere…"https://www.na…3605.0479.0
"AK""Premera Blue C…38344"No""No""38344AK1070002…"QHP""PPO""Bronze Expande…"https://www.pr…603348.0143208.02156.01079.050.05nullnullnull75481.020063.01322.037.01586.087.0null31298.0"https://ratere…"https://www.na…3139.0391.0
"AK""Premera Blue C…38344"No""No""38344AK1080001…"QHP""PPO""Gold""https://www.pr…603348.0143208.02156.01079.050.05nullnullnullnullnullnullnullnullnullnullnull"https://ratere…"https://www.na…nullnull

Tackling datasets like this#

It’s often overwhelming to tackle data you have no idea how it’s structured.

But what are data visualizations for? EXPLORATION.

Drop unnecessary columns#

First glance, I can tell the Excel file already separated:

  • QHP/SADP columns

  • SADP_Only? column is not useful since we’re not workign dental plans

  • Issuer_ID most likely is not useful

  • URL_Claims_Payment_Policies for now, I don’t need link

working_df = df.drop("SADP_Only?", "Issuer_ID", "URL_Claims_Payment_Policies", "URL_Claims_Payment_Policies", "Is_Issuer_New_to_Exchange? (Yes_or_No)",
                     "Rate_Review", "Financial_Information")
working_df.columns
['State',
 'Issuer_Name',
 'Plan_ID',
 'QHP/SADP',
 'Plan_Type',
 'Metal_Level',
 'Issuer_Claims_Received',
 'Issuer_Claims_Denials',
 'Issuer_Internal_Appeals_Filed',
 'Issuer_Number_Internal_Appeals_Overturned',
 'Issuer_Percent_Internal_Appeals_Overturned',
 'Issuer_External_Appeals_Filed',
 'Issuer_Number_External_Appeals_Overturned',
 'Issuer_Percent_External_Appeals_Overturned',
 'Plan_Number_Claims_Received',
 'Plan_Number_Claims_Denied',
 'Plan_Number_Claims_Denied_Referral_Required',
 'Plan_Number_Claims_Denied_Out_of_Network',
 'Plan_Number_Claims_Denied_Services_Excluded',
 'Plan_Number_Claims_Denied_Not_Medically_Necessary_Excl_Behavioral_Health',
 'Plan_Number_Claims_Denied_Not_Medically_Necessary_Behavioral_Health_Only',
 'Plan_Number_Claims_Denied_Other',
 'Average Monthly Enrollment',
 'Average Monthly Disenrollment']

Wide Dataframes#

Looking at the column names, you can see it’s one of those “wide” dataframes that is good for reporting and traditional data visualization.

Each plan gets a row of data, and all the metrics correspond to a column.

Since we have a lot of plans/rows, best bet is to try to plot our first metric - Plan_Number_Claims_Received per plan

EDA: Scatter Graph#

First thing I’d like to see is number of claims for each plan, by state, since State definitely wouldn’t overload one axis

alt.Chart(df).mark_circle().encode(
    x=alt.X("State:O"),
    y=alt.Y("Plan_Number_Claims_Received:Q")
)

Note: Missing States#

Hmm we’re missing some states, x-axis clearly doesn’t have 50 states. Where is NY, NJ, CT… all the northeast (I’m from NY).

Issuer vs. Plan-level#

Because there are two levels of data here, we pretty much have to work with two different dataframes.

Let’s start with Issuer

Issuer Level Data#

issuer_level_df = (
    working_df
    .filter(pl.col("Issuer_Claims_Received").is_not_null())
    .groupby("State", "Issuer_Name", "Plan_Type")
    .agg(
        pl.first("Issuer_Claims_Received"),
        pl.first("Issuer_Claims_Denials"),
        pl.first("Issuer_Internal_Appeals_Filed"),
        pl.first("Issuer_Number_Internal_Appeals_Overturned"),
        pl.first("Issuer_Percent_Internal_Appeals_Overturned"),       
        pl.col("Metal_Level") 
    )
)

issuer_level_df.head(5)
/tmp/ipykernel_18022/128026567.py:4: DeprecationWarning: `groupby` is deprecated. It has been renamed to `group_by`.
  .groupby("State", "Issuer_Name", "Plan_Type")
shape: (5, 9)
StateIssuer_NamePlan_TypeIssuer_Claims_ReceivedIssuer_Claims_DenialsIssuer_Internal_Appeals_FiledIssuer_Number_Internal_Appeals_OverturnedIssuer_Percent_Internal_Appeals_OverturnedMetal_Level
strstrstrf64f64f64f64f64list[str]
"AK""Premera Blue C…"PPO"603348.0143208.02156.01079.050.05["Gold", "Silver", … "Bronze"]
"AL""Blue Cross and…"PPO"7.557971e61.710597e61974.0569.028.82["Gold", "Silver", … "Bronze Expanded"]
"AL""Bright Health …"EPO"99090.06396.020.00.00.0["Gold", "Bronze Expanded", … "Silver"]
"AR""QCA Health Pla…"POS"1.954143e6362988.0297.0135.045.45["Silver", "Silver", … "Gold"]
"AR""HMO Partners, …"POS"919095.0152474.0311.068.021.86["Silver", "Gold", … "Silver"]

View 1: Issuer-level Viz#

Lets get started. I want to look at distribution of “denial” rates

One of the cool things that’s built-in to Altair is ability to do transformations and calculations using Altair syntax, as to reduce data modeling required for data visualizations.

Add State-to-Region mapping#

Create a Polars DataFrame with the state abbreviation and region data

state_region = pl.DataFrame({
    "State": ["AL", "AK", "AZ", "AR", "CA", "CO", "CT", "DE", "FL", "GA",
            "HI", "ID", "IL", "IN", "IA", "KS", "KY", "LA", "ME", "MD",
            "MA", "MI", "MN", "MS", "MO", "MT", "NE", "NV", "NH", "NJ",
            "NM", "NY", "NC", "ND", "OH", "OK", "OR", "PA", "RI", "SC",
            "SD", "TN", "TX", "UT", "VT", "VA", "WA", "WV", "WI", "WY"],
    "region": ["South", "West", "West", "South", "West", "West", "Northeast", "South", "South", "South",
           "West", "West", "Midwest", "Midwest", "Midwest", "Midwest", "South", "South", "Northeast", "South",
           "Northeast", "Northeast", "Midwest", "South", "Midwest", "West", "Midwest", "West", "Northeast", "Northeast",
           "West", "Northeast", "South", "Midwest", "Midwest", "South", "West", "Northeast", "Northeast", "South",
           "Midwest", "South", "West", "Northeast", "South", "West", "West", "South", "Midwest", "West"]
})
issuer_level_df = issuer_level_df.join(state_region, on=["State"], how='left')

View 2: Bubble Charts, sized by Denial Rates#

I know I am using Percent Denial twice (generally a no-no) but since it’s expected to be crowded, it’ll help me develop the visualization further.

percent_denial_fig = (
    alt.Chart(issuer_level_df.to_pandas()).mark_circle().encode(
        x=alt.X("Issuer_Claims_Received:Q").title("Total Number of Claims"),
        y=alt.Y("percent_denial:Q").title("Percentage of Claims Denies (%)"),
        size=alt.Size("percent_denial:Q").title(None),
        color=alt.Color("State:N"),
        tooltip=["State:N", "Issuer_Name:N", "Plan_Type:N", "Metal_Level:N"],
        column="region:N",
    )
    .transform_flatten(
        ["Metal_Level"]
    )
    .transform_calculate(
        percent_denial = "datum.Issuer_Claims_Denials/datum.Issuer_Claims_Received"
    )
)
percent_denial_fig

Honestly, this still doesn’t separate well, since there are so many dimensions.

One of my curiosities is to compare mean denial rates for states, this is harder than you’d think, since you’d ideally

Leveraging “Selectors” in Altair#

Selectors allow Altair to “communicate” using selections across different figures, even if they are completely different charts.

It takes a bit of a learning curve, but once selection like state_selection is assigned to a column (in this case, State), you can use state_selection pretty universally across Altair’s expressions:

  • .add_params(state_selection)

  • .transform_filter(state_selection)

  • alt.condition(state_selection...)

state_selection = alt.selection_point(fields=['State'])
state_level_average = (
    alt.Chart(issuer_level_df.to_pandas()).mark_bar().encode(
        x=alt.X("mean_percent_denial:Q").title("Percentage of Claims Denies (%)"),
        y=alt.Y("State:N").title("State").sort("-x"),
        color=alt.condition(
            state_selection,
            alt.Color("region:N"),
            alt.value("gray")
        )
    )
    .transform_aggregate(
        total_claims = "sum(Issuer_Claims_Received)",
        total_denials = "sum(Issuer_Claims_Denials)",
        groupby = ["State", "region"]
    )
    .transform_calculate(
        mean_percent_denial = "datum.total_denials / datum.total_claims"
    )
    .add_params(state_selection)
)
state_level_average
# This is likely a bad habit but it makes layout easier. 

View 3: Add State-level Deniers (Top 10)#

Since it’s tough to show ALL issuers, lets go top 10 and bottom 10 deniers

One of the key tool is to rank and then filter using Altair functions

top_issuers = (
    alt.Chart(issuer_level_df.to_pandas()).mark_bar().encode(
        x=alt.X("Issuer_Name:N").sort("-y"),
        y=alt.Y("percent_denial:Q").stack(None)
    ).transform_aggregate(
        claims_received = "mean(Issuer_Claims_Received)",   # There is no "drop duplicates" related to Metal (Gold/Silver/Bronze)
        claims_denied = "mean(Issuer_Claims_Denials)",    # There is no "drop duplicates" related to Metal (Gold/Silver/Bronze)
        groupby=["Issuer_Name", "State", "region"]
    ).transform_calculate(
        percent_denial = "datum.claims_denied / datum.claims_received"
    ).transform_filter(
        alt.datum.Issuer_Claims_Received != None
    )
)
top_issuers

Create an interactive chart, piecing view 1, view 2, view 3 together#

Bad habit#

(
    (state_level_average.encode(
        color=alt.condition(
            state_selection,
            alt.Color("region:N"),
            alt.value("lightgray")
        ),
        opacity=alt.condition(
            state_selection,
            alt.value(1.0),
            alt.value(0.2)
        )
    ).properties(
        width=100,
        height=500
    )
    ) | 
    (
        percent_denial_fig.add_params(state_selection).encode(
            color=alt.condition(
                state_selection,
                alt.Color("region:N"),
                alt.value("lightgray")
            ),
            opacity=alt.condition(
                state_selection,
                alt.value(1.0),
                alt.value(0.2)
            )
        )
        .properties(
            width=100,
            height=200
        ) & top_issuers.encode(
                opacity=alt.condition(
                    state_selection,
                    alt.value(1.0),
                    alt.value(0.2)
                ),
                color=alt.condition(
                    state_selection,
                    alt.Color("region:N"),
                    alt.value("lightgray")
                ),
        ).add_params(state_selection).transform_filter(state_selection).transform_window(
                rank='rank(percent_denial)',
                sort=[alt.SortField('percent_denial', order='descending')]  
            ).transform_filter(
                alt.datum.rank <= 20
            ).properties(
                height=200
            )
    )
)