Claims Denials Analysis#
Data from: https://www.cms.gov/marketplace/resources/data/public-use-files
Sheet IDs
2 = Individual Marketplace Medical Qualified Health Plans (QHPs)
3 = Individual Marketplace Stand Alone Dental Plans (SADPs)
4 = Small Business Health Options Program (SHOP) Medical and Dental Plans
import polars as pl
import pandas as pd
import altair as alt
alt.data_transformers.disable_max_rows()
# For some reason, I can't get Polars to skip the rows properly.
# So using Pandas, for the reader.
df = pl.from_pandas(pd.read_excel(
"data/Transparency_in_Coverage_PUF.xlsx",
sheet_name=1,
skiprows=2,
na_values=["**", "*"]))
df.head(5)
State | Issuer_Name | Issuer_ID | Is_Issuer_New_to_Exchange? (Yes_or_No) | SADP_Only? | Plan_ID | QHP/SADP | Plan_Type | Metal_Level | URL_Claims_Payment_Policies | Issuer_Claims_Received | Issuer_Claims_Denials | Issuer_Internal_Appeals_Filed | Issuer_Number_Internal_Appeals_Overturned | Issuer_Percent_Internal_Appeals_Overturned | Issuer_External_Appeals_Filed | Issuer_Number_External_Appeals_Overturned | Issuer_Percent_External_Appeals_Overturned | Plan_Number_Claims_Received | Plan_Number_Claims_Denied | Plan_Number_Claims_Denied_Referral_Required | Plan_Number_Claims_Denied_Out_of_Network | Plan_Number_Claims_Denied_Services_Excluded | Plan_Number_Claims_Denied_Not_Medically_Necessary_Excl_Behavioral_Health | Plan_Number_Claims_Denied_Not_Medically_Necessary_Behavioral_Health_Only | Plan_Number_Claims_Denied_Other | Rate_Review | Financial_Information | Average Monthly Enrollment | Average Monthly Disenrollment |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
str | str | i64 | str | str | str | str | str | str | str | f64 | f64 | f64 | f64 | f64 | f64 | f64 | f64 | f64 | f64 | f64 | f64 | f64 | f64 | f64 | f64 | str | str | f64 | f64 |
"AK" | "Premera Blue C… | 38344 | "No" | "No" | "38344AK1060001… | "QHP" | "PPO" | "Gold" | "https://www.pr… | 603348.0 | 143208.0 | 2156.0 | 1079.0 | 50.05 | null | null | null | 259304.0 | 59633.0 | 11433.0 | 143.0 | 6071.0 | 818.0 | null | 89942.0 | "https://ratere… | "https://www.na… | 6428.0 | 728.0 |
"AK" | "Premera Blue C… | 38344 | "No" | "No" | "38344AK1060002… | "QHP" | "PPO" | "Silver" | "https://www.pr… | 603348.0 | 143208.0 | 2156.0 | 1079.0 | 50.05 | null | null | null | 100437.0 | 26239.0 | 4720.0 | 95.0 | 2663.0 | 182.0 | null | 36773.0 | "https://ratere… | "https://www.na… | 2109.0 | 232.0 |
"AK" | "Premera Blue C… | 38344 | "No" | "No" | "38344AK1060004… | "QHP" | "PPO" | "Bronze Expande… | "https://www.pr… | 603348.0 | 143208.0 | 2156.0 | 1079.0 | 50.05 | null | null | null | 115878.0 | 26136.0 | 4786.0 | 20.0 | 2765.0 | 127.0 | null | 43716.0 | "https://ratere… | "https://www.na… | 3605.0 | 479.0 |
"AK" | "Premera Blue C… | 38344 | "No" | "No" | "38344AK1070002… | "QHP" | "PPO" | "Bronze Expande… | "https://www.pr… | 603348.0 | 143208.0 | 2156.0 | 1079.0 | 50.05 | null | null | null | 75481.0 | 20063.0 | 1322.0 | 37.0 | 1586.0 | 87.0 | null | 31298.0 | "https://ratere… | "https://www.na… | 3139.0 | 391.0 |
"AK" | "Premera Blue C… | 38344 | "No" | "No" | "38344AK1080001… | "QHP" | "PPO" | "Gold" | "https://www.pr… | 603348.0 | 143208.0 | 2156.0 | 1079.0 | 50.05 | null | null | null | null | null | null | null | null | null | null | null | "https://ratere… | "https://www.na… | null | null |
Tackling datasets like this#
It’s often overwhelming to tackle data you have no idea how it’s structured.
But what are data visualizations for? EXPLORATION.
Drop unnecessary columns#
First glance, I can tell the Excel file already separated:
QHP/SADP
columnsSADP_Only?
column is not useful since we’re not workign dental plansIssuer_ID
most likely is not usefulURL_Claims_Payment_Policies
for now, I don’t need link
working_df = df.drop("SADP_Only?", "Issuer_ID", "URL_Claims_Payment_Policies", "URL_Claims_Payment_Policies", "Is_Issuer_New_to_Exchange? (Yes_or_No)",
"Rate_Review", "Financial_Information")
working_df.columns
['State',
'Issuer_Name',
'Plan_ID',
'QHP/SADP',
'Plan_Type',
'Metal_Level',
'Issuer_Claims_Received',
'Issuer_Claims_Denials',
'Issuer_Internal_Appeals_Filed',
'Issuer_Number_Internal_Appeals_Overturned',
'Issuer_Percent_Internal_Appeals_Overturned',
'Issuer_External_Appeals_Filed',
'Issuer_Number_External_Appeals_Overturned',
'Issuer_Percent_External_Appeals_Overturned',
'Plan_Number_Claims_Received',
'Plan_Number_Claims_Denied',
'Plan_Number_Claims_Denied_Referral_Required',
'Plan_Number_Claims_Denied_Out_of_Network',
'Plan_Number_Claims_Denied_Services_Excluded',
'Plan_Number_Claims_Denied_Not_Medically_Necessary_Excl_Behavioral_Health',
'Plan_Number_Claims_Denied_Not_Medically_Necessary_Behavioral_Health_Only',
'Plan_Number_Claims_Denied_Other',
'Average Monthly Enrollment',
'Average Monthly Disenrollment']
Wide Dataframes#
Looking at the column names, you can see it’s one of those “wide” dataframes that is good for reporting and traditional data visualization.
Each plan gets a row of data, and all the metrics correspond to a column.
Since we have a lot of plans/rows, best bet is to try to plot our first metric - Plan_Number_Claims_Received
per plan
EDA: Scatter Graph#
First thing I’d like to see is number of claims for each plan, by state, since State
definitely wouldn’t overload one axis
alt.Chart(df).mark_circle().encode(
x=alt.X("State:O"),
y=alt.Y("Plan_Number_Claims_Received:Q")
)
Note: Missing States#
Hmm we’re missing some states, x-axis clearly doesn’t have 50 states. Where is NY, NJ, CT… all the northeast (I’m from NY).
Issuer vs. Plan-level#
Because there are two levels of data here, we pretty much have to work with two different dataframes.
Let’s start with Issuer
Issuer Level Data#
issuer_level_df = (
working_df
.filter(pl.col("Issuer_Claims_Received").is_not_null())
.groupby("State", "Issuer_Name", "Plan_Type")
.agg(
pl.first("Issuer_Claims_Received"),
pl.first("Issuer_Claims_Denials"),
pl.first("Issuer_Internal_Appeals_Filed"),
pl.first("Issuer_Number_Internal_Appeals_Overturned"),
pl.first("Issuer_Percent_Internal_Appeals_Overturned"),
pl.col("Metal_Level")
)
)
issuer_level_df.head(5)
/tmp/ipykernel_18022/128026567.py:4: DeprecationWarning: `groupby` is deprecated. It has been renamed to `group_by`.
.groupby("State", "Issuer_Name", "Plan_Type")
State | Issuer_Name | Plan_Type | Issuer_Claims_Received | Issuer_Claims_Denials | Issuer_Internal_Appeals_Filed | Issuer_Number_Internal_Appeals_Overturned | Issuer_Percent_Internal_Appeals_Overturned | Metal_Level |
---|---|---|---|---|---|---|---|---|
str | str | str | f64 | f64 | f64 | f64 | f64 | list[str] |
"AK" | "Premera Blue C… | "PPO" | 603348.0 | 143208.0 | 2156.0 | 1079.0 | 50.05 | ["Gold", "Silver", … "Bronze"] |
"AL" | "Blue Cross and… | "PPO" | 7.557971e6 | 1.710597e6 | 1974.0 | 569.0 | 28.82 | ["Gold", "Silver", … "Bronze Expanded"] |
"AL" | "Bright Health … | "EPO" | 99090.0 | 6396.0 | 20.0 | 0.0 | 0.0 | ["Gold", "Bronze Expanded", … "Silver"] |
"AR" | "QCA Health Pla… | "POS" | 1.954143e6 | 362988.0 | 297.0 | 135.0 | 45.45 | ["Silver", "Silver", … "Gold"] |
"AR" | "HMO Partners, … | "POS" | 919095.0 | 152474.0 | 311.0 | 68.0 | 21.86 | ["Silver", "Gold", … "Silver"] |
View 1: Issuer-level Viz#
Lets get started. I want to look at distribution of “denial” rates
One of the cool things that’s built-in to Altair is ability to do transformations and calculations using Altair syntax, as to reduce data modeling required for data visualizations.
Add State-to-Region mapping#
Create a Polars DataFrame with the state abbreviation and region data
state_region = pl.DataFrame({
"State": ["AL", "AK", "AZ", "AR", "CA", "CO", "CT", "DE", "FL", "GA",
"HI", "ID", "IL", "IN", "IA", "KS", "KY", "LA", "ME", "MD",
"MA", "MI", "MN", "MS", "MO", "MT", "NE", "NV", "NH", "NJ",
"NM", "NY", "NC", "ND", "OH", "OK", "OR", "PA", "RI", "SC",
"SD", "TN", "TX", "UT", "VT", "VA", "WA", "WV", "WI", "WY"],
"region": ["South", "West", "West", "South", "West", "West", "Northeast", "South", "South", "South",
"West", "West", "Midwest", "Midwest", "Midwest", "Midwest", "South", "South", "Northeast", "South",
"Northeast", "Northeast", "Midwest", "South", "Midwest", "West", "Midwest", "West", "Northeast", "Northeast",
"West", "Northeast", "South", "Midwest", "Midwest", "South", "West", "Northeast", "Northeast", "South",
"Midwest", "South", "West", "Northeast", "South", "West", "West", "South", "Midwest", "West"]
})
issuer_level_df = issuer_level_df.join(state_region, on=["State"], how='left')
View 2: Bubble Charts, sized by Denial Rates#
I know I am using Percent Denial twice (generally a no-no) but since it’s expected to be crowded, it’ll help me develop the visualization further.
percent_denial_fig = (
alt.Chart(issuer_level_df.to_pandas()).mark_circle().encode(
x=alt.X("Issuer_Claims_Received:Q").title("Total Number of Claims"),
y=alt.Y("percent_denial:Q").title("Percentage of Claims Denies (%)"),
size=alt.Size("percent_denial:Q").title(None),
color=alt.Color("State:N"),
tooltip=["State:N", "Issuer_Name:N", "Plan_Type:N", "Metal_Level:N"],
column="region:N",
)
.transform_flatten(
["Metal_Level"]
)
.transform_calculate(
percent_denial = "datum.Issuer_Claims_Denials/datum.Issuer_Claims_Received"
)
)
percent_denial_fig
Honestly, this still doesn’t separate well, since there are so many dimensions.
One of my curiosities is to compare mean denial rates for states, this is harder than you’d think, since you’d ideally
Leveraging “Selectors” in Altair#
Selectors allow Altair to “communicate” using selections across different figures, even if they are completely different charts.
It takes a bit of a learning curve, but once selection like state_selection
is assigned to a column (in this case, State
), you can use state_selection
pretty universally across Altair’s expressions:
.add_params(state_selection)
.transform_filter(state_selection)
alt.condition(state_selection...)
state_selection = alt.selection_point(fields=['State'])
state_level_average = (
alt.Chart(issuer_level_df.to_pandas()).mark_bar().encode(
x=alt.X("mean_percent_denial:Q").title("Percentage of Claims Denies (%)"),
y=alt.Y("State:N").title("State").sort("-x"),
color=alt.condition(
state_selection,
alt.Color("region:N"),
alt.value("gray")
)
)
.transform_aggregate(
total_claims = "sum(Issuer_Claims_Received)",
total_denials = "sum(Issuer_Claims_Denials)",
groupby = ["State", "region"]
)
.transform_calculate(
mean_percent_denial = "datum.total_denials / datum.total_claims"
)
.add_params(state_selection)
)
state_level_average
# This is likely a bad habit but it makes layout easier.
View 3: Add State-level Deniers (Top 10)#
Since it’s tough to show ALL issuers, lets go top 10 and bottom 10 deniers
One of the key tool is to rank and then filter using Altair functions
top_issuers = (
alt.Chart(issuer_level_df.to_pandas()).mark_bar().encode(
x=alt.X("Issuer_Name:N").sort("-y"),
y=alt.Y("percent_denial:Q").stack(None)
).transform_aggregate(
claims_received = "mean(Issuer_Claims_Received)", # There is no "drop duplicates" related to Metal (Gold/Silver/Bronze)
claims_denied = "mean(Issuer_Claims_Denials)", # There is no "drop duplicates" related to Metal (Gold/Silver/Bronze)
groupby=["Issuer_Name", "State", "region"]
).transform_calculate(
percent_denial = "datum.claims_denied / datum.claims_received"
).transform_filter(
alt.datum.Issuer_Claims_Received != None
)
)
top_issuers
Create an interactive chart, piecing view 1, view 2, view 3 together#
Bad habit#
(
(state_level_average.encode(
color=alt.condition(
state_selection,
alt.Color("region:N"),
alt.value("lightgray")
),
opacity=alt.condition(
state_selection,
alt.value(1.0),
alt.value(0.2)
)
).properties(
width=100,
height=500
)
) |
(
percent_denial_fig.add_params(state_selection).encode(
color=alt.condition(
state_selection,
alt.Color("region:N"),
alt.value("lightgray")
),
opacity=alt.condition(
state_selection,
alt.value(1.0),
alt.value(0.2)
)
)
.properties(
width=100,
height=200
) & top_issuers.encode(
opacity=alt.condition(
state_selection,
alt.value(1.0),
alt.value(0.2)
),
color=alt.condition(
state_selection,
alt.Color("region:N"),
alt.value("lightgray")
),
).add_params(state_selection).transform_filter(state_selection).transform_window(
rank='rank(percent_denial)',
sort=[alt.SortField('percent_denial', order='descending')]
).transform_filter(
alt.datum.rank <= 20
).properties(
height=200
)
)
)