Bar Chart#
Bar chart is the bread and butter of data visualization. You can basically meet 90% of data analytics needs with bar chart.
Often, online examples have 1 final example with 50-line chart-generating code, which can be daunting for begginers. I aim to to display, incremental, line-by-line additions with the corresponding outputs, which is best for learning.
Typically, most data visualization experts usually do it for creating “dashboard” output, but for me, data visualization is a mechanism to understand data, and often few of the output ends up becoming useful by chance.
The “iterative” development is often unintentionally discouraged given today’s “Scrum” culture - you need to definite your X and Y before you even get to the data. Data analysis, especially on new data, requires a lot of exploration.
Playground Data#
Polars is my favorite dataframe library. There is an interesting open-source effort in biomedical space called OpenTargets.
Downloaded the (tab-separated value) from OpenTargets’s IgA Glomerulonephritis
: https://platform.opentargets.org/disease/EFO_0004194
import polars as pl
df = pl.read_csv("data/EFO_0004194-known-drugs.tsv", separator="\t")
df.head(5)
diseaseId | diseaseName | drugId | drugName | type | mechanismOfAction | actionType | symbol | name | phase | status | source |
---|---|---|---|---|---|---|---|---|---|---|---|
str | str | str | str | str | str | str | str | str | i64 | str | str |
"EFO_0004194" | "IGA glomerulon… | "CHEMBL429910" | "DAPAGLIFLOZIN" | "Small molecule… | "Sodium/glucose… | "Inhibitor" | "SLC5A2" | "solute carrier… | 4 | "Not yet recrui… | "https://clinic… |
"EFO_0004194" | "IGA glomerulon… | "CHEMBL1168" | "RAMIPRIL" | "Small molecule… | "Angiotensin-co… | "Inhibitor" | "ACE" | "angiotensin I … | 4 | "Not yet recrui… | "https://clinic… |
"EFO_0004194" | "IGA glomerulon… | "CHEMBL1535" | "HYDROXYCHLOROQ… | "Small molecule… | "Toll-like rece… | "Antagonist" | "TLR7" | "toll like rece… | 4 | "Completed" | "https://clinic… |
"EFO_0004194" | "IGA glomerulon… | "CHEMBL1639" | "ALISKIREN" | "Small molecule… | "Renin inhibito… | "Inhibitor" | "REN" | "renin" | 4 | "Completed" | "https://clinic… |
"EFO_0004194" | "IGA glomerulon… | "CHEMBL4594217" | "CANAGLIFLOZIN" | "Small molecule… | "Sodium/glucose… | "Inhibitor" | "SLC5A2" | "solute carrier… | 4 | "Not yet recrui… | "https://clinic… |
Data Viz as exploratory data analysis#
Preview the data#
For starters, an easy way to explore complex datasets like this is to explore seemingly obvious ones.
The data contains:
row-by-row information for disease IgA glomerulonephritis (IgA nephropathy)
Base Bar Chart#
status
column is relatively understandable, even if you don’t come from biomedical background.
Let’s start with plotting a count of different status
counts.
import altair as alt
fig = (
alt.Chart(df).mark_bar().encode(
x="distinct(drugId):Q",
y=alt.X("status:N")
)
)
fig
Adding a Tooltip#
import altair as alt
fig = (
alt.Chart(df).mark_bar().encode(
x="distinct(drugId):Q",
y=alt.Y("status:N"),
tooltip=[
"distinct(drugId):Q",
"status:N"
]
)
)
fig
Reorder Y-axis (status) by x-axis (count)#
Altair syntax allows you to sort by another field. In our case, we want to sort by the data result mapped to “x” (which is distinct(drugId)
) but in descending order.
When you start to add in more complex encodings and parameters, you have to start using Altair’s object representations such as alt.X()
or alt.Color()
rather than the column names themselves (this is a shortcut to an extent)
import altair as alt
fig = (
alt.Chart(df).mark_bar().encode(
x="distinct(drugId):Q",
y=alt.Y("status:N").sort("-x"),
tooltip=[
"distinct(drugId):Q",
"status:N"
]
)
)
fig
Add Color encoding#
import altair as alt
fig = (
alt.Chart(df).mark_bar().encode(
x="distinct(drugId):Q",
y=alt.Y("status:N").sort("-x"),
color=alt.Color("status:N"),
tooltip=[
"distinct(drugId):Q",
"status:N"
]
)
)
fig
Remove Unnecesary Titles#
Many times, I find titles unnecessary or not-presentation ready.
For such cases, it’s very easy to modify encoding channel titles (e.g. axis titles, legend titles) in Altair.
Use .title()
after each encoding definition.
import altair as alt
fig = (
alt.Chart(df).mark_bar().encode(
x=alt.X("distinct(drugId):Q").title("Unique Drugs"),
y=alt.Y("status:N").sort("-x").title("Status"),
color=alt.Color("status:N").title(None),
tooltip=[
"distinct(drugId):Q",
"status:N"
]
)
)
fig
import altair as alt
fig = (
alt.Chart(df).mark_bar().encode(
x=alt.X("distinct(drugId):Q").title("Unique Drugs"),
y=alt.Y("status:N").sort("-x").title("Status"),
color=alt.Color("status:N").title(None),
tooltip=[
"distinct(drugId):Q",
"status:N"
]
)
).properties(
title="Drug Trials for IgA Nephropathy",
)
fig
alt.Chart(df).mark_bar().encode(
x=alt.X("distinct(drugId):Q").title("Unique Drugs"),
y=alt.Y("mechanismOfAction:N").sort("-x").title("Status"),
tooltip=[
"distinct(drugId):Q",
"mechanismOfAction:N"
]
)
df
diseaseId | diseaseName | drugId | drugName | type | mechanismOfAction | actionType | symbol | name | phase | status | source |
---|---|---|---|---|---|---|---|---|---|---|---|
str | str | str | str | str | str | str | str | str | i64 | str | str |
"EFO_0004194" | "IGA glomerulon… | "CHEMBL429910" | "DAPAGLIFLOZIN" | "Small molecule… | "Sodium/glucose… | "Inhibitor" | "SLC5A2" | "solute carrier… | 4 | "Not yet recrui… | "https://clinic… |
"EFO_0004194" | "IGA glomerulon… | "CHEMBL1168" | "RAMIPRIL" | "Small molecule… | "Angiotensin-co… | "Inhibitor" | "ACE" | "angiotensin I … | 4 | "Not yet recrui… | "https://clinic… |
"EFO_0004194" | "IGA glomerulon… | "CHEMBL1535" | "HYDROXYCHLOROQ… | "Small molecule… | "Toll-like rece… | "Antagonist" | "TLR7" | "toll like rece… | 4 | "Completed" | "https://clinic… |
"EFO_0004194" | "IGA glomerulon… | "CHEMBL1639" | "ALISKIREN" | "Small molecule… | "Renin inhibito… | "Inhibitor" | "REN" | "renin" | 4 | "Completed" | "https://clinic… |
"EFO_0004194" | "IGA glomerulon… | "CHEMBL4594217" | "CANAGLIFLOZIN" | "Small molecule… | "Sodium/glucose… | "Inhibitor" | "SLC5A2" | "solute carrier… | 4 | "Not yet recrui… | "https://clinic… |
"EFO_0004194" | "IGA glomerulon… | "CHEMBL1535" | "HYDROXYCHLOROQ… | "Small molecule… | "Toll-like rece… | "Antagonist" | "TLR9" | "toll like rece… | 4 | "Completed" | "https://clinic… |
"EFO_0004194" | "IGA glomerulon… | "CHEMBL578" | "ENALAPRIL" | "Small molecule… | "Angiotensin-co… | "Inhibitor" | "ACE" | "angiotensin I … | 4 | "Completed" | "https://clinic… |
"EFO_0004194" | "IGA glomerulon… | "CHEMBL1456" | "MYCOPHENOLATE … | "Small molecule… | "Inosine-5'-mon… | "Inhibitor" | "IMPDH1" | "inosine monoph… | 4 | "Unknown status… | "https://clinic… |
"EFO_0004194" | "IGA glomerulon… | "CHEMBL1069" | "VALSARTAN" | "Small molecule… | "Type-1 angiote… | "Antagonist" | "AGTR1" | "angiotensin II… | 4 | "Unknown status… | "https://clinic… |
"EFO_0004194" | "IGA glomerulon… | "CHEMBL1690" | "HYDROXYCHLOROQ… | "Small molecule… | "Toll-like rece… | "Antagonist" | "TLR7" | "toll like rece… | 4 | "Completed" | "https://clinic… |
"EFO_0004194" | "IGA glomerulon… | "CHEMBL1535" | "HYDROXYCHLOROQ… | "Small molecule… | "Toll-like rece… | "Antagonist" | "TLR9" | "toll like rece… | 4 | "Unknown status… | "https://clinic… |
"EFO_0004194" | "IGA glomerulon… | "CHEMBL608" | "PROBUCOL" | "Small molecule… | "ATP-binding ca… | "Inhibitor" | "ABCA1" | "ATP binding ca… | 4 | "Completed" | "https://clinic… |
… | … | … | … | … | … | … | … | … | … | … | … |
"EFO_0004194" | "IGA glomerulon… | "CHEMBL4594460" | "AT-1501" | "Unknown" | "CD40 ligand in… | "Inhibitor" | "CD40LG" | "CD40 ligand" | 2 | "Recruiting" | "https://clinic… |
"EFO_0004194" | "IGA glomerulon… | "CHEMBL1742986" | "ATACICEPT" | "Protein" | "Tumor necrosis… | "Inhibitor" | "TNFSF13" | "TNF superfamil… | 2 | "Active, not re… | "https://clinic… |
"EFO_0004194" | "IGA glomerulon… | "CHEMBL4594614" | "SIBEPRENLIMAB" | "Antibody" | "Tumor necrosis… | "Inhibitor" | "TNFSF13" | "TNF superfamil… | 2 | "Active, not re… | "https://clinic… |
"EFO_0004194" | "IGA glomerulon… | "CHEMBL4594579" | "FELZARTAMAB" | "Antibody" | "Lymphocyte dif… | "Inhibitor" | "CD38" | "CD38 molecule" | 2 | "Recruiting" | "https://clinic… |
"EFO_0004194" | "IGA glomerulon… | "CHEMBL1487" | "ATORVASTATIN" | "Small molecule… | "HMG-CoA reduct… | "Inhibitor" | "HMGCR" | "3-hydroxy-3-me… | 2 | "Completed" | "https://clinic… |
"EFO_0004194" | "IGA glomerulon… | "CHEMBL3989871" | "AVACOPAN" | "Small molecule… | "C5a anaphylato… | "Antagonist" | "C5AR1" | "complement C5a… | 2 | "Completed" | "https://clinic… |
"EFO_0004194" | "IGA glomerulon… | "CHEMBL1370" | "BUDESONIDE" | "Small molecule… | "Glucocorticoid… | "Agonist" | "NR3C1" | "nuclear recept… | 2 | "Completed" | "https://clinic… |
"EFO_0004194" | "IGA glomerulon… | "CHEMBL2103830" | "FOSTAMATINIB" | "Small molecule… | "Tyrosine-prote… | "Inhibitor" | "SYK" | "spleen associa… | 2 | "Completed" | "https://clinic… |
"EFO_0004194" | "IGA glomerulon… | "CHEMBL578" | "ENALAPRIL" | "Small molecule… | "Angiotensin-co… | "Inhibitor" | "ACE" | "angiotensin I … | 2 | "Completed" | "https://clinic… |
"EFO_0004194" | "IGA glomerulon… | "CHEMBL3989516" | "FOSTAMATINIB D… | "Small molecule… | "Tyrosine-prote… | "Inhibitor" | "SYK" | "spleen associa… | 2 | "Withdrawn" | "https://clinic… |
"EFO_0004194" | "IGA glomerulon… | "CHEMBL4297722" | "CEMDISIRAN" | "Oligonucleotid… | "Complement C5 … | "Rnai inhibitor… | "C5" | "complement C5" | 2 | "Active, not re… | "https://clinic… |
"EFO_0004194" | "IGA glomerulon… | "CHEMBL4594614" | "SIBEPRENLIMAB" | "Antibody" | "Tumor necrosis… | "Inhibitor" | "TNFSF13" | "TNF superfamil… | 1 | "Completed" | "https://clinic… |