Error bars
Visualize repeated measurements across categories with automatic aggregation and configurable error bars.
Each x category can contain multiple y values. The chart shows the mean as a point and the spread as error bars, computed by Altair's native mark_errorbar.
Example
import polars as pl
from plotutils.uncertainty import plot_confidence_scatter
df = pl.DataFrame({
"category": ["Low"] * 10 + ["Medium"] * 10 + ["High"] * 10,
"value": [1.0, 0.8, 1.2, 0.9, 1.1, 1.0, 0.7, 1.3, 0.95, 1.05,
2.5, 2.3, 2.7, 2.4, 2.6, 2.5, 2.2, 2.8, 2.45, 2.55,
4.0, 3.8, 4.2, 3.9, 4.1, 4.0, 3.7, 4.3, 3.95, 4.05],
})
chart = plot_confidence_scatter(
df,
x_col="category",
y_col="value",
extent="stdev",
)
The extent parameter controls the error bar type:
extent |
Description |
|---|---|
"ci" (default) |
Bootstrap 95% confidence interval |
"stdev" |
±1 standard deviation |
"stderr" |
Standard error of the mean |
"iqr" |
Interquartile range (25th–75th percentile) |
!!! note
The default extent="ci" uses bootstrap resampling, which is non-deterministic.
Use extent="stdev" or extent="stderr" for reproducible output.
Numeric x-axis with custom labels
When x values are numeric (e.g., model capacity, regularization strength),
pass x_labels to display readable labels while keeping a quantitative axis
with proper spacing:
df = pl.DataFrame({
"x": [1.0] * 10 + [2.0] * 10 + [3.0] * 10,
"y": [1.0, 0.8, 1.2, 0.9, 1.1, 1.0, 0.7, 1.3, 0.95, 1.05,
2.5, 2.3, 2.7, 2.4, 2.6, 2.5, 2.2, 2.8, 2.45, 2.55,
4.0, 3.8, 4.2, 3.9, 4.1, 4.0, 3.7, 4.3, 3.95, 4.05],
})
chart = plot_confidence_scatter(
df,
x_labels={1.0: "Low", 2.0: "Medium", 3.0: "High"},
extent="stdev",
scale_type="log", # optional log scale on x
)
Reference
plotutils.uncertainty.plot_confidence_scatter(df, x_col='x', y_col='y', title='', width=600, height=400, x_title=None, y_title=None, point_color='steelblue', extent='ci', identity_line=False, identity_line_color='gray', zero=False, x_labels=None, scale_type='linear')
Create a scatter plot with error bars using Altair.
The function aggregates multiple y values per x category, computing mean and confidence intervals automatically.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
df
|
DataFrame
|
Polars DataFrame with raw data (multiple y values per x category). |
required |
x_col
|
str
|
Column for x-axis (categorical or numeric). |
'x'
|
y_col
|
str
|
Column for y values (will be aggregated per x category). |
'y'
|
title
|
str
|
Plot title. |
''
|
width
|
int
|
Chart dimensions. |
600
|
height
|
int
|
Chart dimensions. |
600
|
x_title
|
str or None
|
Axis titles (defaults to column names). |
None
|
y_title
|
str or None
|
Axis titles (defaults to column names). |
None
|
point_color
|
str
|
Color for points and error bars. |
'steelblue'
|
extent
|
str
|
Error bar extent: "ci" (95% CI), "stdev", "stderr", or "iqr". |
'ci'
|
identity_line
|
bool
|
If True, adds y = x identity line. |
False
|
identity_line_color
|
str
|
Color of the identity line. |
'gray'
|
zero
|
bool
|
If True, y-axis scale includes zero. |
False
|
x_labels
|
dict[float, str] or None
|
Mapping of numeric x values to custom labels (enables quantitative x-axis with labelled ticks). |
None
|
scale_type
|
str
|
Scale type for both axes: "linear" or "log". |
'linear'
|
Returns:
| Type | Description |
|---|---|
LayerChart
|
Altair layered chart with points and error bars. |
Source code in src/plotutils/uncertainty.py
87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 | |