Error bars

Visualize repeated measurements across categories with automatic aggregation and configurable error bars.

Each x category can contain multiple y values. The chart shows the mean as a point and the spread as error bars, computed by Altair's native mark_errorbar.

Example

import polars as pl
from plotutils.uncertainty import plot_confidence_scatter

df = pl.DataFrame({
    "category": ["Low"] * 10 + ["Medium"] * 10 + ["High"] * 10,
    "value": [1.0, 0.8, 1.2, 0.9, 1.1, 1.0, 0.7, 1.3, 0.95, 1.05,
              2.5, 2.3, 2.7, 2.4, 2.6, 2.5, 2.2, 2.8, 2.45, 2.55,
              4.0, 3.8, 4.2, 3.9, 4.1, 4.0, 3.7, 4.3, 3.95, 4.05],
})

chart = plot_confidence_scatter(
    df,
    x_col="category",
    y_col="value",
    extent="stdev",
)

The extent parameter controls the error bar type:

`extent`	Description
`"ci"` (default)	Bootstrap 95% confidence interval
`"stdev"`	±1 standard deviation
`"stderr"`	Standard error of the mean
`"iqr"`	Interquartile range (25th–75th percentile)

!!! note The default extent="ci" uses bootstrap resampling, which is non-deterministic. Use extent="stdev" or extent="stderr" for reproducible output.

Numeric x-axis with custom labels

When x values are numeric (e.g., model capacity, regularization strength), pass x_labels to display readable labels while keeping a quantitative axis with proper spacing:

df = pl.DataFrame({
    "x": [1.0] * 10 + [2.0] * 10 + [3.0] * 10,
    "y": [1.0, 0.8, 1.2, 0.9, 1.1, 1.0, 0.7, 1.3, 0.95, 1.05,
          2.5, 2.3, 2.7, 2.4, 2.6, 2.5, 2.2, 2.8, 2.45, 2.55,
          4.0, 3.8, 4.2, 3.9, 4.1, 4.0, 3.7, 4.3, 3.95, 4.05],
})

chart = plot_confidence_scatter(
    df,
    x_labels={1.0: "Low", 2.0: "Medium", 3.0: "High"},
    extent="stdev",
    scale_type="log",  # optional log scale on x
)

Reference

`plotutils.uncertainty.plot_confidence_scatter(df, x_col='x', y_col='y', title='', width=600, height=400, x_title=None, y_title=None, point_color='steelblue', extent='ci', identity_line=False, identity_line_color='gray', zero=False, x_labels=None, scale_type='linear')`

Create a scatter plot with error bars using Altair.

The function aggregates multiple y values per x category, computing mean and confidence intervals automatically.

Parameters:

Name	Type	Description	Default
`df`	`DataFrame`	Polars DataFrame with raw data (multiple y values per x category).	required
`x_col`	`str`	Column for x-axis (categorical or numeric).	`'x'`
`y_col`	`str`	Column for y values (will be aggregated per x category).	`'y'`
`title`	`str`	Plot title.	`''`
`width`	`int`	Chart dimensions.	`600`
`height`	`int`	Chart dimensions.	`600`
`x_title`	`str or None`	Axis titles (defaults to column names).	`None`
`y_title`	`str or None`	Axis titles (defaults to column names).	`None`
`point_color`	`str`	Color for points and error bars.	`'steelblue'`
`extent`	`str`	Error bar extent: "ci" (95% CI), "stdev", "stderr", or "iqr".	`'ci'`
`identity_line`	`bool`	If True, adds y = x identity line.	`False`
`identity_line_color`	`str`	Color of the identity line.	`'gray'`
`zero`	`bool`	If True, y-axis scale includes zero.	`False`
`x_labels`	`dict[float, str] or None`	Mapping of numeric x values to custom labels (enables quantitative x-axis with labelled ticks).	`None`
`scale_type`	`str`	Scale type for both axes: "linear" or "log".	`'linear'`

Returns:

Type	Description
`LayerChart`	Altair layered chart with points and error bars.

Source code in src/plotutils/uncertainty.py

def plot_confidence_scatter(
    df: pl.DataFrame,
    x_col: str = "x",
    y_col: str = "y",
    title: str = "",
    width: int = 600,
    height: int = 400,
    x_title: str | None = None,
    y_title: str | None = None,
    point_color: str = "steelblue",
    extent: ErrorBarExtent = "ci",
    identity_line: bool = False,
    identity_line_color: str = "gray",
    zero: bool = False,
    x_labels: dict[float, str] | None = None,
    scale_type: Literal["linear", "log"] = "linear",
) -> alt.LayerChart:
    """Create a scatter plot with error bars using Altair.

    The function aggregates multiple y values per x category, computing
    mean and confidence intervals automatically.

    Parameters
    ----------
    df : pl.DataFrame
        Polars DataFrame with raw data (multiple y values per x category).
    x_col : str
        Column for x-axis (categorical or numeric).
    y_col : str
        Column for y values (will be aggregated per x category).
    title : str
        Plot title.
    width, height : int
        Chart dimensions.
    x_title, y_title : str or None
        Axis titles (defaults to column names).
    point_color : str
        Color for points and error bars.
    extent : str
        Error bar extent: "ci" (95% CI), "stdev", "stderr", or "iqr".
    identity_line : bool
        If True, adds y = x identity line.
    identity_line_color : str
        Color of the identity line.
    zero : bool
        If True, y-axis scale includes zero.
    x_labels : dict[float, str] or None
        Mapping of numeric x values to custom labels (enables quantitative
        x-axis with labelled ticks).
    scale_type : str
        Scale type for both axes: "linear" or "log".

    Returns
    -------
    alt.LayerChart
        Altair layered chart with points and error bars.
    """
    alt.data_transformers.disable_max_rows()

    x_title = x_title or x_col
    y_title = y_title or y_col

    # Sort for deterministic SVG rendering (vl-convert renders data in input order)
    df = df.sort(x_col, y_col)

    _scale = alt.Scale(zero=zero, type=scale_type)
    x_enc = _build_x_encodings(x_col, x_title, x_labels, _scale)

    # Error bars with aggregation
    error_bars = (
        alt.Chart(df)
        .mark_errorbar(extent=extent, color=point_color)
        .encode(
            x=x_enc.full,
            y=alt.Y(f"{y_col}:Q", title=y_title, scale=_scale),
        )
    )

    # Points showing mean
    points = (
        alt.Chart(df)
        .mark_point(filled=True, color=point_color)
        .encode(
            x=x_enc.simple,
            y=alt.Y(f"mean({y_col}):Q", scale=_scale),
        )
    )

    layers = [error_bars, points]

    # Identity line
    if identity_line:
        y_min = float(df[y_col].min())  # type: ignore[arg-type]
        y_max = float(df[y_col].max())  # type: ignore[arg-type]
        layers.insert(
            0, _build_identity_line(y_min, y_max, color=identity_line_color, scale=_scale)
        )

    chart = alt.layer(*layers).properties(width=width, height=height)

    if title:
        chart = chart.properties(title=title)

    return chart.configure_axis(
        gridColor="gray", gridDash=[3, 3], gridOpacity=0.5
    ).configure_view(strokeWidth=0)