Prediction errors

Visualize predicted vs. true values for regression models. An identity line (y = x) is always shown — points on the line are perfect predictions, points above overestimate, and points below underestimate.

Both axes share the same scale domain (with 2% padding), so the visual distance from the identity line is geometrically accurate.

Basic usage

import polars as pl
from plotutils.uncertainty import plot_predictions_errors

df = pl.DataFrame({
    "true": [1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0],
    "pred": [1.1, 2.2, 2.8, 4.1, 5.3, 5.9, 7.2, 7.8, 9.1, 10.3],
})

chart = plot_predictions_errors(df)

Color and shape encoding

Use color_col and shape_col to distinguish groups — for example, different models or train/test splits:

df = pl.DataFrame({
    "true":  [1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0],
    "pred":  [1.1, 2.2, 2.8, 4.1, 5.3, 5.9, 7.2, 7.8],
    "model": ["A", "A", "B", "B", "A", "A", "B", "B"],
    "split": ["train", "test", "train", "test", "train", "test", "train", "test"],
})

chart = plot_predictions_errors(df, color_col="model", shape_col="split")

Multi-panel comparison

Combine with hchart / vchart to compare across conditions with independent axes per panel. This is useful when different splits or datasets have very different value ranges:

from plotutils.concat import hchart

chart = hchart(
    column="split",
    row="group",
    df=df,
    func=plot_predictions_errors,
    color_col="model",
    shape_col="model",
)

See the Concatenation page for full examples.

Reference

plotutils.uncertainty.plot_predictions_errors(df, true_col='true', pred_col='pred', title='', width=600, height=600, x_title=None, y_title=None, point_color='steelblue', color_col=None, shape_col=None, identity_line_color='gray', point_size=60, point_opacity=0.7)

Create a prediction-vs-truth scatter plot with a y = x identity line.

Each point represents one observation. The x-axis shows the true (ground-truth) value and the y-axis shows the predicted value. Both axes share the same scale domain so the identity line is a true diagonal. Points on the identity line represent perfect predictions.

Parameters:

Name Type Description Default
df DataFrame

Polars DataFrame with at least true_col and pred_col columns.

required
true_col str

Column for true (ground-truth) values (x-axis).

'true'
pred_col str

Column for predicted values (y-axis).

'pred'
title str

Plot title.

''
width int

Chart dimensions. Default 600x600 (square) to reinforce equal axis scaling.

600
height int

Chart dimensions. Default 600x600 (square) to reinforce equal axis scaling.

600
x_title str or None

Axis titles (defaults to column names).

None
y_title str or None

Axis titles (defaults to column names).

None
point_color str

Fixed color for all points. Ignored when color_col is set.

'steelblue'
color_col str or None

Column mapped to point color (nominal). When set, point_color is ignored and Altair picks a categorical palette automatically.

None
shape_col str or None

Column mapped to point shape (nominal). When set, each unique value gets a distinct marker shape.

None
identity_line_color str

Color of the y = x dashed identity line.

'gray'
point_size int

Size (area) of the scatter points.

60
point_opacity float

Opacity of the scatter points (0.0 -- 1.0).

0.7

Returns:

Type Description
LayerChart

Altair layered chart with scatter points and identity line.

Source code in src/plotutils/uncertainty.py
def plot_predictions_errors(
    df: pl.DataFrame,
    true_col: str = "true",
    pred_col: str = "pred",
    title: str = "",
    width: int = 600,
    height: int = 600,
    x_title: str | None = None,
    y_title: str | None = None,
    point_color: str = "steelblue",
    color_col: str | None = None,
    shape_col: str | None = None,
    identity_line_color: str = "gray",
    point_size: int = 60,
    point_opacity: float = 0.7,
) -> alt.LayerChart:
    """Create a prediction-vs-truth scatter plot with a y = x identity line.

    Each point represents one observation. The x-axis shows the true
    (ground-truth) value and the y-axis shows the predicted value. Both
    axes share the same scale domain so the identity line is a true
    diagonal. Points on the identity line represent perfect predictions.

    Parameters
    ----------
    df : pl.DataFrame
        Polars DataFrame with at least *true_col* and *pred_col* columns.
    true_col : str
        Column for true (ground-truth) values (x-axis).
    pred_col : str
        Column for predicted values (y-axis).
    title : str
        Plot title.
    width, height : int
        Chart dimensions. Default 600x600 (square) to reinforce equal
        axis scaling.
    x_title, y_title : str or None
        Axis titles (defaults to column names).
    point_color : str
        Fixed color for all points. Ignored when *color_col* is set.
    color_col : str or None
        Column mapped to point color (nominal). When set, *point_color*
        is ignored and Altair picks a categorical palette automatically.
    shape_col : str or None
        Column mapped to point shape (nominal). When set, each unique
        value gets a distinct marker shape.
    identity_line_color : str
        Color of the y = x dashed identity line.
    point_size : int
        Size (area) of the scatter points.
    point_opacity : float
        Opacity of the scatter points (0.0 -- 1.0).

    Returns
    -------
    alt.LayerChart
        Altair layered chart with scatter points and identity line.
    """
    alt.data_transformers.disable_max_rows()

    x_title = x_title or true_col
    y_title = y_title or pred_col

    # Sort for deterministic SVG rendering
    sort_cols = [true_col, pred_col]
    if color_col is not None:
        sort_cols.append(color_col)
    if shape_col is not None and shape_col not in sort_cols:
        sort_cols.append(shape_col)
    df = df.sort(*sort_cols)

    # Shared scale: same domain for both axes, with 2% padding for visibility
    raw_min = float(
        min(df[true_col].min(), df[pred_col].min())  # type: ignore[arg-type]
    )
    raw_max = float(
        max(df[true_col].max(), df[pred_col].max())  # type: ignore[arg-type]
    )
    pad = (raw_max - raw_min) * 0.02
    global_min = raw_min - pad
    global_max = raw_max + pad
    shared_scale = alt.Scale(domain=[global_min, global_max])

    # Encodings
    x_enc = alt.X(f"{true_col}:Q", title=x_title, scale=shared_scale)
    y_enc = alt.Y(f"{pred_col}:Q", title=y_title, scale=shared_scale)

    encode_kwargs: dict = {"x": x_enc, "y": y_enc}

    if color_col is not None:
        encode_kwargs["color"] = alt.Color(f"{color_col}:N")

    if shape_col is not None:
        encode_kwargs["shape"] = alt.Shape(f"{shape_col}:N")

    tooltip_cols = [f"{true_col}:Q", f"{pred_col}:Q"]
    if color_col is not None:
        tooltip_cols.append(f"{color_col}:N")
    if shape_col is not None and shape_col != color_col:
        tooltip_cols.append(f"{shape_col}:N")
    encode_kwargs["tooltip"] = tooltip_cols

    # Build scatter layer
    mark_kwargs: dict = {
        "filled": True,
        "size": point_size,
        "opacity": point_opacity,
    }
    if color_col is None:
        mark_kwargs["color"] = point_color

    points = alt.Chart(df).mark_point(**mark_kwargs).encode(**encode_kwargs)

    # Identity line (always shown, behind points)
    identity_layer = _build_identity_line(
        global_min, global_max, color=identity_line_color, scale=shared_scale
    )

    layers: list[alt.Chart | alt.LayerChart] = [identity_layer, points]

    chart = alt.layer(*layers).properties(width=width, height=height)

    if title:
        chart = chart.properties(title=title)

    return chart.configure_axis(
        gridColor="gray", gridDash=[3, 3], gridOpacity=0.5
    ).configure_view(strokeWidth=0)