Prediction Errors

Visualize predicted vs. true values for regression models. An identity line (y = x) is always shown — points on the line are perfect predictions, points above overestimate, and points below underestimate.

Both axes share the same scale domain (with 2% padding), so the visual distance from the identity line is geometrically accurate.

Basic usage

Color and shape encoding

Use color_col and shape_col to distinguish groups — for example, different models or train/test splits:

Multi-panel comparison

Combine with hchart / vchart to compare across conditions with independent axes per panel. This is useful when different splits or datasets have very different value ranges:

Reference

`plotutils.uncertainty.plot_predictions_errors(df, true_col='true', pred_col='pred', title='', width=600, height=600, x_title=None, y_title=None, point_color='steelblue', color_col=None, shape_col=None, identity_line_color='gray', point_size=60, point_opacity=0.7)`

Create a prediction-vs-truth scatter plot with a y = x identity line.

Each point represents one observation. The x-axis shows the true (ground-truth) value and the y-axis shows the predicted value. Both axes share the same scale domain so the identity line is a true diagonal. Points on the identity line represent perfect predictions.

Parameters:

Name	Type	Description	Default
`df`	`DataFrame`	Polars DataFrame with at least true_col and pred_col columns.	required
`true_col`	`str`	Column for true (ground-truth) values (x-axis).	`'true'`
`pred_col`	`str`	Column for predicted values (y-axis).	`'pred'`
`title`	`str`	Plot title.	`''`
`width`	`int`	Chart dimensions. Default 600x600 (square) to reinforce equal axis scaling.	`600`
`height`	`int`	Chart dimensions. Default 600x600 (square) to reinforce equal axis scaling.	`600`
`x_title`	`str or None`	Axis titles (defaults to column names).	`None`
`y_title`	`str or None`	Axis titles (defaults to column names).	`None`
`point_color`	`str`	Fixed color for all points. Ignored when color_col is set.	`'steelblue'`
`color_col`	`str or None`	Column mapped to point color (nominal). When set, point_color is ignored and Altair picks a categorical palette automatically.	`None`
`shape_col`	`str or None`	Column mapped to point shape (nominal). When set, each unique value gets a distinct marker shape.	`None`
`identity_line_color`	`str`	Color of the y = x dashed identity line.	`'gray'`
`point_size`	`int`	Size (area) of the scatter points.	`60`
`point_opacity`	`float`	Opacity of the scatter points (0.0 -- 1.0).	`0.7`

Returns:

Type	Description
`LayerChart`	Altair layered chart with scatter points and identity line.

Source code in src/plotutils/uncertainty.py

def plot_predictions_errors(
    df: pl.DataFrame,
    true_col: str = "true",
    pred_col: str = "pred",
    title: str = "",
    width: int = 600,
    height: int = 600,
    x_title: str | None = None,
    y_title: str | None = None,
    point_color: str = "steelblue",
    color_col: str | None = None,
    shape_col: str | None = None,
    identity_line_color: str = "gray",
    point_size: int = 60,
    point_opacity: float = 0.7,
) -> alt.LayerChart:
    """Create a prediction-vs-truth scatter plot with a y = x identity line.

    Each point represents one observation. The x-axis shows the true
    (ground-truth) value and the y-axis shows the predicted value. Both
    axes share the same scale domain so the identity line is a true
    diagonal. Points on the identity line represent perfect predictions.

    Parameters
    ----------
    df : pl.DataFrame
        Polars DataFrame with at least *true_col* and *pred_col* columns.
    true_col : str
        Column for true (ground-truth) values (x-axis).
    pred_col : str
        Column for predicted values (y-axis).
    title : str
        Plot title.
    width, height : int
        Chart dimensions. Default 600x600 (square) to reinforce equal
        axis scaling.
    x_title, y_title : str or None
        Axis titles (defaults to column names).
    point_color : str
        Fixed color for all points. Ignored when *color_col* is set.
    color_col : str or None
        Column mapped to point color (nominal). When set, *point_color*
        is ignored and Altair picks a categorical palette automatically.
    shape_col : str or None
        Column mapped to point shape (nominal). When set, each unique
        value gets a distinct marker shape.
    identity_line_color : str
        Color of the y = x dashed identity line.
    point_size : int
        Size (area) of the scatter points.
    point_opacity : float
        Opacity of the scatter points (0.0 -- 1.0).

    Returns
    -------
    alt.LayerChart
        Altair layered chart with scatter points and identity line.
    """
    alt.data_transformers.disable_max_rows()

    x_title = x_title or true_col
    y_title = y_title or pred_col

    # Sort for deterministic SVG rendering
    sort_cols = [true_col, pred_col]
    if color_col is not None:
        sort_cols.append(color_col)
    if shape_col is not None and shape_col not in sort_cols:
        sort_cols.append(shape_col)
    df = df.sort(*sort_cols)

    # Shared scale: same domain for both axes, with 2% padding for visibility
    raw_min = float(
        min(df[true_col].min(), df[pred_col].min())  # type: ignore[arg-type]
    )
    raw_max = float(
        max(df[true_col].max(), df[pred_col].max())  # type: ignore[arg-type]
    )
    pad = (raw_max - raw_min) * 0.02
    global_min = raw_min - pad
    global_max = raw_max + pad
    shared_scale = alt.Scale(domain=[global_min, global_max])

    # Encodings
    x_enc = alt.X(f"{true_col}:Q", title=x_title, scale=shared_scale)
    y_enc = alt.Y(f"{pred_col}:Q", title=y_title, scale=shared_scale)

    encode_kwargs: dict = {"x": x_enc, "y": y_enc}

    if color_col is not None:
        encode_kwargs["color"] = alt.Color(f"{color_col}:N")

    if shape_col is not None:
        encode_kwargs["shape"] = alt.Shape(f"{shape_col}:N")

    tooltip_cols = [f"{true_col}:Q", f"{pred_col}:Q"]
    if color_col is not None:
        tooltip_cols.append(f"{color_col}:N")
    if shape_col is not None and shape_col != color_col:
        tooltip_cols.append(f"{shape_col}:N")
    encode_kwargs["tooltip"] = tooltip_cols

    # Build scatter layer
    mark_kwargs: dict = {
        "filled": True,
        "size": point_size,
        "opacity": point_opacity,
    }
    if color_col is None:
        mark_kwargs["color"] = point_color

    points = alt.Chart(df).mark_point(**mark_kwargs).encode(**encode_kwargs)

    # Identity line (always shown, behind points)
    identity_layer = _build_identity_line(
        global_min, global_max, color=identity_line_color, scale=shared_scale
    )

    layers: list[alt.Chart | alt.LayerChart] = [identity_layer, points]

    chart = alt.layer(*layers).properties(width=width, height=height)

    if title:
        chart = chart.properties(title=title)

    return chart.configure_axis(
        gridColor="gray", gridDash=[3, 3], gridOpacity=0.5
    ).configure_view(strokeWidth=0)

Basic usage

Color and shape encoding

Multi-panel comparison

Reference

plotutils.uncertainty.plot_predictions_errors(df, true_col='true', pred_col='pred', title='', width=600, height=600, x_title=None, y_title=None, point_color='steelblue', color_col=None, shape_col=None, identity_line_color='gray', point_size=60, point_opacity=0.7)

`plotutils.uncertainty.plot_predictions_errors(df, true_col='true', pred_col='pred', title='', width=600, height=600, x_title=None, y_title=None, point_color='steelblue', color_col=None, shape_col=None, identity_line_color='gray', point_size=60, point_opacity=0.7)`