Deviations

Visualize how individual measurements deviate from their per-group mean. This removes the "level" effect and focuses purely on scatter and variability, making it easy to compare consistency across conditions that have very different absolute values.

Basic usage

import polars as pl
from plotutils.uncertainty import plot_deviations

df = pl.DataFrame({
    "category": ["Low"] * 10 + ["Medium"] * 10 + ["High"] * 10,
    "value": [1.0, 0.8, 1.2, 0.9, 1.1, 1.0, 0.7, 1.3, 0.95, 1.05,
              2.5, 2.3, 2.7, 2.4, 2.6, 2.5, 2.2, 2.8, 2.45, 2.55,
              4.0, 3.8, 4.2, 3.9, 4.1, 4.0, 3.7, 4.3, 3.95, 4.05],
})

chart = plot_deviations(df, x_col="category", y_col="value")

Each point shows y - mean(y) for its group. The horizontal line at zero is the group mean reference.

Relative deviations

Use relative=True to express deviations as a fraction of the group mean — (y - mean) / mean. This is useful when comparing groups with very different magnitudes, since the y-axis becomes dimensionless:

chart = plot_deviations(df, x_col="category", y_col="value", relative=True)

Tolerance bands

Add symmetric reference lines with add_levels to mark acceptable deviation thresholds. For example, add_levels=[0.1, 0.2] draws lines at ±0.1 and ±0.2:

df = pl.DataFrame({
    "x": ["A"] * 10 + ["B"] * 10,
    "y": [1.0, 1.1, 0.9, 1.2, 0.8, 1.0, 1.1, 0.9, 1.05, 0.95,
          2.0, 2.1, 1.9, 2.2, 1.8, 2.0, 2.1, 1.9, 2.05, 1.95],
})

chart = plot_deviations(
    df,
    x_col="x",
    y_col="y",
    add_levels=[0.1, 0.2],
)

Reference

plotutils.uncertainty.plot_deviations(df, x_col, y_col, title='', relative=False, add_levels=None, x_labels=None, scale_type='linear')

Create a plot showing deviations of y values from their per-group mean.

Computes y - mean(y) per x group. When relative is True, computes (y - mean(y)) / mean(y) instead. A horizontal line at 0 is always drawn. Additional symmetric level lines (e.g. tolerance bands) can be added via add_levels.

Parameters:

Name Type Description Default
df DataFrame

Raw data with multiple y values per x category.

required
x_col str

Column for x-axis (categorical or numeric).

required
y_col str

Column for y values.

required
title str

Plot title.

''
relative bool

If True, deviations are divided by the group mean.

False
add_levels list[float] or None

Extra horizontal levels drawn symmetrically at +level and -level.

None
x_labels dict[float, str] or None

Mapping of numeric x values to custom labels (enables quantitative x-axis with labelled ticks).

None
scale_type str

Scale type for the x-axis: "linear" or "log".

'linear'
Source code in src/plotutils/uncertainty.py
def plot_deviations(
    df: pl.DataFrame,
    x_col: str,
    y_col: str,
    title: str = "",
    relative: bool = False,
    add_levels: list[float] | None = None,
    x_labels: dict[float, str] | None = None,
    scale_type: Literal["linear", "log"] = "linear",
) -> alt.LayerChart:
    """Create a plot showing deviations of y values from their per-group mean.

    Computes ``y - mean(y)`` per x group. When *relative* is True, computes
    ``(y - mean(y)) / mean(y)`` instead. A horizontal line at 0 is always
    drawn. Additional symmetric level lines (e.g. tolerance bands) can be
    added via *add_levels*.

    Parameters
    ----------
    df : pl.DataFrame
        Raw data with multiple y values per x category.
    x_col : str
        Column for x-axis (categorical or numeric).
    y_col : str
        Column for y values.
    title : str
        Plot title.
    relative : bool
        If True, deviations are divided by the group mean.
    add_levels : list[float] or None
        Extra horizontal levels drawn symmetrically at +level and -level.
    x_labels : dict[float, str] or None
        Mapping of numeric x values to custom labels (enables quantitative
        x-axis with labelled ticks).
    scale_type : str
        Scale type for the x-axis: "linear" or "log".
    """
    alt.data_transformers.disable_max_rows()

    # Compute deviations from per-group mean
    dev_col = "deviation"
    df_dev = df.with_columns(
        (
            (pl.col(y_col) - pl.col(y_col).mean().over(x_col))
            / (pl.col(y_col).mean().over(x_col) if relative else 1)
        ).alias(dev_col)
    )

    # Sort for deterministic SVG rendering (vl-convert renders data in input order)
    df_dev = df_dev.sort(x_col, dev_col)

    y_title = dev_col if not relative else "relative deviation"

    _scale = alt.Scale(type=scale_type)
    x_enc = _build_x_encodings(x_col, x_col, x_labels, _scale)

    points = (
        alt.Chart(df_dev)
        .mark_point(filled=True, color="steelblue")
        .encode(
            x=x_enc.full,
            y=alt.Y(f"{dev_col}:Q", title=y_title),
        )
    )

    # Horizontal zero line
    zero_line = (
        alt.Chart(pl.DataFrame({"y": [0.0]}))
        .mark_rule(color="red", strokeDash=[5, 5])
        .encode(y="y:Q")
    )

    layers: list[alt.Chart | alt.LayerChart] = [points, zero_line]

    # Optional symmetric level lines
    if add_levels:
        level_values = [v for lv in add_levels for v in (lv, -lv)]
        level_df = pl.DataFrame({"y": level_values})
        level_lines = (
            alt.Chart(level_df)
            .mark_rule(color="black", strokeDash=[3, 3])
            .encode(y="y:Q")
        )
        layers.append(level_lines)

    chart = alt.layer(*layers).properties(width=600, height=400)

    if title:
        chart = chart.properties(title=title)

    return chart.configure_axis(
        gridColor="gray", gridDash=[3, 3], gridOpacity=0.5
    ).configure_view(strokeWidth=0)