UpSet Plot

UpSet plots visualize set intersections as a scalable alternative to Venn diagrams. They work well for 3–30 sets by replacing overlapping shapes with a matrix layout.

An UpSet plot has three components:

Component Position Shows
Intersection bars top size (cardinality) of each intersection
Intersection matrix bottom which sets participate (dots + connecting lines)
Set size bars left total membership of each individual set

Tip — always include the set-size bars (the default) so readers can judge intersection sizes relative to their parent sets.

Basic example

import polars as pl
from plotutils.upset import plot_upset

df = pl.DataFrame(
    {
        "Drama": [1, 1, 1, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 1, 0],
        "Comedy": [0, 1, 0, 1, 1, 0, 1, 0, 1, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 1],
        "Action": [0, 0, 1, 0, 1, 1, 1, 0, 0, 1, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0],
        "Sci-Fi": [0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 1, 0, 0, 1, 0, 0, 0, 1, 0, 0],
    }
)
chart = plot_upset(df, title="Movie genre overlaps")

Filtering and sorting

You can filter intersections by degree (number of participating sets) and limit the total number shown. Sorting can be by "frequency" (default) or "degree", and you can pass a list for multi-level sorting.

chart = plot_upset(
    df,
    sort_by=["degree", "frequency"],
    min_degree=1,
    n_intersections=10,
    title="Degree ≥ 1, sorted by degree then frequency",
)

Without set sizes

Set show_set_sizes=False to hide the left-side bar chart and show only the intersection bars and matrix.

chart = plot_upset(df, show_set_sizes=False, title="No set-size bars")

When to use UpSet plots

Beyond the classic "which sets overlap?" question, UpSet plots are useful in many practical scenarios:

Reference

plotutils.upset.plot_upset(df, set_cols=None, *, sort_by='frequency', sort_order=None, min_degree=0, max_degree=None, n_intersections=None, show_set_sizes=True, title='', width=600, height=300, bar_color='#4c78a8', dot_color='#333333', line_color='#333333')

Create an UpSet plot for visualizing set intersections.

Parameters:

Name Type Description Default
df DataFrame

Binary membership matrix. Each row is an element; each column listed in set_cols contains 0 / 1 (or bool) indicating membership.

required
set_cols list[str] | None

Columns to treat as sets. When None every column in df is used.

None
sort_by str or list[str]

Sort key(s). Accepted values are "frequency" (intersection size) and "degree" (number of participating sets). A list applies multi-level sorting, e.g. ["degree", "frequency"] sorts primarily by degree, then by size within each degree.

'frequency'
sort_order str, list[str], or None

Sort direction(s). A single string applies to all keys; a list sets the direction per key. When None (default), each key uses its natural default: "descending" for frequency (biggest first) and "ascending" for degree (simplest first).

None
min_degree int

Hide intersections whose degree is below this threshold.

0
max_degree int | None

Hide intersections whose degree is above this threshold.

None
n_intersections int | None

Show only the first n intersections (after sorting).

None
show_set_sizes bool

If True a horizontal bar chart of individual set sizes is displayed to the left of the matrix.

True
title str

Chart title.

''
width int

Width of the cardinality bar chart (and matrix) in pixels.

600
height int

Height of the cardinality bar chart in pixels.

300
bar_color str

Fill colour for the cardinality and set-size bars.

'#4c78a8'
dot_color str

Fill colour for the active dots in the intersection matrix.

'#333333'
line_color str

Stroke colour for the connecting lines in the matrix.

'#333333'

Returns:

Type Description
VConcatChart | HConcatChart

Composited Altair chart.

Source code in src/plotutils/upset.py
def plot_upset(
    df: pl.DataFrame,
    set_cols: list[str] | None = None,
    *,
    sort_by: _SORT_KEY | list[_SORT_KEY] = "frequency",
    sort_order: _SORT_DIR | list[_SORT_DIR] | None = None,
    min_degree: int = 0,
    max_degree: int | None = None,
    n_intersections: int | None = None,
    show_set_sizes: bool = True,
    title: str = "",
    width: int = 600,
    height: int = 300,
    bar_color: str = "#4c78a8",
    dot_color: str = "#333333",
    line_color: str = "#333333",
) -> alt.VConcatChart | alt.HConcatChart:
    """Create an UpSet plot for visualizing set intersections.

    Parameters
    ----------
    df : pl.DataFrame
        Binary membership matrix.  Each row is an element; each column
        listed in *set_cols* contains ``0`` / ``1`` (or ``bool``) indicating
        membership.
    set_cols : list[str] | None
        Columns to treat as sets.  When ``None`` every column in *df* is
        used.
    sort_by : str or list[str]
        Sort key(s).  Accepted values are ``"frequency"`` (intersection
        size) and ``"degree"`` (number of participating sets).  A list
        applies multi-level sorting, e.g. ``["degree", "frequency"]``
        sorts primarily by degree, then by size within each degree.
    sort_order : str, list[str], or None
        Sort direction(s).  A single string applies to all keys; a list
        sets the direction per key.  When ``None`` (default), each key
        uses its natural default: ``"descending"`` for frequency (biggest
        first) and ``"ascending"`` for degree (simplest first).
    min_degree : int
        Hide intersections whose degree is below this threshold.
    max_degree : int | None
        Hide intersections whose degree is above this threshold.
    n_intersections : int | None
        Show only the first *n* intersections (after sorting).
    show_set_sizes : bool
        If ``True`` a horizontal bar chart of individual set sizes is
        displayed to the left of the matrix.
    title : str
        Chart title.
    width : int
        Width of the cardinality bar chart (and matrix) in pixels.
    height : int
        Height of the cardinality bar chart in pixels.
    bar_color : str
        Fill colour for the cardinality and set-size bars.
    dot_color : str
        Fill colour for the active dots in the intersection matrix.
    line_color : str
        Stroke colour for the connecting lines in the matrix.

    Returns
    -------
    alt.VConcatChart | alt.HConcatChart
        Composited Altair chart.
    """
    alt.data_transformers.disable_max_rows()

    if set_cols is None:
        set_cols = df.columns

    data = _preprocess_upset(
        df, set_cols,
        sort_by=sort_by, sort_order=sort_order,
        min_degree=min_degree, max_degree=max_degree,
        n_intersections=n_intersections,
    )

    # -- shared hover selection --
    highlight = alt.selection_point(
        name="upset_highlight",
        on="pointerover",
        fields=["_intersection_id"],
        empty=False,
    )

    # -- shared encodings --
    # Explicit domain list guarantees sort order across vconcat sub-charts
    # (EncodingSortField can be unreliable when resolve_scale merges scales).
    x_domain = data.intersection_df.sort("_order")["_intersection_id"].to_list()
    n_sets = len(set_cols)
    ordered_set_names = data.set_sizes_df["set_name"].to_list()
    y_label_expr = (
        "{"
        + ", ".join(f"{i}: '{name}'" for i, name in enumerate(ordered_set_names))
        + "}[datum.value]"
    )
    matrix_height = max(n_sets * 30, 100)

    # ── cardinality bar chart (top) ───────────────────────────────────
    bar_chart = (
        alt.Chart(data.intersection_df)
        .mark_bar()
        .encode(
            x=alt.X("_intersection_id:N", sort=x_domain, axis=None),
            y=alt.Y("cardinality:Q", title="Intersection Size"),
            color=alt.condition(
                highlight, alt.value(bar_color), alt.value("#ddd"),
            ),
            tooltip=[
                alt.Tooltip("_sets_label:N", title="Sets"),
                alt.Tooltip("cardinality:Q", title="Size"),
                alt.Tooltip("degree:Q", title="Degree"),
                alt.Tooltip("_pct_label:N", title="% of set"),
            ],
        )
        .properties(width=width, height=height)
    )

    # ── intersection matrix (bottom) ──────────────────────────────────
    y_axis = alt.Axis(
        values=list(range(n_sets)),
        tickCount=n_sets,
        labelExpr=y_label_expr,
        title=None,
        grid=False,
    )
    y_scale = alt.Scale(domain=[-0.5, n_sets - 0.5])
    y_enc = alt.Y("_y_pos:Q", scale=y_scale, axis=y_axis)

    bg_dots = (
        alt.Chart(data.matrix_df)
        .mark_circle(size=80, color="#e0e0e0")
        .encode(
            x=alt.X("_intersection_id:N", sort=x_domain, axis=None),
            y=y_enc,
        )
    )

    active_dots = (
        alt.Chart(data.matrix_df.filter(pl.col("_member") == 1))
        .mark_circle(size=80)
        .encode(
            x=alt.X("_intersection_id:N", sort=x_domain, axis=None),
            y=y_enc,
            color=alt.condition(
                highlight, alt.value(dot_color), alt.value("#888"),
            ),
            tooltip=[
                alt.Tooltip("_sets_label:N", title="Sets"),
                alt.Tooltip("cardinality:Q", title="Size"),
                alt.Tooltip("degree:Q", title="Degree"),
                alt.Tooltip("_pct_label:N", title="% of set"),
            ],
        )
    )

    matrix_layers: list[alt.Chart] = [bg_dots, active_dots]

    if len(data.lines_df) > 0:
        connecting_lines = (
            alt.Chart(data.lines_df)
            .mark_rule(strokeWidth=2)
            .encode(
                x=alt.X("_intersection_id:N", sort=x_domain, axis=None),
                y=alt.Y(
                    "_y_min:Q",
                    scale=alt.Scale(domain=[-0.5, n_sets - 0.5]),
                ),
                y2=alt.Y2("_y_max:Q"),
                color=alt.condition(
                    highlight, alt.value(line_color), alt.value("#bbb"),
                ),
            )
        )
        # lines between background and active dots
        matrix_layers = [bg_dots, connecting_lines, active_dots]

    matrix_chart = (
        alt.layer(*matrix_layers)
        .add_params(highlight)
        .properties(width=width, height=matrix_height)
    )

    # ── assemble main column ──────────────────────────────────────────
    main = alt.vconcat(bar_chart, matrix_chart, spacing=0).resolve_scale(
        x="shared",
    )

    # ── optional set-size bars (left) ─────────────────────────────────
    if show_set_sizes:
        set_bar_width = 120
        bar_thickness = max(8, matrix_height // (n_sets + 1))
        # Use mark_bar oriented horizontally: x encodes the set size,
        # y is the categorical position (quantitative with custom labels).
        # The `size` param sets the bar thickness in pixels and `orient`
        # forces horizontal so the bar extends from x=0 to x=set_size.
        set_size_chart = (
            alt.Chart(data.set_sizes_df)
            .mark_bar(size=bar_thickness, orient="horizontal")
            .encode(
                x=alt.X(
                    "set_size:Q",
                    title="Set Size",
                    scale=alt.Scale(reverse=True),
                ),
                y=alt.Y("_y_pos:Q", scale=y_scale, axis=y_axis),
                color=alt.value(bar_color),
                tooltip=[
                    alt.Tooltip("set_name:N", title="Set"),
                    alt.Tooltip("set_size:Q", title="Size"),
                ],
            )
            .properties(width=set_bar_width, height=matrix_height)
        )
        # empty spacer above the set-size bars to align with the bar chart
        spacer = (
            alt.Chart(pl.DataFrame({"x": [0]}))
            .mark_point(opacity=0, size=0)
            .encode(x=alt.X("x:Q", axis=None), y=alt.Y("x:Q", axis=None))
            .properties(width=set_bar_width, height=height)
        )
        left = alt.vconcat(spacer, set_size_chart, spacing=0)
        chart = alt.hconcat(left, main, spacing=5)
    else:
        chart = main

    if title:
        chart = chart.properties(title=title)

    return chart.configure_axis(
        gridColor="gray", gridDash=[3, 3], gridOpacity=0.5,
    ).configure_view(strokeWidth=0)