UpSet Plot

UpSet plots visualize set intersections as a scalable alternative to Venn diagrams. They work well for 3–30 sets by replacing overlapping shapes with a matrix layout.

An UpSet plot has three components:

Component	Position	Shows
Intersection bars	top	size (cardinality) of each intersection
Intersection matrix	bottom	which sets participate (dots + connecting lines)
Set size bars	left	total membership of each individual set

Tip — always include the set-size bars (the default) so readers can judge intersection sizes relative to their parent sets.

Basic example

import polars as pl
from plotutils.upset import plot_upset

df = pl.DataFrame(
    {
        "Drama": [1, 1, 1, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 1, 0],
        "Comedy": [0, 1, 0, 1, 1, 0, 1, 0, 1, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 1],
        "Action": [0, 0, 1, 0, 1, 1, 1, 0, 0, 1, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0],
        "Sci-Fi": [0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 1, 0, 0, 1, 0, 0, 0, 1, 0, 0],
    }
)
chart = plot_upset(df, title="Movie genre overlaps")

Filtering and sorting

You can filter intersections by degree (number of participating sets) and limit the total number shown. Sorting can be by "frequency" (default) or "degree", and you can pass a list for multi-level sorting.

chart = plot_upset(
    df,
    sort_by=["degree", "frequency"],
    min_degree=1,
    n_intersections=10,
    title="Degree ≥ 1, sorted by degree then frequency",
)

Without set sizes

Set show_set_sizes=False to hide the left-side bar chart and show only the intersection bars and matrix.

chart = plot_upset(df, show_set_sizes=False, title="No set-size bars")

When to use UpSet plots

Beyond the classic "which sets overlap?" question, UpSet plots are useful in many practical scenarios:

Missing-data patterns — Encode each column as a set ("has missing value in column X"). The intersection bars immediately reveal which combinations of missing columns are most common, helping decide imputation strategies.
Constraint / rule satisfaction — Each set represents a rule or constraint that a record satisfies. The plot shows how many records satisfy which combinations, making it easy to spot rules that rarely co-occur or always fire together.
Feature co-occurrence — In NLP or product analytics, each set is a feature (tag, keyword, flag) present on an item. The plot highlights which feature bundles dominate.
Multi-label classification — Each set is a predicted or true label. The plot shows how labels overlap across samples, exposing common multi-label patterns.
Survey / questionnaire analysis — Each set is a response option selected by respondents (e.g. "Which tools do you use?"). The plot shows which tool combinations are most popular.
Genomics / pathway membership — Genes can belong to multiple biological pathways. The plot reveals which pathway overlaps contain the most genes.
Error / alert co-occurrence — In monitoring systems, each set is an alert type. The plot shows which alert combinations fire together, helping diagnose correlated failures.

Reference

`plotutils.upset.plot_upset(df, set_cols=None, *, sort_by='frequency', sort_order=None, min_degree=0, max_degree=None, n_intersections=None, show_set_sizes=True, title='', width=600, height=300, bar_color='#4c78a8', dot_color='#333333', line_color='#333333')`

Create an UpSet plot for visualizing set intersections.

Parameters:

Name	Type	Description	Default
`df`	`DataFrame`	Binary membership matrix. Each row is an element; each column listed in set_cols contains `0` / `1` (or `bool`) indicating membership.	required
`set_cols`	`list[str] \| None`	Columns to treat as sets. When `None` every column in df is used.	`None`
`sort_by`	`str or list[str]`	Sort key(s). Accepted values are `"frequency"` (intersection size) and `"degree"` (number of participating sets). A list applies multi-level sorting, e.g. `["degree", "frequency"]` sorts primarily by degree, then by size within each degree.	`'frequency'`
`sort_order`	`str, list[str], or None`	Sort direction(s). A single string applies to all keys; a list sets the direction per key. When `None` (default), each key uses its natural default: `"descending"` for frequency (biggest first) and `"ascending"` for degree (simplest first).	`None`
`min_degree`	`int`	Hide intersections whose degree is below this threshold.	`0`
`max_degree`	`int \| None`	Hide intersections whose degree is above this threshold.	`None`
`n_intersections`	`int \| None`	Show only the first n intersections (after sorting).	`None`
`show_set_sizes`	`bool`	If `True` a horizontal bar chart of individual set sizes is displayed to the left of the matrix.	`True`
`title`	`str`	Chart title.	`''`
`width`	`int`	Width of the cardinality bar chart (and matrix) in pixels.	`600`
`height`	`int`	Height of the cardinality bar chart in pixels.	`300`
`bar_color`	`str`	Fill colour for the cardinality and set-size bars.	`'#4c78a8'`
`dot_color`	`str`	Fill colour for the active dots in the intersection matrix.	`'#333333'`
`line_color`	`str`	Stroke colour for the connecting lines in the matrix.	`'#333333'`

Returns:

Type	Description
`VConcatChart \| HConcatChart`	Composited Altair chart.

Source code in src/plotutils/upset.py

def plot_upset(
    df: pl.DataFrame,
    set_cols: list[str] | None = None,
    *,
    sort_by: _SORT_KEY | list[_SORT_KEY] = "frequency",
    sort_order: _SORT_DIR | list[_SORT_DIR] | None = None,
    min_degree: int = 0,
    max_degree: int | None = None,
    n_intersections: int | None = None,
    show_set_sizes: bool = True,
    title: str = "",
    width: int = 600,
    height: int = 300,
    bar_color: str = "#4c78a8",
    dot_color: str = "#333333",
    line_color: str = "#333333",
) -> alt.VConcatChart | alt.HConcatChart:
    """Create an UpSet plot for visualizing set intersections.

    Parameters
    ----------
    df : pl.DataFrame
        Binary membership matrix.  Each row is an element; each column
        listed in *set_cols* contains ``0`` / ``1`` (or ``bool``) indicating
        membership.
    set_cols : list[str] | None
        Columns to treat as sets.  When ``None`` every column in *df* is
        used.
    sort_by : str or list[str]
        Sort key(s).  Accepted values are ``"frequency"`` (intersection
        size) and ``"degree"`` (number of participating sets).  A list
        applies multi-level sorting, e.g. ``["degree", "frequency"]``
        sorts primarily by degree, then by size within each degree.
    sort_order : str, list[str], or None
        Sort direction(s).  A single string applies to all keys; a list
        sets the direction per key.  When ``None`` (default), each key
        uses its natural default: ``"descending"`` for frequency (biggest
        first) and ``"ascending"`` for degree (simplest first).
    min_degree : int
        Hide intersections whose degree is below this threshold.
    max_degree : int | None
        Hide intersections whose degree is above this threshold.
    n_intersections : int | None
        Show only the first *n* intersections (after sorting).
    show_set_sizes : bool
        If ``True`` a horizontal bar chart of individual set sizes is
        displayed to the left of the matrix.
    title : str
        Chart title.
    width : int
        Width of the cardinality bar chart (and matrix) in pixels.
    height : int
        Height of the cardinality bar chart in pixels.
    bar_color : str
        Fill colour for the cardinality and set-size bars.
    dot_color : str
        Fill colour for the active dots in the intersection matrix.
    line_color : str
        Stroke colour for the connecting lines in the matrix.

    Returns
    -------
    alt.VConcatChart | alt.HConcatChart
        Composited Altair chart.
    """
    alt.data_transformers.disable_max_rows()

    if set_cols is None:
        set_cols = df.columns

    data = _preprocess_upset(
        df, set_cols,
        sort_by=sort_by, sort_order=sort_order,
        min_degree=min_degree, max_degree=max_degree,
        n_intersections=n_intersections,
    )

    # -- shared hover selection --
    highlight = alt.selection_point(
        name="upset_highlight",
        on="pointerover",
        fields=["_intersection_id"],
        empty=False,
    )

    # -- shared encodings --
    # Explicit domain list guarantees sort order across vconcat sub-charts
    # (EncodingSortField can be unreliable when resolve_scale merges scales).
    x_domain = data.intersection_df.sort("_order")["_intersection_id"].to_list()
    n_sets = len(set_cols)
    ordered_set_names = data.set_sizes_df["set_name"].to_list()
    y_label_expr = (
        "{"
        + ", ".join(f"{i}: '{name}'" for i, name in enumerate(ordered_set_names))
        + "}[datum.value]"
    )
    matrix_height = max(n_sets * 30, 100)

    # ── cardinality bar chart (top) ───────────────────────────────────
    bar_chart = (
        alt.Chart(data.intersection_df)
        .mark_bar()
        .encode(
            x=alt.X("_intersection_id:N", sort=x_domain, axis=None),
            y=alt.Y("cardinality:Q", title="Intersection Size"),
            color=alt.condition(
                highlight, alt.value(bar_color), alt.value("#ddd"),
            ),
            tooltip=[
                alt.Tooltip("_sets_label:N", title="Sets"),
                alt.Tooltip("cardinality:Q", title="Size"),
                alt.Tooltip("degree:Q", title="Degree"),
                alt.Tooltip("_pct_label:N", title="% of set"),
            ],
        )
        .properties(width=width, height=height)
    )

    # ── intersection matrix (bottom) ──────────────────────────────────
    y_axis = alt.Axis(
        values=list(range(n_sets)),
        tickCount=n_sets,
        labelExpr=y_label_expr,
        title=None,
        grid=False,
    )
    y_scale = alt.Scale(domain=[-0.5, n_sets - 0.5])
    y_enc = alt.Y("_y_pos:Q", scale=y_scale, axis=y_axis)

    bg_dots = (
        alt.Chart(data.matrix_df)
        .mark_circle(size=80, color="#e0e0e0")
        .encode(
            x=alt.X("_intersection_id:N", sort=x_domain, axis=None),
            y=y_enc,
        )
    )

    active_dots = (
        alt.Chart(data.matrix_df.filter(pl.col("_member") == 1))
        .mark_circle(size=80)
        .encode(
            x=alt.X("_intersection_id:N", sort=x_domain, axis=None),
            y=y_enc,
            color=alt.condition(
                highlight, alt.value(dot_color), alt.value("#888"),
            ),
            tooltip=[
                alt.Tooltip("_sets_label:N", title="Sets"),
                alt.Tooltip("cardinality:Q", title="Size"),
                alt.Tooltip("degree:Q", title="Degree"),
                alt.Tooltip("_pct_label:N", title="% of set"),
            ],
        )
    )

    matrix_layers: list[alt.Chart] = [bg_dots, active_dots]

    if len(data.lines_df) > 0:
        connecting_lines = (
            alt.Chart(data.lines_df)
            .mark_rule(strokeWidth=2)
            .encode(
                x=alt.X("_intersection_id:N", sort=x_domain, axis=None),
                y=alt.Y(
                    "_y_min:Q",
                    scale=alt.Scale(domain=[-0.5, n_sets - 0.5]),
                ),
                y2=alt.Y2("_y_max:Q"),
                color=alt.condition(
                    highlight, alt.value(line_color), alt.value("#bbb"),
                ),
            )
        )
        # lines between background and active dots
        matrix_layers = [bg_dots, connecting_lines, active_dots]

    matrix_chart = (
        alt.layer(*matrix_layers)
        .add_params(highlight)
        .properties(width=width, height=matrix_height)
    )

    # ── assemble main column ──────────────────────────────────────────
    main = alt.vconcat(bar_chart, matrix_chart, spacing=0).resolve_scale(
        x="shared",
    )

    # ── optional set-size bars (left) ─────────────────────────────────
    if show_set_sizes:
        set_bar_width = 120
        bar_thickness = max(8, matrix_height // (n_sets + 1))
        # Use mark_bar oriented horizontally: x encodes the set size,
        # y is the categorical position (quantitative with custom labels).
        # The `size` param sets the bar thickness in pixels and `orient`
        # forces horizontal so the bar extends from x=0 to x=set_size.
        set_size_chart = (
            alt.Chart(data.set_sizes_df)
            .mark_bar(size=bar_thickness, orient="horizontal")
            .encode(
                x=alt.X(
                    "set_size:Q",
                    title="Set Size",
                    scale=alt.Scale(reverse=True),
                ),
                y=alt.Y("_y_pos:Q", scale=y_scale, axis=y_axis),
                color=alt.value(bar_color),
                tooltip=[
                    alt.Tooltip("set_name:N", title="Set"),
                    alt.Tooltip("set_size:Q", title="Size"),
                ],
            )
            .properties(width=set_bar_width, height=matrix_height)
        )
        # empty spacer above the set-size bars to align with the bar chart
        spacer = (
            alt.Chart(pl.DataFrame({"x": [0]}))
            .mark_point(opacity=0, size=0)
            .encode(x=alt.X("x:Q", axis=None), y=alt.Y("x:Q", axis=None))
            .properties(width=set_bar_width, height=height)
        )
        left = alt.vconcat(spacer, set_size_chart, spacing=0)
        chart = alt.hconcat(left, main, spacing=5)
    else:
        chart = main

    if title:
        chart = chart.properties(title=title)

    return chart.configure_axis(
        gridColor="gray", gridDash=[3, 3], gridOpacity=0.5,
    ).configure_view(strokeWidth=0)