UpSet Plot
UpSet plots visualize set intersections as a scalable alternative to Venn diagrams. They work well for 3–30 sets by replacing overlapping shapes with a matrix layout.
An UpSet plot has three components:
| Component | Position | Shows |
|---|---|---|
| Intersection bars | top | size (cardinality) of each intersection |
| Intersection matrix | bottom | which sets participate (dots + connecting lines) |
| Set size bars | left | total membership of each individual set |
Tip — always include the set-size bars (the default) so readers can judge intersection sizes relative to their parent sets.
Basic example
import polars as pl
from plotutils.upset import plot_upset
df = pl.DataFrame(
{
"Drama": [1, 1, 1, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 1, 0],
"Comedy": [0, 1, 0, 1, 1, 0, 1, 0, 1, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 1],
"Action": [0, 0, 1, 0, 1, 1, 1, 0, 0, 1, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0],
"Sci-Fi": [0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 1, 0, 0, 1, 0, 0, 0, 1, 0, 0],
}
)
chart = plot_upset(df, title="Movie genre overlaps")
Filtering and sorting
You can filter intersections by degree (number of participating sets) and
limit the total number shown. Sorting can be by "frequency" (default) or
"degree", and you can pass a list for multi-level sorting.
chart = plot_upset(
df,
sort_by=["degree", "frequency"],
min_degree=1,
n_intersections=10,
title="Degree ≥ 1, sorted by degree then frequency",
)
Without set sizes
Set show_set_sizes=False to hide the left-side bar chart and show only the
intersection bars and matrix.
When to use UpSet plots
Beyond the classic "which sets overlap?" question, UpSet plots are useful in many practical scenarios:
- Missing-data patterns — Encode each column as a set ("has missing value in column X"). The intersection bars immediately reveal which combinations of missing columns are most common, helping decide imputation strategies.
- Constraint / rule satisfaction — Each set represents a rule or constraint that a record satisfies. The plot shows how many records satisfy which combinations, making it easy to spot rules that rarely co-occur or always fire together.
- Feature co-occurrence — In NLP or product analytics, each set is a feature (tag, keyword, flag) present on an item. The plot highlights which feature bundles dominate.
- Multi-label classification — Each set is a predicted or true label. The plot shows how labels overlap across samples, exposing common multi-label patterns.
- Survey / questionnaire analysis — Each set is a response option selected by respondents (e.g. "Which tools do you use?"). The plot shows which tool combinations are most popular.
- Genomics / pathway membership — Genes can belong to multiple biological pathways. The plot reveals which pathway overlaps contain the most genes.
- Error / alert co-occurrence — In monitoring systems, each set is an alert type. The plot shows which alert combinations fire together, helping diagnose correlated failures.
Reference
plotutils.upset.plot_upset(df, set_cols=None, *, sort_by='frequency', sort_order=None, min_degree=0, max_degree=None, n_intersections=None, show_set_sizes=True, title='', width=600, height=300, bar_color='#4c78a8', dot_color='#333333', line_color='#333333')
Create an UpSet plot for visualizing set intersections.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
df
|
DataFrame
|
Binary membership matrix. Each row is an element; each column
listed in set_cols contains |
required |
set_cols
|
list[str] | None
|
Columns to treat as sets. When |
None
|
sort_by
|
str or list[str]
|
Sort key(s). Accepted values are |
'frequency'
|
sort_order
|
str, list[str], or None
|
Sort direction(s). A single string applies to all keys; a list
sets the direction per key. When |
None
|
min_degree
|
int
|
Hide intersections whose degree is below this threshold. |
0
|
max_degree
|
int | None
|
Hide intersections whose degree is above this threshold. |
None
|
n_intersections
|
int | None
|
Show only the first n intersections (after sorting). |
None
|
show_set_sizes
|
bool
|
If |
True
|
title
|
str
|
Chart title. |
''
|
width
|
int
|
Width of the cardinality bar chart (and matrix) in pixels. |
600
|
height
|
int
|
Height of the cardinality bar chart in pixels. |
300
|
bar_color
|
str
|
Fill colour for the cardinality and set-size bars. |
'#4c78a8'
|
dot_color
|
str
|
Fill colour for the active dots in the intersection matrix. |
'#333333'
|
line_color
|
str
|
Stroke colour for the connecting lines in the matrix. |
'#333333'
|
Returns:
| Type | Description |
|---|---|
VConcatChart | HConcatChart
|
Composited Altair chart. |
Source code in src/plotutils/upset.py
199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 | |