Skip to content

Make it easy to use categorical bar charts with linked selections #6797

@MarcSkovMadsen

Description

@MarcSkovMadsen

Add built-in categorical_agg operation for link_selections with categorical Bars

Problem

HoloViews' link_selections is a powerful cross-filtering mechanism, but it only works out of the box with numeric dimensions via hv.operation.histogram. There is no built-in operation for categorical dimensions. Every natural approach a user tries fails:

Failure 1 — histogram() on a categorical dimension

from holoviews.operation import histogram
hist_cat = histogram(points, dimension='category')
ValueError: Categorical data found. Cannot create histogram from categorical data.

histogram explicitly rejects non-numeric data, so it cannot be repurposed for categorical counts.

Failure 2 — Pre-aggregated hv.Bars with link_selections

bars_agg = hv.Bars(
    df.groupby('category').size().reset_index(name='count'),
    kdims='category', vdims='count',
)
layout = ls(points) + ls(bars_agg)

The layout renders initially, but the moment a selection is made (lasso, box-select, or programmatic expression), it raises:

CallbackError: linked_selection aborted because it could not display selection
for all elements: One or more dimensions in the expression dim('x')>0 could not
resolve on ':Dataset   [category]   (count)' Ensure all dimensions referenced by
the expression are present on the supplied object on ':Bars   [category]   (count)'.

Because bars_agg was constructed from an independent DataFrame, it has no lineage back to points and cannot resolve the cross-filter expression.

Failure 3 — Raw hv.Bars from unaggregated data

bars_raw = hv.Bars(df, kdims=['category'])
print(bars_raw.kdims)  # [Dimension('category')]
print(bars_raw.vdims)  # [Dimension('x'), Dimension('y')]

No exception is raised, but the result is silently wrong. HoloViews auto-detects the remaining DataFrame columns (x, y) as vdims, producing 100 individual bars rather than a 4-bar categorical count chart. The element technically renders with link_selections, but the visualization is meaningless — it is not a count (or any aggregation) over categories.


Minimum reproducible example

Self-contained script demonstrating all three failures (HoloViews 1.22.1, Python 3.12):

import holoviews as hv
import numpy as np
import pandas as pd

hv.extension('bokeh')

# --- Sample data ---
rng = np.random.default_rng(42)
df = pd.DataFrame({
    'category': rng.choice(['A', 'B', 'C', 'D'], 100),
    'x': rng.normal(size=100),
    'y': rng.normal(size=100),
})

ls = hv.link_selections.instance()
points = hv.Points(df, kdims=['x', 'y'], vdims=['category'])

# --- Failure 1: histogram on categorical dimension ---
from holoviews.operation import histogram

try:
    hist_cat = histogram(points, dimension='category')
    print('Attempt 1 succeeded (unexpected)')
except Exception as e:
    print(f'Attempt 1 — histogram on categorical:\n  {type(e).__name__}: {e}\n')

# --- Failure 2: Pre-aggregated Bars with link_selections ---
bars_agg = hv.Bars(
    df.groupby('category').size().reset_index(name='count'),
    kdims='category', vdims='count',
)
try:
    layout = ls(points) + ls(bars_agg)
    # Trigger a selection to expose the error
    from holoviews.util.transform import dim
    ls.selection_expr = (dim('x') > 0)
    hv.render(layout)
    print('Attempt 2 succeeded (unexpected)')
except Exception as e:
    print(f'Attempt 2 — Pre-aggregated Bars:\n  {type(e).__name__}: {e}\n')

# --- Failure 3: Raw Bars from unaggregated data (silent wrong behavior) ---
bars_raw = hv.Bars(df, kdims=['category'])
print(f'Attempt 3 — Raw Bars auto-detected vdims: {bars_raw.vdims}')
print(f'  Expected 4 category-count bars, got {len(bars_raw)} individual rows.')
print('  No exception, but the chart is meaningless — not a categorical aggregation.\n')

Proposed solution — a built-in categorical_agg operation

Add a new operation to holoviews.operation (e.g. categorical_agg) that generalises what histogram does for numeric dimensions to categorical dimensions, supporting arbitrary aggregation functions — not just counting.

Suggested API

from holoviews.operation import categorical_agg

# Count occurrences per category (default)
bars_count = categorical_agg(points, dimension='category')

# Sum a numeric value dimension per category
bars_sum = categorical_agg(points, dimension='category', value_dimension='y', function=np.sum)

# Mean
bars_mean = categorical_agg(points, dimension='category', value_dimension='y', function=np.mean)

# Standard deviation
bars_std = categorical_agg(points, dimension='category', value_dimension='y', function=np.std)

# Min / Max
bars_min = categorical_agg(points, dimension='category', value_dimension='y', function=np.min)
bars_max = categorical_agg(points, dimension='category', value_dimension='y', function=np.max)

# All work seamlessly with link_selections
ls = hv.link_selections.instance()
layout = ls(points) + ls(bars_count) + ls(bars_mean)

Key parameters

Parameter Type Default Description
dimension str (required) Categorical dimension to group by
value_dimension str or None None Numeric dimension to aggregate. When None, counts rows per category.
function callable np.size Aggregation function applied to value_dimension (or row count when value_dimension is None). Any function accepting an array and returning a scalar: np.sum, np.mean, np.std, np.min, np.max, or a custom callable.
label str or None None Label for the value axis. Auto-generated from function name if None (e.g. "mean(y)").

Requirements

  • Must be a holoviews.core.Operation subclass so it preserves data lineagelink_selections recurses into the operation's source element to resolve cross-filter expressions, then re-runs the operation on the filtered subset.
  • Returns hv.Bars with kdims=[dimension] and vdims=[label].
  • Should handle edge cases: empty selections (return zero-height bars for all categories), missing categories in a filtered subset (fill with 0 or NaN as appropriate for the aggregation).

Working workaround

Until a built-in operation exists, this custom Operation subclass achieves the same result. The pattern comes from a real-world energy trading dashboard:

import param
import numpy as np
import holoviews as hv
from holoviews.core import Operation

class categorical_agg(Operation):
    """Aggregate a categorical dimension, returning Bars.

    Preserves data lineage back to the source element so that
    link_selections can resolve all source dimensions during cross-filtering.
    """

    dimension = param.String(doc="Categorical dimension to group by")
    value_dimension = param.String(default=None, allow_None=True,
        doc="Numeric dimension to aggregate. None means count rows.")
    function = param.Callable(default=np.size,
        doc="Aggregation function (np.sum, np.mean, np.std, np.min, np.max, ...)")
    label = param.String(default=None, allow_None=True,
        doc="Label for the value axis. Auto-generated if None.")

    def _process(self, element, key=None):
        cat_vals = element.dimension_values(self.p.dimension, expanded=True)
        unique_cats = np.unique(cat_vals)

        if self.p.value_dimension is None:
            # Pure count
            _, counts = np.unique(cat_vals, return_counts=True)
            agg_label = self.p.label or "Count"
            data = list(zip(unique_cats, counts))
        else:
            num_vals = element.dimension_values(self.p.value_dimension, expanded=True)
            results = []
            for cat in unique_cats:
                mask = cat_vals == cat
                results.append(self.p.function(num_vals[mask]))
            func_name = getattr(self.p.function, '__name__', 'agg')
            agg_label = self.p.label or f"{func_name}({self.p.value_dimension})"
            data = list(zip(unique_cats, results))

        return hv.Bars(data, kdims=[self.p.dimension], vdims=[agg_label])
Full working example with cross-filteringimport holoviews as hv import numpy as np import param from holoviews.core import Operation class categorical_agg(Operation): """Aggregate a categorical dimension, returning Bars. Preserves data lineage back to the source element so that link_selections can resolve all source dimensions during cross-filtering. """ dimension = param.String(doc="Categorical dimension to group by") value_dimension = param.String( default=None, allow_None=True, doc="Numeric dimension to aggregate. None means count rows.", ) function = param.Callable( default=np.size, doc="Aggregation function (np.sum, np.mean, np.std, np.min, np.max, ...)", ) label = param.String( default=None, allow_None=True, doc="Label for the value axis. Auto-generated if None.", ) def _process(self, element, key=None): cat_vals = element.dimension_values(self.p.dimension, expanded=True) unique_cats = np.unique(cat_vals) if self.p.value_dimension is None: # Pure count _, counts = np.unique(cat_vals, return_counts=True) agg_label = self.p.label or "Count" data = list(zip(unique_cats, counts)) else: num_vals = element.dimension_values(self.p.value_dimension, expanded=True) results = [] for cat in unique_cats: mask = cat_vals == cat results.append(self.p.function(num_vals[mask])) func_name = getattr(self.p.function, "__name__", "agg") agg_label = self.p.label or f"{func_name}({self.p.value_dimension})" data = list(zip(unique_cats, results)) return hv.Bars(data, kdims=[self.p.dimension], vdims=[agg_label]) import numpy as np import pandas as pd hv.extension("bokeh") rng = np.random.default_rng(42) df = pd.DataFrame( { "x": rng.normal(0, 1, 200), "y": rng.normal(0, 1, 200), "category": rng.choice(["A", "B", "C", "D"], 200), } ) from holoviews.operation import histogram ls = hv.link_selections.instance() points = hv.Points(df, kdims=["x", "y"], vdims=["category"]) scatter = ls(points) bars_count = ls(categorical_agg(points, dimension="category")) bars_mean = ls( categorical_agg(points, dimension="category", value_dimension="y", function=np.mean) ) hist_x = ls(histogram(points, dimension="x", num_bins=20)) layout = scatter + bars_count + bars_mean + hist_x import panel as pn pn.extension() pn.panel(layout, sizing_mode="stretch_both").servable()

Why this works

HoloViews operations maintain a reference to their source element. When link_selections encounters a selection expression like dim('x') > 0, it recurses into the operation's source (points) where x exists, applies the filter, then re-runs the operation on the filtered subset. Pre-aggregated hv.Bars lack this source reference, so the expression cannot resolve and raises CallbackError.


Documentation ask

The Linked Brushing user guide currently only demonstrates numeric dimensions with histogram. Please update it to:

  1. Show the categorical_agg operation (once built in) as the recommended approach for categorical Bars, including examples of count, sum, mean, etc.
  2. Document the general principle: operations that derive from a source element preserve lineage, enabling link_selections to cross-filter through them.
  3. Demonstrate how to build and use a custom Operation with linked selections

Related issues

  • #3842 — Original linked selections proposal
  • PR #3951 — Initial link_selections implementation

Metadata

Metadata

Assignees

No one assigned

    Labels

    TRIAGENeeds triaging

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions