PX figure overlay / layering #2648

nicolaskruchten · 2020-07-16T20:26:54Z

PX today is nowhere near as powerful as something like ggplot in terms of layering: you can't take a bar chart and a line chart and overlay them easily, even though they have the same subplot information. In principle, so long as x, y, facet_col and facet_row are the same or compatible, I should be able to "overlay" PX figures.

`px.overlay` notes

in order to check matching titles, the shape of the plot must be taken into consideration because the titles are only given to the outer elements
if the titles don't match, it could fail. But we might want to compare datasets, so perhaps given the geometry of the 2 plots is compatible, the titles could be combined somehow, or secondary y-axes could be used.
px sets all the y-ranges in the same row and all the x-ranges in the same column to be identical, so when combining charts, it would be a matter of finding the range that fits all the combined data
for simplicity in comparing axes, only 'xy'-type traces should be supported for now
compatibility of subplot geometry is quite complex as the layout is just a collection of axes that are configured to line-up in a desired way (e.g., producing a grid that looks like subplots). Because an infinite number of layouts is possible, we could say that figures are combinable if the axes in the layout match exactly. However many exceptions to this exist where combinability is still plausible: for example, if one subplot has an inset and the other doesn't, the main subplot data could be combined (show two line charts on the same subplot, say) and the inset could be placed over this. But how to extract this situation by just working with the axis descriptions and data references is difficult... a similar case can be made for secondary y axes. A more complete comparison of layouts would compare the topology of axis anchor references, as well as their position in the final subplot.

Simple implementation (first iteration)

the axes (as described in fig.layout) of the two figures are compared, if they
don't match exactly, the function fails.
somehow it must be known what is being shown in each figure, this could maybe
be extracted from the figure title. This will be used in a legend where the colors of the two overlayed plots will be annotated (e.g., stuff from figure 1 will be blue and stuff from figure 2 will be red).
A simple use case might be the observation of an optimization algorithm. In one figure, each subplot shows the error or cost surface (represented as a contour or a heatmap) for a pair of variables, in another figure the progression of an optimization algorithm is shown as line segments. This case is simple because the trace types are different, but the axes are the same.
This becomes more complicated if the color kwarg (to px.scatter, say) was used because multiple series are compared in this case by varying their color. In this case, we could just continue the cycle of colors and append the new data to the plot, but also in this case it probably is easier for the user to extend the DataFrame.
- consider the following example, imagine animals can also dine out in
  restaurants (and smoke) and we have a different tips data set (called
  tips2) with the keys: total_bill tip species smoker day time. Then we
  can imagine doing the following:
  
  fig1=px.scatter(tips,x="total_bill", y="tip", facet_col="smoker",
  facet_row="time", color="sex")
  fig2=px.scatter(tips2,x="total_bill", y="tip", facet_col="smoker",
  facet_row="time", color="species")
  fig=px.combine(fig1,fig2)
  
  Then the legend could have the title 'sex or species' and the categories
  would be 'male', 'female', 'dog', 'cat', 'bison' etc.
  But in that case it would be easier to combine the data from the beginning.
  
  tips=tips.rename({"sex","sex_species"},axis="columns")
  tips2=tips2.rename({"species","sex_species"},axis="columns")
  tips_combo=pd.concat([tips,tips2],axis=0,ignore_index=True)
  px.scatter(tips_combo,...,color="sex_species")
A case where we cannot simply combine two data-columns is if we had an additional column that wasn't categorical, say "calories consumed". It's true we could pass size="calories consumed" to px.scatter, but say we want to use an axis to observe the information, then it might be nice to have two legends and either a secondary_y-axis showing the other information, or have the axis extended to the range of both columns.
Another case where we cannot combine data-columns is if two trace types, say scatter and bar, are to be overlayed. If the subplot titles match, then this is just a matter of using one of the two provided layouts, combining the data, and updating the ranges. It would be helpful to have legend to indicate what is represented by each trace.
- A more realistic case for this is like the tips example, but we have exact times for all the data-points. We make a bar chart where the x-axis of each bar is the day of the week and the bar represents the total value of transactions (say sum of bill amounts) on that day. They could be stacked bars or a solid bar (in that case a histogram). Then we overlay a scatter plot where the transactions are plotted by their exact time. We might want a secondary y-axis in this case because the single transaction values will be much less than the total. Also the x-axis becomes a little more complicated because it plots a continuous value, not a category (although it should always be a continuous value and the axis should be a time-like axis).

Types of overlays:

- distribution / subset : like the optimization above where distribution is
  the error contour and the subset is the path the optimzation algorithm
  takes
- multidimensional comparison : like the case where in one "plane" we have
  the tip amount and the other "plane" the calories consumed.
- raw data / model : like the case above where we show the value of
  transactions in a day and each transaction individually

nicholas-esterer · 2020-10-27T20:59:11Z

As there are many many possibilities: for the first iteration we try this perhaps:

axes in layouts must be exactly the same (same number of axes, same domain, same anchor, etc.), then this can work for strangely shaped plots
if titles can be extracted from the layout (by inspecting "title" and fig._grid_ref), then they are either checked against the other plot (function only succeeds if they match) or combined to make new titles
the ranges of the resulting plot are updated so all the data fits, or we introduce a secondary y axis in each subplot, if it doesn't already exist
- this means probably only 'xy' type plots are allowed
- detecting secondary y axes might be hard to do
making a legend in every case is a bit confusing, especially if we are combining two different trace types, so that requires a little more thought / guidance

nicholas-esterer · 2020-10-28T17:23:30Z

Detecting the top and right titles is a bit difficult / hacky because they are just annotations.
Also because the figures don't have much meta-data, the combine function would have to do a lot of guessing as to what a particular graph object was meant to be (is it annotation that's supposed to be a title? or point out something on the graph)?
Wondering if a better approach might be to have something like:
px.combine(trace_types=[px.scatter,px.bar],trace_args=[dict(x="total_bill"...),dict(y="tips"...)])
that gives you these 2 plots super-imposed.

nicolaskruchten · 2020-10-29T12:08:12Z

If looking at titles is too complicated, then feel free to assume that px.whatever() can add whatever metadata you need to the figure. There is a layout.meta attribute that PX could put stuff into.

nicolaskruchten · 2020-10-29T12:10:05Z

Maybe in a first iteration we could just look at subplots: consider two figures "compatible" if they have the exact same amount of x/y axis in the exact same domains, and no other subplots like scene/geo/ternary/polar etc.

nicolaskruchten · 2020-11-02T21:33:37Z

Some notes from our conversation today:

we could restrict ourselves to single-subplot or facet_col-only (including facet_col_wrap!) to avoid the "labels on the right due to facet_row" problem
we could reuse the x/y/legend titles if and only if they are the same between fig0 and fig1, otherwise leave them blank and let the user use fig.update_xaxes(title=...) to set them
what to do when users are combining two single-trace figures where the traces have no names? we maybe could use the y-axis titles as the trace names?
we should reflow reused colors between fig0 and fig1 using fig0.layout.colorway falling back to fig0.layout.template.layout.colorway

nicolaskruchten · 2020-11-17T13:13:55Z

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PX figure overlay / layering #2648

PX figure overlay / layering #2648

nicolaskruchten commented Jul 16, 2020 •

edited

Loading

nicholas-esterer commented Oct 27, 2020

nicholas-esterer commented Oct 27, 2020

nicholas-esterer commented Oct 28, 2020

nicolaskruchten commented Oct 29, 2020

nicolaskruchten commented Oct 29, 2020

nicolaskruchten commented Nov 2, 2020

nicolaskruchten commented Nov 17, 2020

gvwilson commented Jun 25, 2024

ndrezn commented Aug 23, 2024

PX figure overlay / layering #2648

PX figure overlay / layering #2648

Comments

nicolaskruchten commented Jul 16, 2020 • edited Loading

nicholas-esterer commented Oct 27, 2020

px.overlay notes

Simple implementation (first iteration)

Types of overlays:

nicholas-esterer commented Oct 27, 2020

nicholas-esterer commented Oct 28, 2020

nicolaskruchten commented Oct 29, 2020

nicolaskruchten commented Oct 29, 2020

nicolaskruchten commented Nov 2, 2020

nicolaskruchten commented Nov 17, 2020

gvwilson commented Jun 25, 2024

ndrezn commented Aug 23, 2024

nicolaskruchten commented Jul 16, 2020 •

edited

Loading

`px.overlay` notes