Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PX figure overlay / layering #2648

Open
nicolaskruchten opened this issue Jul 16, 2020 · 9 comments
Open

PX figure overlay / layering #2648

nicolaskruchten opened this issue Jul 16, 2020 · 9 comments
Milestone

Comments

@nicolaskruchten
Copy link
Contributor

nicolaskruchten commented Jul 16, 2020

PX today is nowhere near as powerful as something like ggplot in terms of layering: you can't take a bar chart and a line chart and overlay them easily, even though they have the same subplot information. In principle, so long as x, y, facet_col and facet_row are the same or compatible, I should be able to "overlay" PX figures.

See also Composition: #2647

@nicholas-esterer
Copy link
Contributor

px.overlay notes

  • in order to check matching titles, the shape of the plot must be taken into consideration because the titles are only given to the outer elements
  • if the titles don't match, it could fail. But we might want to compare datasets, so perhaps given the geometry of the 2 plots is compatible, the titles could be combined somehow, or secondary y-axes could be used.
  • px sets all the y-ranges in the same row and all the x-ranges in the same column to be identical, so when combining charts, it would be a matter of finding the range that fits all the combined data
  • for simplicity in comparing axes, only 'xy'-type traces should be supported for now
  • compatibility of subplot geometry is quite complex as the layout is just a collection of axes that are configured to line-up in a desired way (e.g., producing a grid that looks like subplots). Because an infinite number of layouts is possible, we could say that figures are combinable if the axes in the layout match exactly. However many exceptions to this exist where combinability is still plausible: for example, if one subplot has an inset and the other doesn't, the main subplot data could be combined (show two line charts on the same subplot, say) and the inset could be placed over this. But how to extract this situation by just working with the axis descriptions and data references is difficult... a similar case can be made for secondary y axes. A more complete comparison of layouts would compare the topology of axis anchor references, as well as their position in the final subplot.

Simple implementation (first iteration)

  • the axes (as described in fig.layout) of the two figures are compared, if they
    don't match exactly, the function fails.

  • somehow it must be known what is being shown in each figure, this could maybe
    be extracted from the figure title. This will be used in a legend where the colors of the two overlayed plots will be annotated (e.g., stuff from figure 1 will be blue and stuff from figure 2 will be red).

  • A simple use case might be the observation of an optimization algorithm. In one figure, each subplot shows the error or cost surface (represented as a contour or a heatmap) for a pair of variables, in another figure the progression of an optimization algorithm is shown as line segments. This case is simple because the trace types are different, but the axes are the same.

  • This becomes more complicated if the color kwarg (to px.scatter, say) was used because multiple series are compared in this case by varying their color. In this case, we could just continue the cycle of colors and append the new data to the plot, but also in this case it probably is easier for the user to extend the DataFrame.

    • consider the following example, imagine animals can also dine out in
      restaurants (and smoke) and we have a different tips data set (called
      tips2) with the keys: total_bill tip species smoker day time. Then we
      can imagine doing the following:

      fig1=px.scatter(tips,x="total_bill", y="tip", facet_col="smoker",
      facet_row="time", color="sex")
      fig2=px.scatter(tips2,x="total_bill", y="tip", facet_col="smoker",
      facet_row="time", color="species")
      fig=px.combine(fig1,fig2)

      Then the legend could have the title 'sex or species' and the categories
      would be 'male', 'female', 'dog', 'cat', 'bison' etc.
      But in that case it would be easier to combine the data from the beginning.

      tips=tips.rename({"sex","sex_species"},axis="columns")
      tips2=tips2.rename({"species","sex_species"},axis="columns")
      tips_combo=pd.concat([tips,tips2],axis=0,ignore_index=True)
      px.scatter(tips_combo,...,color="sex_species")

  • A case where we cannot simply combine two data-columns is if we had an additional column that wasn't categorical, say "calories consumed". It's true we could pass size="calories consumed" to px.scatter, but say we want to use an axis to observe the information, then it might be nice to have two legends and either a secondary_y-axis showing the other information, or have the axis extended to the range of both columns.

  • Another case where we cannot combine data-columns is if two trace types, say scatter and bar, are to be overlayed. If the subplot titles match, then this is just a matter of using one of the two provided layouts, combining the data, and updating the ranges. It would be helpful to have legend to indicate what is represented by each trace.

    • A more realistic case for this is like the tips example, but we have exact times for all the data-points. We make a bar chart where the x-axis of each bar is the day of the week and the bar represents the total value of transactions (say sum of bill amounts) on that day. They could be stacked bars or a solid bar (in that case a histogram). Then we overlay a scatter plot where the transactions are plotted by their exact time. We might want a secondary y-axis in this case because the single transaction values will be much less than the total. Also the x-axis becomes a little more complicated because it plots a continuous value, not a category (although it should always be a continuous value and the axis should be a time-like axis).

Types of overlays:

- distribution / subset : like the optimization above where distribution is
  the error contour and the subset is the path the optimzation algorithm
  takes
- multidimensional comparison : like the case where in one "plane" we have
  the tip amount and the other "plane" the calories consumed.
- raw data / model : like the case above where we show the value of
  transactions in a day and each transaction individually

@nicholas-esterer
Copy link
Contributor

As there are many many possibilities: for the first iteration we try this perhaps:

  • axes in layouts must be exactly the same (same number of axes, same domain, same anchor, etc.), then this can work for strangely shaped plots
  • if titles can be extracted from the layout (by inspecting "title" and fig._grid_ref), then they are either checked against the other plot (function only succeeds if they match) or combined to make new titles
  • the ranges of the resulting plot are updated so all the data fits, or we introduce a secondary y axis in each subplot, if it doesn't already exist
    • this means probably only 'xy' type plots are allowed
    • detecting secondary y axes might be hard to do
  • making a legend in every case is a bit confusing, especially if we are combining two different trace types, so that requires a little more thought / guidance

@nicholas-esterer
Copy link
Contributor

Detecting the top and right titles is a bit difficult / hacky because they are just annotations.
Also because the figures don't have much meta-data, the combine function would have to do a lot of guessing as to what a particular graph object was meant to be (is it annotation that's supposed to be a title? or point out something on the graph)?
Wondering if a better approach might be to have something like:
px.combine(trace_types=[px.scatter,px.bar],trace_args=[dict(x="total_bill"...),dict(y="tips"...)])
that gives you these 2 plots super-imposed.

@nicolaskruchten
Copy link
Contributor Author

If looking at titles is too complicated, then feel free to assume that px.whatever() can add whatever metadata you need to the figure. There is a layout.meta attribute that PX could put stuff into.

@nicolaskruchten
Copy link
Contributor Author

Maybe in a first iteration we could just look at subplots: consider two figures "compatible" if they have the exact same amount of x/y axis in the exact same domains, and no other subplots like scene/geo/ternary/polar etc.

@nicolaskruchten
Copy link
Contributor Author

Some notes from our conversation today:

  • we could restrict ourselves to single-subplot or facet_col-only (including facet_col_wrap!) to avoid the "labels on the right due to facet_row" problem
  • we could reuse the x/y/legend titles if and only if they are the same between fig0 and fig1, otherwise leave them blank and let the user use fig.update_xaxes(title=...) to set them
  • what to do when users are combining two single-trace figures where the traces have no names? we maybe could use the y-axis titles as the trace names?
  • we should reflow reused colors between fig0 and fig1 using fig0.layout.colorway falling back to fig0.layout.template.layout.colorway

@nicolaskruchten
Copy link
Contributor Author

See also: #2146 for a different concept

@gvwilson
Copy link
Contributor

Hi - we are trying to tidy up the stale issues and PRs in Plotly's public repositories so that we can focus on things that are still important to our community. Since this one has been sitting for several years, I'm going to close it; if it is still a concern, please add a comment letting us know what recent version of our software you've checked it with so that I can reopen it and add it to our backlog. Thanks for your help - @gvwilson

@ndrezn
Copy link
Member

ndrezn commented Aug 23, 2024

Reopening as there's an open PR related to this ticket.

@ndrezn ndrezn reopened this Aug 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants