API: Don't add extra attributes to matplotlib axes #54485

mroeschke · 2023-08-10T17:39:27Z

Currently there's a few areas where pandas adds extra attributes to a matplotlib axis

pandas/pandas/plotting/_matplotlib/core.py

Line 1425 in a936863

def _ts_plot(self, ax: Axes, x, data, style=None, **kwds):

pandas/pandas/plotting/_matplotlib/core.py

Line 487 in a936863

orig_ax.right_ax, new_ax.left_ax = new_ax, orig_ax

Since axes are stateful and there no way to clear these attributes via matplotlib public APIs, these attributes can cause issues when they are reused (discovered by running tests via pytest-randomly)

rsm-23 · 2023-08-17T15:07:31Z

@mroeschke for clarification, do we just stop adding this ?
orig_ax.right_ax, new_ax.left_ax = new_ax, orig_ax
I am trying to understand what all changes are we looking for. Thanks in advance.

mroeschke · 2023-08-17T15:42:29Z

Ideally all existing functionality should still work without adding extra attributes to a matplotlib axis, so these attributes still need to be passed along somehow (IMO this is probably a nontrival change)

jbrockmendel · 2023-11-09T01:46:25Z

I've been looking at this and am currently skeptical we can get rid of all of this ugly pattern. We have some tests that seem to pretty directly rely on storing state in ax, e.g. from test_from_resampling_area_line_mixed_high_to_low

kind1 = "line"
kind2 = "area"

idxh = date_range("1/1/1999", periods=52, freq="W")
idxl = date_range("1/1/1999", periods=12, freq="ME")
high = DataFrame(
            np.random.default_rng(2).random((len(idxh), 3)),
            index=idxh,
            columns=[0, 1, 2],
        )
low = DataFrame(
            np.random.default_rng(2).random((len(idxl), 3)),
            index=idxl,
            columns=[0, 1, 2],
        )
_, ax = mpl.pyplot.subplots()
high.plot(kind=kind1, stacked=True, ax=ax)
low.plot(kind=kind2, stacked=True, ax=ax)

In the last line here the ax obj is the only thing that can be storing the state. IIUC it is detecting that something is already plotted on ax and resampling low to match the freq of the existing x-axis. Or something. Honestly I know resampling is happening but im still trying to figure out why.

Some ways that come to mind to avoid this:

Just don't support this multiple-call usage
Try to back out the relevant state from whatever state variables matplotlib is using to store the information
Provide some other API to plot low and high in a single call
Make our own object to hold the state, maybe return it from plot (xref API: decide what to return from plotting functions #4020)

jbrockmendel · 2023-11-09T21:05:13Z

Disabling the place where we pin "right_ax" and "left_ax" breaks 30 tests. Of those 16 has the test itself directly trying to access one of these attributes. it tentatively looks like many of these have the pattern where we call multiple .plot calls with a re-used ax.

williambdean · 2024-07-03T19:09:02Z

I believe that this example is related:

import pandas as pd
import numpy as np

import matplotlib.pyplot as plt

n_dates = 52 * 3
dates = pd.date_range("2022-01-01", periods=n_dates, freq="W-MON")

seed = sum(map(ord, "Order matters"))
rng = np.random.default_rng(seed)
data = rng.normal(size=n_dates).cumsum()

ser = pd.Series(data, index=dates)
padding = 15

fig, axes = plt.subplots(nrows=2, ncols=2, sharex=False, sharey=True)

def plot_time_series(pandas: bool, ax: plt.Axes): 
    if pandas: 
        ser.plot(ax=ax)
    else: 
        ax.plot(dates, data)

def plot_fill_between(ax: plt.Axes): 
    ax.fill_between(dates, data - padding, data + padding, alpha=0.25)

ax = axes[0, 0]
plot_time_series(pandas=True, ax=ax)
plot_fill_between(ax)
ax.set(title="time-series first", ylabel="pandas.Series.plot")

ax = axes[0, 1]
plot_fill_between(ax)
plot_time_series(pandas=True, ax=ax)
ax.set(title="time-series second")

ax = axes[1, 0]
plot_time_series(pandas=False, ax=ax)
plot_fill_between(ax)
ax.set(title="", ylabel="plt.plot")

ax = axes[1, 1]
plot_fill_between(ax)
plot_time_series(pandas=False, ax=ax)
ax.set(title="")

plt.show()

Coming from here:

pandas/pandas/plotting/_matplotlib/timeseries.py

Lines 141 to 144 in dcb5494

    
           # clear current axes and data 
        
           # TODO #54485 
        
           ax._plot_data = []  # type: ignore[attr-defined] 
        
           ax.clear()

and brought from here: matplotlib/matplotlib#28505

mroeschke added Visualization plotting API Design labels Aug 10, 2023

mroeschke mentioned this issue Sep 23, 2023

TYP: towards matplotlib 3.8 #55253

Merged

mroeschke mentioned this issue Nov 8, 2023

REF: make plotting less stateful (6) #55886

Merged

5 tasks

jbrockmendel mentioned this issue Nov 13, 2023

REF: Remove un-used attribute-pinning in plotting #55944

Merged

5 tasks

williambdean mentioned this issue Jul 3, 2024

Pull out seasonality as YearlyFourier and MonthlyFourier pymc-labs/pymc-marketing#802

Merged

13 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

API: Don't add extra attributes to matplotlib axes #54485

API: Don't add extra attributes to matplotlib axes #54485

mroeschke commented Aug 10, 2023

rsm-23 commented Aug 17, 2023

mroeschke commented Aug 17, 2023

jbrockmendel commented Nov 9, 2023

jbrockmendel commented Nov 9, 2023

williambdean commented Jul 3, 2024

API: Don't add extra attributes to matplotlib axes #54485

API: Don't add extra attributes to matplotlib axes #54485

Comments

mroeschke commented Aug 10, 2023

rsm-23 commented Aug 17, 2023

mroeschke commented Aug 17, 2023

jbrockmendel commented Nov 9, 2023

jbrockmendel commented Nov 9, 2023

williambdean commented Jul 3, 2024