Skip to content

Commit da7d473

Browse files
TomAugspurgerjorisvandenbossche
authored andcommitted
DEPR: Change boxplot return_type kwarg (#12216)
* DEPR: Change boxplot return_type kwarg Part of #6581 Deprecation started in #7096 Changes the default value of `return_type` in DataFrame.boxplot and DataFrame.plot.box from None to 'axes'. * API: Change faceted boxplot return_type Aligns behavior of `Groupby.boxplot` and DataFrame.boxplot(by=.) to return a Series.
1 parent 4488f18 commit da7d473

File tree

7 files changed

+73
-62
lines changed

7 files changed

+73
-62
lines changed

doc/source/visualization.rst

+18-17
Original file line numberDiff line numberDiff line change
@@ -456,28 +456,29 @@ columns:
456456
457457
.. _visualization.box.return:
458458

459-
Basically, plot functions return :class:`matplotlib Axes <matplotlib.axes.Axes>` as a return value.
460-
In ``boxplot``, the return type can be changed by argument ``return_type``, and whether the subplots is enabled (``subplots=True`` in ``plot`` or ``by`` is specified in ``boxplot``).
459+
.. warning::
461460

462-
When ``subplots=False`` / ``by`` is ``None``:
461+
The default changed from ``'dict'`` to ``'axes'`` in version 0.19.0.
463462

464-
* if ``return_type`` is ``'dict'``, a dictionary containing the :class:`matplotlib Lines <matplotlib.lines.Line2D>` is returned. The keys are "boxes", "caps", "fliers", "medians", and "whiskers".
465-
This is the default of ``boxplot`` in historical reason.
466-
Note that ``plot.box()`` returns ``Axes`` by default same as other plots.
467-
* if ``return_type`` is ``'axes'``, a :class:`matplotlib Axes <matplotlib.axes.Axes>` containing the boxplot is returned.
468-
* if ``return_type`` is ``'both'`` a namedtuple containing the :class:`matplotlib Axes <matplotlib.axes.Axes>`
469-
and :class:`matplotlib Lines <matplotlib.lines.Line2D>` is returned
463+
In ``boxplot``, the return type can be controlled by the ``return_type``, keyword. The valid choices are ``{"axes", "dict", "both", None}``.
464+
Faceting, created by ``DataFrame.boxplot`` with the ``by``
465+
keyword, will affect the output type as well:
470466

471-
When ``subplots=True`` / ``by`` is some column of the DataFrame:
467+
================ ======= ==========================
468+
``return_type=`` Faceted Output type
469+
---------------- ------- --------------------------
472470

473-
* A dict of ``return_type`` is returned, where the keys are the columns
474-
of the DataFrame. The plot has a facet for each column of
475-
the DataFrame, with a separate box for each value of ``by``.
471+
``None`` No axes
472+
``None`` Yes 2-D ndarray of axes
473+
``'axes'`` No axes
474+
``'axes'`` Yes Series of axes
475+
``'dict'`` No dict of artists
476+
``'dict'`` Yes Series of dicts of artists
477+
``'both'`` No namedtuple
478+
``'both'`` Yes Series of namedtuples
479+
================ ======= ==========================
476480

477-
Finally, when calling boxplot on a :class:`Groupby` object, a dict of ``return_type``
478-
is returned, where the keys are the same as the Groupby object. The plot has a
479-
facet for each key, with each facet containing a box for each column of the
480-
DataFrame.
481+
``Groupby.boxplot`` always returns a Series of ``return_type``.
481482

482483
.. ipython:: python
483484
:okwarning:

doc/source/whatsnew/v0.19.0.txt

+2-1
Original file line numberDiff line numberDiff line change
@@ -494,6 +494,7 @@ API changes
494494
- ``__setitem__`` will no longer apply a callable rhs as a function instead of storing it. Call ``where`` directly to get the previous behavior. (:issue:`13299`)
495495
- Passing ``Period`` with multiple frequencies to normal ``Index`` now returns ``Index`` with ``object`` dtype (:issue:`13664`)
496496
- ``PeriodIndex.fillna`` with ``Period`` has different freq now coerces to ``object`` dtype (:issue:`13664`)
497+
- Faceted boxplots from ``DataFrame.boxplot(by=col)`` now return a ``Series`` when ``return_type`` is not None. Previously these returned an ``OrderedDict``. Note that when ``return_type=None``, the default, these still return a 2-D NumPy array. (:issue:`12216`, :issue:`7096`)
497498
- More informative exceptions are passed through the csv parser. The exception type would now be the original exception type instead of ``CParserError``. (:issue:`13652`)
498499
- ``astype()`` will now accept a dict of column name to data types mapping as the ``dtype`` argument. (:issue:`12086`)
499500
- The ``pd.read_json`` and ``DataFrame.to_json`` has gained support for reading and writing json lines with ``lines`` option see :ref:`Line delimited json <io.jsonl>` (:issue:`9180`)
@@ -1282,9 +1283,9 @@ Removal of prior version deprecations/changes
12821283

12831284
Now legacy time rules raises ``ValueError``. For the list of currently supported offsets, see :ref:`here <timeseries.offset_aliases>`
12841285

1286+
- The default value for the ``return_type`` parameter for ``DataFrame.plot.box`` and ``DataFrame.boxplot`` changed from ``None`` to ``"axes"``. These methods will now return a matplotlib axes by default instead of a dictionary of artists. See :ref:`here <visualization.box.return>` (:issue:`6581`).
12851287
- The ``tquery`` and ``uquery`` functions in the ``pandas.io.sql`` module are removed (:issue:`5950`).
12861288

1287-
12881289
.. _whatsnew_0190.performance:
12891290

12901291
Performance Improvements

pandas/tests/plotting/common.py

+4-3
Original file line numberDiff line numberDiff line change
@@ -5,8 +5,8 @@
55
import os
66
import warnings
77

8-
from pandas import DataFrame
9-
from pandas.compat import zip, iteritems, OrderedDict
8+
from pandas import DataFrame, Series
9+
from pandas.compat import zip, iteritems
1010
from pandas.util.decorators import cache_readonly
1111
from pandas.types.api import is_list_like
1212
import pandas.util.testing as tm
@@ -445,7 +445,8 @@ def _check_box_return_type(self, returned, return_type, expected_keys=None,
445445
self.assertIsInstance(r, Axes)
446446
return
447447

448-
self.assertTrue(isinstance(returned, OrderedDict))
448+
self.assertTrue(isinstance(returned, Series))
449+
449450
self.assertEqual(sorted(returned.keys()), sorted(expected_keys))
450451
for key, value in iteritems(returned):
451452
self.assertTrue(isinstance(value, types[return_type]))

pandas/tests/plotting/test_boxplot_method.py

+15-13
Original file line numberDiff line numberDiff line change
@@ -92,6 +92,12 @@ def test_boxplot_legacy(self):
9292
lines = list(itertools.chain.from_iterable(d.values()))
9393
self.assertEqual(len(ax.get_lines()), len(lines))
9494

95+
@slow
96+
def test_boxplot_return_type_none(self):
97+
# GH 12216; return_type=None & by=None -> axes
98+
result = self.hist_df.boxplot()
99+
self.assertTrue(isinstance(result, self.plt.Axes))
100+
95101
@slow
96102
def test_boxplot_return_type_legacy(self):
97103
# API change in https://github.com/pydata/pandas/pull/7096
@@ -103,10 +109,8 @@ def test_boxplot_return_type_legacy(self):
103109
with tm.assertRaises(ValueError):
104110
df.boxplot(return_type='NOTATYPE')
105111

106-
with tm.assert_produces_warning(FutureWarning):
107-
result = df.boxplot()
108-
# change to Axes in future
109-
self._check_box_return_type(result, 'dict')
112+
result = df.boxplot()
113+
self._check_box_return_type(result, 'axes')
110114

111115
with tm.assert_produces_warning(False):
112116
result = df.boxplot(return_type='dict')
@@ -140,6 +144,7 @@ def _check_ax_limits(col, ax):
140144
p = df.boxplot(['height', 'weight', 'age'], by='category')
141145
height_ax, weight_ax, age_ax = p[0, 0], p[0, 1], p[1, 0]
142146
dummy_ax = p[1, 1]
147+
143148
_check_ax_limits(df['height'], height_ax)
144149
_check_ax_limits(df['weight'], weight_ax)
145150
_check_ax_limits(df['age'], age_ax)
@@ -163,8 +168,7 @@ def test_boxplot_legacy(self):
163168
grouped = self.hist_df.groupby(by='gender')
164169
with tm.assert_produces_warning(UserWarning):
165170
axes = _check_plot_works(grouped.boxplot, return_type='axes')
166-
self._check_axes_shape(list(axes.values()), axes_num=2, layout=(1, 2))
167-
171+
self._check_axes_shape(list(axes.values), axes_num=2, layout=(1, 2))
168172
axes = _check_plot_works(grouped.boxplot, subplots=False,
169173
return_type='axes')
170174
self._check_axes_shape(axes, axes_num=1, layout=(1, 1))
@@ -175,7 +179,7 @@ def test_boxplot_legacy(self):
175179
grouped = df.groupby(level=1)
176180
with tm.assert_produces_warning(UserWarning):
177181
axes = _check_plot_works(grouped.boxplot, return_type='axes')
178-
self._check_axes_shape(list(axes.values()), axes_num=10, layout=(4, 3))
182+
self._check_axes_shape(list(axes.values), axes_num=10, layout=(4, 3))
179183

180184
axes = _check_plot_works(grouped.boxplot, subplots=False,
181185
return_type='axes')
@@ -184,8 +188,7 @@ def test_boxplot_legacy(self):
184188
grouped = df.unstack(level=1).groupby(level=0, axis=1)
185189
with tm.assert_produces_warning(UserWarning):
186190
axes = _check_plot_works(grouped.boxplot, return_type='axes')
187-
self._check_axes_shape(list(axes.values()), axes_num=3, layout=(2, 2))
188-
191+
self._check_axes_shape(list(axes.values), axes_num=3, layout=(2, 2))
189192
axes = _check_plot_works(grouped.boxplot, subplots=False,
190193
return_type='axes')
191194
self._check_axes_shape(axes, axes_num=1, layout=(1, 1))
@@ -226,8 +229,7 @@ def test_grouped_box_return_type(self):
226229
expected_keys=['height', 'weight', 'category'])
227230

228231
# now for groupby
229-
with tm.assert_produces_warning(FutureWarning, check_stacklevel=False):
230-
result = df.groupby('gender').boxplot()
232+
result = df.groupby('gender').boxplot(return_type='dict')
231233
self._check_box_return_type(
232234
result, 'dict', expected_keys=['Male', 'Female'])
233235

@@ -347,7 +349,7 @@ def test_grouped_box_multiple_axes(self):
347349
with tm.assert_produces_warning(UserWarning):
348350
returned = df.boxplot(column=['height', 'weight', 'category'],
349351
by='gender', return_type='axes', ax=axes[0])
350-
returned = np.array(list(returned.values()))
352+
returned = np.array(list(returned.values))
351353
self._check_axes_shape(returned, axes_num=3, layout=(1, 3))
352354
self.assert_numpy_array_equal(returned, axes[0])
353355
self.assertIs(returned[0].figure, fig)
@@ -357,7 +359,7 @@ def test_grouped_box_multiple_axes(self):
357359
returned = df.groupby('classroom').boxplot(
358360
column=['height', 'weight', 'category'],
359361
return_type='axes', ax=axes[1])
360-
returned = np.array(list(returned.values()))
362+
returned = np.array(list(returned.values))
361363
self._check_axes_shape(returned, axes_num=3, layout=(1, 3))
362364
self.assert_numpy_array_equal(returned, axes[1])
363365
self.assertIs(returned[0].figure, fig)

pandas/tests/plotting/test_frame.py

+4-1
Original file line numberDiff line numberDiff line change
@@ -1221,6 +1221,9 @@ def test_boxplot_return_type(self):
12211221
result = df.plot.box(return_type='axes')
12221222
self._check_box_return_type(result, 'axes')
12231223

1224+
result = df.plot.box() # default axes
1225+
self._check_box_return_type(result, 'axes')
1226+
12241227
result = df.plot.box(return_type='both')
12251228
self._check_box_return_type(result, 'both')
12261229

@@ -1230,7 +1233,7 @@ def test_boxplot_subplots_return_type(self):
12301233

12311234
# normal style: return_type=None
12321235
result = df.plot.box(subplots=True)
1233-
self.assertIsInstance(result, np.ndarray)
1236+
self.assertIsInstance(result, Series)
12341237
self._check_box_return_type(result, None, expected_keys=[
12351238
'height', 'weight', 'category'])
12361239

pandas/tools/plotting.py

+24-21
Original file line numberDiff line numberDiff line change
@@ -2247,7 +2247,7 @@ class BoxPlot(LinePlot):
22472247
# namedtuple to hold results
22482248
BP = namedtuple("Boxplot", ['ax', 'lines'])
22492249

2250-
def __init__(self, data, return_type=None, **kwargs):
2250+
def __init__(self, data, return_type='axes', **kwargs):
22512251
# Do not call LinePlot.__init__ which may fill nan
22522252
if return_type not in self._valid_return_types:
22532253
raise ValueError(
@@ -2266,7 +2266,7 @@ def _args_adjust(self):
22662266
self.sharey = False
22672267

22682268
@classmethod
2269-
def _plot(cls, ax, y, column_num=None, return_type=None, **kwds):
2269+
def _plot(cls, ax, y, column_num=None, return_type='axes', **kwds):
22702270
if y.ndim == 2:
22712271
y = [remove_na(v) for v in y]
22722272
# Boxplot fails with empty arrays, so need to add a NaN
@@ -2339,7 +2339,7 @@ def maybe_color_bp(self, bp):
23392339

23402340
def _make_plot(self):
23412341
if self.subplots:
2342-
self._return_obj = compat.OrderedDict()
2342+
self._return_obj = Series()
23432343

23442344
for i, (label, y) in enumerate(self._iter_data()):
23452345
ax = self._get_ax(i)
@@ -2691,14 +2691,17 @@ def plot_series(data, kind='line', ax=None, # Series unique
26912691
grid : Setting this to True will show the grid
26922692
layout : tuple (optional)
26932693
(rows, columns) for the layout of the plot
2694-
return_type : {'axes', 'dict', 'both'}, default 'dict'
2695-
The kind of object to return. 'dict' returns a dictionary
2696-
whose values are the matplotlib Lines of the boxplot;
2694+
return_type : {None, 'axes', 'dict', 'both'}, default None
2695+
The kind of object to return. The default is ``axes``
26972696
'axes' returns the matplotlib axes the boxplot is drawn on;
2697+
'dict' returns a dictionary whose values are the matplotlib
2698+
Lines of the boxplot;
26982699
'both' returns a namedtuple with the axes and dict.
26992700
2700-
When grouping with ``by``, a dict mapping columns to ``return_type``
2701-
is returned.
2701+
When grouping with ``by``, a Series mapping columns to ``return_type``
2702+
is returned, unless ``return_type`` is None, in which case a NumPy
2703+
array of axes is returned with the same shape as ``layout``.
2704+
See the prose documentation for more.
27022705
27032706
kwds : other plotting keyword arguments to be passed to matplotlib boxplot
27042707
function
@@ -2724,7 +2727,7 @@ def boxplot(data, column=None, by=None, ax=None, fontsize=None,
27242727

27252728
# validate return_type:
27262729
if return_type not in BoxPlot._valid_return_types:
2727-
raise ValueError("return_type must be {None, 'axes', 'dict', 'both'}")
2730+
raise ValueError("return_type must be {'axes', 'dict', 'both'}")
27282731

27292732
from pandas import Series, DataFrame
27302733
if isinstance(data, Series):
@@ -2769,23 +2772,19 @@ def plot_group(keys, values, ax):
27692772
columns = [column]
27702773

27712774
if by is not None:
2775+
# Prefer array return type for 2-D plots to match the subplot layout
2776+
# https://github.com/pydata/pandas/pull/12216#issuecomment-241175580
27722777
result = _grouped_plot_by_column(plot_group, data, columns=columns,
27732778
by=by, grid=grid, figsize=figsize,
27742779
ax=ax, layout=layout,
27752780
return_type=return_type)
27762781
else:
2782+
if return_type is None:
2783+
return_type = 'axes'
27772784
if layout is not None:
27782785
raise ValueError("The 'layout' keyword is not supported when "
27792786
"'by' is None")
27802787

2781-
if return_type is None:
2782-
msg = ("\nThe default value for 'return_type' will change to "
2783-
"'axes' in a future release.\n To use the future behavior "
2784-
"now, set return_type='axes'.\n To keep the previous "
2785-
"behavior and silence this warning, set "
2786-
"return_type='dict'.")
2787-
warnings.warn(msg, FutureWarning, stacklevel=3)
2788-
return_type = 'dict'
27892788
if ax is None:
27902789
ax = _gca()
27912790
data = data._get_numeric_data()
@@ -3104,12 +3103,12 @@ def boxplot_frame_groupby(grouped, subplots=True, column=None, fontsize=None,
31043103
figsize=figsize, layout=layout)
31053104
axes = _flatten(axes)
31063105

3107-
ret = compat.OrderedDict()
3106+
ret = Series()
31083107
for (key, group), ax in zip(grouped, axes):
31093108
d = group.boxplot(ax=ax, column=column, fontsize=fontsize,
31103109
rot=rot, grid=grid, **kwds)
31113110
ax.set_title(pprint_thing(key))
3112-
ret[key] = d
3111+
ret.loc[key] = d
31133112
fig.subplots_adjust(bottom=0.15, top=0.9, left=0.1,
31143113
right=0.9, wspace=0.2)
31153114
else:
@@ -3175,17 +3174,21 @@ def _grouped_plot_by_column(plotf, data, columns=None, by=None,
31753174

31763175
_axes = _flatten(axes)
31773176

3178-
result = compat.OrderedDict()
3177+
result = Series()
3178+
ax_values = []
3179+
31793180
for i, col in enumerate(columns):
31803181
ax = _axes[i]
31813182
gp_col = grouped[col]
31823183
keys, values = zip(*gp_col)
31833184
re_plotf = plotf(keys, values, ax, **kwargs)
31843185
ax.set_title(col)
31853186
ax.set_xlabel(pprint_thing(by))
3186-
result[col] = re_plotf
3187+
ax_values.append(re_plotf)
31873188
ax.grid(grid)
31883189

3190+
result = Series(ax_values, index=columns)
3191+
31893192
# Return axes in multiplot case, maybe revisit later # 985
31903193
if return_type is None:
31913194
result = axes

pandas/util/testing.py

+6-6
Original file line numberDiff line numberDiff line change
@@ -880,12 +880,12 @@ def assert_attr_equal(attr, left, right, obj='Attributes'):
880880

881881
def assert_is_valid_plot_return_object(objs):
882882
import matplotlib.pyplot as plt
883-
if isinstance(objs, np.ndarray):
884-
for el in objs.flat:
885-
assert isinstance(el, plt.Axes), ('one of \'objs\' is not a '
886-
'matplotlib Axes instance, '
887-
'type encountered {0!r}'
888-
''.format(el.__class__.__name__))
883+
if isinstance(objs, (pd.Series, np.ndarray)):
884+
for el in objs.ravel():
885+
msg = ('one of \'objs\' is not a matplotlib Axes instance, '
886+
'type encountered {0!r}')
887+
assert isinstance(el, (plt.Axes, dict)), msg.format(
888+
el.__class__.__name__)
889889
else:
890890
assert isinstance(objs, (plt.Artist, tuple, dict)), \
891891
('objs is neither an ndarray of Artist instances nor a '

0 commit comments

Comments
 (0)