Skip to content

API: Return axes from boxplot #7096

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 3 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
32 changes: 32 additions & 0 deletions doc/source/groupby.rst
Original file line number Diff line number Diff line change
Expand Up @@ -909,6 +909,38 @@ To see the order in which each row appears within its group, use the

df.groupby('A').cumcount(ascending=False) # kwarg only

Plotting
~~~~~~~~

Groupby also works with some plotting methods. For example, suppose we
suspect that some features in a DataFrame my differ by group, in this case,
the values in column 1 where the group is "B" are 3 higher on average.

.. ipython:: python

np.random.seed(1234)
df = DataFrame(np.random.randn(50, 2))
df['g'] = np.random.choice(['A', 'B'], size=50)
df.loc[df['g'] == 'B', 1] += 3

We can easily visualize this with a boxplot:

.. ipython:: python

@savefig groupby_boxplot.png
bp = df.groupby('g').boxplot()

The result of calling ``boxplot`` is a dictionary whose keys are the values
of our grouping column ``g`` ("A" and "B"). The values of the resulting dictionary
can be controlled by the ``return_type`` keyword of ``boxplot``.
See the :ref:`visualization documentation<visualization.box>` for more.

.. warning::

For historical reasons, ``df.groupby("g").boxplot()`` is not equivalent
to ``df.boxplot(by="g")``. See :ref:`here<visualization.box.return>` for
an explanation.

Examples
--------

Expand Down
7 changes: 7 additions & 0 deletions doc/source/release.rst
Original file line number Diff line number Diff line change
Expand Up @@ -208,6 +208,9 @@ API Changes
returns a different Index (:issue:`7088`). Previously the index was unintentionally sorted.
- arithmetic operations with **only** ``bool`` dtypes now raise an error
(:issue:`7011`, :issue:`6762`, :issue:`7015`)
- :meth:`DataFrame.boxplot` has a new keyword argument, `return_type`. It accepts ``'dict'``,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should go in the plotting sub-section (you can move it when you merge) or I can do after

``'axes'``, or ``'both'``, in which case a namedtuple with the matplotlib
axes and a dict of matplotlib Lines is returned.

Deprecations
~~~~~~~~~~~~
Expand Down Expand Up @@ -258,6 +261,10 @@ Deprecations
Use the `percentiles` keyword instead, which takes a list of percentiles to display. The
default output is unchanged.

- The default return type of :func:`boxplot` will change from a dict to a matpltolib Axes
in a future release. You can use the future behavior now by passing ``return_type='axes'``
to boxplot.

Prior Version Deprecations/Changes
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Expand Down
8 changes: 8 additions & 0 deletions doc/source/v0.14.0.txt
Original file line number Diff line number Diff line change
Expand Up @@ -210,6 +210,10 @@ API changes
# this now raises for arith ops like ``+``, ``*``, etc.
NotImplementedError: operator '*' not implemented for bool dtypes

- :meth:`DataFrame.boxplot` has a new keyword argument, `return_type`. It accepts ``'dict'``,
``'axes'``, or ``'both'``, in which case a namedtuple with the matplotlib
axes and a dict of matplotlib Lines is returned.


.. _whatsnew_0140.display:

Expand Down Expand Up @@ -554,6 +558,10 @@ Deprecations
Use the `percentiles` keyword instead, which takes a list of percentiles to display. The
default output is unchanged.

- The default return type of :func:`boxplot` will change from a dict to a matpltolib Axes
in a future release. You can use the future behavior now by passing ``return_type='axes'``
to boxplot.

.. _whatsnew_0140.enhancements:

Enhancements
Expand Down
37 changes: 37 additions & 0 deletions doc/source/visualization.rst
Original file line number Diff line number Diff line change
Expand Up @@ -304,6 +304,43 @@ columns:

plt.close('all')

.. _visualization.box.return:

The return type of ``boxplot`` depends on two keyword arguments: ``by`` and ``return_type``.
When ``by`` is ``None``:

* if ``return_type`` is ``'dict'``, a dictionary containing the :class:`matplotlib Lines <matplotlib.lines.Line2D>` is returned. The keys are "boxes", "caps", "fliers", "medians", and "whiskers".
This is the deafult.
* if ``return_type`` is ``'axes'``, a :class:`matplotlib Axes <matplotlib.axes.Axes>` containing the boxplot is returned.
* if ``return_type`` is ``'both'`` a namedtuple containging the :class:`matplotlib Axes <matplotlib.axes.Axes>`
and :class:`matplotlib Lines <matplotlib.lines.Line2D>` is returned

When ``by`` is some column of the DataFrame, a dict of ``return_type`` is returned, where
the keys are the columns of the DataFrame. The plot has a facet for each column of
the DataFrame, with a separate box for each value of ``by``.

Finally, when calling boxplot on a :class:`Groupby` object, a dict of ``return_type``
is returned, where the keys are the same as the Groupby object. The plot has a
facet for each key, with each facet containing a box for each column of the
DataFrame.

.. ipython:: python

np.random.seed(1234)
df_box = DataFrame(np.random.randn(50, 2))
df_box['g'] = np.random.choice(['A', 'B'], size=50)
df_box.loc[df_box['g'] == 'B', 1] += 3

@savefig boxplot_groupby.png
bp = df_box.boxplot(by='g')

Compare to:

.. ipython:: python

@savefig groupby_boxplot_vis.png
bp = df_box.groupby('g').boxplot()

.. _visualization.area_plot:

Area Plot
Expand Down
31 changes: 7 additions & 24 deletions pandas/core/frame.py
Original file line number Diff line number Diff line change
Expand Up @@ -4856,36 +4856,19 @@ def _put_str(s, space):
DataFrame.hist = gfx.hist_frame


@Appender(_shared_docs['boxplot'] % _shared_doc_kwargs)
def boxplot(self, column=None, by=None, ax=None, fontsize=None,
rot=0, grid=True, **kwds):
"""
Make a box plot from DataFrame column/columns optionally grouped
(stratified) by one or more columns

Parameters
----------
data : DataFrame
column : column names or list of names, or vector
Can be any valid input to groupby
by : string or sequence
Column in the DataFrame to group by
ax : matplotlib axis object, default None
fontsize : int or string
rot : int, default None
Rotation for ticks
grid : boolean, default None (matlab style default)
Axis grid lines

Returns
-------
ax : matplotlib.axes.AxesSubplot
"""
rot=0, grid=True, figsize=None, layout=None, return_type=None,
**kwds):
import pandas.tools.plotting as plots
import matplotlib.pyplot as plt
ax = plots.boxplot(self, column=column, by=by, ax=ax,
fontsize=fontsize, grid=grid, rot=rot, **kwds)
fontsize=fontsize, grid=grid, rot=rot,
figsize=figsize, layout=layout, return_type=return_type,
**kwds)
plt.draw_if_interactive()
return ax

DataFrame.boxplot = boxplot

ops.add_flex_arithmetic_methods(DataFrame, **ops.frame_flex_funcs)
Expand Down
Loading