Skip to content

Commit 2d43ac7

Browse files
Merge pull request #3330 from plotly/distfuncs2
PX ECDF
2 parents ed48215 + 787b16d commit 2d43ac7

14 files changed

+511
-64
lines changed

CHANGELOG.md

+1
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,7 @@ This project adheres to [Semantic Versioning](http://semver.org/).
1111
- `px.scatter` and `px.density_contours` now support new `trendline` types `'rolling'`, `'expanding'` and `'ewm'` [#2997](https://github.com/plotly/plotly.py/pull/2997)
1212
- `px.scatter` and `px.density_contours` now support new `trendline_options` argument to parameterize trendlines, with support for constant control and log-scaling in `'ols'` and specification of the fraction used for `'lowess'`, as well as pass-through to Pandas for `'rolling'`, `'expanding'` and `'ewm'` [#2997](https://github.com/plotly/plotly.py/pull/2997)
1313
- `px.scatter` and `px.density_contours` now support new `trendline_scope` argument that accepts the value `'overall'` to request a single trendline for all traces, including across facets and animation frames [#2997](https://github.com/plotly/plotly.py/pull/2997)
14+
- A new `px.ecdf()` function for Empirical Cumulative Distribution Functions [#3330](https://github.com/plotly/plotly.py/pull/3330)
1415

1516
### Fixed
1617
- Fixed regression introduced in version 5.0.0 where pandas/numpy arrays with `dtype` of Object were being converted to `list` values when added to a Figure ([#3292](https://github.com/plotly/plotly.py/issues/3292), [#3293](https://github.com/plotly/plotly.py/pull/3293))

doc/python/box-plots.md

+8-3
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ jupyter:
66
extension: .md
77
format_name: markdown
88
format_version: '1.2'
9-
jupytext_version: 1.6.0
9+
jupytext_version: 1.4.2
1010
kernelspec:
1111
display_name: Python 3
1212
language: python
@@ -20,7 +20,7 @@ jupyter:
2020
name: python
2121
nbconvert_exporter: python
2222
pygments_lexer: ipython3
23-
version: 3.7.6
23+
version: 3.7.7
2424
plotly:
2525
description: How to make Box Plots in Python with Plotly.
2626
display_as: statistical
@@ -36,13 +36,18 @@ jupyter:
3636
thumbnail: thumbnail/box.jpg
3737
---
3838

39-
A [box plot](https://en.wikipedia.org/wiki/Box_plot) is a statistical representation of numerical data through their quartiles. The ends of the box represent the lower and upper quartiles, while the median (second quartile) is marked by a line inside the box. For other statistical representations of numerical data, see [other statistical charts](https://plotly.com/python/statistical-charts/).
39+
<!-- #region -->
40+
A [box plot](https://en.wikipedia.org/wiki/Box_plot) is a statistical representation of the distribution of a variable through its quartiles. The ends of the box represent the lower and upper quartiles, while the median (second quartile) is marked by a line inside the box. For other statistical representations of numerical data, see [other statistical charts](https://plotly.com/python/statistical-charts/).
41+
42+
43+
Alternatives to box plots for visualizing distributions include [histograms](https://plotly.com/python/histograms/), [violin plots](https://plotly.com/python/violin/), [ECDF plots](https://plotly.com/python/ecdf-plots/) and [strip charts](https://plotly.com/python/strip-charts/).
4044

4145
## Box Plot with `plotly.express`
4246

4347
[Plotly Express](/python/plotly-express/) is the easy-to-use, high-level interface to Plotly, which [operates on a variety of types of data](/python/px-arguments/) and produces [easy-to-style figures](/python/styling-plotly-express/).
4448

4549
In a box plot created by `px.box`, the distribution of the column given as `y` argument is represented.
50+
<!-- #endregion -->
4651

4752
```python
4853
import plotly.express as px

doc/python/ecdf-plots.md

+182
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,182 @@
1+
---
2+
jupyter:
3+
jupytext:
4+
notebook_metadata_filter: all
5+
text_representation:
6+
extension: .md
7+
format_name: markdown
8+
format_version: '1.2'
9+
jupytext_version: 1.4.2
10+
kernelspec:
11+
display_name: Python 3
12+
language: python
13+
name: python3
14+
language_info:
15+
codemirror_mode:
16+
name: ipython
17+
version: 3
18+
file_extension: .py
19+
mimetype: text/x-python
20+
name: python
21+
nbconvert_exporter: python
22+
pygments_lexer: ipython3
23+
version: 3.7.7
24+
plotly:
25+
description: How to add empirical cumulative distribution function (ECDF) plots.
26+
display_as: statistical
27+
language: python
28+
layout: base
29+
name: Empirical Cumulative Distribution Plots
30+
order: 16
31+
page_type: u-guide
32+
permalink: python/ecdf-plots/
33+
thumbnail: thumbnail/figure-labels.png
34+
---
35+
36+
### Overview
37+
38+
[Empirical cumulative distribution function plots](https://en.wikipedia.org/wiki/Empirical_distribution_function) are a way to visualize the distribution of a variable, and Plotly Express has a built-in function, `px.ecdf()` to generate such plots. [Plotly Express](/python/plotly-express/) is the easy-to-use, high-level interface to Plotly, which [operates on a variety of types of data](/python/px-arguments/) and produces [easy-to-style figures](/python/styling-plotly-express/).
39+
40+
Alternatives to ECDF plots for visualizing distributions include [histograms](https://plotly.com/python/histograms/), [violin plots](https://plotly.com/python/violin/), [box plots](https://plotly.com/python/box-plots/) and [strip charts](https://plotly.com/python/strip-charts/).
41+
42+
### Simple ECDF Plots
43+
44+
Providing a single column to the `x` variable yields a basic ECDF plot.
45+
46+
```python
47+
import plotly.express as px
48+
df = px.data.tips()
49+
fig = px.ecdf(df, x="total_bill")
50+
fig.show()
51+
```
52+
53+
Providing multiple columns leverage's Plotly Express' [wide-form data support](https://plotly.com/python/wide-form/) to show multiple variables on the same plot.
54+
55+
```python
56+
import plotly.express as px
57+
df = px.data.tips()
58+
fig = px.ecdf(df, x=["total_bill", "tip"])
59+
fig.show()
60+
```
61+
62+
It is also possible to map another variable to the color dimension of a plot.
63+
64+
```python
65+
import plotly.express as px
66+
df = px.data.tips()
67+
fig = px.ecdf(df, x="total_bill", color="sex")
68+
fig.show()
69+
```
70+
71+
### Configuring the Y axis
72+
73+
By default, the Y axis shows probability, but it is also possible to show raw counts by setting the `ecdfnorm` argument to `None` or to show percentages by setting it to `percent`.
74+
75+
```python
76+
import plotly.express as px
77+
df = px.data.tips()
78+
fig = px.ecdf(df, x="total_bill", color="sex", ecdfnorm=None)
79+
fig.show()
80+
```
81+
82+
If a `y` value is provided, the Y axis is set to the sum of `y` rather than counts.
83+
84+
```python
85+
import plotly.express as px
86+
df = px.data.tips()
87+
fig = px.ecdf(df, x="total_bill", y="tip", color="sex", ecdfnorm=None)
88+
fig.show()
89+
```
90+
91+
### Reversed and Complementary CDF plots
92+
93+
By default, the Y value represents the fraction of the data that is *at or below* the value on on the X axis. Setting `ecdfmode` to `"reversed"` reverses this, with the Y axis representing the fraction of the data *at or above* the X value. Setting `ecdfmode` to `"complementary"` plots `1-ECDF`, meaning that the Y values represent the fraction of the data *above* the X value.
94+
95+
In `standard` mode (the default), the right-most point is at 1 (or the total count/sum, depending on `ecdfnorm`) and the right-most point is above 0.
96+
97+
```python
98+
import plotly.express as px
99+
fig = px.ecdf(df, x=[1,2,3,4], markers=True, ecdfmode="standard",
100+
title="ecdfmode='standard' (Y=fraction at or below X value, this the default)")
101+
fig.show()
102+
```
103+
104+
In `reversed` mode, the right-most point is at 1 (or the total count/sum, depending on `ecdfnorm`) and the left-most point is above 0.
105+
106+
```python
107+
import plotly.express as px
108+
fig = px.ecdf(df, x=[1,2,3,4], markers=True, ecdfmode="reversed",
109+
title="ecdfmode='reversed' (Y=fraction at or above X value)")
110+
fig.show()
111+
```
112+
113+
In `complementary` mode, the right-most point is at 0 and no points are at 1 (or the total count/sum) per the definition of the CCDF as 1-ECDF, which has no point at 0.
114+
115+
```python
116+
import plotly.express as px
117+
fig = px.ecdf(df, x=[1,2,3,4], markers=True, ecdfmode="complementary",
118+
title="ecdfmode='complementary' (Y=fraction above X value)")
119+
fig.show()
120+
```
121+
122+
### Orientation
123+
124+
By default, plots are oriented vertically (i.e. the variable is on the X axis and counted/summed upwards), but this can be overridden with the `orientation` argument.
125+
126+
```python
127+
import plotly.express as px
128+
df = px.data.tips()
129+
fig = px.ecdf(df, x="total_bill", y="tip", color="sex", ecdfnorm=None, orientation="h")
130+
fig.show()
131+
```
132+
133+
### Markers and/or Lines
134+
135+
ECDF Plots can be configured to show lines and/or markers.
136+
137+
```python
138+
import plotly.express as px
139+
df = px.data.tips()
140+
fig = px.ecdf(df, x="total_bill", color="sex", markers=True)
141+
fig.show()
142+
```
143+
144+
```python
145+
import plotly.express as px
146+
df = px.data.tips()
147+
fig = px.ecdf(df, x="total_bill", color="sex", markers=True, lines=False)
148+
fig.show()
149+
```
150+
151+
### Marginal Plots
152+
153+
ECDF plots also support [marginal plots](https://plotly.com/python/marginal-plots/)
154+
155+
```python
156+
import plotly.express as px
157+
df = px.data.tips()
158+
fig = px.ecdf(df, x="total_bill", color="sex", markers=True, lines=False, marginal="histogram")
159+
fig.show()
160+
```
161+
162+
```python
163+
import plotly.express as px
164+
df = px.data.tips()
165+
fig = px.ecdf(df, x="total_bill", color="sex", marginal="rug")
166+
fig.show()
167+
```
168+
169+
### Facets
170+
171+
ECDF Plots also support [faceting](https://plotly.com/python/facet-plots/)
172+
173+
```python
174+
import plotly.express as px
175+
df = px.data.tips()
176+
fig = px.ecdf(df, x="total_bill", color="sex", facet_row="time", facet_col="day")
177+
fig.show()
178+
```
179+
180+
```python
181+
182+
```

doc/python/graph-objects.md

+20-4
Original file line numberDiff line numberDiff line change
@@ -56,13 +56,19 @@ Graph objects have several benefits compared to plain Python dictionaries:
5656
5. Graph object constructors and update methods accept "magic underscores" (e.g. `go.Figure(layout_title_text="The Title")` rather than `dict(layout=dict(title=dict(text="The Title")))`) for more compact code.
5757
6. Graph objects support attached rendering (`.show()`) and exporting functions (`.write_image()`) that automatically invoke the appropriate functions from [the `plotly.io` module](https://plotly.com/python-api-reference/plotly.io.html).
5858

59-
### When to use Graph Objects Directly
59+
### When to use Graph Objects vs Plotly Express
6060

61-
The recommended way to create figures is using the [functions in the plotly.express module](https://plotly.com/python-api-reference/), [collectively known as Plotly Express](/python/plotly-express/), which all return instances of `plotly.graph_objects.Figure`, so every figure produced with the plotly library, actually uses graph objects under the hood, unless manually constructed out of dictionaries.
61+
The recommended way to create figures is using the [functions in the plotly.express module](https://plotly.com/python-api-reference/), [collectively known as Plotly Express](/python/plotly-express/), which all return instances of `plotly.graph_objects.Figure`, so every figure produced with the `plotly` library actually uses graph objects under the hood, unless manually constructed out of dictionaries.
6262

6363
That said, certain kinds of figures are not yet possible to create with Plotly Express, such as figures that use certain 3D trace-types like [`mesh`](/python/3d-mesh/) or [`isosurface`](/python/3d-isosurface-plots/). In addition, certain figures are cumbersome to create by starting from a figure created with Plotly Express, for example figures with [subplots of different types](/python/mixed-subplots/), [dual-axis plots](/python/multiple-axes/), or [faceted plots](/python/facet-plots/) with multiple different types of traces. To construct such figures, it can be easier to start from an empty `plotly.graph_objects.Figure` object (or one configured with subplots via the [make_subplots() function](/python/subplots/)) and progressively add traces and update attributes as above. Every `plotly` documentation page lists the Plotly Express option at the top if a Plotly Express function exists to make the kind of chart in question, and then the graph objects version below.
6464

65-
Note that the figures produced by Plotly Express **in a single function-call** are [easy to customize at creation-time](/python/styling-plotly-express/), and to [manipulate after creation](/python/creating-and-updating-figures/) using the `update_*` and `add_*` methods. The figures produced by Plotly Express can always be built from the ground up using graph objects, but this approach typically takes **5-100 lines of code rather than 1**. Here is a simple example of how to produce the same figure object from the same data, once with Plotly Express and once without. The data in this example is in "long form" but [Plotly Express also accepts data in "wide form"](/python/wide-form/) and the line-count savings from Plotly Express over graph objects are comparable. More complex figures such as [sunbursts](/python/sunburst-charts/), [parallel coordinates](/python/parallel-coordinates-plot/), [facet plots](/python/facet-plots/) or [animations](/python/animations/) require many more lines of figure-specific graph objects code, whereas switching from one representation to another with Plotly Express usually involves changing just a few characters.
65+
Note that the figures produced by Plotly Express **in a single function-call** are [easy to customize at creation-time](/python/styling-plotly-express/), and to [manipulate after creation](/python/creating-and-updating-figures/) using the `update_*` and `add_*` methods.
66+
67+
### Comparing Graph Objects and Plotly Express
68+
69+
The figures produced by Plotly Express can always be built from the ground up using graph objects, but this approach typically takes **5-100 lines of code rather than 1**.
70+
71+
Here is a simple example of how to produce the same figure object from the same data, once with Plotly Express and once without. The data in this example is in "long form" but [Plotly Express also accepts data in "wide form"](/python/wide-form/) and the line-count savings from Plotly Express over graph objects are comparable. More complex figures such as [sunbursts](/python/sunburst-charts/), [parallel coordinates](/python/parallel-coordinates-plot/), [facet plots](/python/facet-plots/) or [animations](/python/animations/) require many more lines of figure-specific graph objects code, whereas switching from one representation to another with Plotly Express usually involves changing just a few characters.
6672

6773
```python
6874
import pandas as pd
@@ -73,11 +79,17 @@ df = pd.DataFrame({
7379
"Number Eaten": [2, 1, 3, 1, 3, 2],
7480
})
7581

82+
83+
# Plotly Express
84+
7685
import plotly.express as px
7786

7887
fig = px.bar(df, x="Fruit", y="Number Eaten", color="Contestant", barmode="group")
7988
fig.show()
8089

90+
91+
# Graph Objects
92+
8193
import plotly.graph_objects as go
8294

8395
fig = go.Figure()
@@ -88,4 +100,8 @@ fig.update_layout(legend_title_text = "Contestant")
88100
fig.update_xaxes(title_text="Fruit")
89101
fig.update_yaxes(title_text="Number Eaten")
90102
fig.show()
91-
```
103+
```
104+
105+
```python
106+
107+
```

doc/python/histograms.md

+9-4
Original file line numberDiff line numberDiff line change
@@ -36,14 +36,19 @@ jupyter:
3636
thumbnail: thumbnail/histogram.jpg
3737
---
3838

39-
In statistics, a [histogram](https://en.wikipedia.org/wiki/Histogram) is representation of the distribution of numerical data, where the data are binned and the count for each bin is represented. More generally, in plotly a histogram is an aggregated bar chart, with several possible aggregation functions (e.g. sum, average, count...).
39+
<!-- #region -->
40+
In statistics, a [histogram](https://en.wikipedia.org/wiki/Histogram) is representation of the distribution of numerical data, where the data are binned and the count for each bin is represented. More generally, in Plotly a histogram is an aggregated bar chart, with several possible aggregation functions (e.g. sum, average, count...) which can be used to visualize data on categorical and date axes as well as linear axes.
4041

41-
If you're looking instead for bar charts, i.e. representing *raw, unaggregated* data with rectangular
42+
43+
Alternatives to violin plots for visualizing distributions include [violin plots](https://plotly.com/python/violin/), [box plots](https://plotly.com/python/box-plots/), [ECDF plots](https://plotly.com/python/ecdf-plots/) and [strip charts](https://plotly.com/python/strip-charts/).
44+
45+
> If you're looking instead for bar charts, i.e. representing *raw, unaggregated* data with rectangular
4246
bar, go to the [Bar Chart tutorial](/python/bar-charts/).
4347

4448
## Histograms with Plotly Express
4549

4650
[Plotly Express](/python/plotly-express/) is the easy-to-use, high-level interface to Plotly, which [operates on a variety of types of data](/python/px-arguments/) and produces [easy-to-style figures](/python/styling-plotly-express/).
51+
<!-- #endregion -->
4752

4853
```python
4954
import plotly.express as px
@@ -160,7 +165,7 @@ fig = px.histogram(df, x="total_bill", color="sex")
160165
fig.show()
161166
```
162167

163-
#### Using histfunc
168+
#### Aggregating with other functions than `count`
164169

165170
For each bin of `x`, one can compute a function of data using `histfunc`. The argument of `histfunc` is the dataframe column given as the `y` argument. Below the plot shows that the average tip increases with the total bill.
166171

@@ -193,7 +198,7 @@ fig.show()
193198

194199
#### Visualizing the distribution
195200

196-
With the `marginal` keyword, a subplot is drawn alongside the histogram, visualizing the distribution. See [the distplot page](https://plotly.com/python/distplot/)for more examples of combined statistical representations.
201+
With the `marginal` keyword, a [marginal](https://plotly.com/python/marginal-plots/) is drawn alongside the histogram, visualizing the distribution. See [the distplot page](https://plotly.com/python/distplot/) for more examples of combined statistical representations.
197202

198203
```python
199204
import plotly.express as px

doc/python/line-and-scatter.md

+56
Original file line numberDiff line numberDiff line change
@@ -67,6 +67,15 @@ fig = px.scatter(df, x="sepal_width", y="sepal_length", color="species",
6767
fig.show()
6868
```
6969

70+
Color can be [continuous](https://plotly.com/python/colorscales/) as follows, or [discrete/categorical](https://plotly.com/python/discrete-color/) as above.
71+
72+
```python
73+
import plotly.express as px
74+
df = px.data.iris()
75+
fig = px.scatter(df, x="sepal_width", y="sepal_length", color='petal_length')
76+
fig.show()
77+
```
78+
7079
The `symbol` argument can be mapped to a column as well. A [wide variety of symbols](https://plotly.com/python/marker-style/) are available.
7180

7281
```python
@@ -104,6 +113,53 @@ fig.update_traces(marker_size=10)
104113
fig.show()
105114
```
106115

116+
### Error Bars
117+
118+
Scatter plots support [error bars](https://plotly.com/python/error-bars/).
119+
120+
```python
121+
import plotly.express as px
122+
df = px.data.iris()
123+
df["e"] = df["sepal_width"]/100
124+
fig = px.scatter(df, x="sepal_width", y="sepal_length", color="species",
125+
error_x="e", error_y="e")
126+
fig.show()
127+
```
128+
129+
### Marginal Distribution Plots
130+
131+
Scatter plots support [marginal distribution plots](https://plotly.com/python/marginal-plots/)
132+
133+
```python
134+
import plotly.express as px
135+
df = px.data.iris()
136+
fig = px.scatter(df, x="sepal_length", y="sepal_width", marginal_x="histogram", marginal_y="rug")
137+
fig.show()
138+
```
139+
140+
### Facetting
141+
142+
Scatter plots support [faceting](https://plotly.com/python/facet-plots/).
143+
144+
```python
145+
import plotly.express as px
146+
df = px.data.tips()
147+
fig = px.scatter(df, x="total_bill", y="tip", color="smoker", facet_col="sex", facet_row="time")
148+
fig.show()
149+
```
150+
151+
### Linear Regression and Other Trendlines
152+
153+
Scatter plots support [linear and non-linear trendlines](https://plotly.com/python/linear-fits/).
154+
155+
```python
156+
import plotly.express as px
157+
158+
df = px.data.tips()
159+
fig = px.scatter(df, x="total_bill", y="tip", trendline="ols")
160+
fig.show()
161+
```
162+
107163
## Line plots with Plotly Express
108164

109165
```python

0 commit comments

Comments
 (0)