Skip to content

Commit 6813682

Browse files
Svetlana Karslioglupytorchmergebot
authored andcommitted
Add Design Philosophy (pytorch#79248)
Pull Request resolved: pytorch#79248 Approved by: https://github.com/albanD
1 parent 24050a5 commit 6813682

File tree

1 file changed

+157
-0
lines changed

1 file changed

+157
-0
lines changed

docs/source/community/design.rst

Lines changed: 157 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,157 @@
1+
PyTorch Design Philosophy
2+
=========================
3+
4+
This document is designed to help contributors and module maintainers
5+
understand the high-level design principles that have developed over
6+
time in PyTorch. These are not meant to be hard-and-fast rules, but to
7+
serve as a guide to help trade off different concerns and to resolve
8+
disagreements that may come up while developing PyTorch. For more
9+
information on contributing, module ownership, and how to escalate a
10+
disagreement to the Core Maintainers, please see `PyTorch
11+
Governance <https://pytorch.org/docs/master/community/governance.html>`__.
12+
13+
Design Principles
14+
-----------------
15+
16+
Principle 1: Usability over Performance
17+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
18+
19+
This principle may be surprising! As one Hacker News poster wrote:
20+
*PyTorch is amazing! [...] Although I’m confused. How can a ML framework be
21+
not obsessed with speed/performance?* See `Hacker News discussion on
22+
PyTorch <https://news.ycombinator.com/item?id=28066093>`__.
23+
24+
Soumith’s blog post on `Growing the PyTorch
25+
Community <https://soumith.ch/posts/2021/02/growing-opensource/?fbclid=IwAR1bvN_xZ8avGvu14ODJzS8Zp7jX1BOyfuGUf-zoRawpyL-s95Vjxf88W7s>`__
26+
goes into this in some depth, but at a high-level:
27+
28+
- PyTorch’s primary goal is usability
29+
- A secondary goal is to have *reasonable* performance
30+
31+
We believe the ability to maintain our flexibility to support
32+
researchers who are building on top of our abstractions remains
33+
critical. We can’t see what the future of what workloads will be, but we
34+
know we want them to be built first on PyTorch and that requires
35+
flexibility.
36+
37+
In more concrete terms, we operate in a *usability-first* manner and try
38+
to avoid jumping to *restriction-first* regimes (for example, static shapes,
39+
graph-mode only) without a clear-eyed view of the tradeoffs. Often there
40+
is a temptation to impose strict user restrictions upfront because it
41+
can simplify implementation, but this comes with risks:
42+
43+
- The performance may not be worth the user friction, either because
44+
the performance benefit is not compelling enough or it only applies to
45+
a relatively narrow set of subproblems.
46+
- Even if the performance benefit is compelling, the restrictions can
47+
fragment the ecosystem into different sets of limitations that can
48+
quickly become incomprehensible to users.
49+
50+
We want users to be able to seamlessly move their PyTorch code to
51+
different hardware and software platforms, to interoperate with
52+
different libraries and frameworks, and to experience the full richness
53+
of the PyTorch user experience, not a least common denominator subset.
54+
55+
Principle 2: Simple Over Easy
56+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
57+
58+
Here, we borrow from `The Zen of
59+
Python <https://peps.python.org/pep-0020/>`__:
60+
61+
- *Explicit is better than implicit*
62+
- *Simple is better than complex*
63+
64+
A more concise way of describing these two goals is `Simple Over
65+
Easy <https://www.infoq.com/presentations/Simple-Made-Easy/>`_. Let’s start with an example because *simple* and *easy* are
66+
often used interchangeably in everyday English. Consider how one may
67+
model `devices <https://pytorch.org/docs/master/tensor_attributes.html#torch.device>`__
68+
in PyTorch:
69+
70+
- **Simple / Explicit (to understand, debug):** every tensor is associated
71+
with a device. The user explicitly specifies tensor device movement.
72+
Operations that require cross-device movement result in an error.
73+
- **Easy / Implicit (to use):** the user does not have to worry about
74+
devices; the system figures out the globally optimal device
75+
placement.
76+
77+
In this specific case, and as a general design philosophy, PyTorch
78+
favors exposing simple and explicit building blocks rather than APIs
79+
that are easy-to-use by practitioners. The simple version is immediately
80+
understandable and debuggable by a new PyTorch user: you get a clear
81+
error if you call an operator requiring cross-device movement at the
82+
point in the program where the operator is actually invoked. The easy
83+
solution may let a new user move faster initially, but debugging such a
84+
system can be complex: How did the system make its determination? What
85+
is the API for plugging into such a system and how are objects
86+
represented in its IR?
87+
88+
Some classic arguments in favor of this sort of design come from `A
89+
Note on Distributed
90+
Computation <https://dl.acm.org/doi/book/10.5555/974938>`__ (TLDR: Do not
91+
model resources with very different performance characteristics
92+
uniformly, the details will leak) and the `End-to-End
93+
Principle <http://web.mit.edu/Saltzer/www/publications/endtoend/endtoend.pdf>`__
94+
(TLDR: building smarts into the lower-layers of the stack can prevent
95+
building performant features at higher layers in the stack, and often
96+
doesn’t work anyway). For example, we could build operator-level or
97+
global device movement rules, but the precise choices aren’t obvious and
98+
building an extensible mechanism has unavoidable complexity and latency
99+
costs.
100+
101+
A caveat here is that this does not mean that higher-level “easy” APIs
102+
are not valuable; certainly there is a value in, for example,
103+
higher-levels in the stack to support efficient tensor computations
104+
across heterogeneous compute in a large cluster. Instead, what we mean
105+
is that focusing on simple lower-level building blocks helps inform the
106+
easy API while still maintaining a good experience when users need to
107+
leave the beaten path. It also allows space for innovation and the
108+
growth of more opinionated tools at a rate we cannot support in the
109+
PyTorch core library, but ultimately benefit from, as evidenced by
110+
our `rich ecosystem <https://pytorch.org/ecosystem/>`__. In other
111+
words, not automating at the start allows us to potentially reach levels
112+
of good automation faster.
113+
114+
Principle 3: Python First with Best In Class Language Interoperability
115+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
116+
117+
This principle began as **Python First**:
118+
119+
PyTorch is not a Python binding into a monolithic C++ framework.
120+
It is built to be deeply integrated into Python. You can use it
121+
naturally like you would use `NumPy <https://www.numpy.org/>`__,
122+
`SciPy <https://www.scipy.org/>`__, `scikit-learn <(https://scikit-learn.org/>`__,
123+
or other Python libraries. You can write your new neural network
124+
layers in Python itself, using your favorite libraries and use
125+
packages such as `Cython <https://cython.org/>`__ and
126+
`Numba <http://numba.pydata.org/>`__. Our goal is to not reinvent
127+
the wheel where appropriate.
128+
129+
One thing PyTorch has needed to deal with over the years is Python
130+
overhead: we first rewrote the `autograd` engine in C++, then the majority
131+
of operator definitions, then developed TorchScript and the C++
132+
frontend.
133+
134+
Still, working in Python provides easily the best experience for our
135+
users: it is flexible, familiar, and perhaps most importantly, has a
136+
huge ecosystem of scientific computing libraries and extensions
137+
available for use. This fact motivates a few of our most recent
138+
investments, which attempt to hit a Pareto optimal point close to the
139+
Python usability end of the curve:
140+
141+
- `TorchDynamo <https://dev-discuss.pytorch.org/t/torchdynamo-an-experiment-in-dynamic-python-bytecode-transformation/361>`__,
142+
a Python frame evaluation tool capable of speeding up existing
143+
eager-mode PyTorch programs with minimal user intervention.
144+
- `torch_function <https://pytorch.org/docs/master/notes/extending.html#extending-torch>`__
145+
and `torch_dispatch <https://dev-discuss.pytorch.org/t/what-and-why-is-torch-dispatch/557>`__
146+
extension points, which have enabled Python-first functionality to be
147+
built on-top of C++ internals, such as the `torch.fx
148+
tracer <https://pytorch.org/docs/stable/fx.html>`__
149+
and `functorch <https://github.com/pytorch/functorch>`__
150+
respectively.
151+
152+
These design principles are not hard-and-fast rules, but hard won
153+
choices and anchor how we built PyTorch to be the debuggable, hackable
154+
and flexible framework it is today. As we have more contributors and
155+
maintainers, we look forward to applying these core principles with you
156+
across our libraries and ecosystem. We are also open to evolving them as
157+
we learn new things and the AI space evolves, as we know it will.

0 commit comments

Comments
 (0)