Add Design Philosophy (pytorch#79248)

Svetlana Karslioglu · pytorchmergebot · commit 68136828e02b · 2022-06-10T21:21:05.000Z
Pull Request resolved: pytorch#79248 Approved by: https://github.com/albanD
diff --git a/docs/source/community/design.rst b/docs/source/community/design.rst
@@ -0,0 +1,157 @@
+PyTorch Design Philosophy
+=========================
+
+This document is designed to help contributors and module maintainers
+understand the high-level design principles that have developed over
+time in PyTorch. These are not meant to be hard-and-fast rules, but to
+serve as a guide to help trade off different concerns and to resolve
+disagreements that may come up while developing PyTorch. For more
+information on contributing, module ownership, and how to escalate a
+disagreement to the Core Maintainers, please see `PyTorch
+Governance <https://pytorch.org/docs/master/community/governance.html>`__.
+
+Design Principles
+-----------------
+
+Principle 1: Usability over Performance
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+This principle may be surprising! As one Hacker News poster wrote:
+*PyTorch is amazing! [...] Although I’m confused. How can a ML framework be
+not obsessed with speed/performance?* See `Hacker News discussion on
+PyTorch <https://news.ycombinator.com/item?id=28066093>`__.
+
+Soumith’s blog post on `Growing the PyTorch
+Community <https://soumith.ch/posts/2021/02/growing-opensource/?fbclid=IwAR1bvN_xZ8avGvu14ODJzS8Zp7jX1BOyfuGUf-zoRawpyL-s95Vjxf88W7s>`__
+goes into this in some depth, but at a high-level:
+
+-  PyTorch’s primary goal is usability
+-  A secondary goal is to have *reasonable* performance
+
+We believe the ability to maintain our flexibility to support
+researchers who are building on top of our abstractions remains
+critical. We can’t see what the future of what workloads will be, but we
+know we want them to be built first on PyTorch and that requires
+flexibility.
+
+In more concrete terms, we operate in a *usability-first* manner and try
+to avoid jumping to *restriction-first* regimes (for example, static shapes,
+graph-mode only) without a clear-eyed view of the tradeoffs. Often there
+is a temptation to impose strict user restrictions upfront because it
+can simplify implementation, but this comes with risks:
+
+-  The performance may not be worth the user friction, either because
+   the performance benefit is not compelling enough or it only applies to
+   a relatively narrow set of subproblems.
+-  Even if the performance benefit is compelling, the restrictions can
+   fragment the ecosystem into different sets of limitations that can
+   quickly become incomprehensible to users.
+
+We want users to be able to seamlessly move their PyTorch code to
+different hardware and software platforms, to interoperate with
+different libraries and frameworks, and to experience the full richness
+of the PyTorch user experience, not a least common denominator subset.
+
+Principle 2: Simple Over Easy
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Here, we borrow from `The Zen of
+Python <https://peps.python.org/pep-0020/>`__:
+
+-  *Explicit is better than implicit*
+-  *Simple is better than complex*
+
+A more concise way of describing these two goals is `Simple Over
+Easy <https://www.infoq.com/presentations/Simple-Made-Easy/>`_. Let’s start with an example because *simple* and *easy* are
+often used interchangeably in everyday English. Consider how one may
+model `devices <https://pytorch.org/docs/master/tensor_attributes.html#torch.device>`__
+in PyTorch:
+
+-  **Simple / Explicit (to understand, debug):** every tensor is associated
+   with a device. The user explicitly specifies tensor device movement.
+   Operations that require cross-device movement result in an error.
+-  **Easy / Implicit (to use):** the user does not have to worry about
+   devices; the system figures out the globally optimal device
+   placement.
+
+In this specific case, and as a general design philosophy, PyTorch
+favors exposing simple and explicit building blocks rather than APIs
+that are easy-to-use by practitioners. The simple version is immediately
+understandable and debuggable by a new PyTorch user: you get a clear
+error if you call an operator requiring cross-device movement at the
+point in the program where the operator is actually invoked. The easy
+solution may let a new user move faster initially, but debugging such a
+system can be complex: How did the system make its determination? What
+is the API for plugging into such a system and how are objects
+represented in its IR?
+
+Some classic arguments in favor of this sort of design come from `A
+Note on Distributed
+Computation <https://dl.acm.org/doi/book/10.5555/974938>`__ (TLDR: Do not
+model resources with very different performance characteristics
+uniformly, the details will leak) and the `End-to-End
+Principle <http://web.mit.edu/Saltzer/www/publications/endtoend/endtoend.pdf>`__
+(TLDR: building smarts into the lower-layers of the stack can prevent
+building performant features at higher layers in the stack, and often
+doesn’t work anyway). For example, we could build operator-level or
+global device movement rules, but the precise choices aren’t obvious and
+building an extensible mechanism has unavoidable complexity and latency
+costs.
+
+A caveat here is that this does not mean that higher-level “easy” APIs
+are not valuable; certainly there is a value in, for example,
+higher-levels in the stack to support efficient tensor computations
+across heterogeneous compute in a large cluster. Instead, what we mean
+is that focusing on simple lower-level building blocks helps inform the
+easy API while still maintaining a good experience when users need to
+leave the beaten path. It also allows space for innovation and the
+growth of more opinionated tools at a rate we cannot support in the
+PyTorch core library, but ultimately benefit from, as evidenced by
+our `rich ecosystem <https://pytorch.org/ecosystem/>`__. In other
+words, not automating at the start allows us to potentially reach levels
+of good automation faster.
+
+Principle 3: Python First with Best In Class Language Interoperability
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+This principle began as **Python First**:
+
+  PyTorch is not a Python binding into a monolithic C++ framework.
+  It is built to be deeply integrated into Python. You can use it
+  naturally like you would use `NumPy <https://www.numpy.org/>`__,
+  `SciPy <https://www.scipy.org/>`__, `scikit-learn <(https://scikit-learn.org/>`__,
+  or other Python libraries. You can write your new neural network
+  layers in Python itself, using your favorite libraries and use
+  packages such as `Cython <https://cython.org/>`__ and
+  `Numba <http://numba.pydata.org/>`__. Our goal is to not reinvent
+  the wheel where appropriate.
+
+One thing PyTorch has needed to deal with over the years is Python
+overhead: we first rewrote the `autograd` engine in C++, then the majority
+of operator definitions, then developed TorchScript and the C++
+frontend.
+
+Still, working in Python provides easily the best experience for our
+users: it is flexible, familiar, and perhaps most importantly, has a
+huge ecosystem of scientific computing libraries and extensions
+available for use. This fact motivates a few of our most recent
+investments, which attempt to hit a Pareto optimal point close to the
+Python usability end of the curve:
+
+-  `TorchDynamo <https://dev-discuss.pytorch.org/t/torchdynamo-an-experiment-in-dynamic-python-bytecode-transformation/361>`__,
+   a Python frame evaluation tool capable of speeding up existing
+   eager-mode PyTorch programs with minimal user intervention.
+-  `torch_function <https://pytorch.org/docs/master/notes/extending.html#extending-torch>`__
+   and `torch_dispatch <https://dev-discuss.pytorch.org/t/what-and-why-is-torch-dispatch/557>`__
+   extension points, which have enabled Python-first functionality to be
+   built on-top of C++ internals, such as the `torch.fx
+   tracer <https://pytorch.org/docs/stable/fx.html>`__
+   and `functorch <https://github.com/pytorch/functorch>`__
+   respectively.
+
+These design principles are not hard-and-fast rules, but hard won
+choices and anchor how we built PyTorch to be the debuggable, hackable
+and flexible framework it is today. As we have more contributors and
+maintainers, we look forward to applying these core principles with you
+across our libraries and ecosystem. We are also open to evolving them as
+we learn new things and the AI space evolves, as we know it will.