|
| 1 | +PyTorch Design Philosophy |
| 2 | +========================= |
| 3 | + |
| 4 | +This document is designed to help contributors and module maintainers |
| 5 | +understand the high-level design principles that have developed over |
| 6 | +time in PyTorch. These are not meant to be hard-and-fast rules, but to |
| 7 | +serve as a guide to help trade off different concerns and to resolve |
| 8 | +disagreements that may come up while developing PyTorch. For more |
| 9 | +information on contributing, module ownership, and how to escalate a |
| 10 | +disagreement to the Core Maintainers, please see `PyTorch |
| 11 | +Governance <https://pytorch.org/docs/master/community/governance.html>`__. |
| 12 | + |
| 13 | +Design Principles |
| 14 | +----------------- |
| 15 | + |
| 16 | +Principle 1: Usability over Performance |
| 17 | +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| 18 | + |
| 19 | +This principle may be surprising! As one Hacker News poster wrote: |
| 20 | +*PyTorch is amazing! [...] Although I’m confused. How can a ML framework be |
| 21 | +not obsessed with speed/performance?* See `Hacker News discussion on |
| 22 | +PyTorch <https://news.ycombinator.com/item?id=28066093>`__. |
| 23 | + |
| 24 | +Soumith’s blog post on `Growing the PyTorch |
| 25 | +Community <https://soumith.ch/posts/2021/02/growing-opensource/?fbclid=IwAR1bvN_xZ8avGvu14ODJzS8Zp7jX1BOyfuGUf-zoRawpyL-s95Vjxf88W7s>`__ |
| 26 | +goes into this in some depth, but at a high-level: |
| 27 | + |
| 28 | +- PyTorch’s primary goal is usability |
| 29 | +- A secondary goal is to have *reasonable* performance |
| 30 | + |
| 31 | +We believe the ability to maintain our flexibility to support |
| 32 | +researchers who are building on top of our abstractions remains |
| 33 | +critical. We can’t see what the future of what workloads will be, but we |
| 34 | +know we want them to be built first on PyTorch and that requires |
| 35 | +flexibility. |
| 36 | + |
| 37 | +In more concrete terms, we operate in a *usability-first* manner and try |
| 38 | +to avoid jumping to *restriction-first* regimes (for example, static shapes, |
| 39 | +graph-mode only) without a clear-eyed view of the tradeoffs. Often there |
| 40 | +is a temptation to impose strict user restrictions upfront because it |
| 41 | +can simplify implementation, but this comes with risks: |
| 42 | + |
| 43 | +- The performance may not be worth the user friction, either because |
| 44 | + the performance benefit is not compelling enough or it only applies to |
| 45 | + a relatively narrow set of subproblems. |
| 46 | +- Even if the performance benefit is compelling, the restrictions can |
| 47 | + fragment the ecosystem into different sets of limitations that can |
| 48 | + quickly become incomprehensible to users. |
| 49 | + |
| 50 | +We want users to be able to seamlessly move their PyTorch code to |
| 51 | +different hardware and software platforms, to interoperate with |
| 52 | +different libraries and frameworks, and to experience the full richness |
| 53 | +of the PyTorch user experience, not a least common denominator subset. |
| 54 | + |
| 55 | +Principle 2: Simple Over Easy |
| 56 | +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| 57 | + |
| 58 | +Here, we borrow from `The Zen of |
| 59 | +Python <https://peps.python.org/pep-0020/>`__: |
| 60 | + |
| 61 | +- *Explicit is better than implicit* |
| 62 | +- *Simple is better than complex* |
| 63 | + |
| 64 | +A more concise way of describing these two goals is `Simple Over |
| 65 | +Easy <https://www.infoq.com/presentations/Simple-Made-Easy/>`_. Let’s start with an example because *simple* and *easy* are |
| 66 | +often used interchangeably in everyday English. Consider how one may |
| 67 | +model `devices <https://pytorch.org/docs/master/tensor_attributes.html#torch.device>`__ |
| 68 | +in PyTorch: |
| 69 | + |
| 70 | +- **Simple / Explicit (to understand, debug):** every tensor is associated |
| 71 | + with a device. The user explicitly specifies tensor device movement. |
| 72 | + Operations that require cross-device movement result in an error. |
| 73 | +- **Easy / Implicit (to use):** the user does not have to worry about |
| 74 | + devices; the system figures out the globally optimal device |
| 75 | + placement. |
| 76 | + |
| 77 | +In this specific case, and as a general design philosophy, PyTorch |
| 78 | +favors exposing simple and explicit building blocks rather than APIs |
| 79 | +that are easy-to-use by practitioners. The simple version is immediately |
| 80 | +understandable and debuggable by a new PyTorch user: you get a clear |
| 81 | +error if you call an operator requiring cross-device movement at the |
| 82 | +point in the program where the operator is actually invoked. The easy |
| 83 | +solution may let a new user move faster initially, but debugging such a |
| 84 | +system can be complex: How did the system make its determination? What |
| 85 | +is the API for plugging into such a system and how are objects |
| 86 | +represented in its IR? |
| 87 | + |
| 88 | +Some classic arguments in favor of this sort of design come from `A |
| 89 | +Note on Distributed |
| 90 | +Computation <https://dl.acm.org/doi/book/10.5555/974938>`__ (TLDR: Do not |
| 91 | +model resources with very different performance characteristics |
| 92 | +uniformly, the details will leak) and the `End-to-End |
| 93 | +Principle <http://web.mit.edu/Saltzer/www/publications/endtoend/endtoend.pdf>`__ |
| 94 | +(TLDR: building smarts into the lower-layers of the stack can prevent |
| 95 | +building performant features at higher layers in the stack, and often |
| 96 | +doesn’t work anyway). For example, we could build operator-level or |
| 97 | +global device movement rules, but the precise choices aren’t obvious and |
| 98 | +building an extensible mechanism has unavoidable complexity and latency |
| 99 | +costs. |
| 100 | + |
| 101 | +A caveat here is that this does not mean that higher-level “easy” APIs |
| 102 | +are not valuable; certainly there is a value in, for example, |
| 103 | +higher-levels in the stack to support efficient tensor computations |
| 104 | +across heterogeneous compute in a large cluster. Instead, what we mean |
| 105 | +is that focusing on simple lower-level building blocks helps inform the |
| 106 | +easy API while still maintaining a good experience when users need to |
| 107 | +leave the beaten path. It also allows space for innovation and the |
| 108 | +growth of more opinionated tools at a rate we cannot support in the |
| 109 | +PyTorch core library, but ultimately benefit from, as evidenced by |
| 110 | +our `rich ecosystem <https://pytorch.org/ecosystem/>`__. In other |
| 111 | +words, not automating at the start allows us to potentially reach levels |
| 112 | +of good automation faster. |
| 113 | + |
| 114 | +Principle 3: Python First with Best In Class Language Interoperability |
| 115 | +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| 116 | + |
| 117 | +This principle began as **Python First**: |
| 118 | + |
| 119 | + PyTorch is not a Python binding into a monolithic C++ framework. |
| 120 | + It is built to be deeply integrated into Python. You can use it |
| 121 | + naturally like you would use `NumPy <https://www.numpy.org/>`__, |
| 122 | + `SciPy <https://www.scipy.org/>`__, `scikit-learn <(https://scikit-learn.org/>`__, |
| 123 | + or other Python libraries. You can write your new neural network |
| 124 | + layers in Python itself, using your favorite libraries and use |
| 125 | + packages such as `Cython <https://cython.org/>`__ and |
| 126 | + `Numba <http://numba.pydata.org/>`__. Our goal is to not reinvent |
| 127 | + the wheel where appropriate. |
| 128 | + |
| 129 | +One thing PyTorch has needed to deal with over the years is Python |
| 130 | +overhead: we first rewrote the `autograd` engine in C++, then the majority |
| 131 | +of operator definitions, then developed TorchScript and the C++ |
| 132 | +frontend. |
| 133 | + |
| 134 | +Still, working in Python provides easily the best experience for our |
| 135 | +users: it is flexible, familiar, and perhaps most importantly, has a |
| 136 | +huge ecosystem of scientific computing libraries and extensions |
| 137 | +available for use. This fact motivates a few of our most recent |
| 138 | +investments, which attempt to hit a Pareto optimal point close to the |
| 139 | +Python usability end of the curve: |
| 140 | + |
| 141 | +- `TorchDynamo <https://dev-discuss.pytorch.org/t/torchdynamo-an-experiment-in-dynamic-python-bytecode-transformation/361>`__, |
| 142 | + a Python frame evaluation tool capable of speeding up existing |
| 143 | + eager-mode PyTorch programs with minimal user intervention. |
| 144 | +- `torch_function <https://pytorch.org/docs/master/notes/extending.html#extending-torch>`__ |
| 145 | + and `torch_dispatch <https://dev-discuss.pytorch.org/t/what-and-why-is-torch-dispatch/557>`__ |
| 146 | + extension points, which have enabled Python-first functionality to be |
| 147 | + built on-top of C++ internals, such as the `torch.fx |
| 148 | + tracer <https://pytorch.org/docs/stable/fx.html>`__ |
| 149 | + and `functorch <https://github.com/pytorch/functorch>`__ |
| 150 | + respectively. |
| 151 | + |
| 152 | +These design principles are not hard-and-fast rules, but hard won |
| 153 | +choices and anchor how we built PyTorch to be the debuggable, hackable |
| 154 | +and flexible framework it is today. As we have more contributors and |
| 155 | +maintainers, we look forward to applying these core principles with you |
| 156 | +across our libraries and ecosystem. We are also open to evolving them as |
| 157 | +we learn new things and the AI space evolves, as we know it will. |
0 commit comments