|
| 1 | +--- |
| 2 | +title: "Exploring 'True' Nullability in GraphQL" |
| 3 | +tags: ["spec"] |
| 4 | +date: 2024-08-14 |
| 5 | +byline: Benjie Gillam |
| 6 | +--- |
| 7 | + |
| 8 | +One of GraphQL's early decisions was to handle "partial failures"; this was a |
| 9 | +critical feature for Facebook - if one part of their backend infrastructure |
| 10 | +became degraded they wouldn't want to just render an error page, instead they |
| 11 | +wanted to serve the user a page with as much working data as they could. |
| 12 | + |
| 13 | +## Null propagation |
| 14 | + |
| 15 | +To accomplish this, if an error occured within a resolver, the resolver's value |
| 16 | +would be replaced with a `null`, and an error would be added to the `errors` |
| 17 | +array in the response. However, what if that field was marked as non-null? To |
| 18 | +solve that apparent contradiction, GraphQL introduced the "error propagation" |
| 19 | +behavior (also known colloquially as "null bubbling") - when a `null` (from an |
| 20 | +error or otherwise) occurs in a non-nullable position, the parent position |
| 21 | +(either a field or a list item) is made `null` and this behavior would repeat if |
| 22 | +the parent position was also non-nullable. |
| 23 | + |
| 24 | +This solved the issue, and meant that GraphQL's nullability promises were still |
| 25 | +honoured; but it wasn't without complications. |
| 26 | + |
| 27 | +### Complication 1: partial failures |
| 28 | + |
| 29 | +We want to be resilient to systems failing; but errors that occur in |
| 30 | +non-nullable positions cascade to surrounding parts of the query, making less |
| 31 | +and less data available to be rendered. This seems contrary to our "partial |
| 32 | +failures" aim, but it's easy to solve - we just make sure that the positions |
| 33 | +where we expect errors to occur are nullable so that errors don't propagate |
| 34 | +further. Clients now needed to ensure they handle any nulls that occur in these |
| 35 | +positions; but that seemed like a fair trade. |
| 36 | + |
| 37 | +### Complication 2: nullable epidemic |
| 38 | + |
| 39 | +But, it turns out, almost any field in your GraphQL schema could raise an error |
| 40 | + |
| 41 | +- errors might not only be caused by backend services becoming unavailable or |
| 42 | + responding in unexpected ways; they can also be caused by simple programming |
| 43 | + errors in your business logic, data consistency errors (e.g. expecting a |
| 44 | + boolean but receiving a float), or any other cause. |
| 45 | + |
| 46 | +Since we don't want to "blow up" the entire response if any such issue occurred, |
| 47 | +we've moved to strongly encourage nullable usage throughout a schema, only |
| 48 | +adding the non-nullable `!` marker to positions where we're truly sure that |
| 49 | +field is extremely unlikely to error. This has the effect of meaning that |
| 50 | +developers consuming the GraphQL API have to handle null in more positions than |
| 51 | +they would expect, giving them a harder time. |
| 52 | + |
| 53 | +### Complication 3: normalized caching |
| 54 | + |
| 55 | +Many modern GraphQL clients use a "normalized" cache, such that updates pulled |
| 56 | +down from the API in one query can automatically update all the previously |
| 57 | +rendered data across the application. This helps ensure consistency for users, |
| 58 | +and is a powerful feature. |
| 59 | + |
| 60 | +But if an error occurs in a non-nullable position, it's |
| 61 | +[no longer safe](https://github.com/graphql/nullability-wg/issues/20) to store |
| 62 | +the data to the normalized cache. |
| 63 | + |
| 64 | +## The Nullability Working Group |
| 65 | + |
| 66 | +At first, we thought the solution to this was to give clients control over the |
| 67 | +nullability of a response, so we set up the Client-Controlled Nullability (CCN) |
| 68 | +Working Group. Later, we renamed the working group to the Nullability WG to show |
| 69 | +that it encompassed all potential solutions to this problem. |
| 70 | + |
| 71 | +### Client-controlled nullability |
| 72 | + |
| 73 | +The first CCN WG proposal was that we could adorn the queries we issue to the |
| 74 | +server with sigils indicating our desired nullability overrides for the given |
| 75 | +fields - a `?` would be added to fields where we don't mind if they're null, but |
| 76 | +we definitely want errors to stop there; and add a `!` to fields where we |
| 77 | +definitely don't want a null to occur. This would give consumers control over |
| 78 | +where errors/nulls were handled; but after much exploration of the topic over |
| 79 | +years we found numerous issues that traded one set of concerns for another. |
| 80 | + |
| 81 | +We needed a better solution. |
| 82 | + |
| 83 | +### True nullability schema |
| 84 | + |
| 85 | +Jordan Eldredge |
| 86 | +[proposed](https://github.com/graphql/nullability-wg/discussions/22) that making |
| 87 | +fields nullable to handle error propagation was hiding the "true" nullability of |
| 88 | +the data. Instead, he suggested, we should have the schema represent the true |
| 89 | +nullability, and put the responsibility on clients to use the `?` CCN operator |
| 90 | +to handle errors in the relevant places. |
| 91 | + |
| 92 | +However, this would mean that clients such as Relay would want to add `?` in |
| 93 | +every position, causing an "explosion" of question marks, because really what |
| 94 | +Relay desired was to disable null propagation entirely. |
| 95 | + |
| 96 | +### A new type |
| 97 | + |
| 98 | +Getting the relevant experts together at GraphQLConf 2023 re-energized the |
| 99 | +discussions and sparked new ideas. After seeing Stephen Spalding's "Nullability |
| 100 | +Sandwich" talk and chatting with Jordan, Stephen and others in amongst the |
| 101 | +seating, Benjie had an idea that felt right to him. He grabbed his laptop and |
| 102 | +sat quietly for an hour at one of the tables in the sponsors room and wrote up |
| 103 | +[the spec edits](https://github.com/graphql/graphql-spec/pull/1046) to represent |
| 104 | +a "null only on error" type. This type would allow us to express the "true" |
| 105 | +nullability of a field whilst also indicating that errors may happen that should |
| 106 | +be handled, but would not "blow up" the response. |
| 107 | + |
| 108 | +To maintain backwards compatibility, clients would need to opt in to seeing this |
| 109 | +new type (otherwise it would masquerade as nullable); and it would be their |
| 110 | +choice of how to handle the nullability of this position, knowing that the data |
| 111 | +would only contain a `null` there if a matching error existed in the `errors` |
| 112 | +list. |
| 113 | + |
| 114 | +A |
| 115 | +[number of alternative syntaxes](https://gist.github.com/benjie/19d784721d1658b89fd8954e7ee07034) |
| 116 | +were suggested for this, but none were well liked. |
| 117 | + |
| 118 | +### A new approach to client error handling |
| 119 | + |
| 120 | +Also around the time of GraphQLConf 2023 the Relay team shared |
| 121 | +[a presentation](https://docs.google.com/presentation/u/2/d/1rfWeBcyJkiNqyxPxUIKxgbExmfdjA70t/edit?pli=1#slide=id.p8) |
| 122 | +on some of the things they were thinking around errors. In particular they |
| 123 | +discussed the `@catch` directive which would give users control over how errors |
| 124 | +were represented in the data being rendered, allowing the client to |
| 125 | +differentiate an error from a legitimate null. Over the coming months, many |
| 126 | +behaviors were discussed at the Nullability WG; one particularly compelling one |
| 127 | +was that clients could throw the error when an errored field was read, and rely |
| 128 | +on framework mechanics (such as React's |
| 129 | +[error boundaries](https://legacy.reactjs.org/docs/error-boundaries.html)) to |
| 130 | +handle them. |
| 131 | + |
| 132 | +### A new mode |
| 133 | + |
| 134 | +Lee [proposed](https://github.com/graphql/graphql-wg/discussions/1410) that we |
| 135 | +introduce a schema directive, `@strictNullability`, whereby we would change what |
| 136 | +the syntax meant - `Int?` for nullable, `Int` for null-only-on-error, and `Int!` |
| 137 | +for never-null. This proposal was well liked, but wasn't a clear win, it |
| 138 | +introduced many complexities, not least migration costs. |
| 139 | + |
| 140 | +### A pivotal discussion |
| 141 | + |
| 142 | +Lee and Benjie had a call where they discussed all of this in depth, including |
| 143 | +their two respective solutions, their pros and cons. It was clear that neither |
| 144 | +solution was quite there, but we were getting closer and closer to a solution. |
| 145 | +This long and detailed highly technical discussion inspired Benjie to write up |
| 146 | +[a new proposal](https://github.com/graphql/nullability-wg/discussions/58), |
| 147 | +which has been iterated further, and we aim to describe below. |
| 148 | + |
| 149 | +## Our latest proposal |
| 150 | + |
| 151 | +We're now proposing a new opt-in mode to solve the nullability problem. It's |
| 152 | +important to note that clients and servers that don't opt-in will be completely |
| 153 | +unaffected by this change (and a client may opt-in without a server opting-in, |
| 154 | +and vice-versa, without causing any issues - in these cases, traditional mode |
| 155 | +will be used). |
| 156 | + |
| 157 | +### No-error-propogation mode |
| 158 | + |
| 159 | +The new proposal centers around the premise of allowing clients to disable the |
| 160 | +"error propagation" behavior discussed above. |
| 161 | + |
| 162 | +Clients that opt-in to this behavior take responsibility for interpretting the |
| 163 | +response as a whole, correlating the `data` and `errors` properties of the |
| 164 | +response. With error propagation disabled and the fact that any field could |
| 165 | +potentially throw an error, all positions in `data` can potentially contain a |
| 166 | +`null` value. Clients in this mode must cross-check any `null` values against |
| 167 | +`errors` to determine if it's a true null, or an error. |
| 168 | + |
| 169 | +### "Smart" clients |
| 170 | + |
| 171 | +The no-error-propagation mode is intended for use by "smart" clients such as |
| 172 | +Relay, Apollo Client, URQL and others which understand GraphQL deeply and are |
| 173 | +responsible for the storage and retrieval of fetched GraphQL data. These clients |
| 174 | +are well positioned to handle the responsibilities outlined above. |
| 175 | + |
| 176 | +By disabling error propagation, these clients will be able to safely update |
| 177 | +their stores (including normalized stores) even when errors occur. They can also |
| 178 | +re-implement traditional GraphQL error propagation on top of these new |
| 179 | +foundations, shielding applications developers from needing to learn this new |
| 180 | +behavior (whilst still allowing them to reap the benefits!). They can even take |
| 181 | +on advanced behaviors, such as throwing the error when the application developer |
| 182 | +attempts to read from an errored field, allowing the developer to handle errors |
| 183 | +with their own more natural error boundaries. |
| 184 | + |
| 185 | +### True nullability |
| 186 | + |
| 187 | +Just like in traditional mode, for clients operating in no-error-propagation |
| 188 | +mode fields are either nullable or non-nullable. However; unlike in traditional |
| 189 | +mode, no-error-propagation mode allows for errors to be represented in any |
| 190 | +position: |
| 191 | + |
| 192 | +- nullable (e.g. `Int`): a value, an error, or a true `null`; |
| 193 | +- non-nullable (e.g. `Int!`): a value **or an error**. |
| 194 | + |
| 195 | +_(In traditional mode, non-nullable fields cannot represent an error because the |
| 196 | +error propagates to the nearest nullable position.)_ |
| 197 | + |
| 198 | +Since this mode allows every field, whether nullable or non-nullable, to |
| 199 | +represent an error, the schema can safely indicate to clients in this mode the |
| 200 | +true intended nullability of a field. If the schema designer knows that a field |
| 201 | +should never be null unless an error occurs, they would mark the field as |
| 202 | +non-nullable (but only for clients in no-null-propagation mode; see "schema |
| 203 | +developers" below). |
| 204 | + |
| 205 | +### Client reflection of true nullability |
| 206 | + |
| 207 | +Smart clients can ask the schema about the "true" nullability of each field via |
| 208 | +introspection, and can generate a derived SDL by combining that information with |
| 209 | +their knowledge of how the client handles errors. This derived SDL would look |
| 210 | +like the traditional representation of the schema, but with more fields |
| 211 | +represented as non-nullable where the true nullability of the underlying schema |
| 212 | +is reflected. Application developers would issue queries and mutations in the |
| 213 | +same way they always had, but now their generated types don't need to handle |
| 214 | +`null` in as many positions as before, increasing developer happiness. |
| 215 | + |
| 216 | +### Schema developers |
| 217 | + |
| 218 | +Schemas that wish to add support for indicating the "true nullability" of a |
| 219 | +field in no-error-propagation mode need to be able to discern which types show |
| 220 | +up as non-nullable in both modes (traditional non-null types), and which types |
| 221 | +show up as non-nullable only in no-error-propagation mode. For this later |
| 222 | +concern we've introduced the concept, of a "semantic" non-null type: |
| 223 | + |
| 224 | +- "strict" (traditional) non-nullable - shows up as non-nullable in both |
| 225 | + traditional mode and no-null-propagation mode |
| 226 | +- "semantic" non-nullable, aka "null only on error" - shows up as non-nullable |
| 227 | + only in no-null-propagation mode; in traditional mode it will masquerade as |
| 228 | + nullable |
| 229 | + |
| 230 | +Only clients that opt-in to seeing the true nullability will see this |
| 231 | +difference, otherwise the nullability of the chosen mode (traditional or |
| 232 | +no-error-propagation) will be reflected by introspection. |
| 233 | + |
| 234 | +### Representation in SDL |
| 235 | + |
| 236 | +Application developers will only need to deal with traditional SDL that |
| 237 | +represents traditional nullability concerns. If these developers are using |
| 238 | +"smart" clients then they should get this SDL from the client rather than from |
| 239 | +the server, this allows them to see the nullability that the client guarantees |
| 240 | +based on how it will handle the "true" nullability of the schema, how it handles |
| 241 | +errors, and factoring in any local schema extensions that may have been added. |
| 242 | + |
| 243 | +Client-derived SDL (see "client reflection of true nullability" above) can be |
| 244 | +used for concerns such as code generation, which will work in the traditional |
| 245 | +way with no need for changes (but happier developers since there will be fewer |
| 246 | +nullable positions!). |
| 247 | + |
| 248 | +However, schema developers and people working on "smart" clients may need to |
| 249 | +represent the differences between "strict" and "semantic" non-nullable in SDL. |
| 250 | +For these people, we're introducing the `@extendedNullability` document |
| 251 | +directive. When this directive is present at the top of a document, the `!` |
| 252 | +symbol means that a type will appear as non-nullable only in no-null-propagation |
| 253 | +mode, and a new `!!` symbol will represent that a type will appear as |
| 254 | +non-nullable in both traditional and no-error-propagation mode. |
| 255 | + |
| 256 | +| Traditional Mode | No-null-propagation mode | Example | |
| 257 | +| ---------------- | ------------------------ | ------- | |
| 258 | +| Nullable | Nullable | `Int` | |
| 259 | +| Nullable | Non-nullable | `Int!` | |
| 260 | +| Non-nullable\* | Non-nullable | `Int!!` | |
| 261 | + |
| 262 | +The `!!` symbol is designed to look a little scary - it should be used with |
| 263 | +caution (like `!` in traditional schemas) because it is the symbol that means |
| 264 | +that errors will propagate in traditional mode, "blowing up" parent selection |
| 265 | +sets. |
| 266 | + |
| 267 | +## Get involved |
| 268 | + |
| 269 | +Like all GraphQL Working Groups, the Nullability Working Group is open to all. |
| 270 | +Whether you work on a GraphQL client or are just a GraphQL user with thoughts on |
| 271 | +nullability, we want to hear from you - add yourself to an |
| 272 | +[upcoming working group](https://github.com/graphql/nullability-wg/) or chat |
| 273 | +with us in the #nullability-wg channel in |
| 274 | +[the GraphQL Discord](https://discord.graphql.org). This solution is not yet |
| 275 | +merged into the specification, so there's still time for iteration and |
| 276 | +alternative ideas! |
0 commit comments