Skip to content

Commit f6fef9f

Browse files
committed
Nullability post
1 parent ff20471 commit f6fef9f

File tree

2 files changed

+277
-0
lines changed

2 files changed

+277
-0
lines changed

.prettierignore

+1
Original file line numberDiff line numberDiff line change
@@ -2,4 +2,5 @@ public/
22
pnpm-lock.yaml
33
*.mdx
44
!src/pages/blog/2024-04-11-announcing-new-graphql-website/index.mdx
5+
!src/pages/blog/2024-08-14-exploring-true-nullability.mdx
56
*.jpg
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,276 @@
1+
---
2+
title: "Exploring 'True' Nullability in GraphQL"
3+
tags: ["spec"]
4+
date: 2024-08-14
5+
byline: Benjie Gillam
6+
---
7+
8+
One of GraphQL's early decisions was to handle "partial failures"; this was a
9+
critical feature for Facebook - if one part of their backend infrastructure
10+
became degraded they wouldn't want to just render an error page, instead they
11+
wanted to serve the user a page with as much working data as they could.
12+
13+
## Null propagation
14+
15+
To accomplish this, if an error occured within a resolver, the resolver's value
16+
would be replaced with a `null`, and an error would be added to the `errors`
17+
array in the response. However, what if that field was marked as non-null? To
18+
solve that apparent contradiction, GraphQL introduced the "error propagation"
19+
behavior (also known colloquially as "null bubbling") - when a `null` (from an
20+
error or otherwise) occurs in a non-nullable position, the parent position
21+
(either a field or a list item) is made `null` and this behavior would repeat if
22+
the parent position was also non-nullable.
23+
24+
This solved the issue, and meant that GraphQL's nullability promises were still
25+
honoured; but it wasn't without complications.
26+
27+
### Complication 1: partial failures
28+
29+
We want to be resilient to systems failing; but errors that occur in
30+
non-nullable positions cascade to surrounding parts of the query, making less
31+
and less data available to be rendered. This seems contrary to our "partial
32+
failures" aim, but it's easy to solve - we just make sure that the positions
33+
where we expect errors to occur are nullable so that errors don't propagate
34+
further. Clients now needed to ensure they handle any nulls that occur in these
35+
positions; but that seemed like a fair trade.
36+
37+
### Complication 2: nullable epidemic
38+
39+
But, it turns out, almost any field in your GraphQL schema could raise an error
40+
41+
- errors might not only be caused by backend services becoming unavailable or
42+
responding in unexpected ways; they can also be caused by simple programming
43+
errors in your business logic, data consistency errors (e.g. expecting a
44+
boolean but receiving a float), or any other cause.
45+
46+
Since we don't want to "blow up" the entire response if any such issue occurred,
47+
we've moved to strongly encourage nullable usage throughout a schema, only
48+
adding the non-nullable `!` marker to positions where we're truly sure that
49+
field is extremely unlikely to error. This has the effect of meaning that
50+
developers consuming the GraphQL API have to handle null in more positions than
51+
they would expect, giving them a harder time.
52+
53+
### Complication 3: normalized caching
54+
55+
Many modern GraphQL clients use a "normalized" cache, such that updates pulled
56+
down from the API in one query can automatically update all the previously
57+
rendered data across the application. This helps ensure consistency for users,
58+
and is a powerful feature.
59+
60+
But if an error occurs in a non-nullable position, it's
61+
[no longer safe](https://github.com/graphql/nullability-wg/issues/20) to store
62+
the data to the normalized cache.
63+
64+
## The Nullability Working Group
65+
66+
At first, we thought the solution to this was to give clients control over the
67+
nullability of a response, so we set up the Client-Controlled Nullability (CCN)
68+
Working Group. Later, we renamed the working group to the Nullability WG to show
69+
that it encompassed all potential solutions to this problem.
70+
71+
### Client-controlled nullability
72+
73+
The first CCN WG proposal was that we could adorn the queries we issue to the
74+
server with sigils indicating our desired nullability overrides for the given
75+
fields - a `?` would be added to fields where we don't mind if they're null, but
76+
we definitely want errors to stop there; and add a `!` to fields where we
77+
definitely don't want a null to occur. This would give consumers control over
78+
where errors/nulls were handled; but after much exploration of the topic over
79+
years we found numerous issues that traded one set of concerns for another.
80+
81+
We needed a better solution.
82+
83+
### True nullability schema
84+
85+
Jordan Eldredge
86+
[proposed](https://github.com/graphql/nullability-wg/discussions/22) that making
87+
fields nullable to handle error propagation was hiding the "true" nullability of
88+
the data. Instead, he suggested, we should have the schema represent the true
89+
nullability, and put the responsibility on clients to use the `?` CCN operator
90+
to handle errors in the relevant places.
91+
92+
However, this would mean that clients such as Relay would want to add `?` in
93+
every position, causing an "explosion" of question marks, because really what
94+
Relay desired was to disable null propagation entirely.
95+
96+
### A new type
97+
98+
Getting the relevant experts together at GraphQLConf 2023 re-energized the
99+
discussions and sparked new ideas. After seeing Stephen Spalding's "Nullability
100+
Sandwich" talk and chatting with Jordan, Stephen and others in amongst the
101+
seating, Benjie had an idea that felt right to him. He grabbed his laptop and
102+
sat quietly for an hour at one of the tables in the sponsors room and wrote up
103+
[the spec edits](https://github.com/graphql/graphql-spec/pull/1046) to represent
104+
a "null only on error" type. This type would allow us to express the "true"
105+
nullability of a field whilst also indicating that errors may happen that should
106+
be handled, but would not "blow up" the response.
107+
108+
To maintain backwards compatibility, clients would need to opt in to seeing this
109+
new type (otherwise it would masquerade as nullable); and it would be their
110+
choice of how to handle the nullability of this position, knowing that the data
111+
would only contain a `null` there if a matching error existed in the `errors`
112+
list.
113+
114+
A
115+
[number of alternative syntaxes](https://gist.github.com/benjie/19d784721d1658b89fd8954e7ee07034)
116+
were suggested for this, but none were well liked.
117+
118+
### A new approach to client error handling
119+
120+
Also around the time of GraphQLConf 2023 the Relay team shared
121+
[a presentation](https://docs.google.com/presentation/u/2/d/1rfWeBcyJkiNqyxPxUIKxgbExmfdjA70t/edit?pli=1#slide=id.p8)
122+
on some of the things they were thinking around errors. In particular they
123+
discussed the `@catch` directive which would give users control over how errors
124+
were represented in the data being rendered, allowing the client to
125+
differentiate an error from a legitimate null. Over the coming months, many
126+
behaviors were discussed at the Nullability WG; one particularly compelling one
127+
was that clients could throw the error when an errored field was read, and rely
128+
on framework mechanics (such as React's
129+
[error boundaries](https://legacy.reactjs.org/docs/error-boundaries.html)) to
130+
handle them.
131+
132+
### A new mode
133+
134+
Lee [proposed](https://github.com/graphql/graphql-wg/discussions/1410) that we
135+
introduce a schema directive, `@strictNullability`, whereby we would change what
136+
the syntax meant - `Int?` for nullable, `Int` for null-only-on-error, and `Int!`
137+
for never-null. This proposal was well liked, but wasn't a clear win, it
138+
introduced many complexities, not least migration costs.
139+
140+
### A pivotal discussion
141+
142+
Lee and Benjie had a call where they discussed all of this in depth, including
143+
their two respective solutions, their pros and cons. It was clear that neither
144+
solution was quite there, but we were getting closer and closer to a solution.
145+
This long and detailed highly technical discussion inspired Benjie to write up
146+
[a new proposal](https://github.com/graphql/nullability-wg/discussions/58),
147+
which has been iterated further, and we aim to describe below.
148+
149+
## Our latest proposal
150+
151+
We're now proposing a new opt-in mode to solve the nullability problem. It's
152+
important to note that clients and servers that don't opt-in will be completely
153+
unaffected by this change (and a client may opt-in without a server opting-in,
154+
and vice-versa, without causing any issues - in these cases, traditional mode
155+
will be used).
156+
157+
### No-error-propogation mode
158+
159+
The new proposal centers around the premise of allowing clients to disable the
160+
"error propagation" behavior discussed above.
161+
162+
Clients that opt-in to this behavior take responsibility for interpretting the
163+
response as a whole, correlating the `data` and `errors` properties of the
164+
response. With error propagation disabled and the fact that any field could
165+
potentially throw an error, all positions in `data` can potentially contain a
166+
`null` value. Clients in this mode must cross-check any `null` values against
167+
`errors` to determine if it's a true null, or an error.
168+
169+
### "Smart" clients
170+
171+
The no-error-propagation mode is intended for use by "smart" clients such as
172+
Relay, Apollo Client, URQL and others which understand GraphQL deeply and are
173+
responsible for the storage and retrieval of fetched GraphQL data. These clients
174+
are well positioned to handle the responsibilities outlined above.
175+
176+
By disabling error propagation, these clients will be able to safely update
177+
their stores (including normalized stores) even when errors occur. They can also
178+
re-implement traditional GraphQL error propagation on top of these new
179+
foundations, shielding applications developers from needing to learn this new
180+
behavior (whilst still allowing them to reap the benefits!). They can even take
181+
on advanced behaviors, such as throwing the error when the application developer
182+
attempts to read from an errored field, allowing the developer to handle errors
183+
with their own more natural error boundaries.
184+
185+
### True nullability
186+
187+
Just like in traditional mode, for clients operating in no-error-propagation
188+
mode fields are either nullable or non-nullable. However; unlike in traditional
189+
mode, no-error-propagation mode allows for errors to be represented in any
190+
position:
191+
192+
- nullable (e.g. `Int`): a value, an error, or a true `null`;
193+
- non-nullable (e.g. `Int!`): a value **or an error**.
194+
195+
_(In traditional mode, non-nullable fields cannot represent an error because the
196+
error propagates to the nearest nullable position.)_
197+
198+
Since this mode allows every field, whether nullable or non-nullable, to
199+
represent an error, the schema can safely indicate to clients in this mode the
200+
true intended nullability of a field. If the schema designer knows that a field
201+
should never be null unless an error occurs, they would mark the field as
202+
non-nullable (but only for clients in no-null-propagation mode; see "schema
203+
developers" below).
204+
205+
### Client reflection of true nullability
206+
207+
Smart clients can ask the schema about the "true" nullability of each field via
208+
introspection, and can generate a derived SDL by combining that information with
209+
their knowledge of how the client handles errors. This derived SDL would look
210+
like the traditional representation of the schema, but with more fields
211+
represented as non-nullable where the true nullability of the underlying schema
212+
is reflected. Application developers would issue queries and mutations in the
213+
same way they always had, but now their generated types don't need to handle
214+
`null` in as many positions as before, increasing developer happiness.
215+
216+
### Schema developers
217+
218+
Schemas that wish to add support for indicating the "true nullability" of a
219+
field in no-error-propagation mode need to be able to discern which types show
220+
up as non-nullable in both modes (traditional non-null types), and which types
221+
show up as non-nullable only in no-error-propagation mode. For this later
222+
concern we've introduced the concept, of a "semantic" non-null type:
223+
224+
- "strict" (traditional) non-nullable - shows up as non-nullable in both
225+
traditional mode and no-null-propagation mode
226+
- "semantic" non-nullable, aka "null only on error" - shows up as non-nullable
227+
only in no-null-propagation mode; in traditional mode it will masquerade as
228+
nullable
229+
230+
Only clients that opt-in to seeing the true nullability will see this
231+
difference, otherwise the nullability of the chosen mode (traditional or
232+
no-error-propagation) will be reflected by introspection.
233+
234+
### Representation in SDL
235+
236+
Application developers will only need to deal with traditional SDL that
237+
represents traditional nullability concerns. If these developers are using
238+
"smart" clients then they should get this SDL from the client rather than from
239+
the server, this allows them to see the nullability that the client guarantees
240+
based on how it will handle the "true" nullability of the schema, how it handles
241+
errors, and factoring in any local schema extensions that may have been added.
242+
243+
Client-derived SDL (see "client reflection of true nullability" above) can be
244+
used for concerns such as code generation, which will work in the traditional
245+
way with no need for changes (but happier developers since there will be fewer
246+
nullable positions!).
247+
248+
However, schema developers and people working on "smart" clients may need to
249+
represent the differences between "strict" and "semantic" non-nullable in SDL.
250+
For these people, we're introducing the `@extendedNullability` document
251+
directive. When this directive is present at the top of a document, the `!`
252+
symbol means that a type will appear as non-nullable only in no-null-propagation
253+
mode, and a new `!!` symbol will represent that a type will appear as
254+
non-nullable in both traditional and no-error-propagation mode.
255+
256+
| Traditional Mode | No-null-propagation mode | Example |
257+
| ---------------- | ------------------------ | ------- |
258+
| Nullable | Nullable | `Int` |
259+
| Nullable | Non-nullable | `Int!` |
260+
| Non-nullable\* | Non-nullable | `Int!!` |
261+
262+
The `!!` symbol is designed to look a little scary - it should be used with
263+
caution (like `!` in traditional schemas) because it is the symbol that means
264+
that errors will propagate in traditional mode, "blowing up" parent selection
265+
sets.
266+
267+
## Get involved
268+
269+
Like all GraphQL Working Groups, the Nullability Working Group is open to all.
270+
Whether you work on a GraphQL client or are just a GraphQL user with thoughts on
271+
nullability, we want to hear from you - add yourself to an
272+
[upcoming working group](https://github.com/graphql/nullability-wg/) or chat
273+
with us in the #nullability-wg channel in
274+
[the GraphQL Discord](https://discord.graphql.org). This solution is not yet
275+
merged into the specification, so there's still time for iteration and
276+
alternative ideas!

0 commit comments

Comments
 (0)