Skip to content

Commit f0655b7

Browse files
garlofffkr
andauthored
Security advisory for Linux LPEs copy.fail and Dirty Frag. (#363)
* Security advisory for Linux LPEs copy.fail and Dirty Frag. Signed-off-by: Kurt Garloff <kurt@garloff.de> * Instructions how to use kubectl node-shell Signed-off-by: Kurt Garloff <kurt@garloff.de> * Also add changelog Signed-off-by: Kurt Garloff <kurt@garloff.de> * More on stable kernels. Signed-off-by: Kurt Garloff <kurt@garloff.de> * Mention that we patched community infra. Signed-off-by: Kurt Garloff <kurt@garloff.de> * Use node v20. Signed-off-by: Kurt Garloff <kurt@garloff.de> * Use node v20. Signed-off-by: Kurt Garloff <kurt@garloff.de> * Fix the "here" links that are complained about. Signed-off-by: Kurt Garloff <kurt@garloff.de> * Fix links. Signed-off-by: Kurt Garloff <kurt@garloff.de> * One more link. Signed-off-by: Kurt Garloff <kurt@garloff.de> * Update blog/2026-05-10-kernel-root-exploits.md Co-authored-by: Felix Kronlage-Dammers <fkr@hazardous.org> Signed-off-by: Kurt Garloff <kurt@garloff.de> * Release. And fix info on the disclosure process. Signed-off-by: Kurt Garloff <kurt@garloff.de> --------- Signed-off-by: Kurt Garloff <kurt@garloff.de> Co-authored-by: Felix Kronlage-Dammers <fkr@hazardous.org>
1 parent cc73dec commit f0655b7

1 file changed

Lines changed: 256 additions & 0 deletions

File tree

Lines changed: 256 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,256 @@
1+
---
2+
slug: kernel_local_root_exploits
3+
title: Linux Kernel local root exploits CVE-2026-31431, -43284, -43500
4+
authors: [garloff]
5+
tags: [security, linux, cve, copy.fail, dirtyfrag]
6+
---
7+
8+
## Linux root exploits (Local Privilege Escalation)
9+
10+
Unix is designed as a multi-user system. Different users have their own
11+
files and processes and can work without interference from others.
12+
Linux lives in that tradition. It has advanced the concept with namespaces
13+
where users can also have a private view on networking, process list, filesystems
14+
and other pieces that are traditionally shared (read-only) on a Unix system,
15+
also including some resource management to enhance performance isolation.
16+
17+
It is the operating system's kernel's job to keep the separation safe; in
18+
particular, normal users must not achieve the system administrator (root)
19+
privileges. Where the kernel fails to ensure this, we have a "local root"
20+
vulnerability, a Local Privilege Escalation (LPE).
21+
22+
The Linux kernel is a large and a complex beast. On one hand it has sophisticated
23+
mechanisms to get really good performance out of increasingly complex hardware.
24+
On the other hand, it comes with a huge variety of device drivers. From time to
25+
time, vulnerabilities are found, reported and fixed. The Linux kernel has several
26+
LPEs per year. Most of the time, they affect only a small fraction of users
27+
(typically by being located in a device driver or somewhat exotic feature)
28+
and often they are hard to exploit, needing to win a race condition with
29+
many attempts and sometimes causing crashes in trying (which may not go unnoticed).
30+
31+
We don't normally report about these LPEs. They get fixed by the upstream Linux kernel
32+
developers, shipped as stable updates by the maintainers and shipped to the end
33+
users via kernel updates from the Linux distributors.
34+
35+
## copy.fail and Dirty Frag
36+
37+
The currently highly visible Linux kernel issues [copy.fail](https://copy.fail/)
38+
and [Dirty Frag](https://github.com/V4bel/dirtyfrag) are both LPEs (local root
39+
vulnerabilities). The reason we report about them is that they both affect
40+
most Linux users (with kernels from the last 9 years) and are easy to exploit.
41+
42+
Like [Dirty Pipe](https://dirtypipe.cm4all.com/) and before
43+
[Dirty Cow](https://dirtycow.ninja/), both LPEs rely on improper protection
44+
of the page cache.
45+
The Linux kernel keeps contents from file systems in the page cache; when code
46+
gets executed, it is mapped into your virtual memory. When the memory page is
47+
accessed and not yet loaded into your physical memory, a page fault occurs and
48+
the relevant blocks are loaded from disk — or the access is denied and your
49+
program receives `SIGSEGV` and is terminated. Copying pages is costly and the
50+
kernel avoids it to achieve higher performance. If you write to a memory page,
51+
the kernel may receive a page fault on a read-only mapping (that it created to
52+
avoid copying) and only then do the copy to create a private writable copy.
53+
This approach is called copy-on-write (COW) and is common in modern operating
54+
systems. If a page from the page cache is changed in memory, it is also marked
55+
"dirty", so the kernel knows it needs to write the changes back to the file system.
56+
57+
In copy.fail, the `aead` crypto module did some cryptography in place, avoiding
58+
the need to allocate an extra buffer. Unfortunately, it requires 4 extra bytes
59+
under some conditions; normally aead is used by IPsec and that location is a
60+
designate place in a network buffer. However, a local attacker can make this
61+
write happen to a page cache page by using `splice`. This way, the copy of the
62+
`sudo` binary in the page cache can be overwritten, allowing to circumvent the
63+
safeguards there. The attacker can trivially become root — as the page is not
64+
dirtied, no trace of the corruption will be visible on the disk.
65+
[copy.fail](https://copy.fail/) has been assigned CVE-2026-31431.
66+
67+
In Dirty Frag, a network buffer that is split over several fragments is not
68+
properly handled and the fragmented buffer is not properly COW'ed. The AEAD
69+
crypto operation then again overwrites 4 bytes. A local attacker can trigger
70+
this again become root very quickly by overwriting the page cache's view of
71+
`sudo`. (Of course other sensitive binary code could be overwritten in memory.)
72+
This can be triggered via the IPsec `esp_input` (for both IPv4 and IPv6) as well
73+
as via the `rxrpc` code. The esp variant requires the privilege to create user
74+
namespaces and then allows for easy 4 byte writes at a time. It has been assigned
75+
CVE-2026-43284. The rxrpc variant overwrites 8 bytes and doe not require the
76+
namespace creation privileges, but as these bytes are crypted,
77+
the user needs to brute force them in order to achieve a controlled result. This
78+
variant was assigned CVE-2026-43500.
79+
80+
_Exploiting these vulnerabilities requires access to the system and the ability
81+
to execute code there, thus the categorization as Local Privilege Escalation (LPE),
82+
not Remote Code Execution (RCE)._
83+
84+
## Impact
85+
86+
Any system where normal (non-root) users can log in to execute code under their
87+
own control is no longer secure: The users can use the publicly available
88+
exploits to gain root privileges and get access to whatever the (virtual)
89+
machine has access to. This means accessing other user's data as well as secrets
90+
that may be stored by the system administrator.
91+
92+
Such systems are less common these days than they were 20 years ago. The reason
93+
is that virtualization has become a commodity, so in many scenarios, individual
94+
users may use their own virtual machine rather than having access to a shared
95+
(virtual) machine.
96+
97+
Note that this vulnerability does NOT break the isolation of virtual machines.
98+
VMs remain as securely isolated as they would be without this vulnerability.
99+
These LPEs do NOT establish a virtualization escape.
100+
101+
There is however a common scenario where individual users and workloads
102+
are running inside a container. The LPE also allows for escaping containers.
103+
Running a shell inside a kubernetes pod allows you to get control of the
104+
kubernetes node and thus of everything that your kubernetes cluster has
105+
access to. Running untrusted code in a container is thus very risky — something
106+
that will affect e.g. CI setups.
107+
108+
## Fixes
109+
110+
A fix to the Linux kernel for Copy.fail was silently merged at the end of March
111+
2026 (for 7.0-rc7) and also been merged to the stable kernel series (6.18.22,
112+
6.12.85, 6.6.137).
113+
It just disables the in-place optimization for `algif_aed`. As of early May,
114+
Linux distributors are currently underway to ship fixed kernels.
115+
Without a fixed kernel, a workaround is to place a file `copyfail.conf` in
116+
`/etc/modprobe.d/` with the contents:
117+
118+
```shell
119+
# Temporary workaround for copy.fail CVE-2026-31431
120+
install algif_aead /bin/false
121+
```
122+
123+
The fixes for Dirty Frag are still in development as of May 8. The first fixes
124+
have been merged upstream and released in 7.0.5, 6.18.28, 6.12.87, 6.6.138,
125+
6.1.172, 5.15.206 and 5.10.255 but there is
126+
[more to come for rxrpc](https://lwn.net/ml/all/2026050859-ahead-anchovy-05e2@gregkh/).
127+
The responsible disclosure process for Dirty Frag unfortunately failed due to the
128+
[patches being spotted](https://www.openwall.com/lists/oss-security/2026/05/07/12),
129+
so the upstream maintainers and the distributors this time did not have time
130+
to carefully prepare and test fixes ahead of the publication of the issue.
131+
So we have to expect that it will take a few days until all Linux distributor
132+
manage to ship tested fixed kernels.
133+
134+
A fully effective workaround is again to prevent loading the affected modules
135+
by placing another file `dirtyfrag.conf` in `/etc/modprobe.d/`:
136+
137+
```shell
138+
# Temporary workaround for Dirty Frag CVE-2026-43284, CVE-2026-43500
139+
# This breaks IPsec
140+
install esp4 /bin/false
141+
install esp6 /bin/false
142+
install rxrpc /bin/false
143+
```
144+
145+
Note that these workarounds prevent IPsec from working.
146+
147+
If a system is suspected to already have been exploited, the system owner can
148+
dispose of the page cache by doing `echo 3 > /proc/sys/vm/drop_caches` as root
149+
and unload the affected modules to prevent re-exploitation.
150+
This will discard the modified page cache pages — however an attacker could have
151+
used its gained privileges to install further backdoors etc. into the system, so
152+
it will need to be reinstalled or fully audited to be considered trustable again.
153+
154+
## SCS IaaS Cloud Provider exposure
155+
156+
None of the control-plane / management systems in a normal SCS cloud infrastructure
157+
can be logged in by normal users. The LPE thus can not be exploited. However,
158+
should another exploit be found and used successfully, the LPEs may be used
159+
to escalate privileges further, e.g. breaking out of the containers that run
160+
the OpenStack services or Ceph or some of the management tools and thus remove
161+
one layer of a defense-in-depth concept.
162+
163+
Cloud Providers are advised to install updated kernels to reestablish the defense.
164+
They can apply the module loading prevention measures in the meantime. Providers
165+
are advised to use this with care on the network nodes — if these need to support
166+
IPsec (e.g. for OpenStack's VPNaaS which is part of neutron), the non-loadable
167+
modules may prevent correct operation. Please note that there is no known remote
168+
exploit via IPsec, so a temporary trade-off to live without the defense-in-depth
169+
and not break IPsec (and this way create security and functionality issues or for
170+
customers) may be justified.
171+
172+
Cloud providers often provide VM images for their customers.
173+
To support the customers to keep the security separation in the customer's VMs,
174+
they are advised to watch out for the availability of new distribution images
175+
and provide them short-term via their image service (glance).
176+
177+
## SCS Kubernetes Provider exposure
178+
179+
The default implementation with SCS Cluster Stacks is vulnerable; the current
180+
node images have a kernel that is affected by this weakness. This allows a user
181+
to break out of the containers running in the cluster to take over the node
182+
VM and other containers.
183+
184+
With Cluster-API and the SCS Cluster Stacks building
185+
on them, creating, updating and removing Kubernetes clusters has become
186+
a commodity; it is thus normal to create clusters per development team and
187+
not share them. In this scenario, the break out may allow a developer to
188+
take over containers from his team mates which may not constitute a real danger
189+
in many setups. For cluster setups across teams or worse for setups where several
190+
clusters that belong to different entities share a control plane, this becomes
191+
more serious.
192+
193+
Note that the LPE also removes a defense-in-depth mechanism, where a user of
194+
a service running in a k8s cluster exploits a vulnerability to be able to
195+
execute code in the container — the LPEs can then be used to escalate the
196+
privileges further.
197+
198+
As soon as new kernels become available, the node images will be rebuilt and
199+
shipped with the next cluster stack patch releases. For users, the normal
200+
rolling upgrade will then be all that's needed to be secure against this LPE
201+
again.
202+
203+
We will update this advisory as soon as new node images are available.
204+
205+
For highly critical workloads, cluster operators can log in to the nodes
206+
and deploy the mechanisms to prevent loading the above-mentioned modules.
207+
(Again, this will break IPsec.) Note that logging in to nodes in an SCS
208+
Cluster Stack cluster is not possible by default; it requires booting
209+
into a rescue image (if the cluster runs on OpenStack) to inject an ssh
210+
key or to use a tool like kubectl-node-shell with the appropriate
211+
privileges.
212+
213+
```bash
214+
for node in $(kubectl get nodes | grep -v '^NAME' | awk '{print $1;}') do;
215+
kubectl node_shell "$node" -- bash -c 'echo -e "# Temporarily disable algif_aead (copy.fail)\ninstall algif_aead /bin/false" > /etc/modprobe.d/disable-aead-copyfail.conf'
216+
kubectl node_shell "$node" -- bash -c 'echo -e "# Temporarily disable esp4, esp6, rxrpc (Dirty Frag)\ninstall esp4 /bin/false\ninstall esp6 /bin/false\ninstall rxrpc /bin/false" > /etc/modprobe.d/disable-esp46-rxrpc-dirtyfrag.conf'
217+
done
218+
```
219+
220+
## SCS Cloud users
221+
222+
Customers of SCS IaaS clouds are responsible for their own VMs. For VMs
223+
that are exposed, they should use the documented workaround inside their VMs,
224+
online-update and reboot into a fixed kernel or redeploy their VMs based
225+
on a fixed upstream image.
226+
227+
Customers that do their own Kubernetes Container Cluster Management
228+
with e.g. SCS Cluster Stacks are advised to watch out for new node
229+
images and then perform the rolling upgrade. If their use scenario puts
230+
them at increased risk, they are advised to prevent the module loading
231+
in the meantime, as advised above.
232+
233+
## SCS community infrastructure
234+
235+
The SCS community infrastructure was secured on May 8 by disabling the
236+
relevant modules.
237+
238+
## Thanks
239+
240+
The authors would like to thank Taeyang Lee at Xint (who initiated the
241+
research on copy.fail) and Hyunwoo Kim (@v4bel, who discovered Dirty Frag).
242+
They would also like to thank the upstream Linux kernel maintainers and
243+
Linux distributors for their reliable work no handling the issues and
244+
getting fixes out.
245+
246+
## Sovereign Cloud Stack Security Contact
247+
248+
SCS security contact is [security@scs.community](mailto:security@scs.community), as published on
249+
[https://sovereigncloudstack.org/.well-known/security.txt](https://scs.community/.well-known/security.txt).
250+
251+
## Version history
252+
253+
- Initial Draft, v0.1, 2026-05-08, 17:15 CEST.
254+
- kubectl node-shell instructions, v0.2, 2026-05-09, 12:45 CEST.
255+
- Mention succssful patching of community infra, v0.3, 2026-05-09, 13:30 CEST.
256+
- Correct facts on the failure of the responsible disclosure. Release as v1.0, 2026-05-09, 20:00 CEST.

0 commit comments

Comments
 (0)