Skip to content

Commit

Permalink
Merge pull request moby#5765 from AkihiroSuda/rootless
Browse files Browse the repository at this point in the history
rootless: update docs and examples
  • Loading branch information
crazy-max authored Mar 4, 2025
2 parents f7999fe + 3a91b50 commit 1c41f9b
Show file tree
Hide file tree
Showing 11 changed files with 317 additions and 74 deletions.
111 changes: 72 additions & 39 deletions docs/rootless.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,34 +12,48 @@ Rootless mode allows running BuildKit daemon as a non-root user.

[RootlessKit](https://github.com/rootless-containers/rootlesskit/) needs to be installed.

```console
$ rootlesskit buildkitd
```bash
rootlesskit buildkitd
```

```console
$ buildctl --addr unix:///run/user/$UID/buildkit/buildkitd.sock build ...
```bash
buildctl --addr unix:///run/user/$UID/buildkit/buildkitd.sock build ...
```

To isolate BuildKit daemon's network namespace from the host (recommended):
```console
$ rootlesskit --net=slirp4netns --copy-up=/etc --disable-host-loopback buildkitd
```
> [!TIP]
> To isolate BuildKit daemon's network namespace from the host (recommended):
> ```bash
> rootlesskit --net=slirp4netns --copy-up=/etc --disable-host-loopback buildkitd
> ```
## Running BuildKit in Rootless mode (containerd worker)
[RootlessKit](https://github.com/rootless-containers/rootlesskit/) needs to be installed.
Run containerd in rootless mode using rootlesskit following [containerd's document](https://github.com/containerd/containerd/blob/main/docs/rootless.md).
```bash
containerd-rootless.sh
CONTAINERD_NAMESPACE=default containerd-rootless-setuptool.sh install-buildkit-containerd
```
$ containerd-rootless.sh
```
Then let buildkitd join the same namespace as containerd.
<details>
<summary>Advanced guide</summary>
<p>
Alternatively, you can specify the full command line flags as follows:
```bash
containerd-rootless.sh --config /path/to/config.toml
containerd-rootless-setuptool.sh nsenter -- buildkitd --oci-worker=false --containerd-worker=true
```
$ containerd-rootless-setuptool.sh nsenter -- buildkitd --oci-worker=false --containerd-worker=true --containerd-worker-snapshotter=native
```
</p>
</details>
## Containerized deployment
Expand All @@ -48,36 +62,45 @@ See [`../examples/kubernetes`](../examples/kubernetes).
### Docker
```console
$ docker run \
```bash
docker run \
--name buildkitd \
-d \
--security-opt seccomp=unconfined \
--security-opt apparmor=unconfined \
--device /dev/fuse \
moby/buildkit:rootless --oci-worker-no-process-sandbox
$ buildctl --addr docker-container://buildkitd build ...
```
--security-opt systempaths=unconfined \
moby/buildkit:rootless
If you don't mind using `--privileged` (almost safe for rootless), the `docker run` flags can be shorten as follows:

```console
$ docker run --name buildkitd -d --privileged moby/buildkit:rootless
buildctl --addr docker-container://buildkitd build ...
```
#### About `--device /dev/fuse`
Adding `--device /dev/fuse` to the `docker run` arguments is required only if you want to use `fuse-overlayfs` snapshotter.
> [!TIP]
> If you don't mind using `--privileged` (almost safe for rootless), the `docker run` flags can be shorten as follows:
>
> ```bash
> docker run --name buildkitd -d --privileged moby/buildkit:rootless
> ```
#### About `--oci-worker-no-process-sandbox`
Justification of the `--security-opt` flags:
By adding `--oci-worker-no-process-sandbox` to the `buildkitd` arguments, BuildKit can be executed in a container without adding `--privileged` to `docker run` arguments.
However, you still need to pass `--security-opt seccomp=unconfined --security-opt apparmor=unconfined` to `docker run`.
* `seccomp=unconfined`: For allowing several syscalls such as `unshare` (used by runc) and `mount` (used by snapshotters, etc).
Note that `--oci-worker-no-process-sandbox` allows build executor containers to `kill` (and potentially `ptrace` depending on the seccomp configuration) an arbitrary process in the BuildKit daemon container.
* `apparmor=unconfined`: For allowing mounting filesystems, etc.
This flag is not needed when the host operating system does not use AppArmor.
To allow running rootless `buildkitd` without `--oci-worker-no-process-sandbox`, run `docker run` with `--security-opt systempaths=unconfined`. (For Kubernetes, set `securityContext.procMount` to `Unmasked`.)
* `systempaths=unconfined`: For disabling the masks for the `/proc` mount in the container, so that each of `ExecOp`
(corresponds to a `RUN` instruction in Dockerfile) can have a dedicated `/proc` filesystem.
`systempaths=unconfined` potentially allows reading and writing dangerous kernel files from a container, but it is safe when you are running `buildkitd` as non-root.
The `--security-opt systempaths=unconfined` flag disables the masks for the `/proc` mount in the container and potentially allows reading and writing dangerous kernel files, but it is safe when you are running `buildkitd` as non-root.
> [!TIP]
> Instead of `--security-opt systempaths=unconfined`, `buildkitd` can be also executed with `--oci-worker-no-process-sandbox` (flag of `buildkitd`, not `docker`)
> to avoid creating a new PID namespace and mounting a new `/proc` for it.
>
> Using `--oci-worker-no-process-sandbox` is discouraged, as it cannot terminate processes that did not exit during an `ExecOp`.
> Also, `--oci-worker-no-process-sandbox` allows `ExecOp` containers to `kill` (and potentially `ptrace` depending on the seccomp configuration) an arbitrary process in the BuildKit daemon container.
>
> Despite these caveats, the [Kubernetes examples](../examples/kubernetes) uses `--oci-worker-no-process-sandbox`, as Kubernetes lacks the equivalent of `systempaths=unconfined`.
> (`securityContext.procMount=Unmasked` is similar, but different in the sense that it depends on `hostUsers: false`)
### Change UID/GID
Expand All @@ -90,7 +113,7 @@ Actual ID (shown in the host and the BuildKit daemon container)| Mapped ID (show
... | ...
165535 | 65536
```
```console
$ docker exec buildkitd id
uid=1000(user) gid=1000(user)
$ docker exec buildkitd ps aux
Expand All @@ -99,15 +122,16 @@ PID USER TIME COMMAND
13 user 0:00 /proc/self/exe buildkitd --addr tcp://0.0.0.0:1234
21 user 0:00 buildkitd --addr tcp://0.0.0.0:1234
29 user 0:00 ps aux
$ docker exec cat /etc/subuid
user:100000:65536
```
To change the UID/GID configuration, you need to modify and build the BuildKit image manually.
```
$ vi Dockerfile
$ make images
$ docker run ... moby/buildkit:local-rootless ...
```bash
vi Dockerfile
make images
docker run ... moby/buildkit:local-rootless ...
```
## Troubleshooting
Expand All @@ -120,7 +144,9 @@ $ rootlesskit buildkitd --oci-worker-snapshotter=fuse-overlayfs
```
### Error related to `fuse-overlayfs`
Try running `buildkitd` with `--oci-worker-snapshotter=native`:
Run `docker run` with `--device /dev/fuse`.
Also try running `buildkitd` with `--oci-worker-snapshotter=native`:
```console
$ rootlesskit buildkitd --oci-worker-snapshotter=native
Expand All @@ -137,12 +163,19 @@ Run `sysctl -w user.max_user_namespaces=N` (N=positive integer, like 63359) on t
See [`../examples/kubernetes/sysctl-userns.privileged.yaml`](../examples/kubernetes/sysctl-userns.privileged.yaml).
### Error `fork/exec /proc/self/exe: permission denied` with `This error might have happened because /proc/sys/kernel/apparmor_restrict_unprivileged_userns is set to 1`
Add `kernel.apparmor_restrict_unprivileged_userns=0` to `/etc/sysctl.conf` (or `/etc/sysctl.d`) and run `sudo sysctl -p`.
### Error `mount proc:/proc (via /proc/self/fd/6), flags: 0xe: operation not permitted`
This error is known to happen when BuildKit is executed in a container without the `--oci-worker-no-sandbox` flag.
Make sure that `--oci-worker-no-process-sandbox` is specified (See [below](#docker)).
This error is known to happen when BuildKit is executed in a container without the `--security-opt systempaths=unconfined` flag.
Make sure to specify it (See [above](#docker)).
## Distribution-specific hint
Using Ubuntu kernel is recommended.
### Ubuntu, 24.04 or later
Add `kernel.apparmor_restrict_unprivileged_userns=0` to `/etc/sysctl.conf` (or `/etc/sysctl.d`) and run `sudo sysctl -p`.
### Container-Optimized OS from Google
Make sure to have an `emptyDir` volume below:
```yaml
Expand Down
56 changes: 34 additions & 22 deletions examples/kubernetes/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,16 +6,26 @@ This directory contains Kubernetes manifests for `Pod`, `Deployment` (with `Serv
* `StateFulset`: good for client-side load balancing, without registry-side cache
* `Job`: good if you don't want to have daemon pods

Using Rootless mode (`*.rootless.yaml`) is recommended because Rootless mode image is executed as non-root user (UID 1000) and doesn't need `securityContext.privileged`.
See [`../../docs/rootless.md`](../../docs/rootless.md).
## Variants

See also ["Building Images Efficiently And Securely On Kubernetes With BuildKit" (KubeCon EU 2019)](https://kccnceu19.sched.com/event/MPX5).
- `*.privileged.yaml`: Launches the Pod as the fully privileged root user.
- `*.rootless.yaml`: Launches the Pod as a non-root user, whose UID is 1000.
- `*.userns.yaml`: Launches the Pod as a non-root user. The UID is determined by kubelet.
Needs kubelet and kube-apiserver to be reconfigured to enable the
[`UserNamespacesSupport`](https://kubernetes.io/docs/tasks/configure-pod-container/user-namespaces/) feature gate.

It is recommended to use `*.rootless.yaml` to minimize the chance of container breakout attacks.

See also:
- [`../../docs/rootless.md`](../../docs/rootless.md).
- ["Building Images Efficiently And Securely On Kubernetes With BuildKit" (KubeCon EU 2019)](https://kccnceu19.sched.com/event/MPX5).

## `Pod`

```console
$ kubectl apply -f pod.rootless.yaml
$ buildctl \
```bash
kubectl apply -f pod.rootless.yaml

buildctl \
--addr kube-pod://buildkitd \
build --frontend dockerfile.v0 --local context=/path/to/dir --local dockerfile=/path/to/dir
```
Expand All @@ -29,25 +39,27 @@ If rootless mode doesn't work, try `pod.privileged.yaml`.
Setting up mTLS is highly recommended.

`./create-certs.sh SAN [SAN...]` can be used for creating certificates.
```console
$ ./create-certs.sh 127.0.0.1
```bash
./create-certs.sh 127.0.0.1
```

The daemon certificates is created as `Secret` manifest named `buildkit-daemon-certs`.
```console
$ kubectl apply -f .certs/buildkit-daemon-certs.yaml
```bash
kubectl apply -f .certs/buildkit-daemon-certs.yaml
```

Apply the `Deployment` and `Service` manifest:
```console
$ kubectl apply -f deployment+service.rootless.yaml
$ kubectl scale --replicas=10 deployment/buildkitd
```bash
kubectl apply -f deployment+service.rootless.yaml

kubectl scale --replicas=10 deployment/buildkitd
```

Run `buildctl` with TLS client certificates:
```console
$ kubectl port-forward service/buildkitd 1234
$ buildctl \
```bash
kubectl port-forward service/buildkitd 1234

buildctl \
--addr tcp://127.0.0.1:1234 \
--tlscacert .certs/client/ca.pem \
--tlscert .certs/client/cert.pem \
Expand All @@ -58,10 +70,10 @@ $ buildctl \
## `StatefulSet`
`StatefulSet` is useful for consistent hash mode.

```console
$ kubectl apply -f statefulset.rootless.yaml
$ kubectl scale --replicas=10 statefulset/buildkitd
$ buildctl \
```bash
kubectl apply -f statefulset.rootless.yaml
kubectl scale --replicas=10 statefulset/buildkitd
buildctl \
--addr kube-pod://buildkitd-4 \
build --frontend dockerfile.v0 --local context=/path/to/dir --local dockerfile=/path/to/dir
```
Expand All @@ -70,8 +82,8 @@ See [`./consistenthash`](./consistenthash) for how to use consistent hashing.

## `Job`

```console
$ kubectl apply -f job.rootless.yaml
```bash
kubectl apply -f job.rootless.yaml
```

To push the image to the registry, you also need to mount `~/.docker/config.json`
Expand Down
5 changes: 3 additions & 2 deletions examples/kubernetes/deployment+service.rootless.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -13,8 +13,6 @@ spec:
metadata:
labels:
app: buildkitd
annotations:
container.apparmor.security.beta.kubernetes.io/buildkitd: unconfined
# see buildkit/docs/rootless.md for caveats of rootless mode
spec:
containers:
Expand Down Expand Up @@ -54,6 +52,9 @@ spec:
# Needs Kubernetes >= 1.19
seccompProfile:
type: Unconfined
# Needs Kubernetes >= 1.30
appArmorProfile:
type: Unconfined
# To change UID/GID, you need to rebuild the image
runAsUser: 1000
runAsGroup: 1000
Expand Down
77 changes: 77 additions & 0 deletions examples/kubernetes/deployment+service.userns.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,77 @@
# Depends on feature gate UserNamespacesSupport
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: buildkitd
name: buildkitd
spec:
replicas: 1
selector:
matchLabels:
app: buildkitd
template:
metadata:
labels:
app: buildkitd
spec:
hostUsers: false
containers:
- name: buildkitd
image: moby/buildkit:master
args:
- --addr
- unix:///run/buildkit/buildkitd.sock
- --addr
- tcp://0.0.0.0:1234
- --tlscacert
- /certs/ca.pem
- --tlscert
- /certs/cert.pem
- --tlskey
- /certs/key.pem
# the probe below will only work after Release v0.6.3
readinessProbe:
exec:
command:
- buildctl
- debug
- workers
initialDelaySeconds: 5
periodSeconds: 30
# the probe below will only work after Release v0.6.3
livenessProbe:
exec:
command:
- buildctl
- debug
- workers
initialDelaySeconds: 5
periodSeconds: 30
securityContext:
# Not really privileged
privileged: true
ports:
- containerPort: 1234
volumeMounts:
- name: certs
readOnly: true
mountPath: /certs
volumes:
# buildkit-daemon-certs must contain ca.pem, cert.pem, and key.pem
- name: certs
secret:
secretName: buildkit-daemon-certs
---
apiVersion: v1
kind: Service
metadata:
labels:
app: buildkitd
name: buildkitd
spec:
ports:
- port: 1234
protocol: TCP
selector:
app: buildkitd
4 changes: 2 additions & 2 deletions examples/kubernetes/job.privileged.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -8,11 +8,11 @@ spec:
restartPolicy: Never
initContainers:
- name: prepare
image: alpine:3.10
image: busybox
command:
- sh
- -c
- "echo FROM hello-world > /workspace/Dockerfile"
- "echo -e 'FROM alpine\nRUN apk add gcc\n' > /workspace/Dockerfile"
volumeMounts:
- name: workspace
mountPath: /workspace
Expand Down
Loading

0 comments on commit 1c41f9b

Please sign in to comment.