@@ -189,6 +189,112 @@ In addition to any devices configured with this setting, the runtime MUST also s
189
189
* [ ` /dev/ptmx ` ] [ pts.4 ] .
190
190
A [ bind-mount or symlink of the container's ` /dev/pts/ptmx ` ] [ devpts ] .
191
191
192
+ ## <a name =" configLinuxNetworkDevices " />Network Devices
193
+
194
+ Linux network devices are entities that send and receive data packets. They are
195
+ not represented as files in the ` /dev ` directory. Instead, they are represented
196
+ by the [ ` net_device ` ] [ net_device ] data structure in the Linux kernel. Network
197
+ devices can belong to only one network namespace and use a set of operations
198
+ distinct from regular file operations. Network devices can be categorized as
199
+ ** physical** or ** virtual** :
200
+
201
+ * ** Physical network devices** correspond to hardware interfaces, such as
202
+ Ethernet cards (e.g., ` eth0 ` , ` enp0s3 ` ). They are directly associated with
203
+ physical network hardware.
204
+ * ** Virtual network devices** are software-defined interfaces, such as loopback
205
+ devices (` lo ` ), virtual Ethernet pairs (` veth ` ), bridges (` br0 ` ), VLANs, and
206
+ MACVLANs. They are created and managed by the kernel and do not correspond
207
+ to physical hardware.
208
+
209
+ This schema focuses solely on moving existing network devices identified by name
210
+ from the host network namespace into the container network namespace. It does
211
+ not cover the complexities of network device creation or network configuration,
212
+ such as IP address assignment, routing, and DNS setup.
213
+
214
+ ** ` netDevices ` ** (object, OPTIONAL) - A set of network devices that MUST be made
215
+ available in the container. The runtime is responsible for moving these devices;
216
+ the underlying mechanism is implementation-defined.
217
+
218
+ The name of the network device is the entry key. Entry values are objects with
219
+ the following properties:
220
+
221
+ * ** ` name ` ** * (string, OPTIONAL)* - the name of the network device inside the
222
+ container namespace. If not specified, the host name is used.
223
+
224
+ The runtime MUST check if moving the network interface to the container
225
+ namespace is possible. If a network device with the specified name already
226
+ exists in the container namespace, the runtime MUST [ generate an error] ( runtime.md#errors ) ,
227
+ unless the user has provided a template by appending
228
+ ` %d ` to the new name. In that case, the runtime MUST allow the move, and the
229
+ kernel will generate a unique name for the interface within the container's
230
+ network namespace.
231
+
232
+ The runtime MUST preserve the existing network interface attributes, as defined
233
+ by the kernel, including IP addresses, enabling users to preconfigure the
234
+ interfaces.
235
+
236
+ The runtime MUST set the network device state to "up" after moving it to the
237
+ network namespace to allow the container to send and receive network traffic
238
+ through that device.
239
+
240
+ ### Namespace Lifecycle and Container Termination
241
+
242
+ The runtime MUST NOT actively manage the interface's lifecycle and configuration
243
+ * within* the container's network namespace. This is because network interfaces
244
+ are inherently tied to the network namespace itself, and their lifecycle is
245
+ therefore managed by the owner of the network namespace. Typically, this
246
+ ownership and management are handled by higher-level container runtime
247
+ orchestrators, rather than the processes running directly within the container.
248
+
249
+ The runtime ** MUST NOT** attempt to move the interface out of the namespace
250
+ before deletion. This design decision is based on the following:
251
+
252
+ * ** Namespace Ownership:** Network interfaces are tied to the network namespace,
253
+ which may not always be directly managed by the runtime.
254
+ * ** Abrupt Termination:** Even when the runtime manages the namespace, it cannot
255
+ reliably participate in its deletion if the container's processes terminate
256
+ abruptly (e.g., due to a crash).
257
+
258
+ During the network namespace deletion the kernel's built-in namespace cleanup
259
+ mechanisms take over, as described in [ network_namespaces(7)] [ net_namespaces.7 ] :
260
+ "When a network namespace is freed (i.e., when the last process in the namespace
261
+ terminates), its physical network devices are moved back to the initial network
262
+ namespace." All the network namespace migratable physical network devices are
263
+ moved to the default network namespace, while virtual devices (veth, macvlan,
264
+ ...) are destroyed.
265
+
266
+ If users require custom handling of interface lifecycle during namespace
267
+ deletion, they can utilize existing features within the namespace orchestrator
268
+ or employ post-stop hooks.
269
+
270
+ ** Physical Interface Renaming and Systemd**
271
+
272
+ When a physical interface is renamed within a container and the container's
273
+ network namespace is later deleted, the kernel will move the interface back to
274
+ the root namespace with its renamed name. To ensure predictable interface names
275
+ in the root namespace, users can utilize systemd's ` udevd ` and ` networkd ` rules.
276
+ Refer to [ systemd Predictable Network Interface Names] [ predictable-network-interfaces-names ] for more information on configuring
277
+ predictable names.
278
+
279
+ When a physical interface is renamed within a container and the container's
280
+ network namespace is later deleted, the kernel will move the interface back to
281
+ the root namespace with its renamed name. In case of a name conflict in the root
282
+ namespace, the kernel will rename it to ` dev%d ` . To ensure predictable interface
283
+ names in the root namespace, users can utilize systemd's ` udevd ` and ` networkd `
284
+ rules. Refer to [ systemd Predictable Network Interface Names] [ predictable-network-interfaces-names ]
285
+ for more information on configuring predictable names.
286
+
287
+ ### Example
288
+
289
+ #### Moving a device with a renamed interface inside the container:
290
+
291
+ ``` json
292
+ "netDevices" : {
293
+ "eth0" : {
294
+ "name" : " container_eth0"
295
+ }
296
+ }
297
+
192
298
## <a name="configLinuxControlGroups" />Control groups
193
299
194
300
Also known as cgroups, they are used to restrict resource usage for a container and handle device access.
@@ -975,6 +1081,9 @@ subset of the available options.
975
1081
[ mknod.1 ] : https://man7.org/linux/man-pages/man1/mknod.1.html
976
1082
[ mknod.2 ] : https://man7.org/linux/man-pages/man2/mknod.2.html
977
1083
[ namespaces.7_2 ] : https://man7.org/linux/man-pages/man7/namespaces.7.html
1084
+ [ net_device ] : https://docs.kernel.org/networking/netdevices.html
1085
+ [ net_namespaces.7 ] : https://man7.org/linux/man-pages/man7/network_namespaces.7.html
1086
+ [ predictable-network-interfaces-names ] : https://systemd.io/PREDICTABLE_INTERFACE_NAMES
978
1087
[ null.4 ] : https://man7.org/linux/man-pages/man4/null.4.html
979
1088
[ personality.2 ] : https://man7.org/linux/man-pages/man2/personality.2.html
980
1089
[ pts.4 ] : https://man7.org/linux/man-pages/man4/pts.4.html
0 commit comments