-
-
Notifications
You must be signed in to change notification settings - Fork 146
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
ospp project (feature) add namespace overlayfs cgroup (#949)
## 开发进展: ## namespace - pid_namespace 基本实现,基于pid_struct等数据结构实现隔离 - mnt_namespace 基本实现,挂载点的隔离通过不同的挂载树来实现 - usernamespace 作为支持性的namespace,目前受限实现全局静态 ## overlayfs - 实现若干个文件系统的叠加,在mount中传入多个路径作为多个fs的mount路径以及最后merge层的fs路径 - copy-up机制的,除最上层外其他层为只读层,满足写时拷贝,需要修改的时候copy到上层修改 - whiteout特殊文件,用于标记在下层需要被删除的文件用来掩盖需要删除的文件 ## cgroups - 目前cgroups还处于框架阶段,之后具体实现具体的内存、CPU等子系统
- Loading branch information
1 parent
84c528f
commit f5b2038
Showing
43 changed files
with
2,279 additions
and
56 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,13 @@ | ||
==================================== | ||
容器化 | ||
==================================== | ||
|
||
这里是DragonOS中,与容器化相关的说明文档。 | ||
|
||
主要包括namespace,overlayfs和cgroup | ||
|
||
.. toctree:: | ||
:maxdepth: 2 | ||
|
||
namespaces/index | ||
filesystem/unionfs/index |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,14 @@ | ||
==================================== | ||
名称空间 | ||
==================================== | ||
|
||
DragonOS的namespaces目前支持pid_namespace和mnt_namespace 预计之后会继续完善 | ||
namespace是容器化实现过程中的重要组成部分 | ||
|
||
由于目前os是单用户,user_namespace为全局静态 | ||
|
||
.. toctree:: | ||
:maxdepth: 1 | ||
|
||
pid_namespace | ||
mnt_namespace |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,19 @@ | ||
# 挂载名称空间 | ||
|
||
## 底层架构 | ||
|
||
pcb -> nsproxy -> mnt_namespace | ||
|
||
每一个挂载文件系统都有自立独立的挂载点,表现在数据结构上是一个挂载的红黑树,每一个名称空间中挂载是独立的,所以文件系统的挂载和卸载不会影响别的 | ||
|
||
## 系统调用接口 | ||
|
||
|
||
- clone | ||
- CLONE_NEWNS用于创建一个新的 MNT 命名空间。提供独立的文件系统挂载点 | ||
- unshare | ||
- 使用 CLONE_NEWPID 标志调用 unshare() 后,后续创建的所有子进程都将在新的命名空间中运行。 | ||
- setns | ||
- 将进程加入到指定的名称空间 | ||
- chroot | ||
- 将当前进程的根目录更改为指定的路径,提供文件系统隔离。 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,21 @@ | ||
# 进程名称空间 | ||
:::{note} 本文作者:操丰毅 [email protected] | ||
|
||
2024年10月30日 ::: | ||
pid_namespace 是内核中的一种名称空间,用于实现进程隔离,允许在不同的名称空间中运行的进程有独立的pid试图 | ||
|
||
## 底层架构 | ||
|
||
pcb -> nsproxy -> pid_namespace | ||
- pid_namespace 内有独立的一套进程分配器,以及孤儿进程回收器,独立管理内部的pid | ||
- 不同进程的详细信息都存放在proc文件系统中,里面的找到对应的pid号里面的信息都在pid中,记录的是pid_namespace中的信息 | ||
- pid_namespace等限制由ucount来控制管理 | ||
|
||
## 系统调用接口 | ||
|
||
- clone | ||
- CLONE_NEWPID用于创建一个新的 PID 命名空间。使用这个标志时,子进程将在新的 PID 命名空间内运行,进程 ID 从 1 开始。 | ||
- unshare | ||
- 使用 CLONE_NEWPID 标志调用 unshare() 后,后续创建的所有子进程都将在新的命名空间中运行。 | ||
- getpid | ||
- 在命名空间中调用 getpid() 会返回进程在当前 PID 命名空间中的进程 ID |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -13,4 +13,5 @@ todo: 由于文件系统模块重构,文档暂时不可用,预计在2023年4 | |
vfs/index | ||
sysfs | ||
kernfs | ||
unionfs/index | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,10 @@ | ||
==================================== | ||
联合文件系统 | ||
==================================== | ||
Union Filesystem: | ||
OverlayFS 将多个文件系统(称为“层”)合并为一个逻辑文件系统,使用户看到一个统一的目录结构。 | ||
|
||
.. toctree:: | ||
:maxdepth: 1 | ||
|
||
overlayfs |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,26 @@ | ||
# overlayfs | ||
|
||
OverlayFs是目前使用最多的联合文件系统,原理简单方便使用,主要用于容器中 | ||
在 Docker 中,OverlayFS 是默认的存储驱动之一。Docker 为每个容器创建一个独立的上层目录,而所有容器共享同一个下层镜像文件。这样的设计使得容器之间的资源共享更加高效,同时减少了存储需求。 | ||
## 架构设计 | ||
overlayfs主要有两个层,以及一个虚拟的合并层 | ||
- Lower Layer(下层):通常是 只读 文件系统。可以包含多层。 | ||
- Upper Layer(上层):为 可写层,所有的写操作都会在这一层上进行。 | ||
- Merged Layer(合并层):上层和下层的逻辑视图合并后,向用户呈现的最终文件系统。 | ||
|
||
|
||
## 工作原理 | ||
- 读取操作: | ||
- OverlayFS 会优先从 Upper Layer 读取文件。如果文件不存在于上层,则读取 Lower Layer 中的内容。 | ||
- 写入操作: | ||
- 如果一个文件位于 Lower Layer 中,并尝试写入该文件,系统会将其 copy-up 到 Upper Layer 并在上层写入。如果文件已经存在于 Upper Layer,则直接在该层写入。 | ||
- 删除操作: | ||
- 当删除文件时,OverlayFS 会在上层创建一个标记为 whiteout 的条目,这会隐藏下层的文件。 | ||
|
||
## Copy-up | ||
- 写时拷贝 | ||
当一个文件从 下层 被修改时,它会被复制到 上层(称为 copy-up)。之后的所有修改都会发生在上层的文件副本上。 | ||
|
||
|
||
## 实现逻辑 | ||
通过构建ovlInode来实现indexnode这个trait来代表上层或者下层的inode,具体的有关文件文件夹的操作都在 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,6 @@ | ||
use super::CgroupSubsysState; | ||
|
||
struct MemCgroup { | ||
css: CgroupSubsysState, | ||
id: u32, | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,48 @@ | ||
#![allow(dead_code, unused_variables, unused_imports)] | ||
pub mod mem_cgroup; | ||
|
||
use alloc::{collections::LinkedList, rc::Weak, sync::Arc, vec::Vec}; | ||
|
||
use alloc::boxed::Box; | ||
|
||
use crate::filesystem::vfs::IndexNode; | ||
|
||
pub struct Cgroup { | ||
css: Weak<CgroupSubsysState>, | ||
/// 当前所在的深度 | ||
level: u32, | ||
/// 支持的最大深度 | ||
max_depth: u32, | ||
/// 可见后代数量 | ||
nr_descendants: u32, | ||
/// 正在死亡后代数量 | ||
nr_dying_descendants: u32, | ||
/// 允许的最大后代数量 | ||
max_descendants: u32, | ||
/// css_set的数量 | ||
nr_populated_csets: u32, | ||
/// 子group中有任务的记数 | ||
nr_populated_domain_children: u32, | ||
/// 线程子group中有任务的记数 | ||
nr_populated_threaded_children: u32, | ||
/// 活跃线程子cgroup数量 | ||
nr_threaded_children: u32, | ||
/// 关联cgroup的inode | ||
kernfs_node: Box<dyn IndexNode>, | ||
} | ||
|
||
/// 控制资源的统计信息 | ||
pub struct CgroupSubsysState { | ||
cgroup: Arc<Cgroup>, | ||
/// 兄弟节点 | ||
sibling: LinkedList<Arc<Cgroup>>, | ||
/// 孩子节点 | ||
children: LinkedList<Arc<Cgroup>>, | ||
} | ||
|
||
pub struct CgroupSubsys {} | ||
|
||
/// cgroup_sub_state 的集合 | ||
pub struct CssSet { | ||
subsys: Vec<Arc<CgroupSubsysState>>, | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,41 @@ | ||
use super::OvlInode; | ||
use crate::{ | ||
filesystem::vfs::{IndexNode, Metadata}, | ||
libs::spinlock::SpinLock, | ||
}; | ||
use alloc::sync::Arc; | ||
use system_error::SystemError; | ||
|
||
impl OvlInode { | ||
pub fn copy_up(&self) -> Result<(), SystemError> { | ||
let mut upper_inode = self.upper_inode.lock(); | ||
if upper_inode.is_some() { | ||
return Ok(()); | ||
} | ||
|
||
let lower_inode = self.lower_inode.as_ref().ok_or(SystemError::ENOENT)?; | ||
|
||
let metadata = lower_inode.metadata()?; | ||
let new_upper_inode = self.create_upper_inode(metadata.clone())?; | ||
|
||
let mut buffer = vec![0u8; metadata.size as usize]; | ||
let lock = SpinLock::new(crate::filesystem::vfs::FilePrivateData::Unused); | ||
lower_inode.read_at(0, metadata.size as usize, &mut buffer, lock.lock())?; | ||
|
||
new_upper_inode.write_at(0, metadata.size as usize, &buffer, lock.lock())?; | ||
|
||
*upper_inode = Some(new_upper_inode); | ||
|
||
Ok(()) | ||
} | ||
|
||
fn create_upper_inode(&self, metadata: Metadata) -> Result<Arc<dyn IndexNode>, SystemError> { | ||
let upper_inode = self.upper_inode.lock(); | ||
let upper_root_inode = upper_inode | ||
.as_ref() | ||
.ok_or(SystemError::ENOSYS)? | ||
.fs() | ||
.root_inode(); | ||
upper_root_inode.create_with_data(&self.dname()?.0, metadata.file_type, metadata.mode, 0) | ||
} | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,32 @@ | ||
use alloc::sync::Arc; | ||
|
||
use alloc::vec::Vec; | ||
|
||
use crate::filesystem::vfs::IndexNode; | ||
|
||
use super::{OvlInode, OvlSuperBlock}; | ||
#[derive(Debug)] | ||
pub struct OvlEntry { | ||
numlower: usize, // 下层数量 | ||
lowerstack: Vec<OvlPath>, | ||
} | ||
|
||
impl OvlEntry { | ||
pub fn new() -> Self { | ||
Self { | ||
numlower: 2, | ||
lowerstack: Vec::new(), | ||
} | ||
} | ||
} | ||
#[derive(Debug)] | ||
pub struct OvlPath { | ||
layer: Arc<OvlLayer>, | ||
inode: Arc<dyn IndexNode>, | ||
} | ||
#[derive(Debug)] | ||
pub struct OvlLayer { | ||
pub mnt: Arc<OvlInode>, // 挂载点 | ||
pub index: u32, // 0 是上层读写层,>0 是下层只读层 | ||
pub fsid: u32, // 文件系统标识符 | ||
} |
Oops, something went wrong.