-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathsyscall-fork-in-linux.html
777 lines (684 loc) · 80.4 KB
/
syscall-fork-in-linux.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
<!DOCTYPE html>
<html lang="zh">
<head>
<meta charset="utf-8" />
<meta http-equiv="X-UA-Compatible" content="IE=edge" />
<meta name="HandheldFriendly" content="True" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<meta name="robots" content="index, follow" />
<link href="https://fonts.googleapis.com/css2?family=Source+Code+Pro:ital,wght@0,400;0,700;1,400&family=Source+Sans+Pro:ital,wght@0,300;0,400;0,700;1,400&display=swap" rel="stylesheet">
<link rel="stylesheet" type="text/css" href="https://blog.tonychow.me/theme/stylesheet/style.min.css">
<link id="pygments-light-theme" rel="stylesheet" type="text/css"
href="https://blog.tonychow.me/theme/pygments/colorful.min.css">
<link rel="stylesheet" type="text/css" href="https://blog.tonychow.me/theme/font-awesome/css/fontawesome.css">
<link rel="stylesheet" type="text/css" href="https://blog.tonychow.me/theme/font-awesome/css/brands.css">
<link rel="stylesheet" type="text/css" href="https://blog.tonychow.me/theme/font-awesome/css/solid.css">
<link href="https://blog.tonychow.me/feeds/all.atom.xml" type="application/atom+xml" rel="alternate" title="Tonychow's Blog Atom">
<!-- Chrome, Firefox OS and Opera -->
<meta name="theme-color" content="#333333">
<!-- Windows Phone -->
<meta name="msapplication-navbutton-color" content="#333333">
<!-- iOS Safari -->
<meta name="apple-mobile-web-app-capable" content="yes">
<meta name="apple-mobile-web-app-status-bar-style" content="black-translucent">
<!-- Microsoft EDGE -->
<meta name="msapplication-TileColor" content="#333333">
<meta name="author" content="tonychow" />
<meta name="description" content="" />
<meta name="keywords" content="linux, system-call, fork, source-reading">
<meta property="og:site_name" content="Tonychow's Blog"/>
<meta property="og:title" content="Linux中fork系统调用分析"/>
<meta property="og:description" content=""/>
<meta property="og:locale" content="en_US"/>
<meta property="og:url" content="https://blog.tonychow.me/syscall-fork-in-linux.html"/>
<meta property="og:type" content="article"/>
<meta property="article:published_time" content="2013-06-27 00:00:00+08:00"/>
<meta property="article:modified_time" content=""/>
<meta property="article:author" content="https://blog.tonychow.me/author/tonychow.html">
<meta property="article:section" content="linux"/>
<meta property="article:tag" content="linux"/>
<meta property="article:tag" content="system-call"/>
<meta property="article:tag" content="fork"/>
<meta property="article:tag" content="source-reading"/>
<meta property="og:image" content="/images/avatar.jpg">
<title>Tonychow's Blog – Linux中fork系统调用分析</title>
</head>
<body class="light-theme">
<aside>
<div>
<a href="https://blog.tonychow.me/">
<img src="/images/avatar.jpg" alt="" title="">
</a>
<h1>
<a href="https://blog.tonychow.me/"></a>
</h1>
<p>Go/Python backend developer</p>
<ul class="social">
<li>
<a class="sc-github" href="https://github.com/chow1937" target="_blank">
<i class="fab fa-github"></i>
</a>
</li>
</ul>
</div>
</aside>
<main>
<nav>
<a href="https://blog.tonychow.me/">Home</a>
<a href="/archives.html">Archives</a>
<a href="/categories.html">Categories</a>
<a href="/tags.html">Tags</a>
<a href="https://blog.tonychow.me/feeds/all.atom.xml">Atom</a>
</nav>
<article class="single">
<header>
<h1 id="syscall-fork-in-linux">Linux中fork系统调用分析</h1>
<p>
Posted on 四 27 六月 2013 in <a href="https://blog.tonychow.me/category/linux.html">linux</a>
• 10 min read
</p>
</header>
<div>
<h3>1 相关概念及简单分析</h3>
<p>在这一部分,我将会提及相关的概念比如进程,进程空间等,同时也对 fork 系统调用过程进行简单的文字描述。</p>
<h4>1.1 进程</h4>
<p>操作系统是在计算机硬件和应用程序或者用户程序之间的一个软件层,它通过对硬件资源的抽象,对应用程序隐藏了复杂的硬件资源,状态及操作,同时也隔离了应用程序和硬件资源,防止应用软件随意地操作硬件而带来的安全隐患。操作系统为应用程序提供了几种重要的抽象概念,进程就是操作系统中最基础的抽象概念之一。</p>
<p>通常情况下,我们认为进程是一个程序(program)的运行实例。当一个程序存放在储存介质上的时候,它只是一个指令,数据及其组织形式的描述。操作系统可以将一个程序加载到内存中以一个进程的形式运行起来,这就是这个程序的一个运行实例。所以我们也是可以多次加载一个程序到内存中,形成该程序的多个独立的运行实例。一个进程的内容不单只是程序的执行指令,还包括了诸如打开的文件,等待的信号,内部内核数据,处理器状态,内存地址空间及内存映射等等的资源。</p>
<p>在早期的分时系统和面向进程的操作系统(比如早期的Unix和Linux)中,进程被认为是运行的基本单位。而在面向线程的操作系统(比如Linux2.6或更高版本)中,进程是资源分配的基本单位,而线程才是运行的基本单位。进程是线程的容器,而一个进程中会有一个或多个线程。实际上,在Linux中,对线程和进程有着特别的统一实现,线程只是一种特别的进程。这在下面的分析中将会提及。</p>
<h4>1.2 进程空间</h4>
<p>在进程被创建的时候,操作系统同时也给这个进程创建了一个独立的虚拟内存地址空间。这个虚拟内存地址空间使得一个进程存在着它独自使用着所有的内存资源的错觉,而且这也是该进程独立的,完全不受其他进程的干扰,所以这也使得各个进程区分开来。虽然对于一个进程而言,它拥有着很大的一个虚拟内存地址空间,但是这并不意味着每个进程实际上都拥有这么大物理内存。只有在真正使用某一部分内存空间的时候,这一部分虚拟内存才会被映射到物理内存上。此外,一个进程也不是可以访问或者修改这个虚拟内存地址空间的所有地址的。一个典型的进程内存地址空间会被分为 stack,heap,text,data,bss 等多个段,如下图(来自 Unix 高级环境编程)所示,这是一个进程在Intel x86架构机器上面的进程空间的逻辑表示:</p>
<p><img alt="进程空间" src="../images/linux-process-memory-model.webp"></p>
<p>从上图可以看到,从低地址到高地址,有:</p>
<ul>
<li>text 段,主要保存着程序的代码对应的机器指令,这也将会是 CPU 所将要执行的机器指令的集合。text 段是可共享的,所以对于经常执行的程序只需保留一份 text 段的拷贝在内存中就可以了。特别地,text 段是只读的,进程无法对 text 段进行修改,这样可以防止一个进程意外地修改它自己的指令。</li>
<li>data 段,包含着程序已经被初始化的变量。</li>
<li>bss 段,在这个段中的未初始化变量在程序开始运行之前将会被内核初始化为0或者控指针。</li>
<li>heap 段,用户程序动态的内存分配将会在这里进行。</li>
<li>stack 段,每次一个函数被调用,函数的返回地址和调用者函数的上下文比如一些寄存器变量将会保存在这里。同时,这个被调用的函数将会为它的临时变量在这里分配一定内存空间。</li>
<li>在 stack 之上是命令行参数和一些环境变量。</li>
<li>更高的空间是内核空间,一般的进程都是不被允许访问的。</li>
</ul>
<p>此外,stack 和 heap 段的增长方式是不同的,stack 段的内存是从高地址向低地址增长的,而 heap 段是从低地址向高地址增长的。一般情况下,stack 段的大小是有限制的,而 heap 段的大小是没有限制的,可以一直增长到整个系统的极限。在 stack 和 heap 之间是非常巨大的一个空间。</p>
<h4>1.3 进程描述符</h4>
<p>在 Linux 操作系统中,每个进程被创建的时候,内核会给这个进程分配一个进程描述符结构。进程描述符在一般的操作系统概念中也被称为 PCB ,也就是进程控制块。这个进程描述符保存了这个进程的状态,标识符,打开的文件,等待的信号,文件系统等待的资源信息。每个进程描述符都表示了独立的一个进程,而在系统中,每个进程的进程描述都加入到一个双向循环的任务队列中,由操作系统进行进程的调度,决定哪个进程可以占用 CPU ,哪个进程应该让出 CPU 。Linux 中的进程描述符是一个 task_struct 类型的结构体。在 Linux 中,一个进程的进程描述符结构如下图所示:</p>
<p><img alt="进程描述符" src="../images/linux-task-struct.webp">)</p>
<p>task_struct 是一个相当大的数据结构,同时里面也指向了其他类型的数据结构,比如 thread_info,指向的是这个进程的线程信息; mm_struct 指向了这个进程的内存结构; file_struct 指向了这个进程打开的进程描述符结构,等等。task_struct 是一个复杂的数据结构,我们将会在下面对其进行更详细的分析。</p>
<h4>1.4 系统调用</h4>
<p>操作系统内核的代码运行在内核空间中,而应用程序或者我们平时所写的程序是运行在用户空间中的。操作系统对内核空间有相关的限制和保护,以免操作系统内核的空间受到用户应用程序的修改。也就是说只有内核才具有访问内核空间的权限,而应用程序是无法直接访问内核空间的。结合虚拟内存空间和进程空间,我们可以知道,内核空间的页表是常驻在内存中,不会被替换出去的。</p>
<p>我们上面提到,操作系统将硬件资源和应用程序隔离开来,那应用程序如果需要操作一些硬件或者获取一些资源如何实现?答案是内核提供了一系列的服务比如 IO 或者进程管理等给应用程序调用,也就是通过系统调用( system call )。如下图:</p>
<p><img alt="系统调用" src="../images/linux-os.webp"></p>
<p>系统调用实际上就是函数调用,也是一系列的指令的集合。和普通的应用程序不同,系统调用是运行在内核空间的。当应用程序调用系统调用的时候,将会从用户空间切换到内核空间运行内核的代码。不同的架构实现内核调用的方式不同,在 i386 架构上,运行在用户空间的应用程序如果需要调用相关的系统调用,可以首先把系统调用编号和参数存放在相关的寄存器中,然后使用0x80这个值来执行软中断 int 。软中断发生之后,内核根据寄存器中的系统调用编号去执行相关的系统调用指令。</p>
<p>正如上面的图所展示的,应用程序可以直接通过系统调用接口调用内核提供的系统调用,也可以通过调用一些 C 库函数,而这些 C 库函数实际上是通过系统调用接口调用相关的系统调用。C 库函数有些在调用系统调用前后做一些特别的处理,但也有些函数只是单纯地对系统调用做了一层包装。</p>
<h4>1.5 fork 系统调用</h4>
<p>fork 系统调用是 Linux 中提供的众多系统调用中的一个,是2号系统调用。在 Linux 中,需要一种机制来创建新的进程,而 fork 就是 Linux 中提供的一个从旧的进程中创建新的进程的方法。我们在编程中,一般是调用 C 库的 fork 函数,而这个 fork 函数则是直接包装了 fork 系统调用的一个函数。fork 函数的效果是对当前进程进行复制,然后创建一个新的进程。旧进程和新进程之间是父子关系,父子进程共享了同一个 text 段,并且父子进程被创建后会从 fork 函数调用点下一个指令继续执行。fork 函数有着一次调用,两次返回的特点。在父进程中,fork 调用将会返回子进程的 PID ,而在子进程中,fork 调用返回的是0。之所以这样处理是因为进程描述符中保存着父进程的 PID ,所以子进程可以通过 getpid 来获取父进程的 PID,而进程描述符中却没有保存子进程的 PID 。</p>
<p>fork系统调用的调用过程简单描述如下:</p>
<ol>
<li>首先是开始,父进程调用 fork ,因为这是一个系统调用,所以会导致 int 软中断,进入内核空间;</li>
<li>内核根据系统调用号,调用 sys_fork 系统调用,而 sys_fork 系统调用则是通过 clone 系统调用实现的,会调用 clone 系统调用;</li>
<li>clone 系统调用的参数有一系列的标志用来标明父子进程之间将要共享的内容,这些内容包括虚拟内存空间,文件系统,文件描述符等。而对于 fork 来说,它调用 clone 系统调用的时候只是给 clone 一个 SIGCHLD 的标志,这表示子进程结束后将会给父进程一个 SIGCHLD 信号;</li>
<li>在 clone 函数中,将会调用 do_fork,这个函数是 fork 的主要执行部分。在 do_fork 中,首先做一些错误检查工作和准备复制父进程的初始化工作。然后 do_fork 函数调用 copy_process。</li>
<li>copy_process 是对父进程的内核状态和相关的资源进行复制的主要函数。然后 copy_process 会调用 copy_thread 函数,复制父进程的执行状态,包括相关寄存器的值,指令指针和建立相关的栈;</li>
<li>copy_thread 中还干了一件事,就是把0值写入到寄存器中,然后将指令指针指向一个汇编函数 ret_from_fork 。所以在子进程运行的时候,虽然代码和父进程的代码是一致的,但是还是有些区别。在 copy_thread 完毕后,没有返回到 do_fork ,而是跳到 ret_from_fork ,进行一些清理工作,然后退出到用户空间。用户空间函数可以通过寄存器中的值得到 fork 系统调用的返回值为0。</li>
<li>copy_process 将会返回一个指向子进程的指针。然后回到 do_fork 函数,当 copy_process 函数成功返回的时候,子进程被唤醒,然后加入到进程调度队列中。此外,do_fork 将会返回子进程 的 PID;</li>
</ol>
<p>在 Linux 中,创建一个新的进程的方式有三种,分别是 fork , vfork 和 clone。fork 是通过 clone 来实现的,而 vfork 和 clone 又是都通过 do_fork 函数来进行接下来的操作。</p>
<h3>2 相关源码分析</h3>
<p>本部分内容主要是对相关的具体源码进行分析,使用的 Linux 内核源码版本为3.6.11。被分析的源码并不是全部的相关源码,只是相关源码的一些重要部分。</p>
<h4>2.1 进程描述符</h4>
<p>在 Linux 中,进程描述符是一个 task_struct 类型的数据结构,这个数据结构的定义是在 Linux 源码的 include/linux/sched.h 中。</p>
<div class="highlight"><pre><span></span><code><span class="k">struct</span> <span class="nc">task_struct</span> <span class="p">{</span>
<span class="k">volatile</span> <span class="kt">long</span> <span class="n">state</span><span class="p">;</span> <span class="cm">/* -1 unrunnable, 0 runnable, >0 stopped */</span>
<span class="kt">void</span> <span class="o">*</span><span class="n">stack</span><span class="p">;</span>
<span class="n">atomic_t</span> <span class="n">usage</span><span class="p">;</span>
<span class="kt">unsigned</span> <span class="kt">int</span> <span class="n">flags</span><span class="p">;</span> <span class="cm">/* per process flags, defined below */</span>
<span class="kt">unsigned</span> <span class="kt">int</span> <span class="n">ptrace</span><span class="p">;</span>
<span class="p">...</span>
</code></pre></div>
<p>task_struct中 存放着一个进程的状态 state。进程的状态主要有五种,同时也是在 sched.h 中定义的:</p>
<div class="highlight"><pre><span></span><code><span class="cp">#define TASK_RUNNING 0</span>
<span class="cp">#define TASK_INTERRUPTIBLE 1</span>
<span class="cp">#define TASK_UNINTERRUPTIBLE 2</span>
<span class="cp">#define __TASK_STOPPED 4</span>
<span class="cp">#define __TASK_TRACED 8</span>
</code></pre></div>
<p>TASK_RUNNING:表示该进程是可以运行的,有可能是正在运行或者处于一个运行队列中等待运行。</p>
<p>TASK_INTERRUPTIBLE:进程正在休眠,或者说是被阻塞,等待一写条件成立,然后就会被唤醒,进入 TASK_RUNNING 状态。</p>
<p>TASK_UNINTERRUPTIBLE:和 TASK_INTERRUPTIBLE 状态一样,区别在于处于这个状态的进程不会对信号做出反应也不会转换到 TASK_RUNNING 状态。一般在进程不能受干扰或者等待的事件很快就会出现的情况下才会出现这种状态。</p>
<p>__TASK_STOPPED:进程的执行已经停止了,进程没有在运行也不能够运行。在进程接收到 SIGSTOP,SIGTSTP,SGITTIN 或者 SIGTOU 信号的时候就会进入这个状态。</p>
<p>__TASK_TRACED:该进程正在被其他进程跟踪运行,比如被 ptrace 跟踪中。</p>
<div class="highlight"><pre><span></span><code><span class="p">...</span>
<span class="kt">int</span> <span class="n">prio</span><span class="p">,</span> <span class="n">static_prio</span><span class="p">,</span> <span class="n">normal_prio</span><span class="p">;</span>
<span class="kt">unsigned</span> <span class="kt">int</span> <span class="n">rt_priority</span><span class="p">;</span>
<span class="k">const</span> <span class="k">struct</span> <span class="nc">sched_class</span> <span class="o">*</span><span class="n">sched_class</span><span class="p">;</span>
<span class="k">struct</span> <span class="nc">sched_entity</span> <span class="n">se</span><span class="p">;</span>
<span class="k">struct</span> <span class="nc">sched_rt_entity</span> <span class="n">rt</span><span class="p">;</span>
<span class="p">...</span>
<span class="kt">unsigned</span> <span class="kt">int</span> <span class="n">policy</span><span class="p">;</span>
</code></pre></div>
<p>这一部分是有关于进程调度信息的内容,调度程序利用这部分的信息决定哪一个进程最应该运行,并结合进程的状态信息保证系统进程调度的公平及高效。其中 prio , static_prio , normal_prio 分别表示了进程的动态优先级,静态优先级,普通优先级。rt_priority 表示进程的实时优先级,而 sched_class 则表示调度的类。se 和 rt 表示的都是调度实体,一个用于普通进程,一个用于实时进程。policy 则指出了进程的调度策略,进程的调度策略也是在 include/linux/sched.h 中定义的,如下:</p>
<div class="highlight"><pre><span></span><code><span class="cm">/*</span>
<span class="cm"> * Scheduling policies</span>
<span class="cm"> */</span>
<span class="cp">#define SCHED_NORMAL 0</span>
<span class="cp">#define SCHED_FIFO 1</span>
<span class="cp">#define SCHED_RR 2</span>
<span class="cp">#define SCHED_BATCH 3</span>
<span class="cm">/* SCHED_ISO: reserved but not implemented yet */</span>
<span class="cp">#define SCHED_IDLE 5</span>
</code></pre></div>
<p>也就是有这几种调度策略:</p>
<ul>
<li>SCHED_NORMAL,用于普通进程;</li>
<li>SCHED_FIFO,先来先服务;</li>
<li>SCHED_RR,时间片轮转调度;</li>
<li>SCHED_BATCH,用于非交互的处理器消耗型进程;</li>
<li>SCHED_IDLE,主要是在系统负载低的时候使用。</li>
</ul>
<p>一个进程还包括了各种的标识符,用来标识某一个特定的进程,同时也用来标识这个进程所属的进程组。如下:</p>
<div class="highlight"><pre><span></span><code><span class="p">...</span>
<span class="kt">pid_t</span> <span class="n">pid</span><span class="p">;</span>
<span class="kt">pid_t</span> <span class="n">tgid</span><span class="p">;</span>
<span class="p">...</span>
</code></pre></div>
<p>同时,在 task_struct 中也定义了一些特别指向其他进程的指针。</p>
<div class="highlight"><pre><span></span><code><span class="p">...</span>
<span class="cm">/*</span>
<span class="cm"> * pointers to (original) parent process, youngest child, younger sibling,</span>
<span class="cm"> * older sibling, respectively. (p->father can be replaced with</span>
<span class="cm"> * p->real_parent->pid)</span>
<span class="cm"> */</span>
<span class="k">struct</span> <span class="nc">task_struct</span> <span class="n">__rcu</span> <span class="o">*</span><span class="n">real_parent</span><span class="p">;</span> <span class="cm">/* real parent process */</span>
<span class="k">struct</span> <span class="nc">task_struct</span> <span class="n">__rcu</span> <span class="o">*</span><span class="n">parent</span><span class="p">;</span> <span class="cm">/* recipient of SIGCHLD, wait4() reports */</span>
<span class="cm">/*</span>
<span class="cm"> * children/sibling forms the list of my natural children</span>
<span class="cm"> */</span>
<span class="k">struct</span> <span class="nc">list_head</span> <span class="n">children</span><span class="p">;</span> <span class="cm">/* list of my children */</span>
<span class="k">struct</span> <span class="nc">list_head</span> <span class="n">sibling</span><span class="p">;</span> <span class="cm">/* linkage in my parent's children list */</span>
<span class="k">struct</span> <span class="nc">task_struct</span> <span class="o">*</span><span class="n">group_leader</span><span class="p">;</span> <span class="cm">/* threadgroup leader */</span>
<span class="p">...</span>
</code></pre></div>
<p>正如上面这段代码中的注释所表示的,real_parent 指向本进程真正的父进程,也就是原始的父进程,而 parent 则指向了接收 SIGCHLD 信号的进程,如果一个进程被托孤给另外一个进程,比如 init 进程,那 init 进程将会是这个进程的 parent ,但不是原始进程。childern 则是一个本进程的子进程列表,sibling 是本进程的父进程的子进程列表。而 group_leader 指针指向的是线程组的领头进程。</p>
<div class="highlight"><pre><span></span><code><span class="p">...</span>
<span class="n">cputime_t</span> <span class="n">utime</span><span class="p">,</span> <span class="n">stime</span><span class="p">,</span> <span class="n">utimescaled</span><span class="p">,</span> <span class="n">stimescaled</span><span class="p">;</span>
<span class="n">cputime_t</span> <span class="n">gtime</span><span class="p">;</span>
<span class="p">...</span>
<span class="kt">unsigned</span> <span class="kt">long</span> <span class="n">nvcsw</span><span class="p">,</span> <span class="n">nivcsw</span><span class="p">;</span> <span class="cm">/* context switch counts */</span>
<span class="k">struct</span> <span class="nc">timespec</span> <span class="n">start_time</span><span class="p">;</span> <span class="cm">/* monotonic time */</span>
<span class="k">struct</span> <span class="nc">timespec</span> <span class="n">real_start_time</span><span class="p">;</span> <span class="cm">/* boot based time */</span>
<span class="cm">/* mm fault and swap info: this can arguably be seen as either</span>
<span class="cm">mm-specific or thread-specific */</span>
<span class="kt">unsigned</span> <span class="kt">long</span> <span class="n">min_flt</span><span class="p">,</span> <span class="n">maj_flt</span><span class="p">;</span>
<span class="k">struct</span> <span class="nc">task_cputime</span> <span class="n">cputime_expires</span><span class="p">;</span>
<span class="k">struct</span> <span class="nc">list_head</span> <span class="n">cpu_timers</span><span class="p">[</span><span class="mi">3</span><span class="p">];</span>
<span class="p">...</span>
</code></pre></div>
<p>一个进程,从创建到结束,这是它的生命周期。在进程生命周期中有许多与时间相关的内容,这些内容也包括在进程描述符中了。如上代码,我们可以看到有好几个数据类型为 cputime 的成员。utime 和 stime 分别表示进程在用户态下使用 CPU 的时间和在内核态下使用 CPU 的时间,这两个成员的单位是一个 click 。而 utimescaled 和 stimescaled 同样也是分别表示进程在这两种状态下使用 CPU 的时间,只不过单位是处理器的频率。 gtime 表示的是虚拟处理器的运行时间。start_time 和 real_start_time 表示的都是进程的创建时间,real_start_time 包括了进程睡眠的时间。cputime_expires 表示的是进程或者进程组被跟踪的 CPU 时间,对应着 cpu_timers 的三个值。</p>
<div class="highlight"><pre><span></span><code><span class="cm">/* filesystem information */</span>
<span class="k">struct</span> <span class="nc">fs_struct</span> <span class="o">*</span><span class="n">fs</span><span class="p">;</span>
<span class="cm">/* open file information */</span>
<span class="k">struct</span> <span class="nc">files_struct</span> <span class="o">*</span><span class="n">files</span><span class="p">;</span>
</code></pre></div>
<p>如上,进程描述符还保存了进程的文件系统相关的信息,比如上面的两个成员,fs 表示的是进程与文件系统的关联,包括当前目录和根目录,而 files 则是指向进程打开的文件</p>
<p>在进程描述符中,还有很多重要的信息,比如虚拟内存信息,进程间通信机制, pipe ,还有一些中断和锁的机制等等。更具体的内容可以直接翻阅 Linux 源码中 task_struct 的定义。</p>
<h4>2.2 fork 系统调用</h4>
<p>fork 系统调用实际上调用的是 sys_fork 这个函数,在 Linux 中,sys_fork 是一个定义在 arch/alpha/kernel/entry.S 中的汇编函数。</p>
<div class="highlight"><pre><span></span><code> <span class="p">.</span><span class="n">align</span> <span class="mi">4</span>
<span class="p">.</span><span class="n">globl</span> <span class="n">sys_fork</span>
<span class="p">.</span><span class="n">ent</span> <span class="n">sys_fork</span>
<span class="nl">sys_fork</span><span class="p">:</span>
<span class="p">.</span><span class="n">prologue</span> <span class="mi">0</span>
<span class="n">mov</span> <span class="n">$sp</span><span class="p">,</span> <span class="n">$21</span>
<span class="n">bsr</span> <span class="n">$1</span><span class="p">,</span> <span class="n">do_switch_stack</span>
<span class="n">bis</span> <span class="n">$31</span><span class="p">,</span> <span class="n">SIGCHLD</span><span class="p">,</span> <span class="n">$16</span>
<span class="n">mov</span> <span class="n">$31</span><span class="p">,</span> <span class="n">$17</span>
<span class="n">mov</span> <span class="n">$31</span><span class="p">,</span> <span class="n">$18</span>
<span class="n">mov</span> <span class="n">$31</span><span class="p">,</span> <span class="n">$19</span>
<span class="n">mov</span> <span class="n">$31</span><span class="p">,</span> <span class="n">$20</span>
<span class="n">jsr</span> <span class="n">$26</span><span class="p">,</span> <span class="n">alpha_clone</span>
<span class="n">bsr</span> <span class="n">$1</span><span class="p">,</span> <span class="n">undo_switch_stack</span>
<span class="n">ret</span>
<span class="p">.</span><span class="n">end</span> <span class="n">sys_fork</span>
</code></pre></div>
<p>如上,可以看到在sys_fork中,将相关的标志 SIGCHLD 等参数压栈后,然后就专跳到 alpga_clone 函数中执行。</p>
<h4>2.3 alpha_clone</h4>
<p>alpha_clone 函数的定义在源码目录中的 arch/alpah/kernel/process.c ,具体代码如下:</p>
<div class="highlight"><pre><span></span><code><span class="cm">/*</span>
<span class="cm"> * "alpha_clone()".. By the time we get here, the</span>
<span class="cm"> * non-volatile registers have also been saved on the</span>
<span class="cm"> * stack. We do some ugly pointer stuff here.. (see</span>
<span class="cm"> * also copy_thread)</span>
<span class="cm"> *</span>
<span class="cm"> * Notice that "fork()" is implemented in terms of clone,</span>
<span class="cm"> * with parameters (SIGCHLD, 0).</span>
<span class="cm"> */</span>
<span class="kt">int</span>
<span class="nf">alpha_clone</span><span class="p">(</span><span class="kt">unsigned</span> <span class="kt">long</span> <span class="n">clone_flags</span><span class="p">,</span> <span class="kt">unsigned</span> <span class="kt">long</span> <span class="n">usp</span><span class="p">,</span>
<span class="kt">int</span> <span class="n">__user</span> <span class="o">*</span><span class="n">parent_tid</span><span class="p">,</span> <span class="kt">int</span> <span class="n">__user</span> <span class="o">*</span><span class="n">child_tid</span><span class="p">,</span>
<span class="kt">unsigned</span> <span class="kt">long</span> <span class="n">tls_value</span><span class="p">,</span> <span class="k">struct</span> <span class="nc">pt_regs</span> <span class="o">*</span><span class="n">regs</span><span class="p">)</span>
<span class="p">{</span>
<span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="n">usp</span><span class="p">)</span>
<span class="n">usp</span> <span class="o">=</span> <span class="n">rdusp</span><span class="p">();</span>
<span class="k">return</span> <span class="n">do_fork</span><span class="p">(</span><span class="n">clone_flags</span><span class="p">,</span> <span class="n">usp</span><span class="p">,</span> <span class="n">regs</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="n">parent_tid</span><span class="p">,</span> <span class="n">child_tid</span><span class="p">);</span>
<span class="p">}</span>
</code></pre></div>
<p>正如注释所提到的,在执行 alpah_clone 函数之前已经将寄存器的相关的值保存到栈中了,在此函数中将会根据相关的调用 do_fork 函数。</p>
<h4>2.4 do_fork</h4>
<p>创建一个新的进程的大部分工作是在 do_fork 中完成的,主要是根据标志参数对父进程的相关资源进行复制,得到一个新的进程。do_fork 函数定义在源码目录的 kernel/fork.c 中。</p>
<div class="highlight"><pre><span></span><code><span class="cm">/*</span>
<span class="cm"> * Ok, this is the main fork-routine.</span>
<span class="cm"> *</span>
<span class="cm"> * It copies the process, and if successful kick-starts</span>
<span class="cm"> * it and waits for it to finish using the VM if required.</span>
<span class="cm"> */</span>
<span class="kt">long</span> <span class="nf">do_fork</span><span class="p">(</span><span class="kt">unsigned</span> <span class="kt">long</span> <span class="n">clone_flags</span><span class="p">,</span>
<span class="kt">unsigned</span> <span class="kt">long</span> <span class="n">stack_start</span><span class="p">,</span>
<span class="k">struct</span> <span class="nc">pt_regs</span> <span class="o">*</span><span class="n">regs</span><span class="p">,</span>
<span class="kt">unsigned</span> <span class="kt">long</span> <span class="n">stack_size</span><span class="p">,</span>
<span class="kt">int</span> <span class="n">__user</span> <span class="o">*</span><span class="n">parent_tidptr</span><span class="p">,</span>
<span class="kt">int</span> <span class="n">__user</span> <span class="o">*</span><span class="n">child_tidptr</span><span class="p">)</span>
<span class="p">{</span>
<span class="k">struct</span> <span class="nc">task_struct</span> <span class="o">*</span><span class="n">p</span><span class="p">;</span>
<span class="kt">int</span> <span class="n">trace</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
<span class="kt">long</span> <span class="n">nr</span><span class="p">;</span>
<span class="p">...</span>
</code></pre></div>
<p>首先我们来了解一下 do_fork 函数的参数。clone_flags 是一个标志集合,主要是用来控制复制父进程的资源。clone_flags 的低位保存了子进程结束时发给父进程的信号号码,而高位则保存了其他的各种常数。这些常数也是定义在 include/linux/sched.h 中的,如下:</p>
<div class="highlight"><pre><span></span><code><span class="cm">/*</span>
<span class="cm"> * cloning flags:</span>
<span class="cm"> */</span>
<span class="cp">#define CSIGNAL 0x000000ff </span><span class="cm">/* signal mask to be sent at exit */</span><span class="cp"></span>
<span class="cp">#define CLONE_VM 0x00000100 </span><span class="cm">/* set if VM shared between processes */</span><span class="cp"></span>
<span class="cp">#define CLONE_FS 0x00000200 </span><span class="cm">/* set if fs info shared between processes */</span><span class="cp"></span>
<span class="cp">#define CLONE_FILES 0x00000400 </span><span class="cm">/* set if open files shared between processes */</span><span class="cp"></span>
<span class="cp">#define CLONE_SIGHAND 0x00000800 </span><span class="cm">/* set if signal handlers and blocked signals shared */</span><span class="cp"></span>
<span class="cp">#define CLONE_PTRACE 0x00002000 </span><span class="cm">/* set if we want to let tracing continue on the child too */</span><span class="cp"></span>
<span class="cp">#define CLONE_VFORK 0x00004000 </span><span class="cm">/* set if the parent wants the child to wake it up on mm_release */</span><span class="cp"></span>
<span class="cp">#define CLONE_PARENT 0x00008000 </span><span class="cm">/* set if we want to have the same parent as the cloner */</span><span class="cp"></span>
<span class="cp">#define CLONE_THREAD 0x00010000 </span><span class="cm">/* Same thread group? */</span><span class="cp"></span>
<span class="cp">#define CLONE_NEWNS 0x00020000 </span><span class="cm">/* New namespace group? */</span><span class="cp"></span>
<span class="cp">#define CLONE_SYSVSEM 0x00040000 </span><span class="cm">/* share system V SEM_UNDO semantics */</span><span class="cp"></span>
<span class="cp">#define CLONE_SETTLS 0x00080000 </span><span class="cm">/* create a new TLS for the child */</span><span class="cp"></span>
<span class="cp">#define CLONE_PARENT_SETTID 0x00100000 </span><span class="cm">/* set the TID in the parent */</span><span class="cp"></span>
<span class="cp">#define CLONE_CHILD_CLEARTID 0x00200000 </span><span class="cm">/* clear the TID in the child */</span><span class="cp"></span>
<span class="cp">#define CLONE_DETACHED 0x00400000 </span><span class="cm">/* Unused, ignored */</span><span class="cp"></span>
<span class="cp">#define CLONE_UNTRACED 0x00800000 </span><span class="cm">/* set if the tracing process can't force CLONE_PTRACE on this clone */</span><span class="cp"></span>
<span class="cp">#define CLONE_CHILD_SETTID 0x01000000 </span><span class="cm">/* set the TID in the child */</span><span class="cp"></span>
</code></pre></div>
<ul>
<li>CLONE_VM 表示在父子进程间共享 VM ;</li>
<li>CLONE_FS 表示在父子进程间共享文件系统信息,包括工作目录等;</li>
<li>CLONE_FILES 表示在父子进程间共享打开的文件;</li>
<li>CLONE_SIGHAND 表示在父子进程间共享信号的处理函数;</li>
<li>CLONE_PTRACE 表示如果父进程被跟踪,子进程也被跟踪;</li>
<li>CLONE_VFORK 在 vfork 的时候使用;</li>
<li>CLONE_PARENT 表示和复制的进程有同样的父进程;</li>
<li>CLONE_THREAD 表示同一个线程组;</li>
</ul>
<p>之前提到过,在 Linux 中,线程的实现是和进程统一的,就是说,在 Linux 中,进程和线程的结构都是 task_struct 。区别在于,多个线程会共享一个进程的资源,包括虚拟地址空间,文件系统,打开的文件和信号处理函数。线程的创建和一般的进程的创建差不多,区别在于调用 clone 系统调用时,需要通过传入相关的标志参数指定要共享的特定资源。通常是这样的:clone(CLONE_VM | CLONE_FS | CLONE_FILES | CLONE_SIGHAND, 0)。</p>
<p>do_fork 函数的参数 stack_start 表示的是用户状态下,栈的起始地址。regs 是一个指向寄存器集合的指针,在其中保存了调用的参数。当进程从用户态切换到内核态的时候,该结构体保存通用寄存器中的值,并存放到内核态的堆栈中。stack_size 是用户态下的栈大小,一般是不必要的,设置为0。而 parent_tidptr 和 child_tidptr 则分别是指向用户态下父进程和和子进程的 TID 的指针。</p>
<div class="highlight"><pre><span></span><code><span class="cm">/*</span>
<span class="cm"> * Do some preliminary argument and permissions checking before we</span>
<span class="cm"> * actually start allocating stuff</span>
<span class="cm"> */</span>
<span class="k">if</span> <span class="p">(</span><span class="n">clone_flags</span> <span class="o">&</span> <span class="n">CLONE_NEWUSER</span><span class="p">)</span> <span class="p">{</span>
<span class="k">if</span> <span class="p">(</span><span class="n">clone_flags</span> <span class="o">&</span> <span class="n">CLONE_THREAD</span><span class="p">)</span>
<span class="k">return</span> <span class="o">-</span><span class="n">EINVAL</span><span class="p">;</span>
<span class="cm">/* hopefully this check will go away when userns support is</span>
<span class="cm"> * complete</span>
<span class="cm"> */</span>
<span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="n">capable</span><span class="p">(</span><span class="n">CAP_SYS_ADMIN</span><span class="p">)</span> <span class="o">||</span> <span class="o">!</span><span class="n">capable</span><span class="p">(</span><span class="n">CAP_SETUID</span><span class="p">)</span> <span class="o">||</span>
<span class="o">!</span><span class="n">capable</span><span class="p">(</span><span class="n">CAP_SETGID</span><span class="p">))</span>
<span class="k">return</span> <span class="o">-</span><span class="n">EPERM</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div>
<p>上面这段代码主要是对参数的 clone_flags 组合的正确性进行检查,因为标志需要遵循一定的规则,如果不符合,则返回错误代码。此外还需要对权限进行检查。</p>
<div class="highlight"><pre><span></span><code><span class="cm">/*</span>
<span class="cm"> * Determine whether and which event to report to ptracer. When</span>
<span class="cm"> * called from kernel_thread or CLONE_UNTRACED is explicitly</span>
<span class="cm"> * requested, no event is reported; otherwise, report if the event</span>
<span class="cm"> * for the type of forking is enabled.</span>
<span class="cm"> */</span>
<span class="k">if</span> <span class="p">(</span><span class="n">likely</span><span class="p">(</span><span class="n">user_mode</span><span class="p">(</span><span class="n">regs</span><span class="p">))</span> <span class="o">&&</span> <span class="o">!</span><span class="p">(</span><span class="n">clone_flags</span> <span class="o">&</span> <span class="n">CLONE_UNTRACED</span><span class="p">))</span> <span class="p">{</span>
<span class="k">if</span> <span class="p">(</span><span class="n">clone_flags</span> <span class="o">&</span> <span class="n">CLONE_VFORK</span><span class="p">)</span>
<span class="n">trace</span> <span class="o">=</span> <span class="n">PTRACE_EVENT_VFORK</span><span class="p">;</span>
<span class="k">else</span> <span class="k">if</span> <span class="p">((</span><span class="n">clone_flags</span> <span class="o">&</span> <span class="n">CSIGNAL</span><span class="p">)</span> <span class="o">!=</span> <span class="n">SIGCHLD</span><span class="p">)</span>
<span class="n">trace</span> <span class="o">=</span> <span class="n">PTRACE_EVENT_CLONE</span><span class="p">;</span>
<span class="k">else</span>
<span class="n">trace</span> <span class="o">=</span> <span class="n">PTRACE_EVENT_FORK</span><span class="p">;</span>
<span class="k">if</span> <span class="p">(</span><span class="n">likely</span><span class="p">(</span><span class="o">!</span><span class="n">ptrace_event_enabled</span><span class="p">(</span><span class="n">current</span><span class="p">,</span> <span class="n">trace</span><span class="p">)))</span>
<span class="n">trace</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div>
<p>决定报告给 ptracer 的事件,如果是从 kernel_thread 中调用后者参数中指明了 CLONE_UNTRACED ,将不会有任何的事件被报告。否则,根据创建进程的类型 clone ,fork 或者 vfork 报告支持的事件。</p>
<p>然后 do_fork 将会调用 copy_process,如下:</p>
<div class="highlight"><pre><span></span><code><span class="n">p</span> <span class="o">=</span> <span class="n">copy_process</span><span class="p">(</span><span class="n">clone_flags</span><span class="p">,</span> <span class="n">stack_start</span><span class="p">,</span> <span class="n">regs</span><span class="p">,</span> <span class="n">stack_size</span><span class="p">,</span>
<span class="n">child_tidptr</span><span class="p">,</span> <span class="nb">NULL</span><span class="p">,</span> <span class="n">trace</span><span class="p">);</span>
</code></pre></div>
<h4>2.5 copy_process</h4>
<p>copy_process 函数也是定义在源码目录的 kernel/fork.c 中,这个函数将会复制父进程,作为新创建的一个进程,也就是子进程。copy_process 会复制寄存器,然后也根据每个 clone 的标志,复制父进程环境的相关内容或者也可能共享父进程的内容。</p>
<div class="highlight"><pre><span></span><code><span class="k">static</span> <span class="k">struct</span> <span class="nc">task_struct</span> <span class="o">*</span><span class="n">copy_process</span><span class="p">(</span><span class="kt">unsigned</span> <span class="kt">long</span> <span class="n">clone_flags</span><span class="p">,</span>
<span class="kt">unsigned</span> <span class="kt">long</span> <span class="n">stack_start</span><span class="p">,</span>
<span class="k">struct</span> <span class="nc">pt_regs</span> <span class="o">*</span><span class="n">regs</span><span class="p">,</span>
<span class="kt">unsigned</span> <span class="kt">long</span> <span class="n">stack_size</span><span class="p">,</span>
<span class="kt">int</span> <span class="n">__user</span> <span class="o">*</span><span class="n">child_tidptr</span><span class="p">,</span>
<span class="k">struct</span> <span class="nc">pid</span> <span class="o">*</span><span class="n">pid</span><span class="p">,</span>
<span class="kt">int</span> <span class="n">trace</span><span class="p">)</span>
<span class="p">{</span>
<span class="kt">int</span> <span class="n">retval</span><span class="p">;</span>
<span class="k">struct</span> <span class="nc">task_struct</span> <span class="o">*</span><span class="n">p</span><span class="p">;</span>
<span class="kt">int</span> <span class="n">cgroup_callbacks_done</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
</code></pre></div>
<p>从 copy_process 函数的参数来看,do_fork 函数的所有参数也都被传入到这个函数中了,此外,后面还有一个参数 trace 标识是否对子进程进行跟踪和参数 pid 。在函数的开始,定义了一个未初始化的 task_struct 类型的指针 p。</p>
<p>在 copy_process 这里也对 clone 标志的有效性进行了检查,如下:</p>
<div class="highlight"><pre><span></span><code><span class="k">if</span> <span class="p">((</span><span class="n">clone_flags</span> <span class="o">&</span> <span class="p">(</span><span class="n">CLONE_NEWNS</span><span class="o">|</span><span class="n">CLONE_FS</span><span class="p">))</span> <span class="o">==</span> <span class="p">(</span><span class="n">CLONE_NEWNS</span><span class="o">|</span><span class="n">CLONE_FS</span><span class="p">))</span>
<span class="k">return</span> <span class="n">ERR_PTR</span><span class="p">(</span><span class="o">-</span><span class="n">EINVAL</span><span class="p">);</span>
<span class="k">if</span> <span class="p">((</span><span class="n">clone_flags</span> <span class="o">&</span> <span class="n">CLONE_THREAD</span><span class="p">)</span> <span class="o">&&</span> <span class="o">!</span><span class="p">(</span><span class="n">clone_flags</span> <span class="o">&</span> <span class="n">CLONE_SIGHAND</span><span class="p">))</span>
<span class="k">return</span> <span class="n">ERR_PTR</span><span class="p">(</span><span class="o">-</span><span class="n">EINVAL</span><span class="p">);</span>
<span class="k">if</span> <span class="p">((</span><span class="n">clone_flags</span> <span class="o">&</span> <span class="n">CLONE_SIGHAND</span><span class="p">)</span> <span class="o">&&</span> <span class="o">!</span><span class="p">(</span><span class="n">clone_flags</span> <span class="o">&</span> <span class="n">CLONE_VM</span><span class="p">))</span>
<span class="k">return</span> <span class="n">ERR_PTR</span><span class="p">(</span><span class="o">-</span><span class="n">EINVAL</span><span class="p">);</span>
<span class="k">if</span> <span class="p">((</span><span class="n">clone_flags</span> <span class="o">&</span> <span class="n">CLONE_PARENT</span><span class="p">)</span> <span class="o">&&</span>
<span class="n">current</span><span class="o">-></span><span class="n">signal</span><span class="o">-></span><span class="n">flags</span> <span class="o">&</span> <span class="n">SIGNAL_UNKILLABLE</span><span class="p">)</span>
<span class="k">return</span> <span class="n">ERR_PTR</span><span class="p">(</span><span class="o">-</span><span class="n">EINVAL</span><span class="p">);</span>
</code></pre></div>
<p>在 copy_process 函数中同样也进行了一系列的函数调用。比如 dup_task_struct 函数:</p>
<div class="highlight"><pre><span></span><code><span class="n">p</span> <span class="o">=</span> <span class="n">dup_task_struct</span><span class="p">(</span><span class="n">current</span><span class="p">);</span>
<span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="n">p</span><span class="p">)</span>
<span class="k">goto</span> <span class="n">fork_out</span><span class="p">;</span>
</code></pre></div>
<p>dup_task_struct 函数将会为心的进程创建一个新的内核栈,thread_info 结构和 task_struct 结构。thread_info 结构是一个比较简单的数据结构,主要保存了进程的 task_struct 还有其他一些比较底层的内容。新值和当前进程的值是一致,所以可以说此时父子进程的进程描述符是一致的。current 实际上是一个获取当前进程描述符的宏定义函数,返回当前调用系统调用的进程描述符,也就是父进程。</p>
<div class="highlight"><pre><span></span><code><span class="k">if</span> <span class="p">(</span><span class="n">atomic_read</span><span class="p">(</span><span class="o">&</span><span class="n">p</span><span class="o">-></span><span class="n">real_cred</span><span class="o">-></span><span class="n">user</span><span class="o">-></span><span class="n">processes</span><span class="p">)</span> <span class="o">>=</span>
<span class="n">task_rlimit</span><span class="p">(</span><span class="n">p</span><span class="p">,</span> <span class="n">RLIMIT_NPROC</span><span class="p">))</span> <span class="p">{</span>
<span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="n">capable</span><span class="p">(</span><span class="n">CAP_SYS_ADMIN</span><span class="p">)</span> <span class="o">&&</span> <span class="o">!</span><span class="n">capable</span><span class="p">(</span><span class="n">CAP_SYS_RESOURCE</span><span class="p">)</span> <span class="o">&&</span>
<span class="n">p</span><span class="o">-></span><span class="n">real_cred</span><span class="o">-></span><span class="n">user</span> <span class="o">!=</span> <span class="n">INIT_USER</span><span class="p">)</span>
<span class="k">goto</span> <span class="n">bad_fork_free</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div>
<p>在创建新进程的相关核心数据结构后,将会对这个新的进程进行检查,看是否超出了当前用户的进程数限制。如果超出限制了,并且没有相关的权限,也不是 init 用户,将会转跳到相关的失败处理指令处。</p>
<div class="highlight"><pre><span></span><code><span class="n">p</span><span class="o">-></span><span class="n">did_exec</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
<span class="n">delayacct_tsk_init</span><span class="p">(</span><span class="n">p</span><span class="p">);</span> <span class="cm">/* Must remain after dup_task_struct() */</span>
<span class="n">copy_flags</span><span class="p">(</span><span class="n">clone_flags</span><span class="p">,</span> <span class="n">p</span><span class="p">);</span>
<span class="n">INIT_LIST_HEAD</span><span class="p">(</span><span class="o">&</span><span class="n">p</span><span class="o">-></span><span class="n">children</span><span class="p">);</span>
<span class="n">INIT_LIST_HEAD</span><span class="p">(</span><span class="o">&</span><span class="n">p</span><span class="o">-></span><span class="n">sibling</span><span class="p">);</span>
<span class="n">rcu_copy_process</span><span class="p">(</span><span class="n">p</span><span class="p">);</span>
<span class="n">p</span><span class="o">-></span><span class="n">vfork_done</span> <span class="o">=</span> <span class="nb">NULL</span><span class="p">;</span>
<span class="n">spin_lock_init</span><span class="p">(</span><span class="o">&</span><span class="n">p</span><span class="o">-></span><span class="n">alloc_lock</span><span class="p">);</span>
<span class="n">init_sigpending</span><span class="p">(</span><span class="o">&</span><span class="n">p</span><span class="o">-></span><span class="n">pending</span><span class="p">);</span>
</code></pre></div>
<p>这段代码首先将进程描述符p的did_exec值设置为0,以保证这个新创建的进程不会被运行。因为子进程和父进程实际上还是有区别的,所以,接着将会将子进程的进程描述符的部分内容清除掉并设置为初始的值。如上,新创建的进程的描述符中 children ,sibling 和等待的信号等值都被初始化了。然后,这段代码还调用了 copy_flags 函数,copy_flags 函数如下:</p>
<div class="highlight"><pre><span></span><code><span class="k">static</span> <span class="kt">void</span> <span class="nf">copy_flags</span><span class="p">(</span><span class="kt">unsigned</span> <span class="kt">long</span> <span class="n">clone_flags</span><span class="p">,</span> <span class="k">struct</span> <span class="nc">task_struct</span> <span class="o">*</span><span class="n">p</span><span class="p">)</span>
<span class="p">{</span>
<span class="kt">unsigned</span> <span class="kt">long</span> <span class="n">new_flags</span> <span class="o">=</span> <span class="n">p</span><span class="o">-></span><span class="n">flags</span><span class="p">;</span>
<span class="n">new_flags</span> <span class="o">&=</span> <span class="o">~</span><span class="p">(</span><span class="n">PF_SUPERPRIV</span> <span class="o">|</span> <span class="n">PF_WQ_WORKER</span><span class="p">);</span>
<span class="n">new_flags</span> <span class="o">|=</span> <span class="n">PF_FORKNOEXEC</span><span class="p">;</span>
<span class="n">p</span><span class="o">-></span><span class="n">flags</span> <span class="o">=</span> <span class="n">new_flags</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div>
<p>copy_flags 函数将会更新这个新创建的子进程的标志,主要是清除 PF_SUPERPRIV 标志,这个标志表示一个进程是否使用超级用户权限。然后还有就是设置 PF_FORKNOEXEC 标志,表示这个进程还没有执行过 exec 函数。</p>
<div class="highlight"><pre><span></span><code><span class="n">retval</span> <span class="o">=</span> <span class="n">perf_event_init_task</span><span class="p">(</span><span class="n">p</span><span class="p">);</span>
<span class="k">if</span> <span class="p">(</span><span class="n">retval</span><span class="p">)</span>
<span class="k">goto</span> <span class="n">bad_fork_cleanup_policy</span><span class="p">;</span>
<span class="n">retval</span> <span class="o">=</span> <span class="n">audit_alloc</span><span class="p">(</span><span class="n">p</span><span class="p">);</span>
<span class="k">if</span> <span class="p">(</span><span class="n">retval</span><span class="p">)</span>
<span class="k">goto</span> <span class="n">bad_fork_cleanup_policy</span><span class="p">;</span>
<span class="cm">/* copy all the process information */</span>
<span class="n">retval</span> <span class="o">=</span> <span class="n">copy_semundo</span><span class="p">(</span><span class="n">clone_flags</span><span class="p">,</span> <span class="n">p</span><span class="p">);</span>
<span class="k">if</span> <span class="p">(</span><span class="n">retval</span><span class="p">)</span>
<span class="k">goto</span> <span class="n">bad_fork_cleanup_audit</span><span class="p">;</span>
<span class="n">retval</span> <span class="o">=</span> <span class="n">copy_files</span><span class="p">(</span><span class="n">clone_flags</span><span class="p">,</span> <span class="n">p</span><span class="p">);</span>
<span class="k">if</span> <span class="p">(</span><span class="n">retval</span><span class="p">)</span>
<span class="k">goto</span> <span class="n">bad_fork_cleanup_semundo</span><span class="p">;</span>
<span class="n">retval</span> <span class="o">=</span> <span class="n">copy_fs</span><span class="p">(</span><span class="n">clone_flags</span><span class="p">,</span> <span class="n">p</span><span class="p">);</span>
<span class="k">if</span> <span class="p">(</span><span class="n">retval</span><span class="p">)</span>
<span class="k">goto</span> <span class="n">bad_fork_cleanup_files</span><span class="p">;</span>
<span class="n">retval</span> <span class="o">=</span> <span class="n">copy_sighand</span><span class="p">(</span><span class="n">clone_flags</span><span class="p">,</span> <span class="n">p</span><span class="p">);</span>
<span class="k">if</span> <span class="p">(</span><span class="n">retval</span><span class="p">)</span>
<span class="k">goto</span> <span class="n">bad_fork_cleanup_fs</span><span class="p">;</span>
<span class="n">retval</span> <span class="o">=</span> <span class="n">copy_signal</span><span class="p">(</span><span class="n">clone_flags</span><span class="p">,</span> <span class="n">p</span><span class="p">);</span>
<span class="k">if</span> <span class="p">(</span><span class="n">retval</span><span class="p">)</span>
<span class="k">goto</span> <span class="n">bad_fork_cleanup_sighand</span><span class="p">;</span>
<span class="n">retval</span> <span class="o">=</span> <span class="n">copy_mm</span><span class="p">(</span><span class="n">clone_flags</span><span class="p">,</span> <span class="n">p</span><span class="p">);</span>
<span class="k">if</span> <span class="p">(</span><span class="n">retval</span><span class="p">)</span>
<span class="k">goto</span> <span class="n">bad_fork_cleanup_signal</span><span class="p">;</span>
<span class="n">retval</span> <span class="o">=</span> <span class="n">copy_namespaces</span><span class="p">(</span><span class="n">clone_flags</span><span class="p">,</span> <span class="n">p</span><span class="p">);</span>
<span class="k">if</span> <span class="p">(</span><span class="n">retval</span><span class="p">)</span>
<span class="k">goto</span> <span class="n">bad_fork_cleanup_mm</span><span class="p">;</span>
<span class="n">retval</span> <span class="o">=</span> <span class="n">copy_io</span><span class="p">(</span><span class="n">clone_flags</span><span class="p">,</span> <span class="n">p</span><span class="p">);</span>
<span class="k">if</span> <span class="p">(</span><span class="n">retval</span><span class="p">)</span>
<span class="k">goto</span> <span class="n">bad_fork_cleanup_namespaces</span><span class="p">;</span>
<span class="n">retval</span> <span class="o">=</span> <span class="n">copy_thread</span><span class="p">(</span><span class="n">clone_flags</span><span class="p">,</span> <span class="n">stack_start</span><span class="p">,</span> <span class="n">stack_size</span><span class="p">,</span> <span class="n">p</span><span class="p">,</span> <span class="n">regs</span><span class="p">);</span>
<span class="k">if</span> <span class="p">(</span><span class="n">retval</span><span class="p">)</span>
<span class="k">goto</span> <span class="n">bad_fork_cleanup_io</span><span class="p">;</span>
</code></pre></div>
<p>上面代码就是根据 clone_flags 集合中的值,共享或者复制父进程打开的文件,文件系统信息,信号处理函数,进程地址空间,命名空间等资源。这些资源通常情况下在一个进程内的多个线程才会共享,对于我们现在分析的 fork 系统调用来说,对于这些资源都会复制一份到子进程。</p>
<div class="highlight"><pre><span></span><code><span class="k">if</span> <span class="p">(</span><span class="n">pid</span> <span class="o">!=</span> <span class="o">&</span><span class="n">init_struct_pid</span><span class="p">)</span> <span class="p">{</span>
<span class="n">retval</span> <span class="o">=</span> <span class="o">-</span><span class="n">ENOMEM</span><span class="p">;</span>
<span class="n">pid</span> <span class="o">=</span> <span class="n">alloc_pid</span><span class="p">(</span><span class="n">p</span><span class="o">-></span><span class="n">nsproxy</span><span class="o">-></span><span class="n">pid_ns</span><span class="p">);</span>
<span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="n">pid</span><span class="p">)</span>
<span class="k">goto</span> <span class="n">bad_fork_cleanup_io</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div>
<p>因为在 do_fork 函数中调用 copy_process 函数的时候,参数 pid 的值为 NULL,所以此时新建进程的 PID 其实还没有被分配。所以接下来的就是要给子进程分配一个 PID。</p>
<p>最后,copy_process 函数做了一些清理工作,并且返回一个指向新建的子进程的指针给 do_fork 函数。</p>
<h4>2.6 回到 do_fork</h4>
<div class="highlight"><pre><span></span><code><span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="n">IS_ERR</span><span class="p">(</span><span class="n">p</span><span class="p">))</span> <span class="p">{</span>
<span class="p">...</span>
<span class="n">wake_up_new_task</span><span class="p">(</span><span class="n">p</span><span class="p">);</span>
<span class="cm">/* forking complete and child started to run, tell ptracer */</span>
<span class="k">if</span> <span class="p">(</span><span class="n">unlikely</span><span class="p">(</span><span class="n">trace</span><span class="p">))</span>
<span class="n">ptrace_event</span><span class="p">(</span><span class="n">trace</span><span class="p">,</span> <span class="n">nr</span><span class="p">);</span>
<span class="k">if</span> <span class="p">(</span><span class="n">clone_flags</span> <span class="o">&</span> <span class="n">CLONE_VFORK</span><span class="p">)</span> <span class="p">{</span>
<span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="n">wait_for_vfork_done</span><span class="p">(</span><span class="n">p</span><span class="p">,</span> <span class="o">&</span><span class="n">vfork</span><span class="p">))</span>
<span class="n">ptrace_event</span><span class="p">(</span><span class="n">PTRACE_EVENT_VFORK_DONE</span><span class="p">,</span> <span class="n">nr</span><span class="p">);</span>
<span class="p">}</span>
</code></pre></div>
<p>回到 do_fork 函数中,如果 copy_process 函数执行成功,没有错误,那么将会唤醒新创建的子进程,让子进程运行。自此,fork 函数调用成功执行。</p>
<h3>3 具体例程分析</h3>
<p>在这一部分,我将会结合相关的具体例程,进行一些简单的分析。</p>
<h4>3.1 例程代码</h4>
<div class="highlight"><pre><span></span><code><span class="cp">#include</span> <span class="cpf"><stdio.h></span><span class="cp"></span>
<span class="cp">#include</span> <span class="cpf"><stdlib.h></span><span class="cp"></span>
<span class="cp">#include</span> <span class="cpf"><unistd.h></span><span class="cp"></span>
<span class="cp">#include</span> <span class="cpf"><sys/types.h></span><span class="cp"></span>
<span class="cp">#define LEN 1024 * 1024</span>
<span class="kt">int</span> <span class="nf">main</span><span class="p">(</span><span class="kt">int</span> <span class="n">argc</span><span class="p">,</span> <span class="kt">char</span> <span class="o">**</span><span class="n">argv</span><span class="p">)</span>
<span class="p">{</span>
<span class="kt">pid_t</span> <span class="n">pid</span><span class="p">;</span>
<span class="kt">int</span> <span class="n">num</span> <span class="o">=</span> <span class="mi">10</span><span class="p">,</span> <span class="n">i</span><span class="p">;</span>
<span class="kt">char</span> <span class="o">*</span><span class="n">p</span><span class="p">;</span>
<span class="n">p</span> <span class="o">=</span> <span class="n">malloc</span><span class="p">(</span><span class="n">LEN</span> <span class="o">*</span> <span class="k">sizeof</span><span class="p">(</span><span class="kt">char</span><span class="p">));</span>
<span class="n">pid</span> <span class="o">=</span> <span class="n">fork</span><span class="p">();</span>
<span class="k">if</span> <span class="p">(</span><span class="n">pid</span> <span class="o">></span> <span class="mi">0</span><span class="p">)</span> <span class="p">{</span>
<span class="cm">/*parent process.*/</span>
<span class="n">printf</span><span class="p">(</span><span class="s">"parent %d process get %d!It stores in %x.</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span>
<span class="n">getpid</span><span class="p">(),</span> <span class="n">num</span><span class="p">,</span> <span class="o">&</span><span class="n">num</span><span class="p">);</span>
<span class="n">printf</span><span class="p">(</span><span class="s">"parent have a piece of memory start from %x.</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span>
<span class="n">p</span><span class="p">);</span>
<span class="p">}</span> <span class="k">else</span> <span class="p">{</span>
<span class="cm">/*child process.*/</span>
<span class="n">printf</span><span class="p">(</span><span class="s">"child %d process get %d!It stores in %x.</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span>
<span class="n">getpid</span><span class="p">(),</span> <span class="n">num</span><span class="p">,</span> <span class="o">&</span><span class="n">num</span><span class="p">);</span>
<span class="n">printf</span><span class="p">(</span><span class="s">"child have a piece of memory start from %x.</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span>
<span class="n">p</span><span class="p">);</span>
<span class="p">}</span>
<span class="k">while</span><span class="p">(</span><span class="mi">1</span><span class="p">){}</span>
<span class="k">return</span> <span class="mi">0</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div>
<p>这个程序只是简单地调用了一次 fork ,创建了一个子进程,然后分别在父子进程中查看申请的一块内存的起始地址。此外还添加了一个 while 死循环,方便父子进程的进程控制块进行查看。</p>
<h4>3.2 相关分析</h4>
<p>这个程序执行的结果截图如下:</p>
<p><img alt="执行结果" src="../images/linux-fork-analysis.webp"></p>
<p>可以看到,通过对 pid 的值检测,我们让父子进程执行了不同的代码。</p>
<p>通过 ps aux | grep a.out 指令,我们可以得到父子进程的 PID:</p>
<div class="highlight"><pre><span></span><code><span class="nv">$ps</span> aux <span class="p">|</span> grep a.out
tonychow <span class="m">32261</span> <span class="m">93</span>.8 <span class="m">0</span>.0 <span class="m">3056</span> <span class="m">272</span> pts/1 R+ <span class="m">10</span>:57 <span class="m">4</span>:11 ./a.out
tonychow <span class="m">32262</span> <span class="m">93</span>.3 <span class="m">0</span>.0 <span class="m">3056</span> <span class="m">52</span> pts/1 R+ <span class="m">10</span>:57 <span class="m">4</span>:10 ./a.out
</code></pre></div>
<p>每个进程,在其生命周期期间,都会在 /proc/ 进程号 目录中保存相关的进程内容,我们可以查看里面的内容对这个进程进行分析。根据上面的运行结果,我们可以通过 ls -al /proc/32261 这个指令来查看该文件夹中的内容:</p>
<div class="highlight"><pre><span></span><code><span class="nv">$ls</span> -al /proc/32261
总用量 <span class="m">0</span>
dr-xr-xr-x <span class="m">8</span> tonychow tonychow <span class="m">0</span> 6月 <span class="m">27</span> <span class="m">10</span>:59 .
dr-xr-xr-x <span class="m">267</span> root root <span class="m">0</span> 5月 <span class="m">31</span> <span class="m">12</span>:18 ..
dr-xr-xr-x <span class="m">2</span> tonychow tonychow <span class="m">0</span> 6月 <span class="m">27</span> <span class="m">11</span>:06 attr
-rw-r--r-- <span class="m">1</span> tonychow tonychow <span class="m">0</span> 6月 <span class="m">27</span> <span class="m">11</span>:06 autogroup
-r-------- <span class="m">1</span> tonychow tonychow <span class="m">0</span> 6月 <span class="m">27</span> <span class="m">11</span>:06 auxv
-r--r--r-- <span class="m">1</span> tonychow tonychow <span class="m">0</span> 6月 <span class="m">27</span> <span class="m">11</span>:06 cgroup
--w------- <span class="m">1</span> tonychow tonychow <span class="m">0</span> 6月 <span class="m">27</span> <span class="m">11</span>:06 clear_refs
-r--r--r-- <span class="m">1</span> tonychow tonychow <span class="m">0</span> 6月 <span class="m">27</span> <span class="m">11</span>:02 cmdline
-rw-r--r-- <span class="m">1</span> tonychow tonychow <span class="m">0</span> 6月 <span class="m">27</span> <span class="m">11</span>:06 comm
-rw-r--r-- <span class="m">1</span> tonychow tonychow <span class="m">0</span> 6月 <span class="m">27</span> <span class="m">11</span>:06 coredump_filter
-r--r--r-- <span class="m">1</span> tonychow tonychow <span class="m">0</span> 6月 <span class="m">27</span> <span class="m">11</span>:06 cpuset
lrwxrwxrwx <span class="m">1</span> tonychow tonychow <span class="m">0</span> 6月 <span class="m">27</span> <span class="m">11</span>:06 cwd -> /home/tonychow/code/c/fork-analysis
-r-------- <span class="m">1</span> tonychow tonychow <span class="m">0</span> 6月 <span class="m">27</span> <span class="m">11</span>:03 environ
lrwxrwxrwx <span class="m">1</span> tonychow tonychow <span class="m">0</span> 6月 <span class="m">27</span> <span class="m">11</span>:06 exe -> /home/tonychow/code/c/fork-analysis/a.out
dr-x------ <span class="m">2</span> tonychow tonychow <span class="m">0</span> 6月 <span class="m">27</span> <span class="m">10</span>:59 fd
dr-x------ <span class="m">2</span> tonychow tonychow <span class="m">0</span> 6月 <span class="m">27</span> <span class="m">11</span>:06 fdinfo
-r-------- <span class="m">1</span> tonychow tonychow <span class="m">0</span> 6月 <span class="m">27</span> <span class="m">11</span>:06 io
-r--r--r-- <span class="m">1</span> tonychow tonychow <span class="m">0</span> 6月 <span class="m">27</span> <span class="m">11</span>:06 latency
-r--r--r-- <span class="m">1</span> tonychow tonychow <span class="m">0</span> 6月 <span class="m">27</span> <span class="m">11</span>:06 limits
-rw-r--r-- <span class="m">1</span> tonychow tonychow <span class="m">0</span> 6月 <span class="m">27</span> <span class="m">11</span>:06 loginuid
-r--r--r-- <span class="m">1</span> tonychow tonychow <span class="m">0</span> 6月 <span class="m">27</span> <span class="m">11</span>:06 maps
-rw------- <span class="m">1</span> tonychow tonychow <span class="m">0</span> 6月 <span class="m">27</span> <span class="m">11</span>:06 mem
-r--r--r-- <span class="m">1</span> tonychow tonychow <span class="m">0</span> 6月 <span class="m">27</span> <span class="m">11</span>:06 mountinfo
-r--r--r-- <span class="m">1</span> tonychow tonychow <span class="m">0</span> 6月 <span class="m">27</span> <span class="m">11</span>:06 mounts
-r-------- <span class="m">1</span> tonychow tonychow <span class="m">0</span> 6月 <span class="m">27</span> <span class="m">11</span>:06 mountstats
dr-xr-xr-x <span class="m">6</span> tonychow tonychow <span class="m">0</span> 6月 <span class="m">27</span> <span class="m">11</span>:06 net
dr-x--x--x <span class="m">2</span> tonychow tonychow <span class="m">0</span> 6月 <span class="m">27</span> <span class="m">11</span>:06 ns
-rw-r--r-- <span class="m">1</span> tonychow tonychow <span class="m">0</span> 6月 <span class="m">27</span> <span class="m">11</span>:06 oom_adj
-r--r--r-- <span class="m">1</span> tonychow tonychow <span class="m">0</span> 6月 <span class="m">27</span> <span class="m">11</span>:06 oom_score
-rw-r--r-- <span class="m">1</span> tonychow tonychow <span class="m">0</span> 6月 <span class="m">27</span> <span class="m">11</span>:06 oom_score_adj
-r--r--r-- <span class="m">1</span> tonychow tonychow <span class="m">0</span> 6月 <span class="m">27</span> <span class="m">11</span>:06 pagemap
-r--r--r-- <span class="m">1</span> tonychow tonychow <span class="m">0</span> 6月 <span class="m">27</span> <span class="m">11</span>:06 personality
lrwxrwxrwx <span class="m">1</span> tonychow tonychow <span class="m">0</span> 6月 <span class="m">27</span> <span class="m">11</span>:06 root -> /
-rw-r--r-- <span class="m">1</span> tonychow tonychow <span class="m">0</span> 6月 <span class="m">27</span> <span class="m">11</span>:06 sched
-r--r--r-- <span class="m">1</span> tonychow tonychow <span class="m">0</span> 6月 <span class="m">27</span> <span class="m">11</span>:06 schedstat
-r--r--r-- <span class="m">1</span> tonychow tonychow <span class="m">0</span> 6月 <span class="m">27</span> <span class="m">11</span>:06 sessionid
-r--r--r-- <span class="m">1</span> tonychow tonychow <span class="m">0</span> 6月 <span class="m">27</span> <span class="m">11</span>:06 smaps
-r--r--r-- <span class="m">1</span> tonychow tonychow <span class="m">0</span> 6月 <span class="m">27</span> <span class="m">11</span>:06 stack
-r--r--r-- <span class="m">1</span> tonychow tonychow <span class="m">0</span> 6月 <span class="m">27</span> <span class="m">11</span>:02 stat
-r--r--r-- <span class="m">1</span> tonychow tonychow <span class="m">0</span> 6月 <span class="m">27</span> <span class="m">11</span>:06 statm
-r--r--r-- <span class="m">1</span> tonychow tonychow <span class="m">0</span> 6月 <span class="m">27</span> <span class="m">11</span>:02 status
-r--r--r-- <span class="m">1</span> tonychow tonychow <span class="m">0</span> 6月 <span class="m">27</span> <span class="m">11</span>:06 syscall
dr-xr-xr-x <span class="m">3</span> tonychow tonychow <span class="m">0</span> 6月 <span class="m">27</span> <span class="m">11</span>:06 task
-r--r--r-- <span class="m">1</span> tonychow tonychow <span class="m">0</span> 6月 <span class="m">27</span> <span class="m">11</span>:06 wchan
</code></pre></div>
<p>从上面的结果可以看到列出了一堆的信息文件,包括状态,io,限制,文件,命名空间等等这些属于这个进程的一大堆资源。分别查看这两个进程的 status 信息:</p>
<div class="highlight"><pre><span></span><code><span class="nv">$cat</span> /proc/32261/status
Name: a.out
State: R <span class="o">(</span>running<span class="o">)</span>
Tgid: <span class="m">32261</span>
Pid: <span class="m">32261</span>
PPid: <span class="m">12747</span>
...
<span class="nv">$cat</span> /proc/32262/status
Name: a.out
State: R <span class="o">(</span>running<span class="o">)</span>
Tgid: <span class="m">32262</span>
Pid: <span class="m">32262</span>
PPid: <span class="m">32261</span>
...
</code></pre></div>
<p>从上面的结果可以看到,这两个进程都处于 running 状态,而进程32261是进程32262的父进程。接着查看一下内存映射信息:</p>
<div class="highlight"><pre><span></span><code><span class="nv">$cat</span> /proc/32261/maps
<span class="m">08048000</span>-08049000 r-xp <span class="m">00000000</span> fd:02 <span class="m">20979068</span> /home/tonychow/code/c/fork-analysis/a.out
<span class="m">08049000</span>-0804a000 rw-p <span class="m">00000000</span> fd:02 <span class="m">20979068</span> /home/tonychow/code/c/fork-analysis/a.out
4b94d000-4b96c000 r-xp <span class="m">00000000</span> fd:01 <span class="m">793014</span> /usr/lib/ld-2.15.so
4b96c000-4b96d000 r--p 0001e000 fd:01 <span class="m">793014</span> /usr/lib/ld-2.15.so
4b96d000-4b96e000 rw-p 0001f000 fd:01 <span class="m">793014</span> /usr/lib/ld-2.15.so
4b970000-4bb1b000 r-xp <span class="m">00000000</span> fd:01 <span class="m">809017</span> /usr/lib/libc-2.15.so
4bb1b000-4bb1c000 ---p 001ab000 fd:01 <span class="m">809017</span> /usr/lib/libc-2.15.so
4bb1c000-4bb1e000 r--p 001ab000 fd:01 <span class="m">809017</span> /usr/lib/libc-2.15.so
4bb1e000-4bb1f000 rw-p 001ad000 fd:01 <span class="m">809017</span> /usr/lib/libc-2.15.so
4bb1f000-4bb22000 rw-p <span class="m">00000000</span> <span class="m">00</span>:00 <span class="m">0</span>
b76a4000-b77a6000 rw-p <span class="m">00000000</span> <span class="m">00</span>:00 <span class="m">0</span>
b77be000-b77c0000 rw-p <span class="m">00000000</span> <span class="m">00</span>:00 <span class="m">0</span>
b77c0000-b77c1000 r-xp <span class="m">00000000</span> <span class="m">00</span>:00 <span class="m">0</span> <span class="o">[</span>vdso<span class="o">]</span>
bf92a000-bf94b000 rw-p <span class="m">00000000</span> <span class="m">00</span>:00 <span class="m">0</span> <span class="o">[</span>stack<span class="o">]</span>
</code></pre></div>
<p>结合上面程序的输出,可以看到 int 的类型的变量 num 存放在栈中,而通过 malloc 得到的则是存放在堆中。</p>
<div class="highlight"><pre><span></span><code><span class="nv">$ls</span> -l /proc/32261/fd
总用量 <span class="m">0</span>
lrwx------ <span class="m">1</span> tonychow tonychow <span class="m">64</span> 6月 <span class="m">27</span> <span class="m">10</span>:59 <span class="m">0</span> -> /dev/pts/1
lrwx------ <span class="m">1</span> tonychow tonychow <span class="m">64</span> 6月 <span class="m">27</span> <span class="m">10</span>:59 <span class="m">1</span> -> /dev/pts/1
lrwx------ <span class="m">1</span> tonychow tonychow <span class="m">64</span> 6月 <span class="m">27</span> <span class="m">10</span>:59 <span class="m">2</span> -> /dev/pts/1
</code></pre></div>
<p>查看下该进程的文件描述符,可以看到主要是有标准输出,标准输入和标准输出这三个。</p>
<div class="highlight"><pre><span></span><code>$ cat /proc/32261/limits
limit soft limit hard limit units
max cpu <span class="nb">time</span> unlimited unlimited seconds
max file size unlimited unlimited bytes
max data size unlimited unlimited bytes
max stack size <span class="m">8388608</span> unlimited bytes
max core file size <span class="m">0</span> unlimited bytes
max resident <span class="nb">set</span> unlimited unlimited bytes
max processes <span class="m">1024</span> <span class="m">31683</span> processes
max open files <span class="m">1024</span> <span class="m">4096</span> files
max locked memory <span class="m">65536</span> <span class="m">65536</span> bytes
max address space unlimited unlimited bytes
max file locks unlimited unlimited locks
max pending signals <span class="m">31683</span> <span class="m">31683</span> signals
max msgqueue size <span class="m">819200</span> <span class="m">819200</span> bytes
max nice priority <span class="m">0</span> <span class="m">0</span>
max realtime priority <span class="m">0</span> <span class="m">0</span>
max realtime timeout unlimited unlimited us
</code></pre></div>
<p>通过 cat /proc/32261/limits 命令我们可以看到系统对这个用户的一些资源限制,包括 CPU 时间,最大文件大小,最大栈大小,进程数,文件数,最大地址空间等等的资源。</p>
<h2>4 总结</h2>
<p>经过这次对 Linux 系统的 fork 系统调用的分析,主要有以下几点总结:</p>
<ul>
<li>fork 调用是 Linux 系统中很重要的一个创建进程的方式,系统级别的进程和线程都是通过 fork 系统调用来实现的,它的实现其实也依靠了 clone 系统调用;</li>
<li>在 Linux 系统中,一个进程内多个线程其实就是共享了父进程大部分资源的子进程,内核通过 clone_flags 来控制创建这种特别的进程;</li>
<li>Linux 其实也是一个软件,但是它是一个复杂无比的软件。虽然从源码来说,不同的部分分得挺清楚,但是具体到一个个函数的执行,对于我们新手而言,如果没有注释,有时候真的很难知道一个函数的参数是什么意思。这时候就要依靠搜索引擎的力量了。</li>
</ul>
<h2>5 主要参考文献</h2>
<ul>
<li>Robert Love,《Linux系统编程》,东南大学出版社</li>
<li>Robert Love,《Linux内核设计与实现》,机械工业出版社</li>
<li>Richard Steven,《Unix环境高级编程》,人民邮电出版社</li>
<li><a href="http://zh.wikipedia.org/wiki/%E6%93%8D%E4%BD%9C%E7%B3%BB%E7%BB%9F">维基百科.操作系统词条</a></li>
<li><a href="http://zh.wikipedia.org/wiki/%E8%A1%8C%E7%A8%8B">维基百科.进程词条</a></li>
<li><a href="https://zh.wikipedia.org/wiki/%E7%B3%BB%E7%BB%9F%E8%B0%83%E7%94%A8">维基百科.系统调用词条</a></li>
<li><a href="http://zh.wikipedia.org/zh-cn/%E8%A1%8C%E7%A8%8B%E6%8E%A7%E5%88%B6%E8%A1%A8">维基百科.进程控制块词条</a></li>
<li><a href="http://www.quora.com/Linux-Kernel/After-a-fork-where-exactly-does-the-childs-execution-start">RobertLove在Quora上面关于fork的一个回答</a></li>
</ul>
</div>
<div class="tag-cloud">
<p>
<a href="https://blog.tonychow.me/tag/linux.html">linux</a>
<a href="https://blog.tonychow.me/tag/system-call.html">system-call</a>
<a href="https://blog.tonychow.me/tag/fork.html">fork</a>
<a href="https://blog.tonychow.me/tag/source-reading.html">source-reading</a>
</p>
</div>
</article>
<footer>
<p>
© 2017 - This work is licensed under a <a rel="license" href="http://creativecommons.org/licenses/by-sa/4.0/deed.en_US" target="_blank">Creative Commons Attribution-ShareAlike</a>
</p>
<p>
Built with <a href="http://getpelican.com" target="_blank">Pelican</a> using <a href="http://bit.ly/flex-pelican" target="_blank">Flex</a> theme
</p><p>
<a rel="license"
href="http://creativecommons.org/licenses/by-sa/4.0/"
target="_blank">
<img alt="Creative Commons License"
title="Creative Commons License"
style="border-width:0"
src="https://i.creativecommons.org/l/by-sa/4.0/80x15.png"
width="80"
height="15"/>
</a>
</p> </footer>
</main>
<script type="application/ld+json">
{
"@context" : "http://schema.org",
"@type" : "Blog",
"name": " Tonychow's Blog ",
"url" : "https://blog.tonychow.me",
"image": "/images/avatar.jpg",
"description": "tonychow's Thoughts and Writings"
}
</script>
</body>
</html>