-
Notifications
You must be signed in to change notification settings - Fork 2
/
Copy pathproc.tex
1396 lines (1275 loc) · 60.7 KB
/
proc.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
include(`macros.m4')
\pagebreak
\pdfbookmark[0]{process manipulation, program execution}{processes}
\begin{slide}
\sltitle{Contents}
\slidecontents{4}
\end{slide}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
ifdef([[[NOSPELLCHECK]]], [[[
\pdfbookmark[1]{text/data/bss/stack/heap}{procmem}
]]])
\begin{slide}
\sltitle{Process memory layout in userspace}
\begin{center}
\input{img/tex/mem_user.tex}
\end{center}
\end{slide}
\begin{itemize}
\item Each process has 3 basic segments (memory segments, not
hardware segments):
\begin{itemize}
\item text \dots{} program code
\item data \dots{} initialized variables
\item stack
\end{itemize}
\item \texttt{text} and \texttt{data} sections are saved in executable file
\item The sections for initialized and non-initialized variables and heap are
considered as data
\item It is also possible to connect segments of shared memory
(\texttt{shm\_open}) or files (\texttt{mmap}) into the address space.
\item The text is shared between all processes which execute the same code.
The data segment and stack are private for each process.
\item Each system can use a different layout of a process address space
(and typically it is indeed so). See the next slide which also shows
sections for \texttt{mmap} and \emph{heap}.
\item ifdef([[[NOSPELLCHECK]]], [[[\emph{bss}]]]) \dots{} non-initialized
variables (\texttt{bss} comes from the IBM 7090 assembler and stands for
\uv{block started by symbol}).
While the program is running, the \texttt{data}, \texttt{bss} and heap
sections (not shown in the picture) make up data segments of the process.
Heap size can be changed using the \texttt{brk} and \texttt{sbrk} system calls.
\item Note -- by non-initialized variables are meant static variables --
i.e. global variables or variables declared as \texttt{static} both in the
functions and outside that are not set to a value. All these variables are
automatically initialized with zeroes before the program is started. Therefore
it is not necessary to store their value in the binary. Once one of these
variables is initialized, it will become part of the data segment on disk.
\item \emph{(User) stack} \dots{} local non-static variables, function
parameters (on certain architectures in certain modes - e.g. 32-bit x86), return
addresses. Each process has 2 stacks -- one for a user mode and another for
kernel mode. The user stack automatically grows according to its use (except
for threads where each thread has its own limited stack).
\item \emph{User area (u-area)} \dots{} contains process information used by
the kernel which is not needed when the process is swapped out to disk
(number of open files, signal handling settings, number of shared memory segments,
program arguments, environment variables, current working directory, etc.).
This area is accessible only to the kernel which will see just the area
of a currently running process. The rest of the data needed even if the process
is not currently running or while swapped out to disk is stored in the
\texttt{proc} structure. \texttt{proc} structures for all processes are always
resident in memory and accessible in a kernel mode.
\end{itemize}
\begin{slide}
\sltitle{Example: Solaris 11 x86 32-bit}
\begin{center}
\includegraphics[width=54mm]{img/eps/x86-memory-proc-mem-layout.eps}
\end{center}
\end{slide}
\hlabel{SOLARIS_PROC_ADDR_SPACE}
\begin{itemize}
\item The following is deductible from the image:
\begin{itemize}
\item maximum size of kernel for Solaris 11 x86 32-bit is 256 megabytes
\item there is free space between kernel and memory reserved for \texttt{mmap}
\item stack grows towards lower addresses and its size is limited to 128 megabytes
\end{itemize}
\item A \emph{heap} is a part of the memory that can be extended by processes
using the \texttt{brk} and \texttt{sbrk} syscalls and is used by the
\texttt{malloc} function. The \texttt{malloc} allocator gradually extends the
heap on demand, and manages acquired memory and distributes it to the process in
chunks. When \texttt{free} is called, it does not mean that the memory is
returned to the kernel; it is only returned to the allocator.
It depends on the allocator implementation whether this memory is returned
to the kernel.
\item The \texttt{mmap} area is used for mapping files into memory, i.e. also
for shared libraries. Some allocators use also this memory internally,
e.g. in case a process requests larger chunks of memory at once. It is possible
to exclusively use just \texttt{mmap}, and that is transparent to the
application. When using \texttt{mmap} it is possible to return the memory to the
kernel (using \texttt{munmap}), in contrast to the \texttt{brk}/\texttt{sbrk}
based implementation.
\item The picture was taken from [McDougall-Mauro] and does not contain space
for non-initialized variables. If you try to print the address of such a
variable on this system, you will find out that both initialized and
non-initialized variables share a common data segment, labeled in the image as
,,executable -- DATA''. Example: \example{pmap/proc-addr-space.c}.
\item The kernel mapping is not necessary, for example on Solaris running on
\emph{\texttt{amd64}} architecture (i.e. 64-bit) the kernel is no longer mapped
into user space.
\item \texttt{brk} nor \texttt{sbrk} are part of the standard, hence portable
applications should not use them; if a similar functionality is needed, they
should use \texttt{mmap}, see page \pageref{MMAP}.
\end{itemize}
%%%%%
\begin{slide}
\sltitle{Process memory layout in kernel}
\begin{center}
\input{img/tex/mem_kernel.tex}
\end{center}
\end{slide}
\begin{itemize}
\item A process will enter a kernel mode either by a
\emph{trap induced by the CPU} (page fault, unknown instruction, etc.),
\emph{timer} (to invoke scheduler), \emph{interrupt} from a peripheral device,
or synchronous trap (a standard library uses it to hand over the control to
the kernel to service a \emph{system call}).
\item There is only one copy of the kernel text and data in the memory,
shared by all processes. The kernel text as a whole is resident in memory and
not swapped out to disk.
\item \emph{kernel text} \dots{} code of the operating system kernel,
loaded when the system is booting up and is always resident in memory.
Some implementations allow to add modules to the kernel during runtime
(e.g. when a device is connected, matching device driver module is
automatically loaded), it is therefore not necessary to regenerate
the kernel and reboot the system whenever a change is needed.
\item \emph{data and kernel \texttt{bss}} \dots{} contain data structures used
by the kernel, contains the u-area of a currently running process.
\item \emph{kernel stack} \dots{} independent for each process, is empty
when the process is in the user mode (and therefore uses the user stack).
\end{itemize}
%%%%%
\pdfbookmark[1]{segments - text/data/stack}{memsegments}
\begin{slide}
\sltitle{Process memory segments}
\begin{center}
\input{img/tex/memory_segments.tex}
\end{center}
\end{slide}
\begin{itemize}
\item This is how memory segments representation looks like in the kernel.
\item The core feature of this architecture is a \emph{memory object}, a
mapping abstraction between a part of memory and a place where data is normally
stored (so called \emph{backing store} or \emph{data object}). This place can
be e.g. swap space or a file. Address space of a process is a set of mappings to
different data objects. There exists also an \emph{anonymous object} that does
not have persistent backing store (it is used e.g. for a stack). Physical memory
then serves as a cache for data of these mapped objects.
\item This coarsely described architecture is called VM (\emph{Virtual
Memory}), and was introduced in SunOS 4.0. The virtual memory architecture
of SVR4 is based on this architecture. More information can be found in
[Vahalia], the original white paper from 1987 that introduced this architecture:
Gingell, R. A., Moran J. P., Shannon, W. A. -- \emph{Virtual Memory
Architecture in SunOS}.
\item To determine what memory segments a memory space of a process consists of,
various tools can be used: \texttt{pmap(1)} on Solaris, NetBSD and in some Linux
distributions, \texttt{procmap(1)} on OpenBSD, or \texttt{vmmap(1)} on macOS.
\end{itemize}
%%%%%
\pdfbookmark[1]{virtual memory}{virtmem}
\begin{slide}
\sltitle{Virtual memory}
\begin{center}
\input{img/tex/virt_mem.tex}
\end{center}
\end{slide}
\begin{itemize}
\item Each process sees its own address space as a contiguous interval of
(virtual) addresses from zero to some maximal value. Accessible addresses
are those for which there is a mapping, i.e. there is a memory
segment (see the previous slide).
\item The kernel divides the memory to pages. Each page has its own location in
physical memory. This location is determined by kernel page tables and the
memory pages can be arbitrarily mixed w.r.t. their placement in the virtual
address space of a process.
\item If a page is not used it can be swapped out to disk.
\item The kernel memory management ensures a mapping between virtual addresses
used by processes and the kernel to physical addresses. It also reads in pages
from a disk upon a page fault.
\end{itemize}
%%%%%
\pdfbookmark[1]{Process states}{procstates}
\begin{slide}
\sltitle{Process states}
\begin{center}
\input{img/tex/proc_states.tex}
\end{center}
\end{slide}
\begin{itemize}
\item After the process is terminated either using the \texttt{\_exit} call
or as a consequence of a signal, it will transition to a zombie state as the
kernel needs to store a return value for the process. The whole memory
of the process is freed, the only remaining piece is the \texttt{proc}
structure. The process can go away for good only after its parent will
retrieve its return value using the \texttt{wait} call.
If the original parent is no longer available, the
\texttt{init} process which became the new parent will call \texttt{wait}.
\item In today's Unix systems processes are usually not swapped out as whole,
only individual pages are.
\item A process is put to sleep if it requests it, e.g. when it is waiting on a
completion of an operation with a peripheral device. The \emph{preemption} is on
the other hand an involuntary removal of a CPU by the scheduler.
\end{itemize}
%%%%%
\begin{slide}
\sltitle{Process scheduling}
\begin{itemize}
\item \emph{preemptive} -- if a process does not give up CPU
(e.g. by entering a sleep to wait on some event), the CPU is taken away
after a time quantum expiration.
\item processes are classified into queues according to a priority,
a CPU is assigned to the first ready process from the queue with the biggest
priority.
\item SVR4 introduced priority queues and real-time support with guaranteed
maximal response time
\item contrary to the previous versions, in SVR4 a bigger number means a bigger
priority
\end{itemize}
\end{slide}
\begin{itemize}
\item \emsl{The premise of preemptive planning are periodic timer interrupts}
which take away the CPU from the running process and pass on the CPU
to the kernel (scheduler is activated).
\item The other variant is non-preemptive (cooperative) planning, where process
keeps running, until it gives up the CPU, i.e. until it calls such system call,
that switches the context to different process. The downside of cooperative
planning is that one process can block the CPU and other processes forever.
\item Unix uses only preemptive planning for user processes.
\item There is also a \emph{tickless kernel} that uses a variable timer tick.
\item Traditional (historical) UNIX \emsl{kernel} uses cooperative planning,
i.e. a process running in kernel mode is not switched until it gives up the CPU
by itself.
\emsl{Modern Unix kernels are preemptive} -- mainly because of real-time
systems; where it is necessary to have the possibility to remove a CPU from
a running process immediately, and not waiting until it returns from a kernel
mode or enters sleep by itself. Note that UNIX was preemptive from its very
beginning but its kernel was non-preemptive in the beginning.
\item With a preemptive planning processes can be interrupted at any time and
the CPU given to another process. Therefore a process can never be sure
that a given operation (spanning more than one instruction, besides system calls
with guaranteed atomicity) will be executed atomically, without being
influenced by other processes. If it is necessary to ensure atomicity of an
operation, processes must synchronize. This problem is avoided in a cooperative
planning -- the atomicity of a given operation is simply ensured by not giving
up the CPU while the operation is still in progress.
\end{itemize}
%%%%%
\pdfbookmark[1]{Priority classes for process scheduling}{prioclasses}
\begin{slide}
\sltitle{Priority classes}
\setlength{\baselineskip}{0.8\baselineskip}
\begin{itemize}
\item \emsl{system}
\begin{itemize}
\item priority 60 to 99
\item reserved for system processes (\texttt{pageout},
\texttt{sched}, \dots)
\item fixed priority
\end{itemize}
\item \emsl{real-time}
\begin{itemize}
\item priority 100 to 159
\item fixed priority
\item a time quantum corresponds to priority value
\end{itemize}
\item \emsl{time-shared}
\begin{itemize}
\item priority 0 to 59
\item dynamic 2 part priority, fixed user part and
dynamic system part -- if a process uses CPU extensively,
its priority is being decreased (and time quantum increased)
\end{itemize}
\end{itemize}
\end{slide}
\begin{itemize}
\item The system class is used only by the kernel, a user process running
in a kernel mode retains its own planning characteristics.
\item Processes in the real time class have the biggest priority and so should
be configured correctly otherwise they could block the rest of the system from
getting any CPU time.
\item If a process in a time-shared class is put to sleep and is waiting on an
event, the system priority is temporarily assigned to it. After a wake-up, such
a process will be assigned a CPU earlier than other non-sleeping processes to
finish the operation as soon as possible as it could hold some locks later
needed by other processes.
\item Fixed part of the priority in the time-shared class can be set using\\
ifdef([[[NOSPELLCHECK]]], [[[
\texttt{int \funnm{setpriority}(int \emph{which}, id\_t \emph{who},
int \emph{prio});}\\ or \\ \texttt{int \funnm{nice}(int \emph{in{}cr});} \\
]]])
The \emph{which} value determines what will be in the \emph{who} argument.
If \emph{which} is e.g. \emph{\texttt{PRIO\_PGRP}}, the \emph{who} will store
process group number. Note that \funnm{nice} call will return a new nice value.
As -1 is a valid value, it is necessary to clear \texttt{errno} and then check
if the function returns -1.
\item The priority class and nice value of a given process can be displayed with
the \texttt{-l} option of the \texttt{ps} command or by explicitly specifying
the fields to be printed out (see the \texttt{-o} option).
\item The \texttt{renice} command can be used to manipulate the priority/nice
value. It usually uses the \texttt{setpriority} system call to do that.
Non privileged users can only increase the nice value (example from
mac OS 10.13):
\begin{verbatim}
$ /bin/sleep 200 & # use /bin/sleep to avoid invoking shell built-in
[1] 36877
$ ps -O pri,nice -p $!
PID PRI NI TT STAT TIME COMMAND
36877 31 0 s003 S 0:00.00 sleep 200
$ renice 10 -p $!
$ ps -O pri,nice -p $!
PID PRI NI TT STAT TIME COMMAND
36877 31 10 s003 SN 0:00.00 sleep 200
\end{verbatim}
On Linux (kernel 5.10) this will look differently - after the \texttt{renice}
is run, the priority will be decreased by the nice value.
\end{itemize}
%%%%%
\pdfbookmark[1]{process groups}{procgrps}
\begin{slide}
\sltitle{Process groups, controlling terminals}
\begin{itemize}
\item every process belongs to a \emph{process group}
\item each group can have a leading process, so called \emph{group leader}
\item every process can have a \emph{controlling terminal} (usually it is a
login terminal)
\item special file \texttt{/dev/tty} is associated with a controlling terminal
of each process
\item each terminal is associated with a process group called a
\emph{controlling group}
\item \emph{job control} is a mechanism for suspending, resuming, and
terminating process groups and control their access to terminals
\item \emph{session} is a collection of process groups created for the purpose
of job control
\end{itemize}
\end{slide}
\begin{itemize}
\item When a user logs into a system, a new session is created. The session
contains one process group that only has a single process -- one with the user's
shell. That process is also the leader of that single process group and also a
session leader. In case a job control is on, each command or a pipeline will
create a new process group, and one process from each group will always become a
process group leader. One of the groups can be running in the foreground, the
rest will be running in the background. Signals which are generated from the
keyboard (i.e. those triggered by combination of keys, not by executing the
\texttt{kill} command) are sent only to the group running in the foreground.
\item If the job control is off, command execution in the background means that
the shell will not be waiting for its completion. There exists only one group of
processes, and keyboard generated signals are sent to all processes running in
the foreground and background. Processes cannot be moved to the background from
the foreground and ifdef([[[NOSPELLCHECK]]], [[[vice versa]]]).
\item When a process that has a controlling terminal opens the \texttt{/dev/tty}
file, it gets associated with its controlling terminal, i.e. if two different
processes open this file, each will be accessing a different terminal.
\item In \texttt{bash} a process group (job) can be stopped temporarily using
\texttt{Ctrl-Z}. Then it can be resumed again using ,,\texttt{fg \%N}'' where
\texttt{N} is the number from the \texttt{jobs} command listing. Or, it can
be resumed in background, using ,,\texttt{bg \%N}''. The job specification
is (\texttt{\%N}) optional - if omitted, the most recently stopped job will be
used. More information can be found in the ``JOB CONTROL'' section in the
\texttt{bash} man page.
\end{itemize}
\pdfbookmark[1]{fork}{fork}
\begin{slide}
\sltitle{Create a new process: \texttt{fork()}}
\begin{center}
\input{img/tex/fork.tex}
\end{center}
\end{slide}
\begin{itemize}
\item \hlabel{FORK} The child process is almost an exact copy of its parent
except for the following:
\begin{itemize}
\item The child process has a unique process and parent process ID.
\item If the parent had multiple threads, the child will only have the one that
called \texttt{fork}; will be further explained on page \pageref{FORKALL}.
\item Child process resource utilization counters are set to 0.
\item \texttt{alarm} settings and file locks are not inherited.
\end{itemize}
\item The file descriptor tables are exact copies in both processes. That means
that more processes can share and seek a common file position. Signal masks are
not changed, more on that on page \pageref{SIGNALBLOCKINGEXAMPLE}.
\item For efficiency and less memory consumption, the address space is not copied
but a \emph{copy-on-write} mechanism is used.
\item The reason why the parent gets its child's PID as a return value and the
child gets 0 is because it is easy for the child to get its parent PID via
\texttt{getpid}. Imagine how the parent would figure out the new child PID,
especially if it already spawned multiple children.
\item Example: \example{fork/fork.c}
\item \hlabel{VFORK} There is also \texttt{vfork}, used in the past to work
around the problem that the child address space was usually rewritten on
subsequent \texttt{exec}. This problem was partially solved via already
mentioned copy-on-write mechanism. See \example{fork/vfork.c} on how it works.
\item The problems of \texttt{vfork} are largely solved by
\texttt{posix\_spawn} if implemented in kernel (it is usually
\texttt{vfork} wrapper). See page \pageref{SPAWN}.
\end{itemize}
%%%%%
ifdef([[[NOSPELLCHECK]]], [[[
\pdfbookmark[1]{getpid, getpgrp, getppid, getsid}{getp}
]]])
\begin{slide}
\sltitle{Process identification}
ifdef([[[NOSPELLCHECK]]], [[[
\texttt{pid\_t \funnm{getpid}(void);}
]]])
\begin{itemize}
\item returns the process ID of the calling process.
\end{itemize}
ifdef([[[NOSPELLCHECK]]], [[[
\texttt{pid\_t \funnm{getpgrp}(void);}
]]])
\begin{itemize}
\item returns the PGID of the calling process
\end{itemize}
ifdef([[[NOSPELLCHECK]]], [[[
\texttt{pid\_t \funnm{getppid}(void);}
]]])
\begin{itemize}
\item returns the process ID of the parent process.
\end{itemize}
ifdef([[[NOSPELLCHECK]]], [[[
\texttt{pid\_t \funnm{getsid}(pid\_t \emph{pid});}
]]])
\begin{itemize}
\item returns the session ID for process \texttt{pid} (0 means for the calling
process)
\end{itemize}
\end{slide}
\begin{description}
\item[process groups] make it possible to send signals to group of processes
at once
\item[session] is a collection of processes created for \emph{job control}.
The processes of the session share one \emph{controlling terminal}.
Session includes one or more process groups. Maximum one group in the session
runs in foreground (\emph{foreground process group}) and has access to the
controlling terminal for input and output, the rest is running in the background
(\emph{background process groups}) and have only optional access to the output
or no at all.
A disallowed operation with a terminal will stop the process. To verify, start
a process in the background, for example a simple shell script that does
\texttt{read a}. On the next shell prompt, you will be notified by the shell
that the process was stopped (you can set your shell to notify you about such
events asynchronously -- if you know what you are doing). The process was
actually stopped by the \texttt{SIGTTIN} signal. How to verify that? Run
\texttt{strace -o output ./the-script \&} and check the output file.
\item[parent process:] Each process (besides \texttt{swapper},
\texttt{pid~==~0})
has a parent, i.e. process that created it with the \texttt{fork} syscall.
If the parent exits before the child, its adoptive parent will become the
\texttt{init} process, that will take care of the zombie after the process ends.
\end{description}
\begin{itemize}
\item To get information about running processes programmatically is possible
using non-standard API (e.g. the \texttt{libproc} library on Solaris built
on top of the \texttt{procfs} filesystem that is mounted under \texttt{/proc}).
\item Note that using the value returned from \texttt{getppid} to check if
the parent exited is not portable (commonly \texttt{init} has
\texttt{pid} 1 however this is not true in many virtualized/containerized
environments, e.g. in \emph{PID namespaces} on Linux, Zones in Solaris).\\
Example: \example{session/getppid.c}.
\end{itemize}
%%%%
ifdef([[[NOSPELLCHECK]]], [[[
\pdfbookmark[1]{setpgrp, setsid}{setp}
]]])
\begin{slide}
\sltitle{Creating a new process group/session}
ifdef([[[NOSPELLCHECK]]], [[[
\texttt{int \funnm{setpgid}(pid\_t \emph{pid}, pid\_t \emph{pgid});}
]]])
\begin{itemize}
\item sets the PGID of the process specified by \texttt{pid} to \texttt{pgid}.
\end{itemize}
ifdef([[[NOSPELLCHECK]]], [[[
\texttt{pid\_t \funnm{setsid}(void);}
]]])
\begin{itemize}
\item creates a new session if the calling process is not a process
group leader
\end{itemize}
\end{slide}
\begin{itemize}
\item For the \funnm{setpgid} syscall the following is true:
\begin{enumerate}
\item ifdef([[[NOSPELLCHECK]]], [[[pid == pgid]]]) : the process with
\emph{\texttt{pid}} will become process group leader
\item ifdef([[[NOSPELLCHECK]]], [[[pid != pgid]]]) : the process with
\emph{\texttt{pid}} will become process group member
\end{enumerate}
\item The process which is not yet process group leader can both become session
leader and process group leader using \texttt{setsid}. If the process already is
process group leader, \texttt{setsid} will fail. To overcome this it is necessary
to call \texttt{fork} and call \texttt{setsid} in the child process
(and the parent will call \texttt{\_exit()}).
Such process does not have controlling terminal however it can acquire it by
opening a terminal which is not yet controlling terminal of a session when
\texttt{open} flags argument does not contain the \texttt{O\_NOCTTY} flag,
or using other implementation dependent way.
\end{itemize}
%%%%%
%%%%%
\pdfbookmark[1]{exec}{exec}
\begin{slide}
\sltitle{Execute a program: \texttt{exec}}
ifdef([[[NOSPELLCHECK]]], [[[
\texttt{extern char **\funnm{environ};\\
int \funnm{execl}(const char *\emph{path}, const char *\emph{arg0}, ... );}
]]])
\begin{itemize}
\item replaces the current process image with a new process image
\item runs a program defined via \emph{path}
\item arguments that follow, including \emph{\texttt{arg0}}, are given to the
program via \texttt{argc} and \texttt{argv} of its \texttt{main()}
\item the argument list must end with \texttt{(char *)0}, i.e. \texttt{NULL}
\item \emph{\texttt{arg0}} should contain the program name (i.e. not the full
path)
\item \emsl{open file descriptors are unaffected by \funnm{exec}}
\begin{itemize}
\item \dots{}aside from file descriptors with flag \texttt{FD\_CLOEXEC}
\end{itemize}
\end{itemize}
\end{slide}
\hlabel{EXEC}
\begin{itemize}
\item \emph{path} must be an absolute or relative path to the executable file.
The \texttt{PATH} environment variable is only used for \funnm{execlp} and
\funnm{execvp} (see one of the slides that follow),
provided \emph{path} does not contain \texttt{'/'}.
\item All variants of these calls are commonly just called the \funnm{exec}
call. It goes without saying that one of the variants is used but usually that
is not important for the sake of a discussion.
\item Sometimes \texttt{argv[0]} is different from the executable file name.
For example, \texttt{login} command prefixes the shell file name with
\texttt{'-'}, e.g. \texttt{-bash}. The shell then knows it is supposed to
function as a login shell. A login shell reads \texttt{/etc/profile}, for
example.
\item \funnm{exec} does not transfer the control to the program in memory
directly, at least for dynamically linked programs.
As described on page \pageref{RUNTIMELINKER}, the system (i.e. the
code of the \funnm{exec} call) first maps the dynamic linker, a.k.a. the loader,
to the process address space. The loader then maps all dynamic libraries there
as well, then finally calls the program \texttt{main()}.
\item A useful exercise is to write a simple program calling for example
\texttt{open()} on a distinct file. Then run the program via \texttt{truss(1)}
or \texttt{strace(1)} like this: \texttt{truss ./a.out}. You will see a number
of system calls before the \texttt{open} is actually called. These system calls
can be attributed to the dynamic linker.
\item The \texttt{FD\_CLOEXEC} file descriptor flag is set using the
\texttt{fcntl} system call. It can be also set via implementation specific
(i.e. non-standard) way, e.g. by passing the \texttt{O\_CLOEXEC} flag to
the \texttt{open} system call on Linux.
\end{itemize}
\begin{slide}
\sltitle{Execute a program: \texttt{exec} (continued)}
ifdef([[[NOSPELLCHECK]]], [[[
\texttt{extern char **\funnm{environ};\\
int \funnm{execl}(const char *\emph{path}, const char *\emph{arg0}, ... );}
]]])
\begin{itemize}
\item successful \funnm{exec} never returns as the new process (program) fully
replaced the address space of the calling process
\begin{itemize}
\item ...the original place to return to no longer exists
\end{itemize}
\item signal handlers are set to default
\begin{itemize}
\item ...as the original handler code no longer exists
\end{itemize}
\item the new process inherits \texttt{environ} from the calling process
\end{itemize}
\end{slide}
\begin{itemize}
\item More about signals on page \pageref{SIGNALS}.
\item \funnm{exec} does not change RUID and RGID. And for security reasons, if
the executed program has a SUID bit set, the program's EUID and saved EUID are
set to the UID of the executable program owner.
\item Today's systems can also execute scripts that start with a line:\\
\texttt{\#!/\emph{interpreter\_path}/\emph{interpreter\_name} \emph{[args]}}
\item The \texttt{system} or \texttt{popen} library calls are more
straightforward to use, however they more often than not execute a shell,
which might have security implications (shell expansion of arguments,
environment variables, command injection, etc.), so generally it is better
to avoid them.
\end{itemize}
%%%%%
ifdef([[[NOSPELLCHECK]]], [[[
\pdfbookmark[1]{execv, execle, execl, execve, execlp}{execvariants}
]]])
\begin{slide}
\sltitle{Variants of the \texttt{exec} call}
\setlength{\baselineskip}{0.8\baselineskip}
\texttt{int \funnm{execv}(const char *\emph{path}, char *const \emph{argv}[]);}
\begin{itemize}
\item like \funnm{execl} but arguments are in the \emph{argv} array,
the last item must be \texttt{NULL}
\end{itemize}
\begin{minipage}{\slidewidth}
ifdef([[[NOSPELLCHECK]]], [[[
\vspace{-1ex}\texttt{\begin{tabbing}
int \funnm{execle}(\=const char *\emph{path}, const char *\emph{arg0},
... ,\\\> char *const \emph{envp}[]);
\end{tabbing}}
]]])
\end{minipage}
\begin{itemize}
\item like \funnm{execl} but instead of the global variable \emph{environ},
the \emph{\texttt{envp}} argument is used
\end{itemize}
\begin{minipage}{\slidewidth}
ifdef([[[NOSPELLCHECK]]], [[[
\vspace{-1ex}\texttt{\begin{tabbing}
int \funnm{execve}(\=const char *\emph{path}, char *const \emph{argv}[],\\
\>char *const \emph{envp}[]);
\end{tabbing}}
]]])
\end{minipage}
\begin{itemize}
\item like \funnm{execv} but instead of \emph{\texttt{environ}},
\emph{\texttt{envp}} is used
\end{itemize}
ifdef([[[NOSPELLCHECK]]], [[[
\texttt{int \funnm{execlp}(const char *\emph{file}, const char *\emph{arg0},
...);\\
int \funnm{execvp}(const char *\emph{file}, char *const \emph{argv}[]);}
]]])
\begin{itemize}
\item like \funnm{execl} and \funnm{execv} but \texttt{PATH} is also used
for searching for the executable file
\end{itemize}
\end{slide}
\begin{itemize}
\item \emsl{l} = list (i.e. list of arguments), \emsl{v} = vector (i.e. an array
of string pointers), \emsl{e}~=~environment (i.e. environment variables are
passed to the function via an argument), \emsl{p} = \texttt{PATH} is used.
\item Aside from \funnm{execlp} and \funnm{execvp}, it is always needed to use
the full path to the executable program, either an absolute or relative one.
\item All variants aside from \funnm{execle} and \funnm{execve}
are also passing to the program being executed the environment variables of the
calling process, i.e. the \texttt{environ} array.
\item For some unknown historical reasons, there is no ``p'' with ``e'' together
in the standard. However, GNU provides \funnm{execvpe} as an extension.
\item \hlabel{EXEC_DATE} Example: \example{exec/exec-date.c}
\item \hlabel{EXECL} The following use of \funnm{execl} is incorrect as it is
missing the mandatory argument for \texttt{argv[0]}:
\begin{verbatim}
execl("/bin/ls", NULL);
\end{verbatim}
On some systems, the above has very interesting consequences. As \texttt{NULL}
is taken as an expected \texttt{argv[0]}, the data on the stack are then
accepted as the program arguments until the next \texttt{NULL} is found there.
In the following example, run on some version of the FreeBSD system, \texttt{ls}
is trying to list file names that are environment variable names and values (the
environment array contain strings \texttt{<varname>=<value>}), as those were on
the stack because the environment was passed to the program being executed, as
we already know. In my case, the output was:
\begin{verbatim}
$ ./a.out
: BLOCKSIZE=K: No such file or directory
: FTP_PASSIVE_MODE=YES: No such file or directory
: HISTCONTROL=ignoredups: No such file or directory
: HISTSIZE=10000: No such file or directory
...
...
\end{verbatim}
Also note that normally \texttt{ls} prints its command name before the colon.
However, given that \texttt{NULL} was put in place of \texttt{argv[0]}, the
command name is printed as an empty string. The '\texttt{:}' character then
leads the output lines, adding to the confusion.
Example: \example{exec/execl-buggy.c}
\end{itemize}
%%%%%
\pdfbookmark[1]{ELF}{ELF}
\begin{slide}
\sltitle{Executable file format}
\begin{itemize}
\item \emsl{a.out} format, in early UNIX versions
\item \emsl{Common Object File Format (COFF)} -- AT\&T System V, superseded
\emsl{a.out}
\item \emsl{Extensible Linking Format (ELF)} -- new in SVR4, replaced both older
formats
\item ELF format:\quad
\raisetab{\begin{tabular}[t]{|c|}
\hline
ELF header\\
\hline
\quad \quad program header table \quad\quad \\
\hline
section 1\\
\hline
$\vdots$\\
\hline
section N\\
\hline
section header table\\
\hline
\end{tabular}}
\end{itemize}
\end{slide}
\hlabel{ELF}
\begin{itemize}
\item The UNIX standard does not specify what executable file format systems
should use. While most of the UNIX and Unix-like systems (e.g. Linux
distributions) use ELF, there are other widely used systems that do not. One
example is macOS (which is a certified UNIX system) that uses the \emph{Mach-O}
file format, short for \emph{Mach Object}. Each Mach-O file is made
up of one Mach-O header, followed by a series of load commands, followed by one
or more segments, each of which contains between 0 and 255 sections.
\item On Solaris, the \texttt{elfdump} command allows listing sections of the
ELF file in a human readable form. On Linux distributions, use \texttt{readelf}.
On macOS (which does not use ELF but the Mach-O format), use \texttt{objdump}.
\item The \emph{ELF header} contains basic information about the file. Try
``\texttt{readelf -h /bin/ls}'' on any Linux distribution.
\item The \emph{program header table} is only present in files that are
executable. For example, dynamic libraries are ELF files that are usually not
executable (on some systems the dynamic linker which is delivered in the form
of dynamic library can be executed for debugging purposes).
The table contains information on the virtual memory layout. You can list
the table via ``\texttt{elfdump~-p}'' or ``\texttt{readelf~-l}''.
\item Sections contain code, data, symbol table, relocation data, etc.
\item The \emph{section header table} contains information for the linker, see
``\texttt{elfdump -c}'' or ``\texttt{readelf~-S}''.
\item Nice diagram of ELF internals:
\url{https://raw.githubusercontent.com/corkami/pics/master/binary/ELF101.png}
\item Some systems stuck on a format based on the original \emph{a.out} for a
long time. For example, OpenBSD moved from \emph{a.out} to ELF in 2003 when
releasing version 3.4.
\item Today it is common that systems randomly arrange the address space
positions of key data areas of a process, including the base of the executable
and the positions of the stack, heap and libraries. This technique is called
\emph{Address Space Layout Randomization} (ASLR) and its objective is to prevent
an attacker from reliably jumping to, for example, a particular exploited
function in memory. First introduced as a Linux kernel patch. OpenBSD was the
first mainstream operating system to support ASLR by default, in version 3.4,
released in 2003. Different systems apply this technique with different
parameters and on different parts of a program. In general it is possible to
introduce randomness into other parts of a system, for example process IDs,
initial TCP sequence numbers, etc.
\end{itemize}
%%%%%
ifdef([[[NOSPELLCHECK]]], [[[
\pdfbookmark[1]{exit, wait, waitpid}{procexit}
]]])
\begin{slide}
\sltitle{Program termination}
\setlength{\baselineskip}{0.6\baselineskip}
ifdef([[[NOSPELLCHECK]]], [[[
\texttt{void \funnm{exit}(int \emph{status});}
]]])
\begin{itemize}
\item terminates a process with a return value \emph{status}. Never returns.
\end{itemize}
ifdef([[[NOSPELLCHECK]]], [[[
\texttt{pid\_t \funnm{wait}(int *\emph{stat\_loc});}
]]])
\begin{itemize}
\item waits for a child process termination, returns its PID and puts
termination information into \emph{\texttt{stat\_loc}} which can be tested as:
\begin{itemize}
\item \texttt{WIFEXITED(stat\_loc)} \dots{} process called
\texttt{exit()}
\item \texttt{WEXITSTATUS(stat\_loc)} \dots{} argument of
\texttt{exit()}
\item \texttt{WIFSIGNALED(stat\_loc)} \dots{} process got a signal
\item \texttt{WTERMSIG(stat\_loc)} \dots{} signal number
\item \texttt{WIFSTOPPED(stat\_loc)} \dots{} process stopped
(\texttt{WUNTRACED} flag required, need \funnm{waitpid} below)
\item \texttt{WSTOPSIG(stat\_loc)} \dots{} stop signal number
\end{itemize}
\end{itemize}
ifdef([[[NOSPELLCHECK]]], [[[
\texttt{pid\_t \funnm{waitpid}(pid\_t \emph{pid}, int *\emph{stat\_loc},
int \emph{opts});}
]]])
\begin{itemize}
\item waits for a specific child process termination
\end{itemize}
\end{slide}
\begin{itemize}
\item \emph{\texttt{status\_loc}} equal to \texttt{NULL} means to ignore the
status information.
\item Function \funnm{\_exit} works as \funnm{exit} but it does not flush stdio
streams and functions registered with the \funnm{atexit} call are not called.
\item There is also \texttt{WIFCONTINUED(stat\_loc)} which means a restarted
process after having been stopped before. However, it is part of an extension
that not all systems support.
\item You can stop a process using ``\texttt{kill -STOP <PID>}'',
and restart it with ''\texttt{kill -CONT <PID>}''.
\item \emph{opts} in \funnm{waitpid} are an OR combination of the following
flags:
\begin{itemize}
\item \texttt{WNOHANG} \dots{} does not hang if there are no processes
that wish to report status
\item \texttt{WUNTRACED} \dots{} children of the current process that were
stopped due to a \texttt{SIGTTIN}, \texttt{SIGTTOU}, \texttt{SIGTSTP}, or
\texttt{SIGSTOP} signal also have their status reported. Such processes are
reported only once per such a situation.
\item \texttt{WCONTINUED} \dots{} also report children of the current
process that were restarted after having been stopped (and not waited for
yet). Part of the same extension as \texttt{WIFCONTINUED}.
\item For the \texttt{WUNTRACED} and \texttt{WCONTINUED} flags, you should
only use them in portable code if a macro \texttt{\_POSIX\_JOB\_CONT\-ROL}
is defined in \texttt{<unistd.h>}.
\end{itemize}
\item \emph{\texttt{pid}} in \funnm{waitpid}:
\begin{itemize}
\item \texttt{== -1} \dots{} wait for any child
\item \texttt{> 0} \dots{} wait for a specific child
\item \texttt{== 0} \dots{} wait for any child in the same process group as
the calling process
\item \texttt{< -1} \dots{} wait for any child in the process group of
\texttt{abs(pid)}
\end{itemize}
\item There are also \texttt{wait3} and \texttt{wait4} calls. These are more
generic versions, also allowing to gather resource utilization statistics from
exited child. These are non-standard.
\item There is also \texttt{waitid} syscall with more direct/exposed semantics
of the child status and information about signals.
\item A parent should always call one of the wait functions otherwise the system
will accumulate \emph{zombies} -- terminated processes that occupy process table
slots only to be waited for by their parents. Zombies could eventually exhaust
all the system memory. Note that if the parent exits, its children are adopted
by the \texttt{init} process that will call \texttt{wait} on such processes.
However, you should always use \texttt{wait} for children even if you know the
parent will exit soon.
\item Actually, you could notify the system that the program will not wait for
its children in which case such zombies will not accumulate. See page
\pageref{IGNORE_SIG_CHLD}.
\end{itemize}
%%%%%
\begin{slide}
\sltitle{Example: start a new process and wait}
\begin{center}
\input{img/tex/fork_wait.tex}
\end{center}
\end{slide}
\begin{itemize}
\item This is a typical way to start a new process and continue after its
termination. The parent could also choose not to wait for the child termination
right away but carry on with its life and wait for the child later.
\item Note that you have to use macros from the previous slide to get the
child's return value out of the status information.
\item \hlabel{WAITPID} Example: \example{wait/wait.c}
\item \hlabel{SPAWN} Alternative for the \texttt{fork}/\texttt{exec} combination
can be the \texttt{posix\_spawn} function. The new process using this function
can be waited on with \texttt{waitpid} etc. just like in the case of
\texttt{fork}. Example: \example{exec/spawn.c}
\end{itemize}
%%%%%
\hlabel{PIPEREADWRITE}
\pdfbookmark[1]{pipe}{pipe}
\begin{slide}
\sltitle{\texttt{pipe()}}
\texttt{int \funnm{pipe}(int \emph{fildes}[2]);}
\begin{itemize}
\item creates an \emph{unnamed pipe} and allocates a pair of file descriptors
\begin{itemize}
\item \texttt{fildes[0]} \dots{} for reading from the pipe
\item \texttt{fildes[1]} \dots{} for writing to the pipe
\end{itemize}
\item the system makes sure that:
\begin{itemize}
\item producer blocks on writing if the pipe is full
\item consumer blocks on reading if the pipe is empty
\end{itemize}
\item consumer gets \texttt{EOF} (i.e. \texttt{read()} on \texttt{fildes[0]}
returns \texttt{0}) only if all copies of \texttt{fildes[1]} are closed.
\item \emph{named pipe} (i.e. FIFO, see \funnm{mkfifo}) works the same way.
The difference is that any process (modulo permissions) can use it.
\end{itemize}
\end{slide}
\hlabel{PIPE}
\begin{itemize}
\item \emph{pipe} is an object with 2 endpoints; serves for passing data from
one to the other.
\item An \emph{unnamed pipe} is created by one process and can be passed to its
children only via file descriptors inherited through \funnm{fork}. That
limitation can be worked around via passing an open file descriptor via a
u{}nix-domain socket. However, such a workaround is out of scope for this
class.
\item If the function \funnm{write} writes at most \texttt{PIPE\_BUF} bytes to
the pipe, it is guaranteed that the write will be atomic, i.e. such bytes will
not be intermingled with bytes written by other producers.
\item \hlabel{TWO_WAY_PIPES} The SUSv3 standard does not specify whether
\texttt{fildes[0]} is also open for writing and if \texttt{fildes[1]} is also
open for reading. FreeBSD and Solaris provide bidirectional pipes while Linux
may not. It is best to assume unidirectional pipes.
\item Writing to a pipe without readers results in the \texttt{SIGPIPE} signal
delivered to the process. The default reaction to that event is process
termination. Example: \example{pipe/broken-pipe.c}
\item Example: \example{pipe/deadlock-in-read.c}
\item \emsl{Important:} the same rules applied to reading and writing from/to
named pipes stand for unnamed pipes as well, see page \pageref{NAMEDPIPE}.
\end{itemize}
%%%%%
\begin{slide}
\sltitle{Example: a pipe between two processes}
\begin{center}
\input{img/tex/pipe.tex}
\end{center}
\end{slide}
\hlabel{FDSHARING}
\begin{itemize}
\item Remember, open file descriptors are unaffected by \funnm{exec} aside from
file descriptors with the \texttt{FD\_CLOEXEC} flag -- those are closed in the
\emsl{successful} \funnm{exec} call.
\item Example: \example{pipe/pipe-and-fork.c}
\item Closing the writing descriptor \texttt{pd[1]} (see
ifdef([[[NOSPELLCHECK]]], [[[{\color[rgb]{1,0,0} $\triangleright$}]]]))
in the consumer process is required as the EOF would not be detected otherwise.