-
Notifications
You must be signed in to change notification settings - Fork 2
/
Copy pathintro.tex
2573 lines (2316 loc) · 101 KB
/
intro.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
% The title slide
include(`macros.m4')
\begin{slide}
\centerslidestrue
\begin{center}
\title{\LARGE Unix/Linux Programming in C}
\author{(NSWI015)}
\date{Version: \rm\today}
\maketitle
\vspace{2ex}
{\small (c) 2011 -- 2024 Vladim\'{i}r Kotal}\\
{\small (c) 2005 -- 2011, 2016 -- 2022 Jan Pechanec}\\
{\small (c) 1999 -- 2004 Martin Beran}
\vspace{2ex}
Department of SISAL\\
Faculty of Mathematics and Physics, Charles University\\
Malostransk\'{e} n\'{a}m. 25, 118 00 Praha 1
\begin{figure}[htb!]
\includegraphics[scale=0.75]{img/by-nc-sa-small}
\end{figure}
\end{center}
\end{slide}
\begin{itemize}
\item This is official material for the class \emph{Unix/Linux Programming in C}
(NSWI015) lectured at the Faculty of Mathematics and Physics, Charles University
in Prague.
\item This material is published under the
\href{http://creativecommons.org/licenses/by-nc-sa/3.0/cz/}{Creative Commons
BY-NC-SA 3.0} license and is always a work in progress, see the history on
GitHub:\\
\url{https://github.com/devnull-cz/unix-linux-prog-in-c}
\item To download the latest version, go to the
\href{https://github.com/devnull-cz/unix-linux-prog-in-c/releases}{releases}
on GitHub.
\item Source code referenced from this material is published in
\href{http://creativecommons.org/licenses/publicdomain/}{Public Domain} unless
specified otherwise in the files.
\item The source code files can be found on GitHub here:\\
\url{https://github.com/devnull-cz/unix-linux-prog-in-c-src}
\item In case you find any errors either in the text or in the example programs,
we would appreciate you letting us know. Especially do not hesitate to create new
issues on \url{https://github.com/devnull-cz/unix-linux-prog-in-c/issues}.
\end{itemize}
\pagebreak
\begin{slide}
\sltitle{Contents}
\slidecontents{0}
\end{slide}
\begin{itemize}
\item This lecture is mostly about Unix principles and Unix programming in the~C
language.
\item \emsl{The lecture is mostly about system calls, i.e. an interface between a
user space and system kernel.}
\item For the API, we will follow the \emph{Single UNIX Specification,
version~4} (SUSv4). Systems that submit to the Open Group for certification and
pass conformance tests are termed to be compliant with the UNIX standard
UNIX~V7. Some versions of AIX, HP-UX and macOS on selected architectures
are compliant with the previous version SUSv3
(\url{http://www.opengroup.org/openbrand/register/xy.htm}).
\item The specific source code examples linked from this material are usually
tested on Solaris, macOS and Linux.
\end{itemize}
%%%%%
\pdfbookmark[0]{intro, programming utilities}{intro}
\begin{slide}
\sltitle{Contents}
\slidecontents{1}
\end{slide}
\pdfbookmark[1]{Current UNIX and Unix-like Systems}{currentunix}
\begin{slide}
\sltitle{Proprietary UNIX and Unix-like Systems}
\begin{itemize}
\item Sun Microsystems, now Oracle: \emsl{SunOS} (defunct), \emsl{Solaris}
\item Apple: \emsl{macOS} (formerly Mac OS X, Mac OS)
\item SGI: \emsl{IRIX} (in maintenance mode)
\item IBM: \emsl{AIX}
\item HP: \emsl{HP-UX}, \emsl{Tru64 UNIX} (defunct, formerly by Compaq)
\item SCO: \emsl{SCO Unix} (discontinued)
\item BSD/OS: \emsl{BSDi} (discontinued)
\item Xinuos (formerly Novell): \emsl{UNIXware}
\end{itemize}
\end{slide}
\begin{slide}
\sltitle{Open source Unix-like Systems}
\begin{itemize}
\item rather extensive number of \emsl{Linux} distributions
\item \emsl{FreeBSD}
\item \emsl{NetBSD}
\item \emsl{OpenBSD}
\item \emsl{DragonflyBSD}
\begin{itemize}
\item all BSD variants have roots in the 4.3BSD-Lite source code
\end{itemize}
\item \emsl{Minix}, micro-kernel based
\item \emsl{Illumos}, based on Solaris
\end{itemize}
\end{slide}
\begin{itemize}
\item Note that \emsl{Linux is a kernel}, not the whole system. In contrast to
FreeBSD for example, which covers both the kernel and the userland. It is
better to say a ``Linux distribution'' if you discuss a whole system that is
built around the Linux kernel.
\item FreeBSD and NetBSD forked from 386BSD (now defunct) in 1993, OpenBSD
forked from NetBSD in 1995, and DragonflyBSD forked from FreeBSD in 2003.
386BSD itself was based on 4.3BSD-Lite. However, the history is much more
complicated, as usual.
\item Presently, the ``UNIX'' trademark can be only used by systems that passed
conformance tests defined in the Single UNIX Specification (SUS).
\item From those systems listed above, only macOS, AIX, and HP-UX are
UNIX~03 compliant (\url{http://www.opengroup.org/openbrand/register/}). Other
non-certified systems, are often described as ``Unix-like'', even when in many
cases they closely follow the standard. However, the word ``Unix'' is often used
for systems from either group.
\item The above list is a tiny fraction of the whole Unix world. Every
proprietary Unix variant likely came from either UNIX~V or BSD, and added its
own features. This resulted in quite a few standards as well, see page
\pageref{UNIXSTANDARDS}. In the end vendors agreed upon a small set of those.
\item If you are interested in a detailed and up-to-date Unix system version
history, go check \url{https://www.levenez.com/unix/}.
\end{itemize}
%%%%%
\pdfbookmark[1]{UNIX standards}{unixstd}
\begin{slide}
\sltitle{UNIX standards}
\begin{itemize}
\renewcommand{\baselinestretch}{0.8}
\item \emsl{SVID} (System~V Interface Definition)
\begin{itemize2}
\item \uv{purple book}, published by AT\&T first in 1985
\item today at version SVID4 from 1995 (SVID3 corresponds to SVR4)
\end{itemize2}
\item \emsl{POSIX} (Portable Operating System based on UNIX)
\begin{itemize2}
\item family of standards published by the IEEE organization marked
P1003.xx, gradually incorporated into ISO standards
\end{itemize2}
\item \emsl{XPG} (X/Open Portability Guide)
\begin{itemize2}
\item recommendation of the X/Open consortium, that was founded in 1984
by leading UNIX platform companies
\end{itemize2}
\item \emsl{Single UNIX Specification}
\begin{itemize2}
\item standard of the The Open Group organization founded in 1996 via
merging X/Open and OSF
\item today at Version~4 (\emsl{SUSv4})
\item compliance is a requisite condition for using the UNIX trademark
\end{itemize2}
\end{itemize}
\end{slide}
\hlabel{UNIXSTANDARDS}
\begin{itemize}
\item The very basic information is that the area of UNIX standards is very
complex and incomprehensible on a first sight.
\item AT\&T allowed the producers to call its own commercial UNIX variant
``System V'' only if it complied to the SVID standard conditions. AT\&T also
published \emph{System~V Verification Suite} (SVVS), that checked whether a given
system complies to the standard.
\item POSIX (Portable Operating System Interface) is a standardization effort
of the IEEE organization (Institute of Electrical and Electronics Engineers).
\item SUSv4 is a common standard of The Open Group, IEEE (Std. 1003.1, 2008
Edition) and ISO (ISO/IEC 9945-2008).
\item To certify a given system for the Single Unix Specification, it is necessary
to pass a series of tests (on given architecture, e.g. 64-bit x86).
The results of the tests are then evaluated. The tests themselves are unified into
so called \emph{test suites}, which are sets of automatic tests that go through
the system and verify if it implements the interfaces specified in the standard.
For example, for SUSv3 there are 10 such test suites.
\item The interfaces specified by the POSIX.1-2008 standard are divided into 4
basic groups: XSH (System Interfaces), XCU (Shell and Utilities), XBD
(Base definitions). W.r.t. number of interfaces, the biggest of them is XSH which
contains more than 1000 interfaces.
\item The interface groups of POSIX together with the Xcurses group, are part
of the Single Unix Specification (however not part of POSIX base in the IEEE Std
1003.1-2001 standard) which includes 1833 interfaces in total, which form the Single Unix
Specification (SUSv4, 2018). The SUSv4 interface tables are here:
\url{https://unix.org/version4/GS5\_APIs.pdf}
\item Commercial UNIXes largely follow the Single UNIX Specification, compliance
to this standard is the condition to use the UNIX trademark
(the UNIX 98 brand corresponds to SUSv2, UNIX 03 corresponds to SUSv3, SUSv4 is
UNIX V7 - do not mix it up with historical V7 UNIX). It is built on the POSIX
base.
\item We are going to follow SUSv4 for APIs in this lecture. The data structure
definitions and algorithms used by the kernel will be mostly based on
System~V Rel.~4 to keep things simple.
\item On Solaris there is an extensive \texttt{standards(5)} manual page, where
lots of information about standards can be found in one place.
Individual commands compliant to the standard are moreover placed
into special directories, e.g. the \texttt{tr} program is located in
\texttt{/usr/xpg4/bin/} and \texttt{/usr/xpg6/bin/} directories, in each there
is a version of the program compliant to the respective standard.
The options and behavior specified by the standard can be then relied upon e.g.
when writing shell scripts.
\item Also on Solaris, look into the
ifdef([[[NOSPELLCHECK]]], [[[
\texttt{/usr/inc{}lude/sys/fea\-ture\-\_tests.h}]]]) header file.
\end{itemize}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\pdfbookmark[1]{POSIX}{POSIX}
\begin{slide}
\sltitle{POSIX}
\begin{itemize}
\renewcommand{\baselinestretch}{0.8}
\item statements like ``this system is POSIX compatible'' do not give any
concrete information whatsoever
\begin{itemize}
\item it might comply to POSIX1990 -- what else ?
\item given person either does not know what is POSIX or thinks you do not know
\item the only reasonable reaction is ``what POSIX?''
\end{itemize}
\item POSIX is \emsl{family of standards}
\item the first document is \emph{IEEE Std POSIX1003.1-1988}, later after the
introduction of extensions referred to informally as POSIX.1
\item last version of POSIX.1 is \emph{IEEE Std 1003.1-2008, 2016 Edition}
\begin{itemize}
\item contains even content formerly defined by POSIX.2 (Shell and Utilities)
and miscellaneous, formerly standalone extensions.
\end{itemize}
\end{itemize}
\end{slide}
\hlabel{POSIX}
\begin{itemize}
\item The first document is \emph{IEEE Std POSIX1003.1-1988}, formerly simply
referred to as POSIX, then referenced as \emph{POSIX.1}, because by POSIX it is
currently meant a set of related standards. POSIX.1 back then contained
programming API, i.e. work with processes, signals, files, timers etc.
It was accepted by the ISO organization (\emph{ISO 9945-1:1990}) With small
changes and is referred to as POSIX1990. IEEE reference is
\emph{IEEE Std POSIX1003.1-1990}. This standard was a great success on its own
however still did not connect the System~V and BSD camps, because it did not
contain BSD sockets or System~V IPC (semaphores, messages, shared memory).
Part of the standard is the ``POSIX conformance test suite (PCTS)'', which is
freely available.
\item The POSIX brand was conceived by Richard Stallman, who founded the GNU
project in 1983.
\item Important extensions of IEEE Std 1003.1-1990 (they are part of IEEE Std
1003.1, 2004 Edition):
\begin{itemize}
\item \emph{IEEE Std 1003.1b-1993 Realtime Extension}, informally also known as
POSIX.4, because that was its original naming before renumbering. Most of this
extension is optional, therefore the claim ``system supports POSIX.1b'' gives
even worse testimony that ``system is POSIX compatible'', i.e. practically
zero. The only mandatory part of POSIX.4 is a simple addendum to signals
compared to POSIX1990. It is therefore always necessary to state what exactly
out of POSIX.4 is implemented -- e.g. shared memory, semaphores, real-time
signals, memory locking, asynchronous I/O, timers, etc.
\item \emph{IEEE Std 1003.1c-1995 Threads}, see page \pageref{POSIXTHREADS}.
\item \emph{IEEE Std 1003.1d-1999 Additional Realtime Extensions}
\item \emph{IEEE Std 1003.1j-2000 Advanced Realtime Extensions}, see page
\pageref{RWLOCKS}.
\item \dots
\end{itemize}
\item The POSIX standards can be found on \url{http://www.open-std.org/}.
The HTML version is freely available, PDF documents have to be purchased.
\end{itemize}
\pdfbookmark[1]{books}{books}
%%%%%
\begin{slide}
\sltitle{Books on Unix system principles and design}
\begin{enumerate}
\item Uresh Vahalia: \emsl{UNIX Internals: The New Frontiers}.
Prentice Hall; \nth{1} edition, 1995
\item Bach, Maurice J.: \emsl{The Design of the UNIX Operating System}.
Prentice Hall, 1986
\item McKusick, M. K., Neville-Neil, G. V.: \emsl{The Design and
Implementation of the FreeBSD Operating System}. Addison-Wesley, 2004
%\item Goodheart, B.; Cox, J.: \emsl{The Magic Garden Explained: the
%Internals of UNIX System~V Release 4}. Prentice Hall, 1994
\item McDougall, R.; Mauro, J.: \emsl{Solaris Internals}. Prentice Hall; \nth{2}
edition, 2006.
\item \emsl{Linux Documentation Project}. \url{http://tldp.org/}
\end{enumerate}
\end{slide}
\begin{itemize}
\item These books are about Unix internals, not about Unix system programming.
\end{itemize}
\begin{enumerate}
\item A great book on Unix in general and compares SVR4.2, 4.4BSD, Solarix~2.x
and Mach systems. The \nth{2} edition, scheduled for 2005, never happened,
unfortunately.
\item UNIX classic book. On UNIX System~V Rel.~2, and partially 3 as well.
While outdated, it is one of the best books ever written on Unix. In 1993 a
Czech translation was released as
ifdef([[[NOSPELLCHECK]]],
[[[\emsl{Principy opera\v{c}n\'{\i}ho syst\'{e}mu UNIX}]]]), SAS.
\item Structures, functions, and algorithms of the FreeBSD 5.2 kernel; it is
based on another Unix classic book \emsl{The Design and Implementation of the
4.4 BSD Operating System} by the same author.
\item The best book on the Solaris operating system. The system version in the
book is Solaris~10.
\item Linux documentation project home page.
\end{enumerate}
%%%%%
\begin{slide}
\sltitle{Books on Unix programming}
\begin{enumerate}
\item Stevens, W. R., Rago, S. A.: \emsl{Advanced Programming in UNIX(r)
Environment}. Addison-Wesley, \nth{2} edition, 2005.
\item Rochkind, M. J.: \emsl{Advanced UNIX Programming},
Addison-Wesley; \nth{2} edition, 2004
\item Stevens, W. R., Fenner B., Rudoff, A. M.: \emsl{UNIX Network
Programming, Vol. 1 -- The Sockets Networking API}. Prentice Hall,
\nth{3} edition, 2004
\item Butenhof, D. R.: \emsl{Programming with POSIX Threads},
Addison-Wesley; \nth{1} edition, 1997
% I don't why but after I switched from FreeBSD to Solaris, I can't typeset
% word "unix" anymore. It's like it wasn't there. Using {} trick helps.
\item UNIX specifications, see \url{http://www.unix.org}
\item manual pages, mainly sections 2 and 3
\end{enumerate}
\end{slide}
\hlabel{REF_PROGRAMMING}
\begin{enumerate}
\item One of the best book on programming in Unix environment. Does not cover
net\-work\-ing, that is in 3.
\item Another classic book on programming in Unix environment. Also covers
net\-work\-ing. Not as detailed as books 1 and 3 but that could be to your
advantage. We very much recommend this book, especially if you want just one.
The author can see the big picture which is quite rare.
\item Unix network programming classic, one of the best on the topic; there is
also volume 2, \emsl{UNIX Network Programming, Volume 2: Interprocess
Communications}, covering interprocess communication in great detail.
\item Great book on programming with threads using POSIX API. Highly
recommended.
\item UNIX specifications.
\item Detailed descriptions of system calls and functions.
\item \hlabel{POSIX4} A book that did not fit the slide and covers topics outside
of the scope of this class: Gall\-meis\-ter, B. R.: \emsl{POSIX.4 Programmers
Guide: Programming for the Real World}, O'Reilly; \nth{1} edition, 1995. A great
book on real-time POSIX extensions with a beautiful cover. See also pages
\pageref{REALTIMEEXTENSIONS} a \pageref{SIGWAITINFO}.
\item[\ldots] Go to Amazon and search for ``unix''. If you ever buy anything,
always check whether there is a newer edition of the same book. Note that they
often still sell older releases as well.
\item[\ldots] You can also buy lots of these books on Amazon in a decent second
hand quality for a fraction of the original price.
\item[\ldots] or you can borrow them in library of the faculty !
\end{enumerate}
%%%%%
\begin{slide}
\sltitle{Manual page sections}
\begin{itemize}
\item the convention is that ``(X)'' after a name means the manual page section
\item for example, \texttt{chmod(2)} means a man page for the system call from a
section 2, it does \emsl{not} mean a function call
\item \texttt{chmod(1)} means the shell command
\item use ``\texttt{man <N> <name>}'' to get the specific man page
\item example: \texttt{man 2 chmod}
\item see the \texttt{man-pages(7)} man page on Linux on what sections exist
\end{itemize}
\end{slide}
\begin{itemize}
\item Different systems might have a different list of manual page sections, the
numbering may not match, etc. See also \texttt{man(1)}. For example, on
Solaris, the manual page section needs to be provided with the \texttt{-s}
option, i.e. ``\texttt{man -s 2 chmod}''.
\item The \texttt{man} command uses a list of system directories to search for
man pages. If you have manual pages someplace else, perhaps in a local subtree
after you unpacked a tar file you downloaded and you want to check the
documentation, the \texttt{-M} option or the \texttt{MANPATH} environment
variable may come in handy.
\item Sometimes there are entries for the same name in several sections. If
unsure what you are looking for, use the \texttt{-a} option to get all manual
pages for that name (otherwise you get just one, usually from the first
section found), and go through the individual man pages with the \texttt{q}
command for \texttt{less(1)} which is usually the default pager (or possibly
\texttt{more(1)}).
\end{itemize}
%%%%%
\pdfbookmark[1]{The C Programming Language}{C}
\hlabel{C_LANGUAGE}
\begin{slide}
\sltitle{The C Programming Language}
\begin{itemize}
\item virtually all Unix kernels are written in C. Only some HW dependent parts
are written in assembler.
\item C came into existence in the years 1969-1973, by Dennis M. Ritchie (\dag
2011)
\item it evolved from B, designed by Ken Thomson
\item created as means to rewrite original Unix in a higher language. It also
greatly helped \emsl{portability of the system.}
\item language variants
\begin{itemize}
\item original K\&R C (1978-1979)
\item standard ANSI/ISO C (1989), then next C standard revisions
\end{itemize}
\end{itemize}
\end{slide}
\begin{itemize}
\item The success of C eventually overcame the success of Unix itself.
\item CPL $\Rightarrow$ BCPL $\Rightarrow$ B (Thompson, interpret)
$\Rightarrow$ C. Both Thompson and Ritchie worked for Bell Laboratories.
\item It took many years before C reached its first standard. Most work on C
happened in 1972, another peak was in 1977-1979, then in the 1980s ANSI commitee
was established to provide the first standard on C. For more information on the
early C history, see \emph{Dennis M. Ritchie, The Development of the C Language}
paper, available freely.
\item K\&R C refers to the C language as described in the first edition of
\emph{The C Prog\-ramm\-ing Language} classic book by Kernighan and Ritchie,
Prentice-Hall, 1978.
\item In 1983 ANSI (American National Standards Institute) formed a commitee
X3J11 to create the first C standard. After a long and tedious process the
standard came to existence as ANSI X3.159-1989 ``Programming Language C,'' and
is mostly known as ``ANSI C'', or C89, and the command line name for the
compiler itself was \texttt{c89}.
\item The \nth{2} edition of the C book (1988) was updated for the upcoming
standard as it used one of its final drafts. In 1990, ANSI C was adopted by ISO
as ISO/IEC 9899:1990; that standard is sometimes called C90. It's the same as
C89 but it renumbered its sections and removed the rationale document which was
part of ANSI C. That standard was adopted back by ANSI. After C89, ANSI never
got involved in the C standardization anymore, it only adopted each ISO C
standard.
\item The next revision of the language was released in 1999 as ISO 9899:1999,
informally called C99. After that, there were three technical corrigendums,
TC1, TC2, and TC3, so the current version of the C99 standard is the combined
C99+TC1+TC2+TC3, WG14~N1256, dated 2007-09-07. It is a work in progress,
with its current final draft located here,
\url{http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1256.pdf}.
\item After C99, C11 came, officially ISO/IEC 9899:2011. And then C17,
a bug fix standard, in 2018, officially ISO/IEC 9899:2018.
\item Some difference between C89 and C99 -- inline functions, variable
definitions intermixed with code, one-line comments using \texttt{//}, new
functions like \funnm{snprintf}() etc.
\item The ISO C standards are not free but the drafts are. The latest draft for
each standard is virtually the standard itself, it just not does not say that.
See \url{http://www.open-std.org/}.
\end{itemize}
%%%%%
\begin{slide}
\sltitle{Byte ordering}
\begin{itemize}
\item byte ordering -- depends on the architecture
\begin{itemize}
\item \raisetab{
\begin{tabular}[t]{r|c|c|c|c|}
little endian: 0x11223344 =
&44&33&22&11\\
\multicolumn{1}{r}{\texttt{addr +}}&
\multicolumn{1}{c}{0}&\multicolumn{1}{c}{1}&
\multicolumn{1}{c}{2}&\multicolumn{1}{c}{3}
\end{tabular}}
\item \raisetab{
\begin{tabular}[t]{r|c|c|c|c|}
big endian: 0x11223344 =
&11&22&33&44\\
\multicolumn{1}{r}{\texttt{addr +}}&
\multicolumn{1}{c}{0}&\multicolumn{1}{c}{1}&
\multicolumn{1}{c}{2}&\multicolumn{1}{c}{3}
\end{tabular}}
\end{itemize}
\item little endian -- Intel, ARM (mostly, but it does support both)
\item big endian -- SPARC, MIPS, network byte ordering
\end{itemize}
\end{slide}
\hlabel{BYTE_ORDERING}
\begin{itemize}
\item Be careful when using tools like \texttt{hexdump} that by default print
out a file as 16-bit numbers. The ordering of individual bytes may not be how
they are stored in a file. For example, take FreeBSD on i386. The first number
in the file is character ``i'' which represents the lower 8 bits of the first 16-bit
number, so when the first two bytes are printed out as a 16-bit number, the byte
representing ``i'', i.e. ``69'', is shown as the second byte. Similarly for
``kl''.
\begin{verbatim}
$ echo -n ijkl > test
$ hexdump test
0000000 6a69 6c6b
0000004
\end{verbatim}
You can use other output formats though, for example as hex bytes and
characters in the same output:
\begin{verbatim}
$ hexdump -C test
00000000 69 6a 6b 6c |ijkl|
00000004
\end{verbatim}
\item The UNIX spec does not list \texttt{hexdump} but defines \texttt{od}
(octal dump). The equivalent output for the \texttt{hexdump} default output is
as follows. Note that since we did that on SPARC, the output is different from
the i386 output above!
\begin{verbatim}
$ od -tx2 test
0000000 696a 6b6c
0000004
\end{verbatim}
\end{itemize}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\begin{slide}
\sltitle{New line character(s)}
\begin{itemize}
\item in Unix, a text file line ends with a single character \emsl{LF}
\item in Windows (and MS~DOS), a new line ends with two characters, \emsl{CR+LF}
\item on Unix, calling ifdef([[[NOSPELLCHECK]]], [[[\verb.putc('\n').]]])
thus prints only one character
\item ``classic'' Mac~OS used \emsl{CR}
\end{itemize}
\end{slide}
\hlabel{NEWLINECHAR}
\begin{itemize}
\item \emsl{LF}, \emph{line feed}, sometimes also referred to as \emph{new
line}, is a character 0x0A (10). \emsl{CR}, \emph{carriage return}, or simply
\emph{return}, is a character 0x0D (13).
\item To further confuse the enemy, ``classic'' Mac OS used a single \emsl{CR}
as line breaks. As present time macOS comes from the Unix world, it also uses
\emsl{LF} now.
\item When you open a text file in the classic \texttt{vi} editor and you see
strange \verb|^M| characters at the end of every line, it is that \emsl{CR}
character from a line separator in a file brought over from a Windows system.
Just get rid of them via \verb|:%s/^V^M//g| where \verb|^X| means Ctrl+X.
ViM (\emph{Vi IMproved}) by default tries to be smarter in such situations but
not always to your benefit.
\item See the \texttt{ascii} man page for the octal, hexadecimal, and decimal
ASCII character sets (i.e. up to character 127 as ASCII table has only 128
characters).
\end{itemize}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\pdfbookmark[1]{C style}{cstyle}
\begin{slide}
\sltitle{C style}
\begin{itemize}
\item C style of the source code files is extremely important
\item there are quite a few ways how to do it:
\begin{verbatim}
int
main(void)
{
int i;
char c = 'X';
for (i = 0; i < 10; ++i)
printf("%c%d\n", c, i);
return (0);
}
\end{verbatim}
\end{itemize}
\end{slide}
\begin{itemize}
\item One of the most important thing of a C style (well, any style) is
consistency. And often it is not that important what exact C style a group
of coders is going to pick as it is that one specific style is chosen and then
religiously followed by all in the group. A good and rigorously followed cstyle
leads to a smaller number of bugs in code.
\item A process that runs C style check script before integration into a source
code repository automatically and refuses to accept any changesets not following
the chosen C style is a working solution to avoid C style violations.
\item \url{http://mff.devnull.cz/cstyle/}
\end{itemize}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\begin{slide}
\sltitle{C style (cont.)}
\begin{itemize}
\item many ways how \emsl{NOT} to do it (so called assembler style):
\begin{verbatim}
int main(void) {
int i = 0; char c;
printf("%d\n", i);
return (0);
}
\end{verbatim}
\item or a schizophrenic style:
\begin{verbatim}
int main(void) {
int i = 0; char c;
if (1)
printf("%d\n", i);i=2;
return (0); }
\end{verbatim}
\end{itemize}
\end{slide}
\begin{itemize}
\item A good C style of of the source code you write represents you. You will
be judged by other people by the way your source code looks. Always try to
write beautiful code.
\end{itemize}
%%%%%
\begin{slide}
\sltitle{Standard Utilities}
\begin{tabular}{ll}
ifdef([[[NOSPELLCHECK]]], [[[\emsl{cc}, \emsl{c99}$^*$,
\emsl{gcc}$^\dagger$&]]]) C compiler\\
ifdef([[[NOSPELLCHECK]]], [[[\emsl{CC}, \emsl{g++}$^\dagger$&]]]) C++ compiler\\
ifdef([[[NOSPELLCHECK]]], [[[\emsl{ld}&]]]) linker\\
ifdef([[[NOSPELLCHECK]]], [[[\emsl{ldd}&]]]) for listing dynamic object dependencies\\
ifdef([[[NOSPELLCHECK]]], [[[\emsl{cxref}$^*$&]]]) generate a C program cross-reference table\\
ifdef([[[NOSPELLCHECK]]], [[[\emsl{sccs}$^*$&]]]) source code management\\
ifdef([[[NOSPELLCHECK]]], [[[\emsl{make}$^*$&]]]) for maintaining program dependencies\\
ifdef([[[NOSPELLCHECK]]], [[[\emsl{ar}$^*$&]]]) for managing archives\\
ifdef([[[NOSPELLCHECK]]], [[[\emsl{dbx}, \emsl{gdb}$^\dagger$&]]]) debuggers\\
ifdef([[[NOSPELLCHECK]]], [[[\emsl{prof}, \emsl{gprof}$^\dagger$&]]]) profilers\\
\end{tabular}
\hspace{0.5cm}$^*$ SUSv4 $^\dagger$ GNU
\end{slide}
SUSv4
\begin{itemize}
\item The standard C language compiler is \texttt{c99}, required by the
specification. Be careful as the default mode for \texttt{gcc} does not
conform to any of the ISO C standards. You need to check the manual page
for your version, look for the option \texttt{-std=} to see what is the default.
For example, for version 4.2.1, the default is \texttt{-std=gnu89}, for version
7.2, it is \texttt{-std=gnu11}.
\item Do not use \texttt{sccs} for source code management. Unless you are
forced to use a centralized source code management (CVS, Subversion, etc.) due
to historical reasons or while working on an existing project, always use a
\emsl{distributed} source code management system when starting a new project.
We recommend Git (\texttt{git}) or Mercurial (\texttt{hg}).
\item Debuggers and profilers are not part of the standard.
\end{itemize}
%%%%%
\begin{slide}
\sltitle{File name convention}
\begin{tabular}{ll}
\texttt{*.c} & the C language source code files\\
\texttt{*.cc} & the C++ language source code files\\
\texttt{*.h} & header files\\
\texttt{*.o} & object files\\
\texttt{a.out} & the default executable file name after the compilation
\end{tabular}
\begin{tabular}{ll}
ifdef([[[NOSPELLCHECK]]], [[[
\texttt{/usr/inc{}lude} & system header file root\\
\texttt{/usr/lib/lib*.a} & static libraries\\
\texttt{/usr/lib/lib*.so} & dynamic libraries
]]])
\end{tabular}
\end{slide}
\begin{itemize}
\item Static libraries -- code for used external functions is copied into a
target program. Not used much nowadays.
\item Dynamic libraries -- the list of dynamic libraries needed are part of the
program, on execution the dynamic linker (path to the dynamic linker is also
part of the program, see page \pageref{RUNTIMELINKER}) loads them to memory and
relocates pointers.
\item Today, dynamic libraries are mostly used as they save disk space and you
do not need to recompile all the utilities and other program on library
upgrades.
\item In specific situations, static libraries are still needed though, for
example, in standalone binaries when booting an operating system.
\item The origin of the name \texttt{a.out} is as follows. Initially, even
before the C was invented, there were no libraries, no loader or link editor in
the first version of the UNIX system: the entire source of a program was
presented to the assembler, and the output file with a fixed name that emerged
was directly executable. So \texttt{a.out} means ``the output of the
assembler''. Even after the system gained a linker and a means of specifying
another name explicitly, it was retained as the default executable result of a
compilation. See \emph{Dennis M. Ritchie, The Development of the C Language}
paper, available freely.
\end{itemize}
%%%%%
\begin{slide}
\sltitle{Process of compilation}
\begin{center}
\input{img/tex/compilation_process.tex}
\end{center}
\end{slide}
\begin{itemize}
\item Non-trivial programs are often split into several source code files that
contain related functions. Such files can be compiled independently, and you
can even use different languages and different compilers for each file. The
advantage is the speed of building, as only modified files are re-compiled (see
page \pageref{MAKE} on the \texttt{make} utility), and also flexibility, as you
can use some of the files in other programs as well.
\item The \emph{compiler} compiles each file into a corresponding object file.
Instead of external function pointers in the compiled code, the object file
contains a table of global symbols.
\item Then, the \emph{linker} combines the built object files and used
libraries into an output file. By default, it also resolves all the references
to make sure all symbols used are available.
\item Used code from the static libraries is copied to the executable file.
When using dynamic libraries, the executable only contains a list of them, the
linking process is then performed by the runtime linker (aka loader) on the
program execution. For more on the dynamic linking process, see page
\pageref{RUNTIMELINKER}.
\item Linker options decide whether static or dynamic library is created.
By default, dynamic libraries are used nowadays. The source code
is the same in either case. There is also a mechanism (\texttt{dlopen},
\texttt{dlsym}\dots) that allows to load an additional dynamic library during
the program execution, and use it. For more information, see page
\pageref{DLOPEN}.
\end{itemize}
%%%%%
\begin{slide}
\sltitle{Compilation of one file: preprocesor}
\begin{center}
\input{img/tex/preprocesor.pstex_t}
\end{center}
\end{slide}
\begin{itemize}
\item The preprocessor performs macro expansion, conditional compilation, and
inserts included files. It also removes comments.
\item The preprocessor output can be provided via \texttt{cc -E} or calling
\texttt{cpp} directly. However, some compilers have the preprocessor
functionality built in so calling the external preprocessor may not get the same
results. You can of course use the preprocessor for anything else where its
functionality comes in handy, not just for C source code.
\item In a situation where you need to fix code full of includes and conditional
compilation, the output after the preprocessor phase may be very helpful to
locate the problem.
\item \texttt{cpp} (or \texttt{cc -E}) also allows you to see the whole tree of
included files, printed on the standard error output. For that, use a separate
\texttt{-H} option (not \texttt{-EH}) and redirect the output to
\texttt{/dev/null}:
\begin{verbatim}
$ gcc -E -H tcp/connect.c >/dev/null
. /usr/include/stdio.h
.. /usr/include/sys/cdefs.h
... /usr/include/sys/_symbol_aliasing.h
... /usr/include/sys/_posix_availability.h
.. /usr/include/Availability.h
... /usr/include/AvailabilityInternal.h
.. /usr/include/_types.h
... /usr/include/sys/_types.h
.... /usr/include/machine/_types.h
..... /usr/include/i386/_types.h
etc...
\end{verbatim}
\item You cannot nest comments in C, so in order to temporarily disable code with
comments without deleting it, wrapping it in another comment will not work.
So, the preprocessor to your rescue -- use the conditional compilation feature:
\begin{verbatim}
...
#if 0
/* some comment */
some_function();
/* another comment */
another_function();
#endif
...
\end{verbatim}
\end{itemize}
%%%%%
\begin{slide}
\sltitle{Compilation of one file: compiler}
\begin{center}
\input{img/tex/compiler.tex}
\end{center}
\end{slide}
\begin{itemize}
\item The picture is an example output for the x86 platform, 32-bit,
with AT\&T assembler syntax.
\item Compilation from the C language into assembler.
\item The assembler output file is the result of \texttt{cc -S}.
\end{itemize}
%%%%%
\begin{slide}
\sltitle{Compilation of one file: assembler}
\begin{center}
\input{img/tex/assembler.pstex_t}
\end{center}
\end{slide}
\begin{itemize}
\item Again an example for the x86 platform, 32-bit.
\item Compilation from the assembler language into the object code.
\item The output file is the result of \texttt{cc -c}.
\end{itemize}
%%%%%
\begin{slide}
\sltitle{Compiler}
\renewcommand{\arraystretch}{1.1}
\begin{itemize}
\item usage:\\
\texttt{cc [\emph{options}] \emph{file} \dots}
\item the most important options:\\
\begin{tabular}{ll}
\texttt{-o \emph{file}} & output file name\\
\texttt{-c} & only compile, do not link\\
\texttt{-E} & only preprocessor\\
\texttt{-l} & link with the specified library\\
\texttt{-L\emph{directory}} & add a directory to search when using \texttt{-l}\\
\texttt{-O\emph{level}} & optimization level\\
\texttt{-g} & compile with debug information\\
\texttt{-D\emph{name}} & define a macro for the preprocessor\\
\texttt{-I\emph{directory}} & add a directory to search for \texttt{\#include} files
\end{tabular}
\end{itemize}
\end{slide}
\begin{itemize}
\item \texttt{-l}/\texttt{-L} are actually options for the linker, i.e. the
compiler will pass them on onto the linker.
\item Both the compiler and linker have an extensive list of additional options
that influence the generated code and what warnings are printed during the
compilation/linking based on the chosen language and the standard. See manual
pages for \texttt{cc}, \texttt{gcc}, and/or \texttt{ld}.
\end{itemize}
%%%%%
\pdfbookmark[1]{standard macros}{stdmacros}
\begin{slide}
\sltitle{UNIX standard macros}
\begin{tabbing}
\hskip 13em \= \kill
\verb#__FILE__#, \verb#__LINE__#,\\\verb#__DATE__#, \verb#__TIME__#,\\
\verb#__cplusplus#, etc.
\> are standard macros for the compiler \\\>C/C++\\
\verb#unix# \> always defined if on Unix\\
ifdef([[[NOSPELLCHECK]]], [[[
\verb#mips#, \verb#i386#, \verb#sparc# ]]]) \> hardware architecture\\
\verb#linux#, \verb#__APPLE__#, \verb#sun#, \verb#bsd# \> operating system\\
\verb#_POSIX_SOURCE#,\\\verb#_XOPEN_SOURCE#
\> build using the specific standard\\
\end{tabbing}
\end{slide}
\begin{slide}
\sltitle{UNIX standard macros (cont.)}
To build using a specific standard, you need to define one of the macros below
before any ifdef([[[NOSPELLCHECK]]], [[[\verb.#include.]]]).
Then include \texttt{unistd.h}.
\vspace{2ex}
\begin{tabular}{l@{\hspace{3em}}l}
\emsl{UNIX 98} &\verb.#define _XOPEN_SOURCE 500.\\
\emsl{SUSv3} &\verb.#define _XOPEN_SOURCE 600.\\
\emsl{SUSv4} &\verb.#define _XOPEN_SOURCE 700.\\
\emsl{POSIX1990} &\verb.#define _POSIX_SOURCE.
\end{tabular}
\end{slide}
\begin{itemize}
\item The way it works is that you use specific macros to define what you
want (e.g. \texttt{\_POSIX\_SOURCE}), and then you use other macros (e.g.
\texttt{\_POSIX\_VERSION}) to find out what you actually got. You always have
to include \texttt{unistd.h} after you set the macros and use a compiler that
supports what you want. For example, below we tried to compile
\example{basic-utils/standards.c} which requires SUSv3, on a system supporting
SUSv3 (Solaris 10), using a compiler that only supports SUSv2 (the compiler
defined in SUSv3 is \texttt{c99}). Note that the default behavior of your
compiler might be same as \texttt{c89}.
\begin{verbatim}
$ cat standards.c
#define _XOPEN_SOURCE 600
/* you must #include at least one header !!! */
#include <stdio.h>
int main(void)
{
return (0);
}
$ c89 basic-utils/standards.c
"/usr/include/sys/feature_tests.h", line 336: #error: "Compiler or
options invalid; UNIX 03 and POSIX.1-2001 applications require
the use of c99"
cc: acomp failed for standards.c
\end{verbatim}
%\item the source of macros for standard can be mentioned already on page
%\pageref{UNIXSTANDARDS}. Mentioned header file \texttt{feature\_tests.h}
%on Solaris
\item See the documentation for your compiler about what other macros can be
used.
\item See page \pageref{C_LANGUAGE} for more information on standards.
\item Regarding macros for specific standards, you can find very good
information in chapter 1.5 in [Rochkind]. See also
\example{basic-utils/suvreq.c}.
\begin{verbatim}
int
main(void)
{
#ifdef unix
printf("Yeah!\n");
#else
printf("Oh, no.\n");
#endif
return (0);
}
\end{verbatim}
\item For an example on using \texttt{\_\_LINE\_\_}, see
\example{basic-utils/main\_\_LINE\_\_.c}
\end{itemize}
%%%%%
\pdfbookmark[1]{link editor}{linker}
\begin{slide}
\sltitle{Link editor (linker)}
\begin{itemize}
\item Invocation:\\
\texttt{ld [\emph{options}] \emph{file} \dots}\\
\texttt{cc [\emph{options}] \emph{file} \dots}
\item Often used options:\\
\begin{tabular}{ll}
\texttt{-o \emph{file}} & output file name (default \texttt{a.out})\\
\texttt{-l\emph{lib}} & link with library \texttt{lib\emph{lib}.so} or
\texttt{lib\emph{lib}.a}\\
\texttt{-L\emph{path}} & path to libraries (\texttt{-l\emph{lib}})\\
\texttt{-shared} & create a dynamic library\\
\texttt{-non\_shared} & create a static executable
\end{tabular}
\end{itemize}
\end{slide}
\begin{itemize}
\item A linker takes one or more objects generated by a compiler and creates a
binary executable, library, or another object file suitable for another linking
phase.
\item Note that different systems support different options. For example,
\texttt{ld} on Solaris does not support \texttt{-shared} and
\texttt{-non\_shared}, and you have to use alternatives.
\item In most cases the linker is not used directly. Some linker specific
options (like \texttt{-l}) are specified as compiler options. The compiler then
passes these to the linker.