-
Notifications
You must be signed in to change notification settings - Fork 0
/
processes.html
597 lines (506 loc) · 20.9 KB
/
processes.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
---
layout: default
title: Processes
last-updated: September, 2022
---
<div id="nav_bar_2" class="nav">
<ul>
<li><a href="#introduction"> Introduction </a></li>
<li><a href="#actors"> Actors </a></li>
<li><a href="#process-creation"> Process Creation and System Startup</a></li>
<li><a href="#exec">Executing a Different Program</a></li>
<li><a href="#file-descriptors"> File Descriptors </a></li>
<li><a href="#termination">Process Termination</a></li>
<li><a href="#attributes">Process Attributes</a></li>
<li><a href="#what-is">Ya But What Is It?</a></li>
<li><a href="#dr-fraser"> Dr. Brian Fraser </a></li>
<li><a href="#references"> References </a></li>
</ul>
</div>
<div id="introduction" class="content">
<h1>Processes</h1>
<div class="quote-text">
"Understanding is the key to success with Linux."
</div>
<div class="quote-ref">
<a href="https://www.tldp.org/LDP/sag/html/intro.html">Linux System Administrator's Guide</a>
</div>
<p>
A <em>process</em> is an instance of an executing
program.
This essay will chisel out the fundamentals of
processes, as they exist on Unix-like platforms.
</p>
</div>
<div class="content" id="actors">
<h2> Actors </h2>
<div class="aside-right">
<h4>Note:</h4>
<p>
There is a close relationship between a <em>program</em> and a
<em>process</em>— Namely, a program is a <u>file</u>
containing the instructions that the process <u>carries out</u>.
</p>
</div>
<p>
In Unix-like operating systems, processes are the actors of the system:
Left to its own devices, the kernel does nothing.
The system depends upon processes to direct
the kernel, and through the kernel, processes direct the machine.
</p>
<p>
The kernel is responsible for:
</p>
<ol>
<li>
Creating new processes
</li>
<li>
Allocating hardware resources to processes
</li>
<li>
Maintaining information about each
</li>
</ol>
<p>
Per the Unix philosophy, each process has a narrowly defined role,
which it is expected to do well. It might copy a file,
display a window, provide audio services, or display the time.
Many small programs are used together to accomplish
a given task, thereby easing the maintenance and compounding
the utility of each.
</p>
<p>
A given user may have more than one process acting on
their behalf at a time, and there are usually hundreds
of processes on the system at a time.
Much like with files, the kernel maintains a large volume of
information about each process on the machine.
</p>
<h3 id="priveledges"> Interacting with Files: Process Permissions</h3>
<p>
Every process is associated with some user, and we say that
it is executing with that user's <em>permissions.</em>
A process can freely manipulate files owned by
its associated user, and
files created by a process inherit its associated user.
</p>
<p>
Processes acting as user
<em><a href="http://www.linfo.org/root.html">root</a></em>
are special:
They can change any file on the system, without qualification.
That is, they by-default transcend the file
ownership scheme; we often use
<em>superuser</em> to describe this user.
</p>
<p>
In practice, root is the administrative user.
A typical use of superuser privileges is to
mount a physical device to prepare it for system-wide
use.
Superuser privileges are typically acquired
through <code>sudo</code>.
</p>
<h3> See Also: </h3>
<ul>
<li><a href="https://www.sudo.ws/man/1.8.3/sudo.man.html"><code>sudo(8)</code></a> - Execute a command as another user</li>
<li><a href="https://man7.org/linux/man-pages/man7/credentials.7.html"><code>credentials(7)</code></a> - Process identifiers</li>
</ul>
</div>
<div class="content" id="process-creation">
<h2 style="margin-bottom: 0;"> Process Creation</h2>
<h4>and System Startup</h4>
<p>
At the end of its boot sequence, the kernel launches a single program,
<code>/sbin/init</code>,<sup><a href="https://en.wikipedia.org/wiki/Init">[1]</a></sup>
and in so doing creates the system's first process.
While <code>init</code> schemes
<a href="https://arxiv.org/abs/0706.2748">have evolved</a>
over the years, its purpose has not changed:
It is a process with superuser permissions that acts as a
manager of the system and trustee among processes.
</p>
<div class="aside-left">
<h4>Aside:</h4>
<p>
You can check your <em>init</em> program with
</p>
<div class="code">
$ ps --pid 1
</div>
<p>
<a href="https://www.freedesktop.org/wiki/Software/systemd/">systemd</a> is very common today.
</p>
</div>
<p>
<code>init</code>'s first responsibility is to bring up userspace.
It does this by launching a sequence of programs, most of which
act in the background, some of which the user interacts with directly.
After this, it manages available services
and logs events for administrative purposes.
Its final responsibility is to shut the system down cleanly.
</p>
<p>
After the execution of <code>/sbin/init</code>, process creation
is dictated by processes themselves, and is carried
out through the <em>fork()</em> system call.
</p>
<p>
<em>fork()</em> instructs the kernel to duplicate
the current process,
and the two resulting processes are very nearly identical.
Of note, they both continue to execute the same program at the
same spot, have the same open files, and have duplicate memory spaces.
The newly created process is referred to as the <em>child</em> process,
the original is called the <em>parent,</em> and we say
that the child process <em>inherits</em> its parent's attributes.
</p>
<p>
Parent and child do differ in a few ways.
First, upon birth, each process is given its own (unique)
identification number by the kernel.
Second, they each receive a different return value
from the <em>fork()</em> system call.
This allows them to act differently within the same
program:
</p>
<pre>
pid_t rv = fork(); // One process executes this line
if(rv == 0) { // Two processes execute this line
// I am child
} else if (rv > 0) {
// I am parent
}
</pre>
<p>
Their executions thereby diverge, and each is free to act
independently.
</p>
<h3>See Also:</h3>
<ul>
<li><code><a href="https://www.man7.org/linux/man-pages/man2/fork.2.html">fork(2)</code></a> - Create a child process</li>
</ul>
</div>
<div class="content" id="exec">
<h2>Executing a Different Program</h2>
<p>
The <em>exec()</em> family of functions replace the program
currently being executed.
A simple invocation is:
</p>
<pre>
char *args[] = {"ls", "-l", NULL};
execv("/usr/bin/ls", args);
// Execution of *this* program stops here,
// and the process continues executing at /usr/bin/ls
</pre>
<p>
<em>exec()</em> simply loads a new program into
the calling process's memory.
During this operation, the old program and its
memory (its "state") are discarded.
</p>
<p>
<em>exec()</em> usually follows <em>fork()</em>:
</p>
<pre>
pid_t rv = fork();
if( rv == 0 ) {
// I am child
char *args[] = {"ls", "-l", NULL};
execv("/usr/bin/ls", args);
} else if ( rv > 0 ) {
// Execution of parent process continues within this program
}
</pre>
<p>
This sequence results in one process launching another program,
and is traditionally referred to as "fork and exec."
Note that there is no need to adjust other
process attributes, and they are preserved across
<em>exec()</em>.
In particular, the PID of the calling process is left unchanged.
</p>
<h3>See Also:</h3>
<ul>
<li><a href="https://man7.org/linux/man-pages/man1/pstree.1.html"><code>pstree(1)</code></a> - Display a tree of processes</li>
</ul>
</div>
<div id="file-descriptors" class="content">
<h2> File Descriptors</h2>
<p>
Within a process, each open file is associated
with a small nonegative integer.
This number is called a <em>file descriptor;</em>
it is returned by the kernel in response to a request to open
a file.
The process uses this number to refer to the
file whenever it wishes to act upon it.
</p>
<p>
In the following line, a process is instructing the kernel to
open the file <em>rubber_ducky</em> for reading.
The process is storing the kernel's response in the
(integer) variable <em>fd:</em>
</p>
<pre>
int fd = open("rubber_ducky", O_RDONLY);
</pre>
<p>
If this call failed (which it would if <em>./rubber_ducky</em> did not
exist, for example), then <em>fd</em> would be assigned <em>-1,</em>
an invalid file descriptor.
We can therefore check with:
</p>
<pre>
if (fd < 0) {
perror("open"); // Print error message
exit(1); // Terminate process
}
</pre>
<p>
and then continue as though <em>open()</em> had succeeded.
</p>
<p>
In the following line, the kernel is directed to read at-most
<em>buff_size</em> bytes from <em>fd</em>,
and place the data into the memory denoted by <em>buff:</em>
</p>
<pre>
int num_bytes_read = read(fd, buff, buff_size);
</pre>
<p>
If all goes well, the system call will return with a nonnegative value,
the number of bytes read;
if the kernel encountered an error, it will return with a
negative value.
In either case, this value will be assigned to <em>num_bytes_read</em>,
which may be checked similarly to <em>open().</em>
</p>
<p>
<em>write()</em> is handled analogously, as is closing a file.
Every system call has an entry in Section 2 of the man pages.
</p>
<h3>See Also:</h3>
<ul>
<li><code><a href="https://man7.org/linux/man-pages/man2/read.2.html">read(2)</a></code> - Read from a file descriptor</li>
<li><a href="./assets/checks.c">checks.c</a> - Compileable, runnable checks of the code snippets in this chapter</li>
</ul>
</div>
<div class="content" id="termination">
<h2> Process Termination </h2>
<p>
A process may terminate for a few reasons:
</p>
<ul class="big-list">
<li>
Explicitly terminates its own execution
<p>
This is the usual (and preferred) route of process
termination:
The program finishes its job or identifies an error,
and terminates itself.
This may be done by either calling <em>exit()</em>, or
returning from <em>main()</em>.
</p>
</li>
<li>
It is told to terminate
<p>
A process may be told to terminate by another process through
a <em>signal,</em> a primitive form of interprocess communication.
This is abnormal process termination.
</p>
<p>
Signals are brutal business.
For the purposes of process termination,
<code>SIGINT</code> is preferred because it
affords the program the oportunity to exit
cleanly.<sup><a href="#references" title="Kerrisk, p389">[2]</a></sup>
At the command line, it is created and sent
(to the foreground process) with CTRL+C.
</p>
<p>
Command-line tools for sending signals to arbitrary
processes are <code><a href="https://www.man7.org/linux/man-pages/man1/kill.1.html">kill</a></code>,
which refers to processes by PID, and
<code><a href="https://www.man7.org/linux/man-pages/man1/killall.1.html">killall</a></code>,
which refers to processes by program name.
</p>
</li>
<li>
It does something wrong
<p>
Almost always, this happens when the process tries
to read or write a memory location not assigned to it,
a condition known in Linux as <em>segmentation fault</em>.
The kernel steps in, halts program execution and immediately
terminates the process.
Again, this is abnormal program termination.
</p>
</li>
</ul>
<h3>See Also:</h3>
<ul>
<li><a href="https://www.man7.org/linux/man-pages/man1/top.1.html"><code>top(1)</code></a> - Display Linux processes</li>
</ul>
</div>
<div class="content" id="attributes">
<h2>Process Attributes</h2>
<p>
The kernel stores a number of attributes
for each process it is hosting; we will not exhaust them here.
The process's PID is one such piece of information; it cannot
be changed.
</p>
<p>
Each process has a current working directory.
Relative filenames are resolved with respect to
this path, and a process may change it via the
<em>chdir()</em> system call.
</p>
<p>
Environment variables are <em>name=value</em> strings
that are associated with each process.
The kernel does not interpret these variables directly,
but instead simply allows processes to set them, and
retrieve them later.
They are preserved across both <em>fork()</em>
and <em>exec().</em>
</p>
<p>
Environment variables allow a single process to communicate with
all of its descendants.
A good example of their use is <code>LANG</code>,
which stipulates the current locale, including preferred language.
We may view the shell's environment variables with
the <code>printenv</code> command:
</p>
<div class="code">
$ printenv<br>
SHELL=/bin/bash<br>
COLORTERM=truecolor<br>
XDG_CONFIG_DIRS=/etc/xdg<br>
DESKTOP_SESSION=xfce<br>
EDITOR=nvim<br>
LANG=en_US.UTF-8<br>
[...]
</div>
<h3>See Also:</h3>
<ul>
<li><a href="https://en.wikipedia.org/wiki/Environment_variable#True_environment_variables">Environment Variables</a> - Wikipedia</li>
</ul>
</div>
<div class="content" id="what-is">
<h2>Ya But What Is It?</h2>
<p>
What is a process?
</p>
<p>
The short answer is, a section of memory distinct from the
rest of the machine, and some restricted access to the CPU, both
granted by the kernel.
The long answer is the primary abstraction presented to programmers,
and the environment within which all programs execute.
The process is one of the most important ideas
of computer science, and we can only scratch the surface of
what it is here.
</p>
<h3> Memory </h3>
<p>
Modern operating systems
provide each process with its own memory in the
form of a distinct <em>address space.</em>
On all modern computers, memory is byte-addressable, so that each
address refers to exactly one byte.
Addresses themselves are simply numbers, and
an address space is a collection of memory addresses.
</p>
<div class="img-right" style="background: #f1f1f1;">
<a style="outline: solid black 2px; " title="Traced by User:Stannered, original by en:User:Dysprosia / BSD (http://opensource.org/licenses/bsd-license.php)" href="https://commons.wikimedia.org/wiki/File:Virtual_address_space_and_physical_address_space_relationship.svg"><img width="256" alt="Virtual address space and physical address space relationship" src="https://upload.wikimedia.org/wikipedia/commons/thumb/3/32/Virtual_address_space_and_physical_address_space_relationship.svg/512px-Virtual_address_space_and_physical_address_space_relationship.svg.png"></a>
</div>
<p>
For each machine, there exists a unique physical address space.
An address in this space refers to an actual byte of memory
(RAM);
only the kernel has access to this address space.
The kernel, working in conjunction with hardware,
imposes a layer of indirection between each process's
memory references and the physical address space.
This layer is known as <em>virtual memory,</em>
and consists of a collection of virtual addresses
mapped to physical ones.
</p>
<p>
Virtual memory requires support from both the kernel and CPU:
The kernel to assign address translations
(mappings from virtual addresses to physical addresses),
and the CPU to carry them out.
Each time a process references an address, the
CPU translates the virtual address to a physical address, so that for all
intents and purposes, virtual addresses behave exactly
as physical ones.
</p>
<p>
Since each process operates within its own virtual address space,
each process is presented with the illusion
that it has the entire physical address space to itself.
By excluding physical addresses from translation,
the rest of the machine is implicitly protected:
Other sections of memory simply do not exist within the context of the
currently running process, and therefore cannot be read from,
nor written to.
</p>
<h3> CPU </h3>
<p>
Modern CPU's support two distinct modes of operation.
<em>Kernel mode,</em> sometimes called "supervisor mode," is
the unrestricted mode.
Here, all CPU instructions are available, and address translation
is not performed, so that code running in this mode has
unmediated access to the machine.
</p>
<p>
<em>User mode</em> is a proper subset of kernel mode:
A program running
in user mode does not have access to all instructions, and
its memory references are redirected through the virtual
memory scheme mentioned above.
Executing programs in User Mode
is known as <em>limited direct execution.</em><sup><a href="http://pages.cs.wisc.edu/~remzi/OSTEP/cpu-mechanisms.pdf">[3]</a></sup>
It is <em>limited</em> in the sense that each process has access to
only a restricted subset of the CPU's instructions, and
it is <em>direct</em> in the sense that the program runs
directly on hardware.
</p>
<p>
Virtual memory and limited direct execution are intended to
make safe and efficient use of available hardware.
Together, they form a sandboxed environment within which a program
may safely be executed, separate from the rest of the machine.
</p>
<h3>See Also:</h3>
<ul>
<li><a href="https://pages.cs.wisc.edu/~remzi/OSTEP/">Operating Systems: Three Easy Pieces</a> by Remzi H. Arpaci-Dusseau and Andrea C. Arpaci-Dusseau </li>
</ul>
</div>
<div class="content" id="dr-fraser">
<h2 style="margin-bottom: 2pt;"> Dr. Brian Fraser: </h2>
<h4 style="margin-bottom: 18pt;"> Fork and Exec Linux Programming </h4>
<iframe class="video" src="https://www.youtube-nocookie.com/embed/l64ySYHmMmY" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>
</div>
<div id="references" class="content">
<h2> References </h2>
<ol>
<li>
init - Wikipedia (3 March 2021). Retrieved March 30, 2021, from <a href="https://en.wikipedia.org/wiki/Init">https://en.wikipedia.org/wiki/Init</a>
</li>
<li>
Kerrisk, Michael. <em><a href="https://man7.org/tlpi/index.html">The Linux Programming Interface</a></em>. San Francisco, CA, No Starch Press, 2010.
</li>
<li>
Arpaci-Dusseau, R. H., & Arpaci-Dusseau, A. C. (2018). <em><a href="https://pages.cs.wisc.edu/~remzi/OSTEP/">Operating systems: Three easy pieces</a>.</em> Arpaci-Dusseau Books.
</li>
</ol>
</div>