A comprehensive reference covering Linux commands, how the terminal works under the hood, regular expressions, and bash scripting. Everything from the basics to the science behind how it all works.
- How the Terminal Works
- The Shell — What It Is and Why It Exists
- Bash — A Specific Shell
- How Commands Are Executed
- The /bin Directories
- Environment Variables and PATH
- stdin, stdout, stderr
- Essential Linux Commands
- File Permissions and chmod
- Redirection and Pipes
- Text Processing Commands
- Regular Expressions
- How the Regex State Machine Works
- Bash Scripting
- Bash Variables and Parameter Expansion
- Arithmetic Expansion
- String Manipulation
- Arrays
- Control Flow
- Loops
- Functions
- Common Syntax Errors and Pitfalls
- The Three Types of Conditionals
- Loop Types Deep Dive
- Command Substitution vs Function Calls
- Number System Conversions
- Special Operators Reference
- Source vs Execute
When you type a command in the terminal, a program called the shell is always running and listening. When you press Enter, it goes through these steps:
You type a command
↓
Shell reads and tokenizes it (splits into pieces)
↓
Shell interprets special characters (|, >, $, *)
↓
Shell finds the program in the bin directories
↓
Shell forks (creates a copy of itself)
↓
The copy replaces itself with the target program (exec)
↓
Original shell waits for it to finish
↓
Output is returned to the screen
| Type | How it runs | Example |
|---|---|---|
| Compiled | Translated into machine code before running, CPU runs binary directly | grep, ls, cat (C programs) |
| Interpreted | Read and executed line by line at runtime by an interpreter | Bash scripts .sh |
Key insight: Commands like
grep,ls,catare pre-compiled C programs stored on your disk. What you type in the terminal are just arguments passed to those programs — they are data, not code that needs compiling.
Every compiled C program has a main function that receives what you type:
int main(int argc, char *argv[], char *envp[])argc— number of argumentsargv— array of what you typedenvp— array of environment variables
So when you type grep -i "John" myfile.txt:
argv[0] = "grep" // program name
argv[1] = "-i" // first argument
argv[2] = "John" // second argument
argv[3] = "myfile.txt" // third argumentEvery command you run in bash is a separate process created through two system calls:
fork()— creates an exact copy of the current bash process in memoryexec()— the copy replaces itself with the target program
bash process
↓
fork()
↓
bash process + copy of bash
↓
exec()
↓
grep process runs
↓
finishes, bash resumes
The shell is any program that sits between you and the operating system kernel. It is called a shell because it wraps around the OS like a shell, letting you communicate with the kernel without talking to it directly.
You → Shell → Kernel → Hardware
The kernel is the actual core of the OS that controls hardware, memory, and processes. You cannot talk to it directly. The shell is the layer that translates your human-readable commands into kernel operations through system calls.
| Shell | Full Name | Notes |
|---|---|---|
sh |
Bourne Shell | The original, very basic |
bash |
Bourne Again Shell | Most common in Linux |
zsh |
Z Shell | Default on macOS, more features |
fish |
Friendly Interactive Shell | Beginner friendly |
ksh |
Korn Shell | Enterprise use |
CMD |
Command Prompt | Windows shell |
PowerShell |
PowerShell | Advanced Windows shell |
Analogy: Shell is the general concept (like SQL), bash is a specific flavor of it (like MySQL). CMD and PowerShell are to Windows what bash and zsh are to Linux.
Important Note: Bash is just ONE type of shell program. The generic term is "shell" - bash is a specific implementation, like how "car" is generic and "Toyota" is specific.
Bash stands for Bourne Again Shell — a joke name because it replaced the original Bourne Shell written by Stephen Bourne. So bash is literally the "born again" shell.
Bash itself is a compiled C program stored at /bin/bash. When you run a bash script, bash reads your script file line by line and for each line it:
- Reads the line as text
- Parses it to understand the structure
- Finds the appropriate compiled binary
- Executes it via fork and exec
- Moves to the next line
#!/bin/bash
echo "Hello" # bash finds /bin/echo, runs it
ls -l # bash finds /usr/bin/ls, runs it
grep "x" file # bash finds /usr/bin/grep, runs itThe first line of every bash script is the shebang (also called hashbang).
#!/bin/bash#is called hash (or sharp in music)!is called bang (old typographer slang)- Together: sharp + bang = shebang
What it does: When the OS sees #! at the very first line, it reads the path after it and uses that program to interpret the file.
You run: ./myscript.sh
↓
OS reads first line: #!/bin/bash
↓
OS loads /bin/bash and passes the script to it
↓
bash ignores first line (# makes it a comment)
↓
bash starts interpreting from line 2
The shebang is not bash-specific. Any interpreter can be used:
#!/bin/bash # run with bash
#!/usr/bin/python3 # run with python
#!/usr/bin/node # run with nodejsThe compiled programs (grep, ls, etc.) are built once and sit on your disk. What you type in the terminal is just data being passed into those programs — arguments, strings, filenames. Data does not need to be compiled.
compiled grep = "I know how to search ANY text for ANY pattern"
(compiled once, sits on disk)
"John" myfile.txt = the specific pattern and file right now
(just data, no compilation needed)
Analogy: A meat grinder is the compiled program, built once. Whatever meat you put in is the data. The grinder doesn't need to be rebuilt for different types of meat.
| Compiled (e.g. grep) | Interpreted (e.g. bash script) | |
|---|---|---|
| Speed | Fast, CPU runs directly | Slower, interpreter overhead |
| Needs interpreter? | No | Yes |
| After change | Must recompile | Run immediately |
| Stored as | Binary machine code | Plain text |
Analogy: Compiled is like translating an entire book into English first, then reading it. Interpreted is like having a translator sit next to you, reading one sentence at a time.
bin stands for binaries — pre-compiled C programs that the CPU can directly execute.
| Directory | Contents |
|---|---|
/bin |
Essential system binaries (ls, cat, grep, cp, rm) |
/usr/bin |
Regular user program binaries |
/usr/local/bin |
Manually installed binaries |
/sbin |
System administration binaries (root only) |
You can find where any command lives:
which grep
# /usr/bin/grep
which ls
# /usr/bin/lsWhen any process starts, the OS gives it a block of memory called the environment — a list of key=value pairs:
HOME=/home/seif096
USER=seif096
PATH=/usr/bin:/bin
In C, these are accessible as the third argument to main:
int main(int argc, char *argv[], char *envp[])
// envp[] contains all environment variables| Type | Visible to |
|---|---|
NAME="x" |
Current bash process only |
export NAME="x" |
Current process + all child processes |
Defined in ~/.bashrc |
All terminals for your user |
Defined in /etc/environment |
Every process on the system |
Variables flow downward from parent to child only. If a child changes a variable, the parent never sees it — each process has its own copy.
$PATH is bash's dictionary of where to look for commands:
echo $PATH
# /usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbinWhen you type grep, bash searches each directory in PATH in order until it finds the binary:
type "grep"
↓
search /usr/local/bin → not found
↓
search /usr/bin → found! /usr/bin/grep
↓
run it
You can add your own programs to this dictionary:
PATH=$PATH:/home/seif096/myprograms| File | When it runs |
|---|---|
/etc/environment |
Every user, system-wide |
/etc/profile |
Every user at login |
~/.bashrc |
Your user, every terminal |
~/.bash_profile |
Your user, at login |
Every process in Linux gets three standard channels automatically:
| Channel | Name | Meaning |
|---|---|---|
stdin |
Standard Input | Data flowing into the program |
stdout |
Standard Output | Data coming out of the program |
stderr |
Standard Error | Error messages coming out |
The pipe | connects stdout of one program to stdin of the next:
cat myfile.txt | grep "John"
↓ ↑
stdout connects stdin
pwd
# /home/seif096/documentsDisplays the full path of the directory you are currently in. Think of it as asking your GPS "where am I right now?"
ls stands for list.
ls # list files in current directory
ls /home # list files in specific path
ls -l # long format (detailed view)
ls -a # show hidden files (starting with .)
ls -la # combine options-rwxr-xr-x 2 john staff 4096 Jan 10 12:00 myfile.txt
| Part | Meaning |
|---|---|
-rwxr-xr-x |
File type + permissions |
2 |
Number of hard links |
john |
Owner (user) |
staff |
Group |
4096 |
File size in bytes |
Jan 10 12:00 |
Last modified date/time |
myfile.txt |
File name |
| Character | Meaning |
|---|---|
- |
Regular file |
d |
Directory |
l |
Symbolic link |
-rw-rw-r-- 1 seif096 seif096 0 Feb 13 18:28 test.txt
-— regular filerw-— owner can read and writerw-— group can read and writer--— others can only read1— one hard linkseif096/seif096— owner and group both are seif096 (Linux creates a group with same name as user automatically)0— file is empty (0 bytes)Feb 13 18:28— last modified today
cat stands for concatenate. Original purpose was joining files, but most commonly used to display file contents.
cat file.txt # display file contents
cat file1.txt file2.txt # display both files
cat file1.txt file2.txt > combined # join and save
cat -n myfile.txt # display with line numbersImportant Note: cat merges files vertically (stacks them one after another), while paste merges horizontally (side-by-side columns).
cp source.txt destination/ # copy file to directory
cp file1.txt file2.txt destination/ # copy multiple files
cp -r source_dir/ dest_dir/ # copy directory recursively
cp -p file.txt destination/ # preserve attributes (permissions, timestamps)
cp -v file.txt destination/ # verbose (show what's being copied)CRITICAL PATH RULE: When you're already inside a directory, use relative paths:
cd ~/Desktop/
cp words.txt numbers.txt Lab1/ # ✅ CORRECT - relative path
# NOT this:
cp ~/Desktop/words.txt ~/Lab1/ # ❌ WRONG - unnecessary full pathWhy? You're already in
~/Desktop/, sowords.txtis in the current directory. Using full paths when you're already there is redundant and error-prone.
Important: cp creates duplicates - both the original and the copy exist independently on disk, each taking up space.
mv oldname.txt newname.txt # rename file
mv file.txt /destination/ # move to different location
mv file.txt /dest/newname.txt # move and renameKey Difference from
cp:mvdoes NOT create a duplicate. The original file is removed from the source location. Only one copy exists on disk.
rm file.txt # delete a file
rm -r myfolder/ # delete directory and everything inside-r means recursive — goes through all contents inside a directory, and all contents inside those, all the way down:
myfolder/
├── file1.txt ← deleted
├── file2.txt ← deleted
└── subfolder/
├── file3.txt ← deleted
└── deeper/
└── file4.txt ← deleted
Think of it like a tree: recursive means "go deep into every level and repeat the same action."
Every file has permissions for three groups:
| Group | Symbol in chmod | Who |
|---|---|---|
| Owner/User | u |
The person who owns the file |
| Group | g |
Users in the file's associated group |
| Others | o |
Everyone else on the system |
| All | a |
All three at once |
| Permission | Symbol | Value | Meaning for file | Meaning for directory |
|---|---|---|---|---|
| Read | r |
4 | View contents | List contents |
| Write | w |
2 | Modify file | Create/delete files inside |
| Execute | x |
1 | Run as program | Enter with cd |
| None | - |
0 | No permission | No permission |
A group is a collection of users. For example, all developers in a company might be in a group called developers. This makes it easy to give all of them the same permissions without setting permissions individually.
Linux asks three questions: Are you the owner? Are you in the group? Or are you everyone else? Each gets their own permissions.
Symbolic method:
chmod u+x myfile.txt # add execute for owner
chmod g-w myfile.txt # remove write from group
chmod o+r myfile.txt # add read for others
chmod a+x myfile.txt # add execute for everyone
chmod ugo+x myfile.txt # same as a+x
chmod a-r myfile.txt # remove read from everyone
chmod a+r myfile.txt # restore read for everyoneOperators:
+add permission-remove permission=set exact permission
Numeric method:
Add up the values for each group:
r = 4
w = 2
x = 1
- = 0
| Permission | Calculation | Value |
|---|---|---|
rwx |
4+2+1 | 7 |
rw- |
4+2+0 | 6 |
r-x |
4+0+1 | 5 |
r-- |
4+0+0 | 4 |
chmod 764 myfile.txt
# owner: rwx (7)
# group: rw- (6)
# others: r-- (4)The file -rw-rw-r-- has numeric value 664.
| Symbol | Affects |
|---|---|
o |
Others only (third group) |
a |
All three: user + group + others |
chmod o+w myfile.txt # only others get write
chmod a+w myfile.txt # everyone gets writeProblem: You remove read permission with chmod a-r, then try to read the file:
chmod a-r SortedMergedContent.txt
cat SortedMergedContent.txt
# Permission denied ❌Why it fails: Without read permission, even YOU (the owner) cannot read the file.
Solution: Restore read permission:
chmod a+r SortedMergedContent.txt # restore read for all
# OR
chmod u+r SortedMergedContent.txt # restore read for owner onlyImportant: Permission restrictions apply to EVERYONE, including the file owner (unless you're root).
A shortcut that points to another file's location. If the original is deleted, the link breaks.
ln -s /original/file shortcutTwo different names pointing to the exact same data on disk. If one name is deleted, the data still exists under the other name.
ln /original/file hardlinkThe binary data is fully duplicated on disk. Two completely independent files.
cp file1.txt file2.txt| Copy | Hard Link | Symbolic Link | |
|---|---|---|---|
| Data duplicated? | Yes | No | No (just a pointer) |
| Disk space | Double | Same | Tiny (just a path) |
| Edit one, affects other? | No | Yes | Yes |
| Delete original, affects other? | No | No | Yes (breaks) |
Analogy: Symbolic link = sticky note saying "document is in warehouse shelf 3". Hard link = same document with two different names on the cover. Copy = completely new document with the same content.
Sends command output to a file instead of the screen. Overwrites existing content.
cat file.txt > newfile.txt # save output to file
echo "hello" > greeting.txt # write text to fileSends output to a file without overwriting — adds to the end.
echo "line 1" > file.txt
echo "line 2" >> file.txt # adds to end, doesn't overwriteAnalogy:
>erases the paper and writes new content.>>writes at the bottom without erasing.
Feeds a file's contents as input to a command:
tr "a-z" "A-Z" < input.txt > output.txt
# Read from input.txt, convert to uppercase, write to output.txtNote: This is more efficient than
cat input.txt | tr "a-z" "A-Z"because it avoids creating an unnecessarycatprocess.
command 2>/dev/null # suppress error messages
command 2>errors.log # save errors to fileTakes the stdout of one command and sends it as stdin to the next command.
cat myfile.txt | grep "John"You can chain multiple pipes like an assembly line:
cat myfile.txt | grep "John" | cut -d, -f2 | sortEach command passes its output to the next one.
Analogy: Like water pipes in a house — output flows from one pipe into the next.
> vs |:
| Operator | Sends output to | Example |
|---|---|---|
> |
A file | echo "hi" > file.txt |
| |
Another command | cat file.txt | grep "x" |
Searches for a pattern and prints matching lines.
grep "John" myfile.txt # basic search
grep -i "john" myfile.txt # case insensitive
grep -n "John" myfile.txt # show line numbers
grep -v "John" myfile.txt # invert (show non-matching lines)
grep -c "John" myfile.txt # count matching lines
grep -E "John|Sarah" myfile.txt # extended regex (OR)| Case | Example | Quotes needed? |
|---|---|---|
| Single word | grep John file |
Optional |
| With spaces | grep "John Smith" file |
Required |
| Special characters | grep "hello$" file |
Required |
Single quotes '' treat everything literally. Double quotes "" allow variable/special character interpretation.
Extracts specific columns from each line of a file.
# file contains: John,25,Cairo
cut -d, -f1 data.txt # extract column 1: John
cut -d, -f2 data.txt # extract column 2: 25
cut -d, -f3 data.txt # extract column 3: Cairo-d— delimiter (what separates columns)-f— field number (which column to extract)
Note: Unlike
Ctrl+X, the Linuxcutcommand does not delete from the original file. It only reads and extracts. The original stays untouched.
Joins files horizontally (column by column), while cat joins vertically.
paste file1.txt file2.txt # join side by side (tab separated)
paste -d, file1.txt file2.txt # use comma as delimiterComparison:
cat file1 file2 |
paste file1 file2 |
|---|---|
| John | John 25 |
| Sarah | Sarah 30 |
| Mike | Mike 28 |
| 25 | |
| 30 | |
| 28 |
Origin of the name: From the physical act of cutting and pasting paper — placing two columns next to each other like gluing newspaper columns side by side.
Replaces or deletes specific characters one by one. Works at the character level, not word level.
echo "hello world" | tr 'a-z' 'A-Z' # lowercase to uppercase: HELLO WORLD
echo "hello world" | tr ' ' '_' # replace spaces: hello_world
echo "hello world" | tr -d 'l' # delete all l's: heo wordtr always reads from input so use with pipe | or <:
tr 'a-z' 'A-Z' < myfile.txtCommon Pattern - Useless Use of cat:
# Less efficient (creates unnecessary process):
cat SortedMergedContent.txt | tr "a-z" "A-Z" > output.txt
# More efficient (direct input redirection):
tr "a-z" "A-Z" < SortedMergedContent.txt > output.txtThink of
trlike find-and-replace in Word, but for individual characters.
head -n 3 file.txt # first 3 lines
tail -n 3 file.txt # last 3 linessort file.txt # sort lines alphabetically
sort file.txt > sorted.txt # save sorted output
uniq file.txt # remove adjacent duplicate lines
sort file.txt | uniq # sort first, then remove duplicatesImportant:
uniqonly removes adjacent duplicates. Alwayssortfirst to group duplicates together.
Regular expressions (regex) are patterns used to match text. Each piece of a regex is like a slot that gets filled with one character at a time.
Connection to JavaScript: All these regex concepts work identically in JavaScript. The
/pattern/gsyntax in JS is the same idea —gmeans global (search entire string), just likegrepsearches the entire file.
| Symbol | Name | Meaning | Example | Matches |
|---|---|---|---|---|
^ |
Caret | Start of line | ^John |
Lines starting with John |
$ |
Dollar | End of line | John$ |
Lines ending with John |
. |
Period | Any single character | J.hn |
John, Jahn, Jxhn |
* |
Asterisk | Zero or more of previous | Jo*hn |
Jhn, John, Joohn |
[ ] |
Brackets | Any one character in set | [aeiou] |
Any vowel |
[^ ] |
Negated brackets | Any character NOT in set | [^aeiou] |
Any non-vowel |
- |
Hyphen (in brackets) | Range of characters | [a-z] |
Any lowercase letter |
| |
Pipe (with -E) | OR | John|Sarah |
John or Sarah |
\{x\} |
Braces | Exactly x repetitions | Jo\{3\}hn |
Jooohn only |
\{x,y\} |
Braces range | Between x and y times | Jo\{2,4\}hn |
Joohn, Jooohn, Joooohn |
\{x,\} |
Braces min | At least x times | Jo\{2,\}hn |
Joohn, Jooohn, ... |
[a-z] # any lowercase letter
[A-Z] # any uppercase letter
[0-9] # any digit
[a-zA-Z] # any letter
[\s] # whitespace
[^\s] # any non-whitespace charactergrep "^John" file # lines starting with John
grep "John$" file # lines ending with John
grep "[Jj]ohn" file # John or john
grep "J.hn" file # J + any char + hn
grep "Jo*hn" file # J + zero or more o's + hn
grep -E "John|Sarah" # John OR Sarah
grep "^T[^\s]*s" file # starts with T, no spaces, contains s
grep -n "^w.*[0-9]$" MergedContent.txt # starts with w, ends with digit (with line numbers)The backslash \ tells the program to treat the next character literally rather than as something special:
\{ # literal { instead of starting a repetition group
\. # literal . instead of "any character"
\* # literal * instead of "zero or more"Basic grep has limited regex support. Use -E or egrep for full regex including |:
grep -E "John|Sarah" file
egrep "John|Sarah" file # same thingWhen grep receives your pattern, it passes it to a regex engine that builds a state machine — a flowchart of decisions.
A state machine is a series of checkpoints. The engine reads your text one character at a time and moves through the checkpoints:
Pattern: ^T[^\s]*s
State 0 → State 1 → State 2 → State 3 → MATCH
^T [^\s] [^\s]* s
Pattern: ^T[^\s]*s against the word Trains:
T - r - a - i - n - s
State 0: start of line, is it T? yes → move to State 1 ✓
State 1: is r a non-whitespace? yes → move to State 2 ✓
State 2: is a a non-whitespace? yes → stay in State 2 ✓ (loop)
State 2: is i a non-whitespace? yes → stay in State 2 ✓ (loop)
State 2: is n a non-whitespace? yes → stay in State 2 ✓ (loop)
State 3: is s the letter s? yes → MATCH! ✓
The * creates a loop arrow — the state machine keeps looping on the same state as long as the condition is met.
^T [^\s] [^\s]* s
[START] ──→ [SAW T] ──→ [NON-SPACE] ──→ [LOOP] ──→ [MATCH]
↑________|
(keeps looping on
non-whitespace)
[^\s] is not multiple characters — it is a description of what one character is allowed to be. Think of it like a form with blank fields:
[ T ] [ _ ] [ _ ] [ _ ] [ s ]
Where each [ _ ] is filled by one character matching the bracket rule.
#!/bin/bash
echo "Hello World"chmod +x myscript.sh # give execute permission
./myscript.sh # run it#!/bin/bash
echo "first argument: $1"
echo "second argument: $2"
echo "all arguments: $@"
echo "number of arguments: $#"./myscript.sh hello world
# first argument: hello
# second argument: world
# all arguments: hello world
# number of arguments: 2NAME="seif096"
AGE=20
echo $NAME # seif096
echo $AGE # 20Rules:
- No spaces around
=(this is CRITICAL) - Access with
$ - No type declaration — everything is a string by default
Common Error:
NAME = "seif096" # ❌ WRONG - bash thinks NAME is a command
NAME="seif096" # ✅ CORRECTecho "What is your name?"
read NAME
echo "Hello $NAME"| Syntax | Name | Does |
|---|---|---|
$NAME |
Variable expansion | Gets value of variable |
${NAME} |
Parameter expansion | Gets value + extra operations |
$() |
Command substitution | Runs command, returns output |
$(()) |
Arithmetic expansion | Evaluates math, returns result |
All
$syntaxes share the same core idea: "evaluate what is inside me and replace me with the result" before the outer command runs.
Removes ambiguity when attaching text to a variable:
NAME="seif096"
echo $NAME_file # bash reads NAME_file as one variable → empty!
echo ${NAME}_file # bash clearly reads NAME + "_file" → seif096_file${#NAME} # length of string
${NAME:0:3} # substring extraction
${NAME:-"default"} # use default value if variable is empty
${NAME/seif/user} # replace "seif" with "user"
${NAME%.txt} # remove .txt from end
${NAME#*/} # remove everything up to first /Runs the command inside and replaces itself with the output:
FILES=$(ls)
DATE=$(date)
echo "Today is $DATE"
# nesting is possible
echo $(echo $(ls))Backticks ` ` do the same thing but are older and cannot be nested:
FILES=`ls` # old way
FILES=$(ls) # new way (preferred)Bash treats everything as strings by default. You need special syntax to do math.
y=1
# Way 1: let
let y=$y+1
echo $y # 2
# Way 2: double parentheses (recommended)
y=$((y+1))
echo $y # 2
# Way 3: expr with backticks (old way, spaces required)
y=`expr $y + 1`
echo $y # 2
# Way 4: without arithmetic → treats as string!
y=$y+1
echo $y # 1+1 (wrong! this is a string)echo $((10 + 5)) # 15 addition
echo $((10 - 5)) # 5 subtraction
echo $((10 * 5)) # 50 multiplication
echo $((10 / 5)) # 2 division
echo $((10 % 3)) # 1 remainder (modulo)
echo $((2 ** 3)) # 8 power (2³)Inside
$(( ))you do not need$before variable names.
Common Arithmetic Mistakes:
# WRONG - using = instead of == for comparison
if (( x = 5 )); then # ❌ This assigns 5 to x, doesn't compare
echo "x is 5"
fi
# CORRECT - use == for comparison
if (( x == 5 )); then # ✅ This compares
echo "x is 5"
fiStrings are indexed starting at 0. Negative indexes count from the end.
a b c A B C 1 2 3 A B C a b c
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
-4 -3 -2 -1
stringZ=abcABC123ABCabc
echo ${#stringZ} # 15 (using parameter expansion)
echo `expr length $stringZ` # 15 (using expr, old way)echo ${stringZ:0} # abcABC123ABCabc (from 0 to end)
echo ${stringZ:1} # bcABC123ABCabc (from 1 to end)
echo ${stringZ:7:3} # 23A (3 chars starting at position 7)
echo ${stringZ: -4} # Cabc (last 4 characters)
echo ${stringZ: -4:1} # C (1 char starting 4 from end)The space before
-4is required — otherwise bash confuses it with different syntax.
| Syntax | Meaning |
|---|---|
${#string} |
Length of string |
${string:0} |
Everything from position 0 |
${string:7:3} |
3 characters starting at position 7 |
${string: -4} |
Last 4 characters |
${string: -4:1} |
1 character starting 4 from end |
Important Note: Bash strings are immutable - you cannot change a character directly. You must rebuild the string.
str="hello"
new=""
for ((i=0; i<${#str}; i++)); do
char="${str:$i:1}"
# Process char
new+="$char" # Build new string
doneThis does NOT work:
str="hello"
str[0]="H" # ❌ WRONG - Bash strings are not arraysfarm_hosts=(web03 web04 web05 web06 web07)
echo ${farm_hosts[0]} # web03 (first element)
echo ${farm_hosts[2]} # web05 (third element)
echo ${farm_hosts[*]} # all elements
echo ${farm_hosts[@]} # all elements (same as *)
echo ${#farm_hosts[@]} # number of elementsWhy
${}? Without curly braces,$farm_hosts[*]is read as$farm_hosts+[*]literally →web03[*]. Curly braces tell bash to treatfarm_hosts[*]as one expression.
farm_hosts=(web03 web04 web05 web06 web07)
for i in ${farm_hosts[*]}
do
echo "item: $i"
done
# item: web03
# item: web04
# item: web05
# item: web06
# item: web07if [ $T1 = $T2 ]
then
echo "equal"
else
echo "not equal"
fi
fiis if spelled backwards — bash closes blocks by reversing the opening word (if/fi,case/esac,do/done).
Numbers:
| Operator | Meaning |
|---|---|
-eq |
equal |
-ne |
not equal |
-gt |
greater than |
-lt |
less than |
-ge |
greater than or equal |
-le |
less than or equal |
Strings:
| Operator | Meaning |
|---|---|
= |
equal |
!= |
not equal |
Files:
| Operator | Meaning |
|---|---|
-f |
file exists |
-d |
directory exists |
-r |
file is readable |
-w |
file is writable |
-x |
file is executable |
-z |
string is empty |
-n |
string is not empty |
#!/bin/bash
echo "Enter a filename:"
read FILENAME
if [ -f $FILENAME ]
then
echo "File exists, contents:"
cat $FILENAME
else
echo "File does not exist"
fiEvery command returns an exit code:
0= success- anything else = failure
grep "John" myfile.txt
echo $? # 0 if found, 1 if not found
if grep "John" myfile.txt
then
echo "Found John"
else
echo "John not found"
fiif [[ "$1" == 1 ]]; then
echo "Option 1"
elif [[ "$1" == 2 ]]; then
echo "Option 2"
elif [[ "$1" == 3 ]]; then
echo "Option 3"
else
echo "Unknown option"
fiImportant: Only ONE fi at the end closes the entire if/elif/else block.
# iterate over a list
for i in 1 2 3 4 5
do
echo $i
done
# iterate over a range
for i in {1..5}
do
echo $i
done
# iterate with step {start..end..step}
for i in {1..5..2}
do
echo $i # 1 3 5
done
# iterate over command output
for i in $(ls)
do
echo "item: $i"
doneseq generates a number sequence: seq start step end
# seq 1 1 9 → generates 1 2 3 4 5 6 7 8 9
for i in `seq 1 1 9`
do
echo $i
done
# count down
for i in `seq 10 -1 1`
do
echo $i
donelen=10
limit=5
for i in `seq 1 1 $((len-1))`
do
if [ $i -gt $limit ]
then
break # stop the loop immediately
fi
echo $i
done
# prints 1 2 3 4 5 then stopsRuns as long as condition is true.
COUNTER=0
while [ $COUNTER -lt 10 ]
do
echo "The counter is $COUNTER"
let COUNTER+=1
doneRuns as long as condition is false — stops when it becomes true. Opposite of while.
COUNTER=20
until [ $COUNTER -lt 10 ]
do
echo "COUNTER $COUNTER"
let COUNTER-=1
done
# counts down from 20, stops when below 10| Loop | Runs when | Use when |
|---|---|---|
for |
Iterating over a known set | You know how many times upfront |
while |
Condition is true | You don't know how many iterations |
until |
Condition is false | You want to run until something becomes true |
Two equivalent syntaxes:
# way 1
function myfunction {
echo "hello"
}
# way 2
myfunction() {
echo "hello"
}❌ WRONG - Don't mix both:
function myfunction() { # ❌ Don't use both 'function' and ()
echo "test"
}Important Syntax Rules:
- Space before
{is REQUIRED:
function myFunc{ # ❌ WRONG - missing space
echo "test"
}
function myFunc { # ✅ CORRECT
echo "test"
}- Empty functions need a placeholder:
function myFunc {} # ❌ WRONG - empty body
function myFunc { # ✅ CORRECT - use : as placeholder
:
}myfunction # no parentheses when calling (unlike most languages)
myfunction arg1 arg2function greet {
echo "Hello $1"
echo "Second arg: $2"
echo "All args: $@"
echo "Number of args: $#"
}
greet "seif096" "world"Using echo + command substitution (to return a value):
function add {
echo $(($1 + $2))
}
result=$(add 3 5)
echo $result # 8Using return (exit codes only, 0-255):
function check {
if [ $1 -gt 10 ]
then
return 0 # success / true
else
return 1 # failure / false
fi
}
check 15
echo $? # 0
returnin bash only returns an exit code (0-255), not a value. For actual values, useechoand capture with$().
CRITICAL: echo must come BEFORE return. Code after return never executes:
function test {
return 1
echo "This never prints" # ❌ Never reached
}
function test {
echo "This prints" # ✅ Executes
return 1
}# global by default
function myfunction {
NAME="seif096" # accessible everywhere
}
# use local to restrict scope
function myfunction {
local NAME="seif096" # only inside this function
}
# Multiple local variables
function myfunction {
local x y z # ✅ Space-separated
# NOT: local x,y,z # ❌ Wrong
}#!/bin/bash
function increment {
counter=0
inc=1
# if an argument was passed, use it as the increment
if [ "$#" -ne 0 ]; then
inc=$1
fi
# loop 10 times (counting down with seq)
for i in `seq 10 -1 1`; do
echo "The counter is $counter"
let counter=counter+$inc
done
}
# call with no argument → increments by 1
increment
# call with argument 5 → increments by 5
increment 5Output with increment 5:
The counter is 0
The counter is 5
The counter is 10
The counter is 15
The counter is 20
The counter is 25
The counter is 30
The counter is 35
The counter is 40
The counter is 45
The counter accumulates — it keeps adding
incto itself. It prints before adding each time, which is why it starts at 0.
| Feature | Syntax |
|---|---|
| Define | function name { } or name() { } |
| Call | name or name arg1 arg2 |
| First argument | $1 |
| All arguments | $@ |
| Argument count | $# |
| Return exit code | return 0 or return 1 |
| Return value | echo value then capture with $() |
| Local variable | local NAME="value" |
1. No spaces around = in variable assignment:
NAME = "value" # ❌ WRONG - bash thinks NAME is a command
NAME="value" # ✅ CORRECT2. Spaces REQUIRED around [ and ]:
if[ "$x" = "y" ]; then # ❌ WRONG - no space after if
if ["$x" = "y" ]; then # ❌ WRONG - no space after [
if [ "$x"="y" ]; then # ❌ WRONG - no spaces around =
if [ "$x" = "y" ]; then # ✅ CORRECT3. Space REQUIRED before { in functions:
function test{ # ❌ WRONG
echo "hi"
}
function test { # ✅ CORRECT
echo "hi"
}4. Spaces in [[ and ]]:
if[[ "$x" == "y" ]]; then # ❌ WRONG - no space after if
if [["$x" == "y"]]; then # ❌ WRONG - no space after [[
if [[ "$x"=="y" ]]; then # ❌ WRONG - no spaces around ==
if [[ "$x" == "y" ]]; then # ✅ CORRECTAlways quote variables in [ ] tests:
if [ $NAME = "test" ]; then # ❌ Dangerous - breaks if NAME is empty
if [ "$NAME" = "test" ]; then # ✅ CORRECT - safeIn [[ quoting is optional but recommended:
if [[ $NAME == "test" ]]; then # ✅ Works but not best practice
if [[ "$NAME" == "test" ]]; then # ✅ Better - always quoteInside arithmetic $(( )), $ is optional:
x=5
echo $((x + 1)) # ✅ Works
echo $(($x + 1)) # ✅ Also worksEverywhere else, $ is required:
echo x # ❌ Prints literal "x"
echo $x # ✅ Prints value of x# ❌ WRONG - can't use both 'function' and ()
function test() {
echo "hi"
}
# ✅ CORRECT - choose one
function test {
echo "hi"
}
# ✅ ALSO CORRECT
test() {
echo "hi"
}
# ❌ WRONG - empty function needs placeholder
function test {
}
# ✅ CORRECT - use : as no-op
function test {
:
}Using = instead of == in arithmetic:
if (( x = 5 )); then # ❌ WRONG - assigns 5, doesn't compare
echo "x is 5"
fi
if (( x == 5 )); then # ✅ CORRECT - compares
echo "x is 5"
fiUsing % for modulo in [ ]:
if [ ${#str} % 2 = 0 ]; then # ❌ WRONG - [ ] treats this as string
echo "even"
fi
if (( ${#str} % 2 == 0 )); then # ✅ CORRECT - use (( )) for math
echo "even"
fiIn one-line if statements:
if [ "$x" = "y" ] then echo "yes"; fi # ❌ WRONG - missing ; before then
if [ "$x" = "y" ]; then echo "yes"; fi # ✅ CORRECTIn one-line for loops:
for i in {1..5} do echo $i; done # ❌ WRONG - missing ; before do
for i in {1..5}; do echo $i; done # ✅ CORRECTUnderstanding substring removal:
file="document.txt"
echo ${file%.txt} # document (removes .txt from end)
echo ${file%.doc} # document.txt (no match, nothing removed)Using it with variables:
half=$((${#parens} / 2)) # ✅ CORRECT
lParen="${parens:0:half}" # ✅ CORRECT - quote for safety
rParen="${parens:half}" # ✅ CORRECT| Feature | [ ] |
[[ ]] |
(( )) |
|---|---|---|---|
| Type | Test command (POSIX) | Bash keyword | Arithmetic evaluation |
| String comparison | = only |
== or = |
❌ Not for strings |
| Number comparison | -eq, -lt, -gt... |
-eq, -lt, -gt |
==, <, > |
| Math operators | ❌ No | ❌ No | ✅ Yes (+, *...) |
| Pattern matching | ❌ No | ✅ Yes (*, ?) |
❌ No |
| Variable quoting | ✅ Required | Optional (safer) | Not needed |
| Word splitting | ✅ Yes (dangerous) | ❌ No (safe) | N/A |
| AND / OR | -a / -o |
&& / || |
&& / || |
| Regex matching | ❌ No | ✅ Yes (=~) |
❌ No |
| Portable | ✅ POSIX (all shells) | ❌ Bash only | ❌ Bash only |
| When to use | Scripts for any shell | Modern bash scripts | Math comparisons |
# String comparison - use = not ==
if [ "$a" = "$b" ]; then
echo "equal"
fi
# Numeric comparison - use -eq, -ne, -lt, -gt, -le, -ge
if [ "$num" -eq 5 ]; then
echo "equals 5"
fi
# File tests
if [ -f "file.txt" ]; then
echo "file exists"
fi
# String tests
if [ -z "$str" ]; then # true if string is empty
echo "empty"
fi
if [ -n "$str" ]; then # true if string is not empty
echo "not empty"
fiCRITICAL RULES for [ ]:
- Must have spaces after
[and before] - Must quote variables or it breaks with empty strings
- Use
=for strings (NOT==) - Use
-eq,-ne,-lt,-gtfor numbers (NOT==,<,>)
# String comparison - can use == or =
if [[ "$a" == "$b" ]]; then
echo "equal"
fi
# Pattern matching
if [[ "$filename" == *.txt ]]; then
echo "text file"
fi
# Regex matching
if [[ "$email" =~ ^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}$ ]]; then
echo "valid email"
fi
# Logical operators
if [[ "$x" == "yes" && "$y" == "no" ]]; then
echo "both conditions true"
fi
# Multiple conditions
if [[ "$1" == 1 ]] || [[ "$1" == 2 ]]; then
echo "1 or 2"
fiBENEFITS of [[ ]]:
- Safer - no word splitting
- Pattern matching works (
*,?) - Can use
==for strings (more intuitive) - Better
&&and||support - Quoting is optional (but still recommended)
# Direct math comparison - use C-style operators
if (( x == 5 )); then
echo "x equals 5"
fi
# No $ needed for variables inside
if (( length % 2 == 0 )); then
echo "even length"
fi
# Complex expressions
if (( (x + y) * 2 > 100 )); then
echo "result > 100"
fi
# All C operators work
if (( x >= 10 && x <= 20 )); then
echo "x is between 10 and 20"
fiOPERATORS in (( )):
== equal
!= not equal
< less than
> greater than
<= less than or equal
>= greater than or equal
+ addition
- subtraction
* multiplication
/ division
% modulo
** exponentiation
&& logical AND
|| logical OR| Situation | Use |
|---|---|
| Math comparison | (( )) |
| String comparison | [[ ]] |
| Pattern matching | [[ ]] |
| File existence | [[ ]] or [ ] |
| POSIX portable script | [ ] |
| Modern bash-only script | [[ ]] |
Arithmetic with operators %+- |
(( )) |
# ❌ WRONG - math in [ ]
if [ ${#str} % 2 = 0 ]; then # Treats as string!
echo "even"
fi
# ✅ CORRECT - use (( )) for math
if (( ${#str} % 2 == 0 )); then
echo "even"
fi
# ❌ WRONG - assignment instead of comparison in (( ))
if (( x = 5 )); then # Assigns 5, doesn't compare!
echo "x is 5"
fi
# ✅ CORRECT - use ==
if (( x == 5 )); then
echo "x is 5"
fi
# ❌ WRONG - using == with [ ]
if [ "$x" == "5" ]; then # Works in bash but not POSIX
echo "x is 5"
fi
# ✅ CORRECT - use = with [ ]
if [ "$x" = "5" ]; then
echo "x is 5"
fi# Iterate over words
for item in apple banana cherry; do
echo "$item"
done
# Iterate over files
for file in *.txt; do
echo "Found: $file"
done
# Iterate over command output
for user in $(cat /etc/passwd | cut -d: -f1); do
echo "User: $user"
doneCOMMON MISTAKE:
for i in flip; do # ❌ Iterates over word "flip" (1 time)
echo "$i"
done
# This is NOT looping over characters!
# It literally loops over the word "flip" onceThis is the CORRECT way to loop over string characters:
str="hello"
# Loop with counter
for ((i=0; i<${#str}; i++)); do
char="${str:$i:1}"
echo "$char"
done
# Reverse loop
for ((i=${#str}-1; i>=0; i--)); do
char="${str:$i:1}"
echo "$char"
doneSyntax breakdown:
for (( initialization; condition; increment )); do
commands
doneExamples:
# Count 0 to 9
for ((i=0; i<10; i++)); do
echo $i
done
# Count by 2s
for ((i=0; i<=10; i+=2)); do
echo $i # 0 2 4 6 8 10
done
# Multiple variables
for ((i=0, j=10; i<10; i++, j--)); do
echo "i=$i j=$j"
done# Number range
for i in {1..5}; do
echo $i
done
# With step
for i in {0..10..2}; do
echo $i # 0 2 4 6 8 10
done
# Reverse
for i in {10..1}; do
echo $i
done
# Letters
for letter in {a..z}; do
echo $letter
doneImportant: Range expansion {1..5} happens BEFORE variable substitution:
n=5
for i in {1..$n}; do # ❌ WRONG - doesn't expand $n
echo $i
done
# Use C-style instead:
for ((i=1; i<=n; i++)); do # ✅ CORRECT
echo $i
done| Loop Type | When to Use | Example |
|---|---|---|
for in list |
Iterating over words, files, array | for word in $words |
for ((;;)) |
Counter-based, character iteration | for ((i=0; i<n; i++)) |
for in {} |
Fixed numeric/letter ranges | for i in {1..10} |
while |
Condition-based, unknown iterations | while [ $x -lt 10 ] |
until |
Run until condition becomes true | until [ $x -ge 10 ] |
$() is command substitution - it runs a command and captures its output:
current_dir=$(pwd)
file_count=$(ls | wc -l)
today=$(date +%Y-%m-%d)
echo "Today is $today"✅ USE $() when you need to CAPTURE output:
# Capture and store
result=$(command)
# Use output in another command
echo "Result: $(command)"
# Use in conditional
if [[ $(whoami) == "root" ]]; then
echo "Running as root"
fi❌ DON'T USE $() when you just want to RUN something:
# WRONG - captures output but does nothing with it
$(echo "Hello")
# CORRECT - just displays it
echo "Hello"CRITICAL CONCEPT: Functions in bash are called like commands, NOT with $():
function greet {
echo "Hello $1"
}
# ❌ WRONG - unnecessary $()
if [[ condition ]]; then
$(greet "World") # Output captured and discarded!
fi
# ✅ CORRECT - just call it
if [[ condition ]]; then
greet "World" # Output displays directly
fi
# ✅ CORRECT - capture if you need the value
message=$(greet "World")
echo "Message: $message"In many languages:
// JavaScript
result = myFunction(); // Need () to call
// Python
result = my_function(); // Need () to callIn Bash:
# Bash - NO PARENTHESES for function calls
result=$(myfunction) # Call with $() only if capturing output
myfunction # Call directly to run and display
myfunction arg1 arg2 # Call with argumentsfunction toDecimal {
printf "%d\n" "$1"
}
# WRONG ways to call:
if [[ "$1" == 1 ]]; then
$(toDecimal "$2") # ❌ Captures output, discards it
fi
# CORRECT ways to call:
if [[ "$1" == 1 ]]; then
toDecimal "$2" # ✅ Runs and displays output
fi
# OR capture if needed:
result=$(toDecimal "$2") # ✅ Captures for later use
echo "Result: $result"Do I need the command's output as a value?
├─ YES → Use $()
│ result=$(command)
│
└─ NO → Just run it
command
function_name
# Decimal (base 10): 0-9
26
# Hexadecimal (base 16): 0-9, A-F
1A # = 1×16 + 10 = 26 in decimal
# Binary (base 2): 0-1
11010 # = 16 + 8 + 2 = 26 in decimal
# Octal (base 8): 0-7
32 # = 3×8 + 2 = 26 in decimal# Method 1: Using arithmetic expansion with 0x prefix
echo $((0x1A)) # 26
echo $((0xFF)) # 255
# Method 2: Using base#number syntax
echo $((16#1A)) # 26
echo $((16#FF)) # 255
# Method 3: Using printf
printf "%d\n" 0x1A # 26
printf "%d\n" 0xFF # 255IMPORTANT: 0x tells bash "this is hexadecimal":
echo $((0x1A)) # ✅ Correctly interprets as hex
echo $((1A)) # ❌ Error - not valid without 0x# Method 1: Using printf (most common)
printf "%X\n" 26 # 1A (uppercase)
printf "%x\n" 26 # 1a (lowercase)
printf "0x%X\n" 26 # 0x1A (with prefix)
# Method 2: Using bc
echo "obase=16; 26" | bc # 1AKey Point: $(( )) can INPUT any base but ALWAYS OUTPUTS decimal:
echo $((0x1A)) # Input: hex, Output: 26 (decimal)
echo $((16#FF)) # Input: hex, Output: 255 (decimal)
echo $((26)) # Input: decimal, Output: 26 (decimal)To output hex, you MUST use printf:
printf "%X\n" 26 # Decimal → Hex: 1A# Binary → Decimal
echo $((2#1010)) # 10
echo $((0b1010)) # 10 (bash 4.3+)
# Octal → Decimal
echo $((8#77)) # 63
echo $((077)) # 63 (leading 0 means octal)
# Decimal → Binary
echo "obase=2; 26" | bc # 11010
# Decimal → Octal
printf "%o\n" 26 # 32
echo "obase=8; 26" | bc # 32#!/bin/bash
function toDecimal {
# Accepts hex with or without 0x prefix
local input="$1"
# Add 0x if not present
if [[ ! "$input" =~ ^0x ]]; then
input="0x$input"
fi
printf "%d\n" "$input"
}
function toHex {
printf "%X\n" "$1"
}
# Usage
toDecimal "1A" # 26
toDecimal "0x1A" # 26
toHex 26 # 1A# ❌ WRONG - trying to convert decimal to hex with $(())
echo $((26)) # Still outputs 26, not 1A
# ✅ CORRECT - use printf
printf "%X\n" 26 # 1A
# ❌ WRONG - missing 0x prefix
echo $((1A)) # Error: invalid number
# ✅ CORRECT - use 0x
echo $((0x1A)) # 26Usage 1: Modulo (Remainder) - In Arithmetic
echo $((10 % 3)) # 1 (remainder of 10 ÷ 3)
echo $((17 % 5)) # 2 (remainder of 17 ÷ 5)
# Check if even
if (( num % 2 == 0 )); then
echo "even"
fiUsage 2: Pattern Removal from End - In Parameter Expansion
file="document.txt"
echo ${file%.txt} # document (removes .txt)
path="/home/user/file.txt"
echo ${path%/*} # /home/user (removes /file.txt)
# % removes shortest match from end
file="hello.world.txt"
echo ${file%.*} # hello.world (removes .txt)
# %% removes longest match from end
echo ${file%%.*} # hello (removes .world.txt)Pattern Removal from Start - In Parameter Expansion
path="/home/user/file.txt"
echo ${path#*/} # home/user/file.txt (removes /)
# # removes shortest match from start
echo ${path#/*/} # user/file.txt (removes /home/)
# ## removes longest match from start
echo ${path##*/} # file.txt (removes /home/user/)Usage 1: Glob Pattern (Wildcard)
ls *.txt # all .txt files
rm file* # all files starting with "file"Usage 2: Multiplication - In Arithmetic
echo $((5 * 3)) # 15Usage 3: All Array Elements
array=(a b c)
echo ${array[*]} # a b cSingle Character Wildcard
ls file?.txt # file1.txt, fileA.txt, etc.
ls ???.txt # abc.txt, xyz.txt, etc.All Array Elements (Preserving Word Boundaries)
array=(a b c)
echo ${array[@]} # a b c
# Difference from * when quoted:
for item in "${array[*]}"; do
echo "$item" # Prints once: "a b c"
done
for item in "${array[@]}"; do
echo "$item" # Prints 3 times: a, b, c
done| Operator | In Arithmetic | In Parameter Expansion | As Glob |
|---|---|---|---|
% |
Modulo | Remove from end | - |
# |
- | Remove from start | Comment |
* |
Multiply | All array elements | Wildcard |
? |
- | - | Single char |
@ |
- | All array elements | - |
chmod +x script.sh
./script.shWhat happens:
- New bash process is created (fork)
- Script runs in the new process
- Variables and functions exist only in that process
- Process ends, everything disappears
- Your original shell is unchanged
Your Shell (PID 1000)
│
└─> Fork: New Shell (PID 1001)
│
└─> Runs script
└─> Sets variables
└─> Defines functions
└─> Process ends
│
└─> Back to your shell
Variables/functions gone ❌
source script.sh
# OR
. script.shWhat happens:
- NO new process created
- Script commands run in YOUR current shell
- Variables persist after script ends
- Functions persist after script ends
Your Shell (PID 1000)
│
└─> Reads script line by line
└─> Executes each line in current shell
└─> Variables remain ✅
└─> Functions remain ✅
| Aspect | Execute ./script.sh |
Source source script.sh |
|---|---|---|
| New process | ✅ Yes | ❌ No |
Needs chmod +x |
✅ Yes | ❌ No |
| Variables persist | ❌ No | ✅ Yes |
| Functions persist | ❌ No | ✅ Yes |
| Changes PATH | Only in subprocess | In current shell |
| Use case | Run programs | Load config/functions |
Use Execute (./script) when:
- Running standalone programs
- You don't need variables/functions afterwards
- Script should run in isolation
- Script modifies files (doesn't matter which process)
Use Source (source script) when:
- Loading configuration (
.bashrc,.bash_profile) - Defining functions you want to use
- Setting environment variables for current session
- Activating virtual environments (
source venv/bin/activate) - Loading utility functions into current shell
Example 1: Configuration File
# config.sh
export DATABASE_URL="postgres://localhost/mydb"
export API_KEY="secret123"
alias ll='ls -la'
greet() {
echo "Hello from config!"
}# Execute - variables disappear
./config.sh
echo $DATABASE_URL # Empty! ❌
# Source - variables persist
source config.sh
echo $DATABASE_URL # postgres://localhost/mydb ✅
greet # Hello from config! ✅
ll # Works! ✅Example 2: Function Library
# utils.sh
function toUpper {
echo "$1" | tr 'a-z' 'A-Z'
}
function toLower {
echo "$1" | tr 'A-Z' 'a-z'
}# Execute - functions not available
./utils.sh
toUpper "hello" # Command not found ❌
# Source - functions available
source utils.sh
toUpper "hello" # HELLO ✅
toLower "WORLD" # world ✅Example 3: Python Virtual Environment
# This is why you SOURCE, not execute:
source venv/bin/activate # ✅ Modifies current shell's PATH
# If you executed instead:
./venv/bin/activate # ❌ PATH modified in subprocess, not your shell❌ WRONG: "Source is like import in Python"
✅ CORRECT: Source is like copy-pasting the file's contents into your current terminal and running them line by line.
❌ WRONG: "Execute needs chmod +x, source doesn't need it"
✅ CORRECT: Execute needs execute permission AND read permission. Source only needs read permission.
# These are identical:
source script.sh
. script.sh
# The dot (.) is the POSIX-standard command
# 'source' is a bash-specific alias for '.'# See what's loaded in current session
echo $PATH # Environment variables
declare -F # All functions
alias # All aliases| Name | Stands For | Story |
|---|---|---|
grep |
Global Regular Expression Print | From the old ed editor command g/re/p |
cat |
Concatenate | Original purpose was joining files |
chmod |
Change Mode | Changes file permission mode |
pwd |
Print Working Directory | Prints current location |
sudo |
Super User Do | Run commands as root |
bin |
Binaries | Pre-compiled executable programs |
#! |
Shebang / Hashbang | Hash + Bang = Shebang (also old slang for "the whole thing") |
tr |
Translate | Translates characters |
cp |
Copy | Copies files |
mv |
Move | Moves/renames files |
rm |
Remove | Removes files |
ls |
List | Lists directory contents |
# VARIABLES
NAME="value" # Define (no spaces around =)
echo $NAME # Access
echo ${NAME} # Parameter expansion
result=$(command) # Command substitution
result=$((5 + 3)) # Arithmetic expansion
# CONDITIONALS
[ "$x" = "y" ] # Old test (POSIX)
[[ "$x" == "y" ]] # Modern test (bash)
(( x == 5 )) # Arithmetic test
# LOOPS
for i in {1..5} # Range
for ((i=0; i<n; i++)) # C-style
for item in $list # List iteration
while [ condition ] # While loop
until [ condition ] # Until loop
# FUNCTIONS
function name { } # Define (space before {)
name arg1 arg2 # Call (no parentheses)
echo "result" # Return value (capture with $())
return 0 # Return exit code
local var="value" # Local variable
# STRINGS
${#str} # Length
${str:pos:len} # Substring
${str^^} # Uppercase
${str,,} # Lowercase
${str/old/new} # Replace
# MATH
$((x + y)) # Addition
$((x % 2)) # Modulo
(( x == 5 )) # Comparison
# FILES
cp src dst # Copy
mv src dst # Move
rm file # Remove
cat file # Display
chmod +x file # Make executable
# REDIRECTION
cmd > file # Overwrite
cmd >> file # Append
cmd < file # Input from file
cmd1 | cmd2 # Pipe
# PERMISSIONS
chmod u+x file # Add execute for user
chmod a-r file # Remove read for all
chmod 755 file # rwxr-xr-x
# CONVERSIONS
echo $((0x1A)) # Hex to decimal
printf "%X" 26 # Decimal to hex
# SOURCE vs EXECUTE
./script.sh # Execute (new process)
source script.sh # Source (current process)
. script.sh # Same as sourceNotes from Lab 2 — covering IO, data types, compilation, pointers, memory allocation, Makefiles, GDB debugging, and file handling in C.
printf format specifiers:
| Specifier | Prints |
|---|---|
%d |
decimal integer |
%c |
character |
%f |
float/double |
%s |
string |
%p |
pointer address |
Display width vs precision — common mistake:
printf("%6s", str); // minimum WIDTH of 6 — pads with spaces if shorter, does NOT truncate
printf("%.6s", str); // maximum PRECISION of 6 — truncates if longer
printf("%10.6s", str); // 10 wide, max 6 chars printedKey distinction:
%6scontrols minimum space used.%.6scontrols maximum characters printed. If you want to print only 6 characters use%.6snot%6s.
scanf — reading input:
scanf("%d", &x); // & required for non-array types
scanf("%s", str); // no & needed — arrays are already pointersCommon mistake: forgetting
&with scanf on non-array types will crash the program.scanfneeds the address to know where in memory to store the value.
Another mistake: putting messages inside scanf format string — scanf ignores text, only reads format specifiers. Always use printf for messages, scanf only for reading.
// WRONG
scanf("enter a number: %d", &x); // message is ignored, confusing
// CORRECT
printf("enter a number: ");
scanf("%d", &x);Escape sequences:
| Sequence | Meaning |
|---|---|
\n |
newline — moves to next line |
\r |
carriage return — moves to start of SAME line |
\t |
horizontal tab |
\\ |
backslash |
\" |
double quote |
\rnote: Moves cursor back to beginning of current line without going down. Rarely used alone. Windows uses\r\nto end lines while Linux uses just\n— this can cause bugs when sharing files between systems.
| Type | Size | Range |
|---|---|---|
char |
1 byte | -128 to 127 (signed) |
int |
4 bytes | ~±2 billion |
float |
4 bytes | 7 digits precision |
double |
8 bytes | 15 digits precision |
bool in C:
C has no built-in bool. Uses integers instead:
0= false- any non-zero = true
// old C (C89) — no bool, use int
int isTrue = 1;
int isFalse = 0;
// modern C (C99+) — include header
#include <stdbool.h>
bool isTrue = true;
bool isFalse = false;Comparison with C++: C++ has
boolbuilt in without any header. C needs<stdbool.h>. Under the hood both are still just integers.
gcc hello.c -o hello # compile → named executable
gcc hello.c # compile → default name a.out
./hello # run executable
gcc hello.c -o hello && ./hello # compile and run in one lineImportant flags:
| Flag | Meaning |
|---|---|
-o name |
name the output file (output) |
-c |
compile to object file only, do not link |
-Wall |
show all warnings |
-g |
include debug info for gdb |
-ois like Save As — without it everything saves asa.outand overwrites each other. Always use-oto keep files organized.
void main()vsint main(): The lab slides usevoid main()but this is non-standard. Modern C (C99+) requiresint main()andreturn 0. gcc follows the standard sovoid main()may cause errors or warnings. Always useint main().
apt stands for Advanced Package Tool — it is the package manager for Ubuntu/Debian Linux. Think of it as the app store for the terminal.
sudo apt update # refresh list of available packages (always run first)
sudo apt install build-essential # installs gcc, g++, make
sudo apt-get install manpages-dev # installs man pages for C functions
sudo apt -y install gdb # installs debugger (-y = yes to all prompts)
gcc --version # verify installation
aptvsapt-get:apt-getis the older version. They do the same thing.aptis preferred now.
sudois needed because installing software affects the whole system, not just your user account — requires admin permissions.
How apt works with repositories:
Repository (online warehouse of packages)
↓
apt update (refresh what's available)
↓
apt install (download and install)
↓
Your PC
Repository addresses are stored in /etc/apt/sources.list. Adding a new repo:
sudo add-apt-repository ppa:somethingWhat build-essential installs:
gcc— C compilerg++— C++ compilermake— build automation tool
What manpages-dev gives you:
man printf # full documentation for printf
man scanf # full documentation for scanfA Makefile automates the compilation process. Instead of typing gcc commands every time, you just type make.
Build process:
Source files (.c) → Compiler (-c flag) → Object files (.o) → Linker → Executable
Basic Makefile format:
target: dependencies
[TAB] commandCritical: Indentation MUST be a TAB not spaces — make will throw an error with spaces.
Simple example:
all:
gcc main.c -o programWith dependencies (only recompiles changed files):
all: hello
hello: main.o factorial.o hello.o
gcc main.o factorial.o hello.o -o hello
main.o: main.c
gcc -c main.c
factorial.o: factorial.c
gcc -c factorial.c
clean:
rm -f *.o helloWith variables:
CC=gcc
CFLAGS=-c -Wall
SOURCES=main.c hello.c factorial.c
OBJECTS=$(SOURCES:.c=.o)
EXECUTABLE=hello
all: $(EXECUTABLE)
$(EXECUTABLE): $(OBJECTS)
$(CC) $(OBJECTS) -o $@
.c.o:
$(CC) $(CFLAGS) $< -o $@
clean:
rm -f $(OBJECTS) $(EXECUTABLE)Key Makefile concepts:
| Concept | Example | Meaning |
|---|---|---|
| Variable | CC=gcc |
reusable value |
| Use variable | $(CC) |
dereference variable |
Auto variable $@ |
target name | e.g. main.o |
Auto variable $< |
first dependency | e.g. main.c |
| Substitution | $(SOURCES:.c=.o) |
replace .c with .o in all filenames |
| Pattern rule | .c.o: |
how to convert any .c to .o |
$(SOURCES:.c=.o) explained:
SOURCES = main.c hello.c factorial.c
↓ ↓ ↓
OBJECTS = main.o hello.o factorial.o
Acts like find-and-replace for file extensions. Add a new .c file to SOURCES and OBJECTS updates automatically.
$< and $@ in pattern rule:
.c.o:
$(CC) $(CFLAGS) $< -o $@
# expands to:
# gcc -c -Wall main.c -o main.o
# ↑ ↑
# $< $@
# (dependency) (target).c.o: is a special suffix rule — not a regular target. Make recognizes two extensions as a conversion rule meaning "for ANY .c file that needs to become a .o file, use this rule." Modern equivalent:
%.o: %.c # newer syntax, same meaning
$(CC) $(CFLAGS) $< -o $@Dependencies are files OR other targets:
make sees a dependency → is it a file that exists?
YES → use it
NO → look for a target rule to create it
make flags:
| Command | Meaning |
|---|---|
make |
looks for file named Makefile automatically |
make -f MyMakefile |
use specific file (-f = file) |
make clean |
run the clean target |
make -f Makefile-2 clean |
run clean in specific makefile |
Execution order make follows:
1. look at target
2. check dependencies exist
3. if dependency missing → find rule to create it
4. if .c newer than .o → recompile
5. link everything together
clean target:
clean:
rm -f *.o hello # ✅ correct — only removes .o files
rm -rf *o hello # ⚠️ dangerous — removes ANYTHING ending in letter oWildcard warning:
*omeans anything ending with the lettero(matcheshello,videoetc).*.omeans anything ending with.ospecifically. Always use*.onot*o.
On Linux vs Windows: The output of linking is just
hellowith no extension. On Windows it would behello.exe. The-oflag names the output file.
int x = 420;
int *pointer = &x; // pointer holds the ADDRESS of xMemory layout:
x
420
pointer = 0x31
0x31 0x32 0x33 0x34
argv is an array of pointers:
char* argv[]
// [] → array of
// * → pointers to
// char → characters (strings)
argv[0] → points to → "hello\0"
argv[1] → points to → "5\0"
argv[2] → points to → "10\0"argv contains pointers, not the data itself. This is why arguments are always strings — argv stores pointers to character arrays, so even the number 5 is stored as the string "5".
Casting argv — common mistake:
(int)argv[3] // casts memory ADDRESS → garbage large number
(int)*argv[3] // gets first char's ASCII value → '5' becomes 53, not 5
atoi(argv[3]) // correctly parses string "5" → 5 ✅Why printf("%s", argv[3]) works but cast doesn't:
%stells printf to follow the pointer and read characters. A cast just reinterprets the raw pointer address as an integer — no string parsing involved.
Three types:
| Type | Size known? | Lifetime | Example |
|---|---|---|---|
| Stack | yes (compile time) | during function call only | local variables |
| Static | yes (compile time) | entire program | global, static variables |
| Heap | no (decided at runtime) | you control it | malloc() |
Stack — LIFO (Last In First Out):
main() called → added to stack
funcA() called → added on top
funcB() called → added on top
funcB() ends → removed
funcA() ends → removed
main() ends → removed
Stack overflow happens when too many function calls fill the stack — usually from infinite recursion.
Static keyword:
void counter(){
static int count = 0; // keeps its value between calls
int x = 0; // resets every call (stack)
count++;
x++;
printf("count=%d x=%d", count, x);
}
counter(); // count=1 x=1
counter(); // count=2 x=1
counter(); // count=3 x=1Dynamic allocation functions (all in <stdlib.h>):
void *malloc(size_t bytes); // allocate memory
void *calloc(size_t n, size_t size); // allocate + initialize to zero
void *realloc(void *ptr, size_t new_size); // resize existing allocation
void free(void *p); // release memory
size_t sizeof(type); // get byte size of a typemalloc example:
int *ids = malloc(sizeof(int) * 40); // space for 40 ints
ids[0] = 5;
free(ids); // always free!realloc — resize existing memory:
// safe pattern — always use a temp pointer
int *temp = realloc(ids, sizeof(int) * 10);
if(temp == NULL){
free(ids); // original still safe
} else {
ids = temp; // success
}realloc behavior: If enough space exists next to current allocation it extends in place. Otherwise it allocates a new block, copies old data, frees old block — all automatically. Always assign to a temp pointer first in case it fails.
Memory leak: Forgetting
free()means memory stays occupied even after the program doesn't need it — can slow down or crash the program over time.
static on a function — file scope:
static char** read_file_lines(...)Means the function is only visible inside the same .c file. Used to:
- hide internal helper functions (private vs public API)
- prevent naming conflicts between files
- signal intent — "this is internal plumbing"
C vs C++ static: In C,
staticon a function only means file scope restriction. In C++,staticon a class member means it belongs to the class not an instance. C++ inherited the C meaning and added its own on top.
#include <string.h>| Function | Purpose | Safe version |
|---|---|---|
strlen(str) |
string length (not counting \0) |
— |
strcpy(dest, src) |
copy string | strncpy(dest, src, size) |
strcat(dest, src) |
concatenate strings | strncat(dest, src, size) |
strcmp(a, b) |
compare strings (0=equal) | strncmp(a, b, n) |
strchr(str, ch) |
find character, returns pointer or NULL | — |
strstr(str, sub) |
find substring, returns pointer or NULL | — |
strdup(str) |
duplicate string on heap (must free!) | — |
memset(ptr, val, size) |
fill memory with value | — |
memcpy(dest, src, size) |
copy memory block | — |
Always prefer
nversions:strcpy→strncpy,strcat→strncat. Thenversions respect buffer size limits and prevent overflow.
atoi — ASCII to Integer (<stdlib.h>):
atoi("42") // → 42
atoi("-5") // → -5
atoi("3.14") // → 3 (stops at decimal point)
atoi("abc") // → 0 (can't convert)
atoi("42abc") // → 42 (stops at first non-numeric)Problem with atoi: Returns 0 for both
"0"(valid) and"abc"(error) — can't tell them apart. Usestrtolfor safer conversion.
snprintf — safe string formatting:
char buffer[50];
snprintf(buffer, sizeof(buffer), "Name: %s Age: %d", name, age);Never use
sprintf— it doesn't check buffer size and causes overflow. Always usesnprintfwhich stops at the size limit.
Opening files:
FILE* file = fopen(filename, mode);| Mode | File exists | File missing | Notes |
|---|---|---|---|
"r" |
opens ✅ | NULL ❌ | read only |
"w" |
overwrites |
creates ✅ | write |
"a" |
adds to end ✅ | creates ✅ | append |
"r+" |
opens ✅ | NULL ❌ | read + write |
"w"deletes existing content — if the file exists all its content is wiped. Use"a"if you want to keep existing content.
fgets — reading line by line:
char line[256];
while(fgets(line, sizeof(line), file)){
printf("%s", line);
}How fgets works:
- reads one line at a time
- stops at
\n, end of file, or buffer size limit — whichever comes first - includes the
\nin the result - returns NULL at EOF
- each call automatically advances to next line (file pointer moves forward)
fgets stops at whichever comes first:
\n (end of line) → stops, includes \n in buffer
256 (buffer full) → stops WITHOUT \n — line was too long!
EOF → returns NULL
If line is longer than buffer:
first fgets → reads 255 chars (reserves 1 for \0)
second fgets → reads remaining chars
To detect truncation:
if(line[strlen(line)-1] != '\n'){
// line was longer than buffer!
}Use 1024 for safety — SRT subtitle files usually have short lines but 1024 is a common standard buffer size.
How fgets remembers position — the FILE pointer:
The FILE struct internally stores a position indicator — a byte number saying "I am currently at byte X in the file."
fopen() // position = 0 (start)
fgets() call 1 // reads bytes 0-5 → position = 6
fgets() call 2 // reads bytes 6-11 → position = 12
fgets() call 3 // reads bytes 12-15 → position = 16 (EOF)Useful position functions:
rewind(file); // reset position to byte 0
ftell(file); // get current position number
fseek(file, 0, SEEK_SET); // jump to start (same as rewind)
fseek(file, 0, SEEK_END); // jump to endfprintf — writing to files or streams:
printf("Hello\n"); // → stdout
fprintf(stdout, "Hello\n"); // → stdout (same)
fprintf(stderr, "Error!\n"); // → stderr
fprintf(file, "Hello\n"); // → file
printfis justfprintf(stdout, ...)— thefin fprintf stands for file, as it was originally designed to print to any file/stream.
Always use stderr for error messages:
fprintf(stderr, "Error: File does not exist.\n"); // ✅
printf("Error: File does not exist.\n"); // ❌ goes to stdoutThis way error messages still appear in terminal even when stdout is redirected to a file.
The f-function family:
| Function | f stands for |
Purpose |
|---|---|---|
printf |
— | print to stdout |
fprintf |
file | print to any stream/file |
scanf |
— | read from stdin |
fscanf |
file | read from any stream/file |
sprintf |
string | print to string buffer (dangerous!) |
snprintf |
string+n | print to string buffer with size limit (safe) |
sscanf |
string | read from string buffer |
Common file handling pattern:
// read from one file, write to another
FILE* input = fopen(input_filename, "r");
FILE* output = fopen(output_filename, "w");
if(input == NULL){
fprintf(stderr, "Error: Input file does not exist.\n");
return 1;
}
if(output == NULL){
fprintf(stderr, "Error: Could not create output file.\n");
fclose(input); // close input before returning!
return 1;
}
char line[256];
while(fgets(line, sizeof(line), input)){
fprintf(output, "%s", line);
}
fclose(input);
fclose(output);Always close files before returning — even in error paths. Not closing causes resource leaks.
gcc -g -o myprogram myprogram.c # compile with debug info (-g flag required)
gdb ./myprogram # start debuggerMost used GDB commands:
| Command | Short | Purpose |
|---|---|---|
run |
r |
start program |
run arg1 arg2 |
r arg1 arg2 |
start with arguments |
break 10 |
b 10 |
breakpoint at line 10 |
break main |
b main |
breakpoint at function |
info break |
show all breakpoints | |
delete 1 |
remove breakpoint 1 | |
delete |
remove all breakpoints | |
next |
n |
next line (skip into functions) |
step |
s |
next line (enter functions) |
finish |
run until current function returns | |
continue |
c |
continue to next breakpoint |
print x |
p x |
print variable value |
set var x = 5 |
assign value to variable | |
watch x |
stop when x changes | |
info watch |
show watched variables | |
backtrace |
where |
show call stack |
frame |
show current function and line | |
list |
l |
display source code |
list main |
l main |
show code for specific function |
quit |
q |
exit gdb |
nextvsstep:nextskips over function calls,stepgoes inside them.
// 1. semicolon after function body
int foo(){ }; // ❌ unnecessary (structs need it, functions don't)
int foo(){ } // ✅
// 2. forgetting & in scanf
scanf("%d", x); // ❌ crashes
scanf("%d", &x); // ✅
// 3. message inside scanf
scanf("enter: %d", &x); // ❌ message ignored, confusing
printf("enter: ");
scanf("%d", &x); // ✅
// 4. wrong width specifier
printf("%6s", str); // ❌ if you want truncation
printf("%.6s", str); // ✅ truncates to 6 chars
// 5. casting argv instead of parsing
(int)argv[1] // ❌ casts memory address → garbage
atoi(argv[1]) // ✅ parses string to integer
// 6. forgetting rewind after counting lines
while(fgets(...))count++; // file pointer now at EOF
// must call rewind(file) before reading again!
// 7. not closing file before early return
if(n > count){
return 1; // ❌ file never closed!
fclose(file);
return 1; // ✅ close before return
}
// 8. sprintf instead of snprintf
sprintf(buf, "%s", str); // ❌ dangerous, buffer overflow
snprintf(buf, sizeof(buf), "%s", str); // ✅ safe
// 9. using *o instead of *.o in makefile clean
rm -rf *o hello // ❌ deletes anything ending in letter o
rm -f *.o hello // ✅ only deletes .o files
// 10. forgetting free
int* p = malloc(sizeof(int) * 40);
// ... use p ...
// forgot free(p) → memory leak ❌
free(p); // ✅ always free
// 11. using realloc directly on original pointer
ids = realloc(ids, new_size); // ❌ if fails, ids becomes NULL, original lost
int* temp = realloc(ids, new_size); // ✅ safe pattern
if(temp) ids = temp;| Feature | C | C++ |
|---|---|---|
bool type |
needs <stdbool.h> |
built in |
malloc return |
auto converts void* |
must cast (int*)malloc(...) |
static on function |
file scope only | file scope OR class member |
| Standard library | <stdio.h>, <stdlib.h> |
also <iostream>, std::string etc |
| Compile with | gcc |
g++ |
| String type | char* arrays manually |
std::string |
C++ is a superset of C — all C code is valid C++, but C++ adds classes, templates, STL, and more. Use
g++to compile C++ source code.
| Name | Stands For |
|---|---|
argc |
Argument Count |
argv |
Argument Values (Vector) |
printf |
Print Formatted |
scanf |
Scan Formatted |
fprintf |
File Print Formatted |
fgets |
File Get String |
fopen |
File Open |
fclose |
File Close |
malloc |
Memory Allocate |
realloc |
Re-Allocate |
calloc |
Clear Allocate (zeroed) |
atoi |
ASCII To Integer |
snprintf |
String N Print Formatted |
strlen |
String Length |
strcpy |
String Copy |
strcmp |
String Compare |
strcat |
String Concatenate |
strchr |
String Character (find) |
strstr |
String String (find substring) |
strdup |
String Duplicate |
memset |
Memory Set |
memcpy |
Memory Copy |
rewind |
Rewind (back to start) |
ftell |
File Tell (current position) |
fseek |
File Seek (jump to position) |
apt |
Advanced Package Tool |
gcc |
GNU C Compiler |
gdb |
GNU Debugger |
A deep-dive into UNIX process management, system calls, virtual memory, scheduling, and file handling under the hood. Built from real questions, common confusions, and first-principles thinking.
- What is a Process?
- Program vs Process
- Process Memory Layout
- Static Variables and the Data Segment
- Virtual Memory — Why It Exists
- Page Tables
- Process States in UNIX
- Process Creation — 4 Ways
- Process Termination
- PCB — Process Control Block
- System Calls — The Bridge to the Kernel
- fork() — Cloning a Process
- wait() — Synchronizing with Children
- exit() — Terminating a Process
- execl() — Replacing a Process
- nice() — Scheduling Priority
- Orphan and Zombie Processes
- The UNIX Scheduler
- ps and top Commands
- fopen Under the Hood
- fork() + Files — What Gets Shared
- fork() + Heap Memory — COW
- Lab Code Walkthroughs
- Common Mistakes in Process Management
- Naming Reference — OS Functions
A process is a running instance of a program. The OS keeps track of every process using a process table — each process gets a unique PID (Process ID).
Program on disk (passive) → Process in RAM (active)
exe file sitting there → Running instance with memory, state, identity
Each process has 5 components:
| Component | What it holds |
|---|---|
| Code | The instructions to execute |
| Data | Global and static variables |
| Stack | Temporary data, function calls, local variables |
| User Area | Open files, signal handlers, CPU info |
| Page Table | Virtual → physical memory translation map |
Common confusion: "Aren't a program and a process both just code?"
No. A program is passive — just bytes sitting on disk. A process is the program brought to life in RAM with real resources.
Program = recipe (just text, does nothing)
Process = actually cooking (uses stove, ingredients, time)
Why you need a process and can't just "run the code" directly:
The code alone has no context. A process gives the code:
- Its own memory space — where variables live
- A stack — to track function calls
- A state — running? sleeping? waiting?
- An identity — PID, owner, permissions
- Resources — open files, I/O, signals
The same program can become multiple processes:
chrome.exe on disk = one file
Open Chrome 3 times = 3 separate processes
Each with own tabs, memory, state
All from the same single program file
Does the process edit the original program on disk?
Never. The OS copies needed parts into RAM and works entirely from that copy. The disk file stays untouched — it's read-only from the process's perspective.
If you change a username in an app and it affects other sessions — that's not the process editing the code. It's the process writing to a shared database or file on disk. The code itself is never touched. Other processes see the change because they all read from the same data source.
Process Virtual Memory:
┌─────────────────┐ ← high address
│ Stack │ temporary, grows downward
│ (local vars) │ dies when function returns
├─────────────────┤
│ Heap │ dynamic (malloc), grows upward
├─────────────────┤
│ Data Segment │ global + static variables
│ │ lives entire program lifetime
├─────────────────┤
│ Code Segment │ the executable instructions
└─────────────────┘ ← low address
Stack vs Heap vs Data:
int globalCounter = 0; // DATA segment — lives entire program
static int y = 10; // DATA segment — lives entire program
int main() {
int x = 5; // STACK — dies when main() returns
int* p = malloc(100); // HEAP — lives until free() is called
}User Area contains:
- Which files are currently open (open file table)
- Signal handling rules (what to do on kill, alarm, etc.)
- CPU register values (saved when process is switched out)
Page Table maps virtual addresses → physical RAM addresses (explained in section 34).
Common confusion: "I put
int x = 5inside main() — why is it on the stack not the data segment?"
It's not about where in the file you wrote it. It's about lifetime.
int globalCounter = 0; // data segment — lives entire program
int main() {
int x = 5; // STACK — temporary, dies when main returns
static int y = 10; // DATA — lives entire program despite being inside main!
}The static keyword forces a variable into the data segment regardless of where it's declared.
Why static exists — the problem it solves:
// WITHOUT static — resets every call:
void countClicks() {
int counter = 0; // created fresh every call
counter++;
printf("%d", counter); // always prints 1!
}
// WITH static — remembers between calls:
void countClicks() {
static int counter = 0; // created once, stays alive
counter++;
printf("%d", counter); // prints 1, 2, 3...
}Static vs Global:
Global: anyone anywhere can access and accidentally modify it ❌
Static: persists like a global BUT locked to its own scope ✓
int counter = 0; // global — any function can reset this!
void someOtherFunction() {
counter = 0; // oops, accidentally reset it
}Static gives you persistence with protection — the best of both worlds.
Static in classes (C++):
class Player {
public:
static int playerCount = 0; // shared across ALL instances
Player() { playerCount++; }
};
Player p1; // playerCount = 1
Player p2; // playerCount = 2
// same concept — one value living for entire program lifetimeThe rule:
static= "I need a permanent apartment, not a temporary Airbnb room." Goes to data segment, regardless of where you declared it. But the scope (who can access it) stays local.
Common question: "Why can't I just give each process real physical addresses directly?"
Early computers did exactly that — and it was a disaster. Here's why virtual memory was invented:
Without virtual memory:
Process 1 (bank app) at address 0x001
Process 2 (virus) just does:
int* steal = (int*)0x001 → reads bank app memory directly!
With virtual memory, each process only sees its own fake address space. The OS controls the page table — no process can reach another's memory. It's physically impossible at the hardware level.
Without virtual memory, every program would need to know at compile time exactly where in RAM it will be loaded. Impossible because:
Today you run: Chrome + VSCode + Spotify
Tomorrow: Chrome + Discord + Game
Next day: just VSCode
Different combinations → different available spaces →
you'd have to recompile every program every time!
With virtual memory, every program compiles assuming it starts at address 0. The page table maps it to wherever RAM is free.
RAM without virtual memory:
├── 0x000 - 0x100 USED
├── 0x100 - 0x200 FREE (100 units)
├── 0x200 - 0x400 USED
├── 0x400 - 0x500 FREE (100 units)
└── 0x500 - 0x600 FREE (100 units)
New process needs 250 units of CONTIGUOUS memory → FAILS!
Even though 300 units are free total.
With pages, free frames can be scattered anywhere but still appear contiguous to the process.
Without virtual memory:
RAM = 4GB, Program = 6GB → simply cannot run
With virtual memory + swapping:
OS loads only needed pages into RAM
Pushes unused pages to disk
Program runs fine!
The process NEVER knows its physical address.
↓
OS sits in the middle controlling everything through the page table.
↓
A process has NO mechanism to ask "what is my physical address?"
↓
The CPU itself enforces this at hardware level.
Virtual memory in one sentence: Every process lives in a completely isolated world controlled entirely by the OS, with no escape route to physical memory whatsoever.
The problem: Process thinks linearly. RAM is scattered.
int arr[3];
arr[0] // address 100
arr[1] // address 104 (just +4)
arr[2] // address 108 (just +4)The CPU calculates next address by simple addition. It has no idea how to jump around scattered physical locations. Virtual memory creates the illusion of contiguity.
How pages work:
Virtual Memory Physical RAM
┌──────────┐ ┌──────────┐
│ Page 1 │ ────────→ │ Frame 5 │
├──────────┤ ├──────────┤
│ Page 2 │ ────────→ │ Frame 2 │
├──────────┤ ├──────────┤
│ Page 3 │ ────────→ │ Frame 9 │
└──────────┘ └──────────┘
- Virtual memory divided into pages
- Physical RAM divided into frames
- Page table maps pages → frames
- Pages don't need to be contiguous in RAM!
Dynamic memory with page tables:
Process needs MORE memory:
OS finds any free frames anywhere in RAM
Adds new virtual pages → mapped to those frames
Process sees contiguous virtual addresses
Nobody else affected
Process needs LESS memory:
OS unmaps pages
Frames returned to free pool
Available for other processes immediately
Swapping — when RAM is full:
RAM is full, process needs more
↓
OS finds a page not used recently
Moves it from RAM → disk
That frame is now free
↓
When process needs that page back:
OS loads from disk → any free frame
Updates page table
Process has no idea this happened!
The process's perspective:
"I have pages 1, 2, 3, 4, 5"
"They all feel contiguous to me"
"Some might be in RAM, some on disk — I don't care"
Reality: completely scattered, some on disk, OS juggling everything
Virtual memory in one line: "Don't worry about the physical place — here are some contiguous virtual pages and I'll handle them."
fork()
↓
Ready/Runnable ←─────────────────────┐
↓ dispatched │ preempted
Running ─────────────────────────────→─┘
│
├── Blocked/Waiting (sleeping, waiting for event)
│ ↓ event occurs
│ Ready again
│
└── Exited/Terminated
↓
Zombie ──── cleaned by wait() ────→ gone
↓ if parent dies first
Orphan
↓ reparented to init (PID 1)
init calls wait() → cleaned
| State | Meaning |
|---|---|
| Running | Currently using the CPU |
| Ready | Could run anytime, waiting for CPU |
| Sleeping | Waiting for an event (like I/O or wait()) |
| Stopped | Frozen by a signal |
| Zombie | Finished but exit code not collected by parent yet |
| Orphan | Parent died before child finished |
1. System initialization
└── OS boots → creates init (PID 1) + system services
before you even see the desktop
2. fork() system call
└── a running process clones itself
most common in UNIX
3. User request
└── you double-click an app or type ./myprogram
OS creates a process for it
4. Batch job
└── scheduled task runs automatically at set time
like a cron job — nobody clicked anything
What happens internally for any creation:
1. Create new PCB entry → assigns unique PID
2. Allocate memory → sets up code, stack, data, user area, page table
3. Copy parent info (if forked) → child inherits parent's environment
4. Add to process table → scheduler can now see it
5. Return PID → parent gets child PID, child gets 0
Process hierarchy:
init (PID 1) ← created at boot, no parent
├── bash (PID 14) ← your terminal
│ ├── chrome (PID 100)
│ └── myprogram (PID 101)
│ └── child (PID 102)
└── system services
Every single process traces back to init (PID 1). It's the only process with no parent.
4 ways a process can end:
├── Normal exit → return 0 from main, or exit(0)
├── Error exit → exit(1) or some non-zero code
├── Fatal error → segfault, divide by zero, unhandled signal
└── Killed → kill -9 <pid> from another process
The PCB is a data structure the OS keeps for every process — the process's complete profile.
PCB for Process 101:
├── PID → 101
├── PPID → 100 (parent)
├── State → running/sleeping/zombie
├── CPU registers → exactly where it was when switched out
├── Program counter → which instruction is next
├── Stack pointer → where its stack is
├── Page Table → virtual → physical memory map
├── Open files → which files it has open
├── Signal handlers → what to do on signals
└── Priority/nice → how much CPU time to give it
Why PCB is essential — context switching:
Chrome running on CPU
↓
Scheduler: "time's up, Spotify's turn"
↓
OS saves Chrome's ENTIRE state to Chrome's PCB:
├── which instruction was executing
├── all register values
└── stack pointer
↓
Load Spotify's state from Spotify's PCB
↓
Spotify runs as if nothing happened
↓
... repeats hundreds of times per second
Without PCB the OS would have nowhere to save state — like waking up with amnesia every time a process gets CPU back.
fork() → creates new PCB
wait() → removes child's PCB after it exits
exit() → marks PCB as zombie until parent collects
nice() → updates priority field in the PCB
ps/top → reads from the process table (all PCBs)
Analogy: PCB is a hospital patient's file. Without it the doctor (OS) has no idea what was happening and has to start from scratch every time.
Processes run in User Mode — restricted, can only access their own data. But sometimes they need to do things only the kernel can do (create processes, read files, allocate memory).
A system call is the formal mechanism to ask the kernel for help:
Process (user mode)
↓
"hey kernel, I need something" ← system call
↓
CPU switches to kernel mode
Kernel validates the request
Kernel does the privileged work
CPU switches back to user mode
Result returned to process
Analogy: You can't go behind the bank counter yourself. You fill out a form (system call), the teller (kernel) does the work behind the barrier, and hands you back the result.
Why system calls are safe:
You (process) → never enter kernel mode yourself
Kernel → validates BEFORE doing anything
CPU hardware → physically enforces the boundary
Result → either success or safe -1 (failure)
System call vs normal function:
Normal function: runs inside your process memory, user mode
System call: crosses into kernel mode, privileged work
fork, wait, execl, exit, nice, open, read...
fork() creates an exact copy of the current process. One process goes in, two come out.
pid_t pid = fork();
if(pid == -1) {
// fork FAILED — no child created
} else if(pid == 0) {
// I am the CHILD — fork returned 0 to me
} else {
// I am the PARENT — fork returned child's actual PID
}Common confusion: "Shouldn't the child's PID be 0?"
No! The 0 is just fork's return value — a flag saying "you are the child." The child's actual PID is still a real number. getpid() gives the real PID.
if(pid == 0) {
printf("%d", pid); // prints 0 (fork's return value, just a flag)
printf("%d", getpid()); // prints 101 (actual real PID!)
}What gets copied:
Parent Process: Child Process (copy):
├── Code → ├── Code (same)
├── Stack → ├── Stack (same values)
├── Data → ├── Data (same values)
├── Heap → ├── Heap (independent copy via COW)
├── User Area → ├── User Area (same)
├── Page Table → ├── Page Table (own copy)
└── PID = 100 → └── PID = 101 (different!)
PPID = 100 (parent's PID)
After fork — completely independent:
int x = 10;
pid_t pid = fork();
if(pid == 0) {
x = 999; // child changes x
printf("%d", x); // prints 999
} else {
printf("%d", x); // prints 10! parent unaffected
}Variables declared before fork are already in the child's memory — fork copies the entire snapshot at that moment. The child wakes up with everything already there, not a blank slate.
Why fork exists:
// Web server pattern — handles multiple users simultaneously:
while(1) {
wait_for_connection();
if(fork() == 0) {
handle_this_user(); // child handles ONE user
exit(0);
}
// parent loops back immediately to accept next user
}Without fork: User 1 → wait → User 2 → wait → User 3 (sequential, unusable)
With fork: User 1 → child 1
User 2 → child 2 ← all simultaneously!
User 3 → child 3
Forking in a loop for parallel work:
pid_t children[n];
// spawn all workers
for(int i = 0; i < n; i++) {
children[i] = fork();
if(children[i] == 0) {
do_work(chunk[i]);
exit(result);
}
}
// collect results IN ORDER using waitpid
for(int i = 0; i < n; i++) {
waitpid(children[i], &status, 0); // wait for SPECIFIC child
results[i] = WEXITSTATUS(status);
}Never put waitpid inside the fork loop — it makes everything sequential. Fork all first, then collect. The whole point is getting all workers running simultaneously before collecting.
fork() is not dangerous because:
fork() only:
├── copies YOUR OWN process → no other process touched
├── allocates new memory → kernel controls safely
├── creates new PCB → kernel owns process table
└── returns two integers → harmless
wait() makes the parent block until a child finishes, then collects its exit code and cleans up its PCB entry.
pid_t var_1 = wait(&var_2);
// var_1 = PID of child that finished
// var_2 = raw status (contains exit code packed in)What happens internally:
parent calls wait()
↓
kernel blocks parent: "sleep until a child finishes"
↓
child calls exit(42)
↓
kernel wakes parent: "your child just finished!"
↓
kernel removes child's PCB from process table
↓
returns to parent:
├── child's PID → var_1
└── exit status → var_2
The status byte structure:
var_2 (raw integer from wait):
[ exit code (bytes 1-3) ] [ how it exited (byte 4) ]
byte 4 = 0 → exited normally (called exit() or return)
byte 4 ≠ 0 → killed by signal
stat_loc & 0x00FF → isolates byte 4 (check if normal exit)
stat_loc >> 8 → extracts exit code (throws away byte 4)
Macros that do the same thing:
// manual:
if(!(stat_loc & 0x00FF))
printf("%d", stat_loc >> 8);
// using macros (identical result):
if(WIFEXITED(stat_loc))
printf("%d", WEXITSTATUS(stat_loc));WIFEXITED = "Wait — If — Exited normally?" → yes/no WEXITSTATUS = "Wait — Exit — Status" → the actual code
Always check WIFEXITED before WEXITSTATUS — no point reading exit code if process was killed by signal.
wait() handles both cases:
Child still running when wait() called:
→ parent blocks until child finishes
Child already finished before wait() called:
→ OS saved the exit code in the zombie
→ wait() returns IMMEDIATELY with saved status
→ no blocking needed!
waitpid() — wait for specific child:
waitpid(child2, &status, 0); // only waits for child2
// child1 and child3 finishing → parent ignores them
// child2 finishes → parent wakes up!Without wait — zombie problem:
Child finishes
↓
OS: "hold on, parent might need your exit code"
↓
Keeps child's PCB in process table as ZOMBIE
↓
Parent never calls wait() → zombie stays forever
↓
Process table fills up → system slows → eventually crashes
Zombie vs Orphan:
- Zombie = child died, parent hasn't called wait() yet
- Orphan = parent died, child still running → reparented to init
When parent dies with unresolved zombies:
Parent dies
↓
init (PID 1) inherits all zombies
↓
init automatically calls wait() → cleaned up!
Zombies are only dangerous in long-running processes (servers) that fork many children but never call wait() — zombies accumulate until the process table is full.
exit(42); // sends 42 as exit code to parentWhat exit() does:
1. Runs atexit() handlers
2. Flushes stdio buffers
3. Closes all open file descriptors
4. Releases all memory
5. Sends SIGCHLD to parent
6. Passes exit code to parent via wait()
7. Removes from process table (or becomes zombie if parent not listening)
_exit() vs exit():
_exit()skips steps 1-2 (no buffer flushing, no atexit handlers). Use in child after fork to avoid double-flushing stdio buffers.
Exit code limitation:
exit code is only 8 bits → max value 255!
exit(300) → truncated to 44 (300 % 256 = 44)
WEXITSTATUS = 44 → WRONG!
If your result can exceed 255, use pipes or shared memory instead of exit codes.
execl() replaces the current process with a completely different program. Not a copy — the process itself gets overwritten.
execl("/bin/ps", "ps", "-e", NULL);
// path name flags end| Parameter | Meaning |
|---|---|
| First | Absolute path to binary on disk |
| Second | Name of the program (usually same as binary name) |
| Middle params | Any flags, each in "" separated by commas |
| Last | Always NULL — marks end of arguments |
What happens:
execl() called
↓
kernel finds /bin/ps on disk
↓
loads ps into THIS process's memory:
├── replaces code segment ← your code is GONE
├── replaces data segment ← your variables are GONE
└── replaces stack ← your stack is GONE
↓
starts executing ps from the beginning
What stays the same:
REPLACED: KEPT:
├── code segment ├── PID (same process!)
├── data segment ├── PPID
├── stack ├── open files
└── heap └── user area
Never returns on success:
printf("before execl\n");
execl("/bin/ps", "ps", NULL);
printf("after execl\n"); // THIS NEVER PRINTS!
// because the process that would print it no longer existsOnly returns if it fails (wrong path, no permission) — returns -1.
This is why execl is always used WITH fork:
if(fork() == 0) {
execl("/bin/ps", "ps", "-f", NULL); // child gets replaced by ps
} else {
wait(NULL); // parent survives, waits for ps to finish
}Without fork: your program → execl → your program GONE
With fork: parent → survives
child → execl → replaced by ps → runs → dies
parent → wakes from wait → continues
This is literally how your terminal runs every command:
bash (parent)
↓ fork()
child → execl("ls") → replaced by ls → runs → dies
↓ wait()
bash (parent) → ready for next command
nice(10); // lower priority by 10
nice(-5); // raise priority by 5 (superuser only!)nice value range:
-20 (highest priority, superuser only)
0 (default, normal)
19 (lowest priority)
higher nice = more "generous" = gives CPU to others = less CPU for you
lower nice = more "selfish" = takes CPU = more CPU for you
getpriority() — read current nice value:
#include <sys/resource.h>
getpriority(PRIO_PROCESS, 0)
// PRIO_PROCESS = get priority of a process
// 0 = this process itself
// returns current nice valueIn ps -l output:
PID PR NI CMD
100 20 0 process08 ← default
100 35 15 process08 ← after nice(15)
PR = PR_base + NI (NI affects actual scheduling priority)
Parent dies before child finishes
↓
Child still running, parent is gone
↓
OS automatically reparents child to init (PID 1)
↓
init eventually calls wait() → cleans up properly
Detecting orphan in code:
sleep(3);
printf("parent PID: %d\n", getppid());
// if this prints 1 → you are an orphan!Why orphans are a problem:
├── Nobody managing the child anymore
├── Resource leaks (open files, connections)
├── If child finishes → becomes zombie under init
├── Loss of control over running process
└── Accumulation can fill process table
Child finishes
↓
Parent alive but not calling wait()
↓
OS keeps child's PCB as zombie:
├── code not running → dead
├── memory freed → dead
└── PCB still in table → NOT fully gone
exit code preserved → waiting for parent
$ ps
PID STATE CMD
101 Z myprogram ← Z = ZOMBIEWhy zombie exists: OS is being cautious — doesn't want to throw away the exit code in case parent needs it.
Resolution:
Parent calls wait() → zombie cleaned immediately
Parent never calls wait() → zombie stays until parent dies
Parent dies → init inherits zombie → init cleans it
Key insight: A living but inattentive parent is worse than a dead parent. Dead parent → init takes responsibility. Alive parent → OS waits for parent to call wait(). If parent sleeps for 10 seconds first, child is a zombie for those 10 seconds.
UNIX uses multi-level priority + Round Robin within each level.
Multi-level priority:
Priority 0 (highest) → critical system processes
Priority 10 → normal user processes
Priority 15 (lowest) → background tasks
Higher priority runs STRICTLY FIRST:
└── lower priority doesn't get CPU while higher is ready
Round Robin within same level:
Priority 10 queue:
Chrome → runs 2ms → back of queue
Spotify → runs 2ms → back of queue
VSCode → runs 2ms → back of queue
Chrome → runs 2ms → ...repeats
Each gets a fair time slice. Nobody starves.
Simultaneous or sequential?
Different priority levels → sequential (higher goes first)
Same priority level → simultaneous illusion (round robin)
In reality, processes at similar priorities all feel simultaneous because the switching happens hundreds of times per second — faster than human perception.
$ ps # only YOUR processes in current terminal (static snapshot)
$ ps -e # ALL processes on entire system (static snapshot)
$ ps -f # full detailed view with extra columns
$ ps -ef # full view of everything
$ top # ALL system processes, live updating viewps columns:
| Column | Meaning |
|---|---|
| PID | Process ID |
| PPID | Parent's PID |
| TTY | Which terminal it's running in |
| TIME | CPU time consumed |
| CMD | Command that started it |
| NI | Nice value (from ps -l) |
| STATE | R=running, S=sleeping, Z=zombie |
top shows extra:
%CPU → how much CPU right now
%MEM → how much RAM using
Tasks summary → total, running, sleeping, stopped, zombie count
ps = photo, top = live video
Most common real-world use:
# something is slow
$ top # find what's eating CPU
# kill a specific process
$ ps -e # find its PID
$ kill -9 <PID> # force kill
# check if your program is running
$ ps -e | grep myprogram # filter outputWhen you call fopen("file.txt", "r"), two separate objects are created:
Lives on your heap. Created by fopen(), returned as FILE*.
// roughly what FILE looks like:
struct _IO_FILE {
int fd; // the kernel fd integer (e.g. 3)
char buffer[8192]; // userspace buffer (8KB)
char* buf_pos; // where we are in the buffer
int buf_level; // how full the buffer is
int flags; // eof, error, etc.
};This is what adds buffering on top of raw file access. Actual syscalls only fire when the buffer fills/empties — not on every character.
Lives in kernel memory. You never touch it directly — only through syscalls.
struct file contains:
├── offset/cursor ← current read/write position
├── open flags ← O_RDONLY, O_APPEND, etc.
├── reference count ← how many fds point to it
└── pointer to inode ← the actual file on disk
The file itself. Exists on disk whether anyone has it open or not.
inode contains:
├── file size
├── permissions (rwx)
├── owner (UID, GID)
├── timestamps
└── pointers to data blocks on disk
Your code
│
↓
FILE* f ──→ [FILE struct — your heap]
fd = 3
buffer = [...]
│ syscall (read, write, lseek)
↓
[struct file — kernel memory] ← you never touch this
offset = 512
refcount = 1
│
↓
[inode — filesystem on disk]
size, permissions, data blocks
Just an integer index into the process's fd table. 0, 1, 2 are always stdin, stdout, stderr. Every open() gives you the next available integer.
fd = 3 → just a number
OS uses it as a key to look up the struct file in kernel
1. Flushes stdio buffer → write() syscall (unwritten data saved)
2. Calls close(fd) → kernel decrements refcount on struct file
3. Frees the FILE struct from heap
4. If refcount hits 0 → kernel frees struct file
(inode stays alive on disk regardless)
fopen() → refcount = 1
fork() → refcount = 2 (parent + child both hold fd)
parent close → refcount = 1
child close → refcount = 0 → struct file freed
The critical difference between memory and files after fork:
Memory (heap, stack, etc.) → COPIED (independent per process via COW)
FILE* / file descriptor → SHARED (same struct file in kernel)
What happens to a file opened before fork:
FILE* file = fopen("data.txt", "r"); // opened BEFORE fork
fork();Parent FILE struct → fd 3 ──→ ┐
├── SAME struct file (kernel)
Child FILE struct → fd 3 ──→ ┘ shared cursor! refcount = 2
The FILE struct is COPIED (it's on your heap) but the struct file (kernel object) is NOT copied — just refcount incremented. The cursor is in the kernel object — so it's shared.
The shared cursor problem:
// file has: "Hello World"
// both parent and child have same cursor at position 0
if(pid == 0) {
fgets(buffer, 5, file); // child reads "Hello", cursor → 5
}
else {
fgets(buffer, 5, file); // parent reads "World"! cursor was already at 5!
}Solutions:
// Option 1: open file AFTER fork (each gets independent struct file)
fork();
if(pid == 0) { FILE* f = fopen("data.txt", "r"); }
// Option 2: close unused copy immediately after fork
fork();
if(pid == 0) {
fclose(file); // child doesn't need it → close
// do other work
}Why your code works despite the shared cursor:
Parent reads file BEFORE fork each iteration:
fgets() → parent fills orders_arr
THEN fork happens
Child inherits cursor position BUT never reads file
Child just closes its reference (refcount decrements)
Parent continues reading next chunk normally
The parallelism is in computation, not I/O. Parent serializes all I/O, children only work on already-read data.
fclose() in child — do you need it?
Calling fclose() in child:
└── decrements refcount → from 2 to 1
parent's reference unaffected
file still open for parent ✓
NOT calling fclose() in child:
└── exit() handles it anyway → refcount decremented on exit
no real difference
Best practice: In child processes that exit immediately, you don't need to manually fclose(). exit() handles all cleanup. Use
_exit()instead ofexit()in children after fork to avoid double-flushing stdio buffers.
Copy-on-Write (COW) — fork doesn't actually copy all memory immediately. It's expensive. Instead:
fork() called:
parent page → [physical page A] ← both point here, read-only
child page → [physical page A]
child writes to a variable:
parent page → [physical page A] ← parent unchanged
child page → [physical page B] ← kernel copied ONLY this page
Forking is cheap until you start writing. Pages that neither process writes to are never duplicated — they stay as one shared physical page.
For malloc'd memory:
int* ptr = malloc(sizeof(int));
*ptr = 42;
fork();After fork:
Parent virtual 0x500 → page table A → physical frame A (value: 42)
Child virtual 0x500 → page table B → physical frame A (value: 42) ← same!
Child writes *ptr = 99:
→ COW triggers
→ kernel copies frame A to frame B
→ Child virtual 0x500 → frame B (value: 99)
→ Parent virtual 0x500 → frame A (value: 42) ← unaffected!
Do you need to free() in the child?
Child never writes to malloc'd memory:
→ COW never triggers
→ one shared physical page (no duplication)
→ no need to free (exit() handles it)
Child writes to malloc'd memory:
→ COW triggers → physical page duplicated
→ child owns its own copy
→ theoretically should free, but exit() handles it anyway
Best practice: just exit() — let the OS clean up
Don't call free() in child — you're paying COW cost for zero benefit
(free() modifies heap metadata → triggers COW on that page → pointless copy)
free() — what it actually does:
1. Marks block as available in heap allocator (userspace, virtual)
2. May eventually call munmap() to release physical pages to kernel
(or it might not — allocator often keeps pages for reuse)
So free() is primarily a userspace operation.
Physical memory release is a side effect that may or may not happen immediately.
int x = 3;
pid = fork();
// child: x = 7 (only in child's memory)
// parent: x = 19 (only in parent's memory)
// proves: complete memory independence after forksleep(1) in parent — gives child time to finish so output isn't mixed up.
// child sleeps 10s, parent has nothing to do and exits
sleep(10);
printf("parent PID: %d\n", getppid()); // prints 1! reparented to initsid = wait(&stat_loc);
if(!(stat_loc & 0x00FF)) // byte 4 = 0? (normal exit)
printf("%d", stat_loc >> 8); // extract exit code (bytes 1-3)exit(42); // child sends specific exit code
// parent reads: stat_loc >> 8 = 42// REVERSED timing:
// child finishes immediately → becomes zombie
// parent sleeps 10 seconds → not listening!
// zombie visible in ps for 10 seconds
// parent wakes → calls wait() → zombie cleaned
sleep(10);
pid = wait(&stat_loc);if(pid == 0) {
execl("/bin/ps", "ps", "-e", NULL); // child replaced by ps
// child never reaches final printf — its code is gone!
}
// only parent reaches: printf("PID %d terminated\n")while(1) {} // both parent and child loop forever
// purpose: observe with ps/top, practice kill command
// kill parent → child becomes orphan (PPID changes to 1)
// kill child → disappears from psgetpriority(PRIO_PROCESS, 0) // read current nice value
system("ps -l"); // show NI column in ps
nice(5); // child: slightly lower priority
nice(15); // parent: much lower priority
while(1) {} // stay alive for observationsystem("ps -l") output:
PID NI CMD
100 0 process08 ← before nice
100 15 process08 ← after nice(15) — much less CPU
101 5 process08 ← after nice(5) — slightly less CPU
// 1. forgetting wait() → zombie accumulation
fork();
// parent does nothing after fork
// child dies → zombie stays forever ❌
// fix: always call wait() or waitpid() ❌
// 2. wait() inside fork loop → kills parallelism
for(int i = 0; i < n; i++) {
fork();
wait(&status); // ❌ blocks until child done before next fork
}
// fix: fork loop first, wait loop second ✓
// 3. exit code truncation for large values
exit(300); // ❌ truncated to 44 (300 % 256)
WEXITSTATUS(status) == 44; // wrong answer!
// fix: use pipes or shared memory for values > 255
// 4. reading file in both parent and child (shared cursor)
FILE* f = fopen("file.txt", "r");
fork();
if(pid == 0) fgets(buffer, 10, f); // moves shared cursor ❌
// parent's next read is at wrong position!
// fix: open file after fork, or only read in one process
// 5. freeing in child unnecessarily
fork();
if(pid == 0) {
free(ptr); // ❌ triggers COW for no benefit
exit(0); // exit() handles cleanup anyway
}
// 6. execl without fork → your program is gone
execl("/bin/ps", "ps", NULL); // ❌ your process is replaced!
// fix: always fork first, execl in child
// 7. not checking execl return
execl("/bin/ps", "ps", NULL);
// if we reach here → execl failed!
// always add error handling after execl
// 8. forgetting NULL at end of execl
execl("/bin/ps", "ps", "-e"); // ❌ missing NULL → undefined behavior
execl("/bin/ps", "ps", "-e", NULL); // ✅
// 9. using exit() instead of _exit() in child after fork
// exit() flushes stdio buffers → double flush if parent also exits!
_exit(0); // ✅ skips buffer flush in child| Name | Stands For |
|---|---|
fork() |
Fork (split into two) |
wait() |
Wait for child state change |
waitpid() |
Wait for specific PID |
execl() |
Execute (list of args) |
exit() |
Exit process |
_exit() |
Exit directly (no cleanup) |
getpid() |
Get Process ID |
getppid() |
Get Parent Process ID |
getpgrp() |
Get Process Group ID |
nice() |
Adjust nice value (priority) |
getpriority() |
Get current priority |
sleep() |
Suspend for N seconds |
kill() |
Send signal to process |
fopen() |
File Open |
fclose() |
File Close |
fgets() |
File Get String |
WIFEXITED() |
Wait — If — Exited normally? |
WEXITSTATUS() |
Wait — Exit — Status (the code) |
perror() |
Print Error (with system message) |
system() |
Run shell command from C |
mmap() |
Memory Map (shared memory) |
PID |
Process ID |
PPID |
Parent Process ID |
PCB |
Process Control Block |
COW |
Copy On Write |
fd |
File Descriptor |
NI |
Nice value (ps column) |
PR |
Priority (ps column) |
TTY |
Teletype (terminal) |
END OF OS PROCESS MANAGEMENT SECTION 🎓
This section covers UNIX process management from first principles — built through real questions, common confusions, and deep dives into how everything works under the hood. From virtual memory to fork/exec/wait, from zombies to COW, from scheduling to file descriptors.
END OF GUIDE 🎓
This comprehensive guide covers everything from the foundational concepts of how Linux works to advanced bash scripting techniques, common pitfalls, and best practices. Keep it as a reference for your computer engineering journey!