Skip to content

A deep-dive guide into the Linux ecosystem, from terminal mechanics and state-machine regex to advanced Bash scripting. This reference bridges the gap between basic commands and the underlying C-based logic of the shell. Perfect for developers looking to master automation, text processing, and system-level operations.

License

Notifications You must be signed in to change notification settings

seif-a096/into-the-kernel

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 

Repository files navigation

🐧 Linux & Bash Scripting — Complete Guide

A comprehensive reference covering Linux commands, how the terminal works under the hood, regular expressions, and bash scripting. Everything from the basics to the science behind how it all works.


📚 Table of Contents

  1. How the Terminal Works
  2. The Shell — What It Is and Why It Exists
  3. Bash — A Specific Shell
  4. How Commands Are Executed
  5. The /bin Directories
  6. Environment Variables and PATH
  7. stdin, stdout, stderr
  8. Essential Linux Commands
  9. File Permissions and chmod
  10. Redirection and Pipes
  11. Text Processing Commands
  12. Regular Expressions
  13. How the Regex State Machine Works
  14. Bash Scripting
  15. Bash Variables and Parameter Expansion
  16. Arithmetic Expansion
  17. String Manipulation
  18. Arrays
  19. Control Flow
  20. Loops
  21. Functions
  22. Common Syntax Errors and Pitfalls
  23. The Three Types of Conditionals
  24. Loop Types Deep Dive
  25. Command Substitution vs Function Calls
  26. Number System Conversions
  27. Special Operators Reference
  28. Source vs Execute

1. How the Terminal Works

When you type a command in the terminal, a program called the shell is always running and listening. When you press Enter, it goes through these steps:

You type a command
        ↓
Shell reads and tokenizes it (splits into pieces)
        ↓
Shell interprets special characters (|, >, $, *)
        ↓
Shell finds the program in the bin directories
        ↓
Shell forks (creates a copy of itself)
        ↓
The copy replaces itself with the target program (exec)
        ↓
Original shell waits for it to finish
        ↓
Output is returned to the screen

Compiled vs Interpreted

Type How it runs Example
Compiled Translated into machine code before running, CPU runs binary directly grep, ls, cat (C programs)
Interpreted Read and executed line by line at runtime by an interpreter Bash scripts .sh

Key insight: Commands like grep, ls, cat are pre-compiled C programs stored on your disk. What you type in the terminal are just arguments passed to those programs — they are data, not code that needs compiling.

How a C Program Receives Your Input

Every compiled C program has a main function that receives what you type:

int main(int argc, char *argv[], char *envp[])
  • argc — number of arguments
  • argv — array of what you typed
  • envp — array of environment variables

So when you type grep -i "John" myfile.txt:

argv[0] = "grep"         // program name
argv[1] = "-i"           // first argument
argv[2] = "John"         // second argument
argv[3] = "myfile.txt"   // third argument

Fork and Exec

Every command you run in bash is a separate process created through two system calls:

  • fork() — creates an exact copy of the current bash process in memory
  • exec() — the copy replaces itself with the target program
bash process
      ↓
   fork()
      ↓
bash process  +  copy of bash
                      ↓
                   exec()
                      ↓
              grep process runs
                      ↓
              finishes, bash resumes

2. The Shell — What It Is and Why It Exists

The shell is any program that sits between you and the operating system kernel. It is called a shell because it wraps around the OS like a shell, letting you communicate with the kernel without talking to it directly.

You → Shell → Kernel → Hardware

The kernel is the actual core of the OS that controls hardware, memory, and processes. You cannot talk to it directly. The shell is the layer that translates your human-readable commands into kernel operations through system calls.

Types of Shells

Shell Full Name Notes
sh Bourne Shell The original, very basic
bash Bourne Again Shell Most common in Linux
zsh Z Shell Default on macOS, more features
fish Friendly Interactive Shell Beginner friendly
ksh Korn Shell Enterprise use
CMD Command Prompt Windows shell
PowerShell PowerShell Advanced Windows shell

Analogy: Shell is the general concept (like SQL), bash is a specific flavor of it (like MySQL). CMD and PowerShell are to Windows what bash and zsh are to Linux.

Important Note: Bash is just ONE type of shell program. The generic term is "shell" - bash is a specific implementation, like how "car" is generic and "Toyota" is specific.


3. Bash — A Specific Shell

Bash stands for Bourne Again Shell — a joke name because it replaced the original Bourne Shell written by Stephen Bourne. So bash is literally the "born again" shell.

Bash itself is a compiled C program stored at /bin/bash. When you run a bash script, bash reads your script file line by line and for each line it:

  1. Reads the line as text
  2. Parses it to understand the structure
  3. Finds the appropriate compiled binary
  4. Executes it via fork and exec
  5. Moves to the next line
#!/bin/bash
echo "Hello"      # bash finds /bin/echo, runs it
ls -l             # bash finds /usr/bin/ls, runs it
grep "x" file     # bash finds /usr/bin/grep, runs it

The Shebang #!/bin/bash

The first line of every bash script is the shebang (also called hashbang).

#!/bin/bash
  • # is called hash (or sharp in music)
  • ! is called bang (old typographer slang)
  • Together: sharp + bang = shebang

What it does: When the OS sees #! at the very first line, it reads the path after it and uses that program to interpret the file.

You run: ./myscript.sh
        ↓
OS reads first line: #!/bin/bash
        ↓
OS loads /bin/bash and passes the script to it
        ↓
bash ignores first line (# makes it a comment)
        ↓
bash starts interpreting from line 2

The shebang is not bash-specific. Any interpreter can be used:

#!/bin/bash          # run with bash
#!/usr/bin/python3   # run with python
#!/usr/bin/node      # run with nodejs

4. How Commands Are Executed

What You Type Is Just Data

The compiled programs (grep, ls, etc.) are built once and sit on your disk. What you type in the terminal is just data being passed into those programs — arguments, strings, filenames. Data does not need to be compiled.

compiled grep = "I know how to search ANY text for ANY pattern"
                (compiled once, sits on disk)

"John" myfile.txt = the specific pattern and file right now
                    (just data, no compilation needed)

Analogy: A meat grinder is the compiled program, built once. Whatever meat you put in is the data. The grinder doesn't need to be rebuilt for different types of meat.

Interpreted Scripts vs Compiled Programs

Compiled (e.g. grep) Interpreted (e.g. bash script)
Speed Fast, CPU runs directly Slower, interpreter overhead
Needs interpreter? No Yes
After change Must recompile Run immediately
Stored as Binary machine code Plain text

Analogy: Compiled is like translating an entire book into English first, then reading it. Interpreted is like having a translator sit next to you, reading one sentence at a time.


5. The /bin Directories

bin stands for binaries — pre-compiled C programs that the CPU can directly execute.

Directory Contents
/bin Essential system binaries (ls, cat, grep, cp, rm)
/usr/bin Regular user program binaries
/usr/local/bin Manually installed binaries
/sbin System administration binaries (root only)

You can find where any command lives:

which grep
# /usr/bin/grep

which ls
# /usr/bin/ls

6. Environment Variables and PATH

What an Environment Variable Is

When any process starts, the OS gives it a block of memory called the environment — a list of key=value pairs:

HOME=/home/seif096
USER=seif096
PATH=/usr/bin:/bin

In C, these are accessible as the third argument to main:

int main(int argc, char *argv[], char *envp[])
// envp[] contains all environment variables

Variable Scope

Type Visible to
NAME="x" Current bash process only
export NAME="x" Current process + all child processes
Defined in ~/.bashrc All terminals for your user
Defined in /etc/environment Every process on the system

Variables flow downward from parent to child only. If a child changes a variable, the parent never sees it — each process has its own copy.

The PATH Variable

$PATH is bash's dictionary of where to look for commands:

echo $PATH
# /usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin

When you type grep, bash searches each directory in PATH in order until it finds the binary:

type "grep"
        ↓
search /usr/local/bin  → not found
        ↓
search /usr/bin        → found! /usr/bin/grep
        ↓
run it

You can add your own programs to this dictionary:

PATH=$PATH:/home/seif096/myprograms

System Configuration Files

File When it runs
/etc/environment Every user, system-wide
/etc/profile Every user at login
~/.bashrc Your user, every terminal
~/.bash_profile Your user, at login

7. stdin, stdout, stderr

Every process in Linux gets three standard channels automatically:

Channel Name Meaning
stdin Standard Input Data flowing into the program
stdout Standard Output Data coming out of the program
stderr Standard Error Error messages coming out

The pipe | connects stdout of one program to stdin of the next:

cat myfile.txt     |      grep "John"
       ↓                       ↑
    stdout      connects     stdin

8. Essential Linux Commands

pwd — Print Working Directory

pwd
# /home/seif096/documents

Displays the full path of the directory you are currently in. Think of it as asking your GPS "where am I right now?"


ls — List Directory Contents

ls stands for list.

ls                    # list files in current directory
ls /home              # list files in specific path
ls -l                 # long format (detailed view)
ls -a                 # show hidden files (starting with .)
ls -la                # combine options

Understanding ls -l Output

-rwxr-xr-x  2  john  staff  4096  Jan 10 12:00  myfile.txt
Part Meaning
-rwxr-xr-x File type + permissions
2 Number of hard links
john Owner (user)
staff Group
4096 File size in bytes
Jan 10 12:00 Last modified date/time
myfile.txt File name

File Type (First Character)

Character Meaning
- Regular file
d Directory
l Symbolic link

Real Example

-rw-rw-r-- 1 seif096 seif096 0 Feb 13 18:28 test.txt
  • - — regular file
  • rw- — owner can read and write
  • rw- — group can read and write
  • r-- — others can only read
  • 1 — one hard link
  • seif096 / seif096 — owner and group both are seif096 (Linux creates a group with same name as user automatically)
  • 0 — file is empty (0 bytes)
  • Feb 13 18:28 — last modified today

cat — Concatenate

cat stands for concatenate. Original purpose was joining files, but most commonly used to display file contents.

cat file.txt                        # display file contents
cat file1.txt file2.txt             # display both files
cat file1.txt file2.txt > combined  # join and save
cat -n myfile.txt                   # display with line numbers

Important Note: cat merges files vertically (stacks them one after another), while paste merges horizontally (side-by-side columns).


cp — Copy

cp source.txt destination/           # copy file to directory
cp file1.txt file2.txt destination/  # copy multiple files
cp -r source_dir/ dest_dir/          # copy directory recursively
cp -p file.txt destination/          # preserve attributes (permissions, timestamps)
cp -v file.txt destination/          # verbose (show what's being copied)

CRITICAL PATH RULE: When you're already inside a directory, use relative paths:

cd ~/Desktop/
cp words.txt numbers.txt Lab1/   # ✅ CORRECT - relative path

# NOT this:
cp ~/Desktop/words.txt ~/Lab1/   # ❌ WRONG - unnecessary full path

Why? You're already in ~/Desktop/, so words.txt is in the current directory. Using full paths when you're already there is redundant and error-prone.

Important: cp creates duplicates - both the original and the copy exist independently on disk, each taking up space.


mv — Move

mv oldname.txt newname.txt     # rename file
mv file.txt /destination/      # move to different location
mv file.txt /dest/newname.txt  # move and rename

Key Difference from cp: mv does NOT create a duplicate. The original file is removed from the source location. Only one copy exists on disk.


rm — Remove

rm file.txt          # delete a file
rm -r myfolder/      # delete directory and everything inside

-r means recursive — goes through all contents inside a directory, and all contents inside those, all the way down:

myfolder/
├── file1.txt         ← deleted
├── file2.txt         ← deleted
└── subfolder/
    ├── file3.txt     ← deleted
    └── deeper/
        └── file4.txt ← deleted

Think of it like a tree: recursive means "go deep into every level and repeat the same action."


9. File Permissions and chmod

Permission Groups

Every file has permissions for three groups:

Group Symbol in chmod Who
Owner/User u The person who owns the file
Group g Users in the file's associated group
Others o Everyone else on the system
All a All three at once

Permission Types

Permission Symbol Value Meaning for file Meaning for directory
Read r 4 View contents List contents
Write w 2 Modify file Create/delete files inside
Execute x 1 Run as program Enter with cd
None - 0 No permission No permission

What Is a Group?

A group is a collection of users. For example, all developers in a company might be in a group called developers. This makes it easy to give all of them the same permissions without setting permissions individually.

Linux asks three questions: Are you the owner? Are you in the group? Or are you everyone else? Each gets their own permissions.

chmod — Change Mode

Symbolic method:

chmod u+x myfile.txt      # add execute for owner
chmod g-w myfile.txt      # remove write from group
chmod o+r myfile.txt      # add read for others
chmod a+x myfile.txt      # add execute for everyone
chmod ugo+x myfile.txt    # same as a+x
chmod a-r myfile.txt      # remove read from everyone
chmod a+r myfile.txt      # restore read for everyone

Operators:

  • + add permission
  • - remove permission
  • = set exact permission

Numeric method:

Add up the values for each group:

r = 4
w = 2
x = 1
- = 0
Permission Calculation Value
rwx 4+2+1 7
rw- 4+2+0 6
r-x 4+0+1 5
r-- 4+0+0 4
chmod 764 myfile.txt
# owner: rwx (7)
# group: rw- (6)
# others: r-- (4)

The file -rw-rw-r-- has numeric value 664.

Difference Between o and a

Symbol Affects
o Others only (third group)
a All three: user + group + others
chmod o+w myfile.txt    # only others get write
chmod a+w myfile.txt    # everyone gets write

Common Permission Error Pattern

Problem: You remove read permission with chmod a-r, then try to read the file:

chmod a-r SortedMergedContent.txt
cat SortedMergedContent.txt
# Permission denied ❌

Why it fails: Without read permission, even YOU (the owner) cannot read the file.

Solution: Restore read permission:

chmod a+r SortedMergedContent.txt  # restore read for all
# OR
chmod u+r SortedMergedContent.txt  # restore read for owner only

Important: Permission restrictions apply to EVERYONE, including the file owner (unless you're root).


Hard Links vs Symbolic Links vs Copy

Symbolic Link

A shortcut that points to another file's location. If the original is deleted, the link breaks.

ln -s /original/file shortcut

Hard Link

Two different names pointing to the exact same data on disk. If one name is deleted, the data still exists under the other name.

ln /original/file hardlink

Copy

The binary data is fully duplicated on disk. Two completely independent files.

cp file1.txt file2.txt
Copy Hard Link Symbolic Link
Data duplicated? Yes No No (just a pointer)
Disk space Double Same Tiny (just a path)
Edit one, affects other? No Yes Yes
Delete original, affects other? No No Yes (breaks)

Analogy: Symbolic link = sticky note saying "document is in warehouse shelf 3". Hard link = same document with two different names on the cover. Copy = completely new document with the same content.


10. Redirection and Pipes

> — Redirect Output to File

Sends command output to a file instead of the screen. Overwrites existing content.

cat file.txt > newfile.txt      # save output to file
echo "hello" > greeting.txt     # write text to file

>> — Append to File

Sends output to a file without overwriting — adds to the end.

echo "line 1" > file.txt
echo "line 2" >> file.txt       # adds to end, doesn't overwrite

Analogy: > erases the paper and writes new content. >> writes at the bottom without erasing.

< — Input Redirection

Feeds a file's contents as input to a command:

tr "a-z" "A-Z" < input.txt > output.txt
# Read from input.txt, convert to uppercase, write to output.txt

Note: This is more efficient than cat input.txt | tr "a-z" "A-Z" because it avoids creating an unnecessary cat process.

2> — Redirect Error Messages

command 2>/dev/null    # suppress error messages
command 2>errors.log   # save errors to file

| — Pipe

Takes the stdout of one command and sends it as stdin to the next command.

cat myfile.txt | grep "John"

You can chain multiple pipes like an assembly line:

cat myfile.txt | grep "John" | cut -d, -f2 | sort

Each command passes its output to the next one.

Analogy: Like water pipes in a house — output flows from one pipe into the next.

> vs |:

Operator Sends output to Example
> A file echo "hi" > file.txt
| Another command cat file.txt | grep "x"

11. Text Processing Commands

grep — Global Regular Expression Print

Searches for a pattern and prints matching lines.

grep "John" myfile.txt              # basic search
grep -i "john" myfile.txt           # case insensitive
grep -n "John" myfile.txt           # show line numbers
grep -v "John" myfile.txt           # invert (show non-matching lines)
grep -c "John" myfile.txt           # count matching lines
grep -E "John|Sarah" myfile.txt     # extended regex (OR)

When to Use Quotes

Case Example Quotes needed?
Single word grep John file Optional
With spaces grep "John Smith" file Required
Special characters grep "hello$" file Required

Single quotes '' treat everything literally. Double quotes "" allow variable/special character interpretation.


cut — Extract Columns

Extracts specific columns from each line of a file.

# file contains: John,25,Cairo
cut -d, -f1 data.txt    # extract column 1: John
cut -d, -f2 data.txt    # extract column 2: 25
cut -d, -f3 data.txt    # extract column 3: Cairo
  • -d — delimiter (what separates columns)
  • -f — field number (which column to extract)

Note: Unlike Ctrl+X, the Linux cut command does not delete from the original file. It only reads and extracts. The original stays untouched.


paste — Join Files Side by Side

Joins files horizontally (column by column), while cat joins vertically.

paste file1.txt file2.txt           # join side by side (tab separated)
paste -d, file1.txt file2.txt       # use comma as delimiter

Comparison:

cat file1 file2 paste file1 file2
John John 25
Sarah Sarah 30
Mike Mike 28
25
30
28

Origin of the name: From the physical act of cutting and pasting paper — placing two columns next to each other like gluing newspaper columns side by side.


tr — Translate Characters

Replaces or deletes specific characters one by one. Works at the character level, not word level.

echo "hello world" | tr 'a-z' 'A-Z'    # lowercase to uppercase: HELLO WORLD
echo "hello world" | tr ' ' '_'         # replace spaces: hello_world
echo "hello world" | tr -d 'l'          # delete all l's: heo word

tr always reads from input so use with pipe | or <:

tr 'a-z' 'A-Z' < myfile.txt

Common Pattern - Useless Use of cat:

# Less efficient (creates unnecessary process):
cat SortedMergedContent.txt | tr "a-z" "A-Z" > output.txt

# More efficient (direct input redirection):
tr "a-z" "A-Z" < SortedMergedContent.txt > output.txt

Think of tr like find-and-replace in Word, but for individual characters.


head and tail

head -n 3 file.txt    # first 3 lines
tail -n 3 file.txt    # last 3 lines

sort and uniq

sort file.txt                # sort lines alphabetically
sort file.txt > sorted.txt   # save sorted output

uniq file.txt                # remove adjacent duplicate lines
sort file.txt | uniq         # sort first, then remove duplicates

Important: uniq only removes adjacent duplicates. Always sort first to group duplicates together.


12. Regular Expressions

Regular expressions (regex) are patterns used to match text. Each piece of a regex is like a slot that gets filled with one character at a time.

Connection to JavaScript: All these regex concepts work identically in JavaScript. The /pattern/g syntax in JS is the same idea — g means global (search entire string), just like grep searches the entire file.

Special Characters

Symbol Name Meaning Example Matches
^ Caret Start of line ^John Lines starting with John
$ Dollar End of line John$ Lines ending with John
. Period Any single character J.hn John, Jahn, Jxhn
* Asterisk Zero or more of previous Jo*hn Jhn, John, Joohn
[ ] Brackets Any one character in set [aeiou] Any vowel
[^ ] Negated brackets Any character NOT in set [^aeiou] Any non-vowel
- Hyphen (in brackets) Range of characters [a-z] Any lowercase letter
| Pipe (with -E) OR John|Sarah John or Sarah
\{x\} Braces Exactly x repetitions Jo\{3\}hn Jooohn only
\{x,y\} Braces range Between x and y times Jo\{2,4\}hn Joohn, Jooohn, Joooohn
\{x,\} Braces min At least x times Jo\{2,\}hn Joohn, Jooohn, ...

Character Ranges in Brackets

[a-z]       # any lowercase letter
[A-Z]       # any uppercase letter
[0-9]       # any digit
[a-zA-Z]    # any letter
[\s]        # whitespace
[^\s]       # any non-whitespace character

Examples

grep "^John" file       # lines starting with John
grep "John$" file       # lines ending with John
grep "[Jj]ohn" file     # John or john
grep "J.hn" file        # J + any char + hn
grep "Jo*hn" file       # J + zero or more o's + hn
grep -E "John|Sarah"    # John OR Sarah
grep "^T[^\s]*s" file   # starts with T, no spaces, contains s
grep -n "^w.*[0-9]$" MergedContent.txt  # starts with w, ends with digit (with line numbers)

The \ Escape Character

The backslash \ tells the program to treat the next character literally rather than as something special:

\{    # literal { instead of starting a repetition group
\.    # literal . instead of "any character"
\*    # literal * instead of "zero or more"

Extended Regex with -E

Basic grep has limited regex support. Use -E or egrep for full regex including |:

grep -E "John|Sarah" file
egrep "John|Sarah" file      # same thing

13. How the Regex State Machine Works

When grep receives your pattern, it passes it to a regex engine that builds a state machine — a flowchart of decisions.

What Is a State Machine?

A state machine is a series of checkpoints. The engine reads your text one character at a time and moves through the checkpoints:

Pattern: ^T[^\s]*s

State 0 → State 1 → State 2 → State 3 → MATCH
  ^T       [^\s]     [^\s]*      s

Walking Through an Example

Pattern: ^T[^\s]*s against the word Trains:

T - r - a - i - n - s

State 0: start of line, is it T?     yes → move to State 1 ✓
State 1: is r a non-whitespace?      yes → move to State 2 ✓
State 2: is a a non-whitespace?      yes → stay in State 2 ✓  (loop)
State 2: is i a non-whitespace?      yes → stay in State 2 ✓  (loop)
State 2: is n a non-whitespace?      yes → stay in State 2 ✓  (loop)
State 3: is s the letter s?          yes → MATCH! ✓

The * creates a loop arrow — the state machine keeps looping on the same state as long as the condition is met.

Visualizing the State Machine

         ^T          [^\s]        [^\s]*          s
[START] ──→ [SAW T] ──→ [NON-SPACE] ──→ [LOOP] ──→ [MATCH]
                                            ↑________|
                                         (keeps looping on
                                          non-whitespace)

Bracket Expressions Are Single Slots

[^\s] is not multiple characters — it is a description of what one character is allowed to be. Think of it like a form with blank fields:

[ T ] [ _ ] [ _ ] [ _ ] [ s ]

Where each [ _ ] is filled by one character matching the bracket rule.


14. Bash Scripting

Creating and Running a Script

#!/bin/bash
echo "Hello World"
chmod +x myscript.sh    # give execute permission
./myscript.sh           # run it

Script Arguments

#!/bin/bash
echo "first argument: $1"
echo "second argument: $2"
echo "all arguments: $@"
echo "number of arguments: $#"
./myscript.sh hello world
# first argument: hello
# second argument: world
# all arguments: hello world
# number of arguments: 2

15. Bash Variables and Parameter Expansion

Basic Variables

NAME="seif096"
AGE=20

echo $NAME          # seif096
echo $AGE           # 20

Rules:

  • No spaces around = (this is CRITICAL)
  • Access with $
  • No type declaration — everything is a string by default

Common Error:

NAME = "seif096"    # ❌ WRONG - bash thinks NAME is a command
NAME="seif096"      # ✅ CORRECT

User Input

echo "What is your name?"
read NAME
echo "Hello $NAME"

The Three $ Syntaxes

Syntax Name Does
$NAME Variable expansion Gets value of variable
${NAME} Parameter expansion Gets value + extra operations
$() Command substitution Runs command, returns output
$(()) Arithmetic expansion Evaluates math, returns result

All $ syntaxes share the same core idea: "evaluate what is inside me and replace me with the result" before the outer command runs.

Why Use ${} Instead of $

Removes ambiguity when attaching text to a variable:

NAME="seif096"
echo $NAME_file       # bash reads NAME_file as one variable → empty!
echo ${NAME}_file     # bash clearly reads NAME + "_file" → seif096_file

Extra Powers of ${}

${#NAME}              # length of string
${NAME:0:3}           # substring extraction
${NAME:-"default"}    # use default value if variable is empty
${NAME/seif/user}     # replace "seif" with "user"
${NAME%.txt}          # remove .txt from end
${NAME#*/}            # remove everything up to first /

Command Substitution $()

Runs the command inside and replaces itself with the output:

FILES=$(ls)
DATE=$(date)
echo "Today is $DATE"

# nesting is possible
echo $(echo $(ls))

Backticks ` ` do the same thing but are older and cannot be nested:

FILES=`ls`    # old way
FILES=$(ls)   # new way (preferred)

16. Arithmetic Expansion

Bash treats everything as strings by default. You need special syntax to do math.

The Four Ways

y=1

# Way 1: let
let y=$y+1
echo $y     # 2

# Way 2: double parentheses (recommended)
y=$((y+1))
echo $y     # 2

# Way 3: expr with backticks (old way, spaces required)
y=`expr $y + 1`
echo $y     # 2

# Way 4: without arithmetic → treats as string!
y=$y+1
echo $y     # 1+1  (wrong! this is a string)

Operators Inside $(( ))

echo $((10 + 5))    # 15  addition
echo $((10 - 5))    # 5   subtraction
echo $((10 * 5))    # 50  multiplication
echo $((10 / 5))    # 2   division
echo $((10 % 3))    # 1   remainder (modulo)
echo $((2 ** 3))    # 8   power (2³)

Inside $(( )) you do not need $ before variable names.

Common Arithmetic Mistakes:

# WRONG - using = instead of == for comparison
if (( x = 5 )); then    # ❌ This assigns 5 to x, doesn't compare
    echo "x is 5"
fi

# CORRECT - use == for comparison
if (( x == 5 )); then   # ✅ This compares
    echo "x is 5"
fi

17. String Manipulation

Strings are indexed starting at 0. Negative indexes count from the end.

a  b  c  A  B  C  1  2  3  A  B   C   a   b   c
0  1  2  3  4  5  6  7  8  9  10  11  12  13  14
                                   -4  -3  -2  -1

String Length

stringZ=abcABC123ABCabc

echo ${#stringZ}            # 15  (using parameter expansion)
echo `expr length $stringZ` # 15  (using expr, old way)

Substring Extraction ${string:position:length}

echo ${stringZ:0}       # abcABC123ABCabc  (from 0 to end)
echo ${stringZ:1}       # bcABC123ABCabc   (from 1 to end)
echo ${stringZ:7:3}     # 23A              (3 chars starting at position 7)
echo ${stringZ: -4}     # Cabc             (last 4 characters)
echo ${stringZ: -4:1}   # C                (1 char starting 4 from end)

The space before -4 is required — otherwise bash confuses it with different syntax.

Quick Reference

Syntax Meaning
${#string} Length of string
${string:0} Everything from position 0
${string:7:3} 3 characters starting at position 7
${string: -4} Last 4 characters
${string: -4:1} 1 character starting 4 from end

Character-by-Character Processing

Important Note: Bash strings are immutable - you cannot change a character directly. You must rebuild the string.

str="hello"
new=""

for ((i=0; i<${#str}; i++)); do
    char="${str:$i:1}"
    # Process char
    new+="$char"  # Build new string
done

This does NOT work:

str="hello"
str[0]="H"    # ❌ WRONG - Bash strings are not arrays

18. Arrays

Defining and Accessing

farm_hosts=(web03 web04 web05 web06 web07)

echo ${farm_hosts[0]}     # web03  (first element)
echo ${farm_hosts[2]}     # web05  (third element)
echo ${farm_hosts[*]}     # all elements
echo ${farm_hosts[@]}     # all elements (same as *)
echo ${#farm_hosts[@]}    # number of elements

Why ${}? Without curly braces, $farm_hosts[*] is read as $farm_hosts + [*] literally → web03[*]. Curly braces tell bash to treat farm_hosts[*] as one expression.

Looping Over an Array

farm_hosts=(web03 web04 web05 web06 web07)

for i in ${farm_hosts[*]}
do
    echo "item: $i"
done
# item: web03
# item: web04
# item: web05
# item: web06
# item: web07

19. Control Flow

If / Else

if [ $T1 = $T2 ]
then
    echo "equal"
else
    echo "not equal"
fi

fi is if spelled backwards — bash closes blocks by reversing the opening word (if/fi, case/esac, do/done).

Comparison Operators

Numbers:

Operator Meaning
-eq equal
-ne not equal
-gt greater than
-lt less than
-ge greater than or equal
-le less than or equal

Strings:

Operator Meaning
= equal
!= not equal

Files:

Operator Meaning
-f file exists
-d directory exists
-r file is readable
-w file is writable
-x file is executable
-z string is empty
-n string is not empty

Example with File Check

#!/bin/bash
echo "Enter a filename:"
read FILENAME

if [ -f $FILENAME ]
then
    echo "File exists, contents:"
    cat $FILENAME
else
    echo "File does not exist"
fi

Exit Codes

Every command returns an exit code:

  • 0 = success
  • anything else = failure
grep "John" myfile.txt
echo $?    # 0 if found, 1 if not found

if grep "John" myfile.txt
then
    echo "Found John"
else
    echo "John not found"
fi

elif - Else If

if [[ "$1" == 1 ]]; then
    echo "Option 1"
elif [[ "$1" == 2 ]]; then
    echo "Option 2"
elif [[ "$1" == 3 ]]; then
    echo "Option 3"
else
    echo "Unknown option"
fi

Important: Only ONE fi at the end closes the entire if/elif/else block.


20. Loops

For Loop

# iterate over a list
for i in 1 2 3 4 5
do
    echo $i
done

# iterate over a range
for i in {1..5}
do
    echo $i
done

# iterate with step {start..end..step}
for i in {1..5..2}
do
    echo $i     # 1 3 5
done

# iterate over command output
for i in $(ls)
do
    echo "item: $i"
done

For Loop with seq

seq generates a number sequence: seq start step end

# seq 1 1 9 → generates 1 2 3 4 5 6 7 8 9
for i in `seq 1 1 9`
do
    echo $i
done

# count down
for i in `seq 10 -1 1`
do
    echo $i
done

For Loop with Break

len=10
limit=5

for i in `seq 1 1 $((len-1))`
do
    if [ $i -gt $limit ]
    then
        break       # stop the loop immediately
    fi
    echo $i
done
# prints 1 2 3 4 5 then stops

While Loop

Runs as long as condition is true.

COUNTER=0
while [ $COUNTER -lt 10 ]
do
    echo "The counter is $COUNTER"
    let COUNTER+=1
done

Until Loop

Runs as long as condition is false — stops when it becomes true. Opposite of while.

COUNTER=20
until [ $COUNTER -lt 10 ]
do
    echo "COUNTER $COUNTER"
    let COUNTER-=1
done
# counts down from 20, stops when below 10

Loop Comparison

Loop Runs when Use when
for Iterating over a known set You know how many times upfront
while Condition is true You don't know how many iterations
until Condition is false You want to run until something becomes true

21. Functions

Defining a Function

Two equivalent syntaxes:

# way 1
function myfunction {
    echo "hello"
}

# way 2
myfunction() {
    echo "hello"
}

❌ WRONG - Don't mix both:

function myfunction() {    # ❌ Don't use both 'function' and ()
    echo "test"
}

Important Syntax Rules:

  1. Space before { is REQUIRED:
function myFunc{      # ❌ WRONG - missing space
    echo "test"
}

function myFunc {     # ✅ CORRECT
    echo "test"
}
  1. Empty functions need a placeholder:
function myFunc {}    # ❌ WRONG - empty body

function myFunc {     # ✅ CORRECT - use : as placeholder
    :
}

Calling a Function

myfunction          # no parentheses when calling (unlike most languages)
myfunction arg1 arg2

Function Arguments

function greet {
    echo "Hello $1"
    echo "Second arg: $2"
    echo "All args: $@"
    echo "Number of args: $#"
}

greet "seif096" "world"

Return Values

Using echo + command substitution (to return a value):

function add {
    echo $(($1 + $2))
}

result=$(add 3 5)
echo $result    # 8

Using return (exit codes only, 0-255):

function check {
    if [ $1 -gt 10 ]
    then
        return 0    # success / true
    else
        return 1    # failure / false
    fi
}

check 15
echo $?    # 0

return in bash only returns an exit code (0-255), not a value. For actual values, use echo and capture with $().

CRITICAL: echo must come BEFORE return. Code after return never executes:

function test {
    return 1
    echo "This never prints"    # ❌ Never reached
}

function test {
    echo "This prints"          # ✅ Executes
    return 1
}

Variable Scope

# global by default
function myfunction {
    NAME="seif096"    # accessible everywhere
}

# use local to restrict scope
function myfunction {
    local NAME="seif096"    # only inside this function
}

# Multiple local variables
function myfunction {
    local x y z           # ✅ Space-separated
    # NOT: local x,y,z    # ❌ Wrong
}

Full Example — Increment Function

#!/bin/bash

function increment {
    counter=0
    inc=1

    # if an argument was passed, use it as the increment
    if [ "$#" -ne 0 ]; then
        inc=$1
    fi

    # loop 10 times (counting down with seq)
    for i in `seq 10 -1 1`; do
        echo "The counter is $counter"
        let counter=counter+$inc
    done
}

# call with no argument → increments by 1
increment

# call with argument 5 → increments by 5
increment 5

Output with increment 5:

The counter is 0
The counter is 5
The counter is 10
The counter is 15
The counter is 20
The counter is 25
The counter is 30
The counter is 35
The counter is 40
The counter is 45

The counter accumulates — it keeps adding inc to itself. It prints before adding each time, which is why it starts at 0.

Function Reference

Feature Syntax
Define function name { } or name() { }
Call name or name arg1 arg2
First argument $1
All arguments $@
Argument count $#
Return exit code return 0 or return 1
Return value echo value then capture with $()
Local variable local NAME="value"

22. Common Syntax Errors and Pitfalls

Critical Spacing Rules

1. No spaces around = in variable assignment:

NAME = "value"    # ❌ WRONG - bash thinks NAME is a command
NAME="value"      # ✅ CORRECT

2. Spaces REQUIRED around [ and ]:

if[ "$x" = "y" ]; then       # ❌ WRONG - no space after if
if ["$x" = "y" ]; then       # ❌ WRONG - no space after [
if [ "$x"="y" ]; then        # ❌ WRONG - no spaces around =
if [ "$x" = "y" ]; then      # ✅ CORRECT

3. Space REQUIRED before { in functions:

function test{               # ❌ WRONG
    echo "hi"
}

function test {              # ✅ CORRECT
    echo "hi"
}

4. Spaces in [[ and ]]:

if[[ "$x" == "y" ]]; then    # ❌ WRONG - no space after if
if [["$x" == "y"]]; then     # ❌ WRONG - no space after [[
if [[ "$x"=="y" ]]; then     # ❌ WRONG - no spaces around ==
if [[ "$x" == "y" ]]; then   # ✅ CORRECT

Quoting Variables

Always quote variables in [ ] tests:

if [ $NAME = "test" ]; then      # ❌ Dangerous - breaks if NAME is empty
if [ "$NAME" = "test" ]; then    # ✅ CORRECT - safe

In [[ quoting is optional but recommended:

if [[ $NAME == "test" ]]; then   # ✅ Works but not best practice
if [[ "$NAME" == "test" ]]; then # ✅ Better - always quote

Using $ with Variables

Inside arithmetic $(( )), $ is optional:

x=5
echo $((x + 1))      # ✅ Works
echo $(($x + 1))     # ✅ Also works

Everywhere else, $ is required:

echo x               # ❌ Prints literal "x"
echo $x              # ✅ Prints value of x

Function Definition Mistakes

# ❌ WRONG - can't use both 'function' and ()
function test() {
    echo "hi"
}

# ✅ CORRECT - choose one
function test {
    echo "hi"
}

# ✅ ALSO CORRECT
test() {
    echo "hi"
}

# ❌ WRONG - empty function needs placeholder
function test {
}

# ✅ CORRECT - use : as no-op
function test {
    :
}

Arithmetic Comparison Mistakes

Using = instead of == in arithmetic:

if (( x = 5 )); then         # ❌ WRONG - assigns 5, doesn't compare
    echo "x is 5"
fi

if (( x == 5 )); then        # ✅ CORRECT - compares
    echo "x is 5"
fi

Using % for modulo in [ ]:

if [ ${#str} % 2 = 0 ]; then     # ❌ WRONG - [ ] treats this as string
    echo "even"
fi

if (( ${#str} % 2 == 0 )); then  # ✅ CORRECT - use (( )) for math
    echo "even"
fi

Missing Semicolons

In one-line if statements:

if [ "$x" = "y" ] then echo "yes"; fi     # ❌ WRONG - missing ; before then
if [ "$x" = "y" ]; then echo "yes"; fi    # ✅ CORRECT

In one-line for loops:

for i in {1..5} do echo $i; done          # ❌ WRONG - missing ; before do
for i in {1..5}; do echo $i; done         # ✅ CORRECT

File Extension in ${var%pattern}

Understanding substring removal:

file="document.txt"
echo ${file%.txt}        # document (removes .txt from end)
echo ${file%.doc}        # document.txt (no match, nothing removed)

Using it with variables:

half=$((${#parens} / 2))        # ✅ CORRECT
lParen="${parens:0:half}"       # ✅ CORRECT - quote for safety
rParen="${parens:half}"         # ✅ CORRECT

23. The Three Types of Conditionals

Complete Comparison Chart

Feature [ ] [[ ]] (( ))
Type Test command (POSIX) Bash keyword Arithmetic evaluation
String comparison = only == or = ❌ Not for strings
Number comparison -eq, -lt, -gt... -eq, -lt, -gt ==, <, >
Math operators ❌ No ❌ No ✅ Yes (+, *...)
Pattern matching ❌ No ✅ Yes (*, ?) ❌ No
Variable quoting ✅ Required Optional (safer) Not needed
Word splitting ✅ Yes (dangerous) ❌ No (safe) N/A
AND / OR -a / -o && / || && / ||
Regex matching ❌ No ✅ Yes (=~) ❌ No
Portable ✅ POSIX (all shells) ❌ Bash only ❌ Bash only
When to use Scripts for any shell Modern bash scripts Math comparisons

[ ] — Test Command (Old Style)

# String comparison - use = not ==
if [ "$a" = "$b" ]; then
    echo "equal"
fi

# Numeric comparison - use -eq, -ne, -lt, -gt, -le, -ge
if [ "$num" -eq 5 ]; then
    echo "equals 5"
fi

# File tests
if [ -f "file.txt" ]; then
    echo "file exists"
fi

# String tests
if [ -z "$str" ]; then      # true if string is empty
    echo "empty"
fi

if [ -n "$str" ]; then      # true if string is not empty
    echo "not empty"
fi

CRITICAL RULES for [ ]:

  1. Must have spaces after [ and before ]
  2. Must quote variables or it breaks with empty strings
  3. Use = for strings (NOT ==)
  4. Use -eq, -ne, -lt, -gt for numbers (NOT ==, <, >)

[[ ]] — Advanced Test (Modern, Recommended)

# String comparison - can use == or =
if [[ "$a" == "$b" ]]; then
    echo "equal"
fi

# Pattern matching
if [[ "$filename" == *.txt ]]; then
    echo "text file"
fi

# Regex matching
if [[ "$email" =~ ^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}$ ]]; then
    echo "valid email"
fi

# Logical operators
if [[ "$x" == "yes" && "$y" == "no" ]]; then
    echo "both conditions true"
fi

# Multiple conditions
if [[ "$1" == 1 ]] || [[ "$1" == 2 ]]; then
    echo "1 or 2"
fi

BENEFITS of [[ ]]:

  1. Safer - no word splitting
  2. Pattern matching works (*, ?)
  3. Can use == for strings (more intuitive)
  4. Better && and || support
  5. Quoting is optional (but still recommended)

(( )) — Arithmetic Evaluation (Math Only)

# Direct math comparison - use C-style operators
if (( x == 5 )); then
    echo "x equals 5"
fi

# No $ needed for variables inside
if (( length % 2 == 0 )); then
    echo "even length"
fi

# Complex expressions
if (( (x + y) * 2 > 100 )); then
    echo "result > 100"
fi

# All C operators work
if (( x >= 10 && x <= 20 )); then
    echo "x is between 10 and 20"
fi

OPERATORS in (( )):

==    equal
!=    not equal
<     less than
>     greater than
<=    less than or equal
>=    greater than or equal
+     addition
-     subtraction
*     multiplication
/     division
%     modulo
**    exponentiation
&&    logical AND
||    logical OR

When to Use Which

Situation Use
Math comparison (( ))
String comparison [[ ]]
Pattern matching [[ ]]
File existence [[ ]] or [ ]
POSIX portable script [ ]
Modern bash-only script [[ ]]
Arithmetic with operators %+- (( ))

Common Mistakes

# ❌ WRONG - math in [ ]
if [ ${#str} % 2 = 0 ]; then         # Treats as string!
    echo "even"
fi

# ✅ CORRECT - use (( )) for math
if (( ${#str} % 2 == 0 )); then
    echo "even"
fi

# ❌ WRONG - assignment instead of comparison in (( ))
if (( x = 5 )); then                 # Assigns 5, doesn't compare!
    echo "x is 5"
fi

# ✅ CORRECT - use ==
if (( x == 5 )); then
    echo "x is 5"
fi

# ❌ WRONG - using == with [ ]
if [ "$x" == "5" ]; then             # Works in bash but not POSIX
    echo "x is 5"
fi

# ✅ CORRECT - use = with [ ]
if [ "$x" = "5" ]; then
    echo "x is 5"
fi

24. Loop Types Deep Dive

The Three For Loop Styles

Style 1: List Iteration (Word-Based)

# Iterate over words
for item in apple banana cherry; do
    echo "$item"
done

# Iterate over files
for file in *.txt; do
    echo "Found: $file"
done

# Iterate over command output
for user in $(cat /etc/passwd | cut -d: -f1); do
    echo "User: $user"
done

COMMON MISTAKE:

for i in flip; do        # ❌ Iterates over word "flip" (1 time)
    echo "$i"
done

# This is NOT looping over characters!
# It literally loops over the word "flip" once

Style 2: C-Style Arithmetic Loop

This is the CORRECT way to loop over string characters:

str="hello"

# Loop with counter
for ((i=0; i<${#str}; i++)); do
    char="${str:$i:1}"
    echo "$char"
done

# Reverse loop
for ((i=${#str}-1; i>=0; i--)); do
    char="${str:$i:1}"
    echo "$char"
done

Syntax breakdown:

for (( initialization; condition; increment )); do
    commands
done

Examples:

# Count 0 to 9
for ((i=0; i<10; i++)); do
    echo $i
done

# Count by 2s
for ((i=0; i<=10; i+=2)); do
    echo $i  # 0 2 4 6 8 10
done

# Multiple variables
for ((i=0, j=10; i<10; i++, j--)); do
    echo "i=$i j=$j"
done

Style 3: Range Expansion

# Number range
for i in {1..5}; do
    echo $i
done

# With step
for i in {0..10..2}; do
    echo $i  # 0 2 4 6 8 10
done

# Reverse
for i in {10..1}; do
    echo $i
done

# Letters
for letter in {a..z}; do
    echo $letter
done

Important: Range expansion {1..5} happens BEFORE variable substitution:

n=5
for i in {1..$n}; do    # ❌ WRONG - doesn't expand $n
    echo $i
done

# Use C-style instead:
for ((i=1; i<=n; i++)); do    # ✅ CORRECT
    echo $i
done

Loop Comparison Table

Loop Type When to Use Example
for in list Iterating over words, files, array for word in $words
for ((;;)) Counter-based, character iteration for ((i=0; i<n; i++))
for in {} Fixed numeric/letter ranges for i in {1..10}
while Condition-based, unknown iterations while [ $x -lt 10 ]
until Run until condition becomes true until [ $x -ge 10 ]

25. Command Substitution vs Function Calls

Understanding $()

$() is command substitution - it runs a command and captures its output:

current_dir=$(pwd)
file_count=$(ls | wc -l)
today=$(date +%Y-%m-%d)

echo "Today is $today"

When to Use $()

✅ USE $() when you need to CAPTURE output:

# Capture and store
result=$(command)

# Use output in another command
echo "Result: $(command)"

# Use in conditional
if [[ $(whoami) == "root" ]]; then
    echo "Running as root"
fi

❌ DON'T USE $() when you just want to RUN something:

# WRONG - captures output but does nothing with it
$(echo "Hello")

# CORRECT - just displays it
echo "Hello"

With Functions

CRITICAL CONCEPT: Functions in bash are called like commands, NOT with $():

function greet {
    echo "Hello $1"
}

# ❌ WRONG - unnecessary $()
if [[ condition ]]; then
    $(greet "World")    # Output captured and discarded!
fi

# ✅ CORRECT - just call it
if [[ condition ]]; then
    greet "World"       # Output displays directly
fi

# ✅ CORRECT - capture if you need the value
message=$(greet "World")
echo "Message: $message"

Why the Confusion?

In many languages:

// JavaScript
result = myFunction(); // Need () to call

// Python
result = my_function(); // Need () to call

In Bash:

# Bash - NO PARENTHESES for function calls
result=$(myfunction)      # Call with $() only if capturing output
myfunction                # Call directly to run and display
myfunction arg1 arg2      # Call with arguments

Practical Examples

function toDecimal {
    printf "%d\n" "$1"
}

# WRONG ways to call:
if [[ "$1" == 1 ]]; then
    $(toDecimal "$2")     # ❌ Captures output, discards it
fi

# CORRECT ways to call:
if [[ "$1" == 1 ]]; then
    toDecimal "$2"        # ✅ Runs and displays output
fi

# OR capture if needed:
result=$(toDecimal "$2")  # ✅ Captures for later use
echo "Result: $result"

Quick Decision Tree

Do I need the command's output as a value?
    ├─ YES → Use $()
    │         result=$(command)
    │
    └─ NO → Just run it
              command
              function_name

26. Number System Conversions

Understanding Number Bases

# Decimal (base 10): 0-9
26

# Hexadecimal (base 16): 0-9, A-F
1A    # = 1×16 + 10 = 26 in decimal

# Binary (base 2): 0-1
11010 # = 16 + 8 + 2 = 26 in decimal

# Octal (base 8): 0-7
32    # = 3×8 + 2 = 26 in decimal

Hex → Decimal Conversions

# Method 1: Using arithmetic expansion with 0x prefix
echo $((0x1A))              # 26
echo $((0xFF))              # 255

# Method 2: Using base#number syntax
echo $((16#1A))             # 26
echo $((16#FF))             # 255

# Method 3: Using printf
printf "%d\n" 0x1A          # 26
printf "%d\n" 0xFF          # 255

IMPORTANT: 0x tells bash "this is hexadecimal":

echo $((0x1A))    # ✅ Correctly interprets as hex
echo $((1A))      # ❌ Error - not valid without 0x

Decimal → Hex Conversions

# Method 1: Using printf (most common)
printf "%X\n" 26            # 1A (uppercase)
printf "%x\n" 26            # 1a (lowercase)
printf "0x%X\n" 26          # 0x1A (with prefix)

# Method 2: Using bc
echo "obase=16; 26" | bc    # 1A

Key Point: $(( )) can INPUT any base but ALWAYS OUTPUTS decimal:

echo $((0x1A))    # Input: hex, Output: 26 (decimal)
echo $((16#FF))   # Input: hex, Output: 255 (decimal)
echo $((26))      # Input: decimal, Output: 26 (decimal)

To output hex, you MUST use printf:

printf "%X\n" 26  # Decimal → Hex: 1A

Other Base Conversions

# Binary → Decimal
echo $((2#1010))            # 10
echo $((0b1010))            # 10 (bash 4.3+)

# Octal → Decimal
echo $((8#77))              # 63
echo $((077))               # 63 (leading 0 means octal)

# Decimal → Binary
echo "obase=2; 26" | bc     # 11010

# Decimal → Octal
printf "%o\n" 26            # 32
echo "obase=8; 26" | bc     # 32

Practical Conversion Functions

#!/bin/bash

function toDecimal {
    # Accepts hex with or without 0x prefix
    local input="$1"

    # Add 0x if not present
    if [[ ! "$input" =~ ^0x ]]; then
        input="0x$input"
    fi

    printf "%d\n" "$input"
}

function toHex {
    printf "%X\n" "$1"
}

# Usage
toDecimal "1A"      # 26
toDecimal "0x1A"    # 26
toHex 26            # 1A

Common Mistakes

# ❌ WRONG - trying to convert decimal to hex with $(())
echo $((26))        # Still outputs 26, not 1A

# ✅ CORRECT - use printf
printf "%X\n" 26    # 1A

# ❌ WRONG - missing 0x prefix
echo $((1A))        # Error: invalid number

# ✅ CORRECT - use 0x
echo $((0x1A))      # 26

27. Special Operators Reference

The % Operator

Usage 1: Modulo (Remainder) - In Arithmetic

echo $((10 % 3))    # 1 (remainder of 10 ÷ 3)
echo $((17 % 5))    # 2 (remainder of 17 ÷ 5)

# Check if even
if (( num % 2 == 0 )); then
    echo "even"
fi

Usage 2: Pattern Removal from End - In Parameter Expansion

file="document.txt"
echo ${file%.txt}        # document (removes .txt)

path="/home/user/file.txt"
echo ${path%/*}          # /home/user (removes /file.txt)

# % removes shortest match from end
file="hello.world.txt"
echo ${file%.*}          # hello.world (removes .txt)

# %% removes longest match from end
echo ${file%%.*}         # hello (removes .world.txt)

The # Operator

Pattern Removal from Start - In Parameter Expansion

path="/home/user/file.txt"
echo ${path#*/}          # home/user/file.txt (removes /)

# # removes shortest match from start
echo ${path#/*/}         # user/file.txt (removes /home/)

# ## removes longest match from start
echo ${path##*/}         # file.txt (removes /home/user/)

The * Operator

Usage 1: Glob Pattern (Wildcard)

ls *.txt                 # all .txt files
rm file*                 # all files starting with "file"

Usage 2: Multiplication - In Arithmetic

echo $((5 * 3))          # 15

Usage 3: All Array Elements

array=(a b c)
echo ${array[*]}         # a b c

The ? Operator

Single Character Wildcard

ls file?.txt             # file1.txt, fileA.txt, etc.
ls ???.txt               # abc.txt, xyz.txt, etc.

The @ Operator

All Array Elements (Preserving Word Boundaries)

array=(a b c)
echo ${array[@]}         # a b c

# Difference from * when quoted:
for item in "${array[*]}"; do
    echo "$item"         # Prints once: "a b c"
done

for item in "${array[@]}"; do
    echo "$item"         # Prints 3 times: a, b, c
done

Comparison Table

Operator In Arithmetic In Parameter Expansion As Glob
% Modulo Remove from end -
# - Remove from start Comment
* Multiply All array elements Wildcard
? - - Single char
@ - All array elements -

28. Source vs Execute

Two Ways to Run a Script

Method 1: Execute (Create New Process)

chmod +x script.sh
./script.sh

What happens:

  1. New bash process is created (fork)
  2. Script runs in the new process
  3. Variables and functions exist only in that process
  4. Process ends, everything disappears
  5. Your original shell is unchanged
Your Shell (PID 1000)
    │
    └─> Fork: New Shell (PID 1001)
            │
            └─> Runs script
            └─> Sets variables
            └─> Defines functions
            └─> Process ends
    │
    └─> Back to your shell
        Variables/functions gone ❌

Method 2: Source (Current Process)

source script.sh
# OR
. script.sh

What happens:

  1. NO new process created
  2. Script commands run in YOUR current shell
  3. Variables persist after script ends
  4. Functions persist after script ends
Your Shell (PID 1000)
    │
    └─> Reads script line by line
    └─> Executes each line in current shell
    └─> Variables remain ✅
    └─> Functions remain ✅

Key Differences

Aspect Execute ./script.sh Source source script.sh
New process ✅ Yes ❌ No
Needs chmod +x ✅ Yes ❌ No
Variables persist ❌ No ✅ Yes
Functions persist ❌ No ✅ Yes
Changes PATH Only in subprocess In current shell
Use case Run programs Load config/functions

When to Use Each

Use Execute (./script) when:

  • Running standalone programs
  • You don't need variables/functions afterwards
  • Script should run in isolation
  • Script modifies files (doesn't matter which process)

Use Source (source script) when:

  • Loading configuration (.bashrc, .bash_profile)
  • Defining functions you want to use
  • Setting environment variables for current session
  • Activating virtual environments (source venv/bin/activate)
  • Loading utility functions into current shell

Practical Examples

Example 1: Configuration File

# config.sh
export DATABASE_URL="postgres://localhost/mydb"
export API_KEY="secret123"

alias ll='ls -la'

greet() {
    echo "Hello from config!"
}
# Execute - variables disappear
./config.sh
echo $DATABASE_URL    # Empty! ❌

# Source - variables persist
source config.sh
echo $DATABASE_URL    # postgres://localhost/mydb ✅
greet                 # Hello from config! ✅
ll                    # Works! ✅

Example 2: Function Library

# utils.sh
function toUpper {
    echo "$1" | tr 'a-z' 'A-Z'
}

function toLower {
    echo "$1" | tr 'A-Z' 'a-z'
}
# Execute - functions not available
./utils.sh
toUpper "hello"       # Command not found ❌

# Source - functions available
source utils.sh
toUpper "hello"       # HELLO ✅
toLower "WORLD"       # world ✅

Example 3: Python Virtual Environment

# This is why you SOURCE, not execute:
source venv/bin/activate    # ✅ Modifies current shell's PATH

# If you executed instead:
./venv/bin/activate         # ❌ PATH modified in subprocess, not your shell

Common Misconceptions

❌ WRONG: "Source is like import in Python"

✅ CORRECT: Source is like copy-pasting the file's contents into your current terminal and running them line by line.

❌ WRONG: "Execute needs chmod +x, source doesn't need it"

✅ CORRECT: Execute needs execute permission AND read permission. Source only needs read permission.

The . Shorthand

# These are identical:
source script.sh
. script.sh

# The dot (.) is the POSIX-standard command
# 'source' is a bash-specific alias for '.'

Checking What Sourced Your Current Shell

# See what's loaded in current session
echo $PATH                  # Environment variables
declare -F                  # All functions
alias                       # All aliases

🔤 Naming History

Name Stands For Story
grep Global Regular Expression Print From the old ed editor command g/re/p
cat Concatenate Original purpose was joining files
chmod Change Mode Changes file permission mode
pwd Print Working Directory Prints current location
sudo Super User Do Run commands as root
bin Binaries Pre-compiled executable programs
#! Shebang / Hashbang Hash + Bang = Shebang (also old slang for "the whole thing")
tr Translate Translates characters
cp Copy Copies files
mv Move Moves/renames files
rm Remove Removes files
ls List Lists directory contents

🎯 Quick Reference Card

# VARIABLES
NAME="value"              # Define (no spaces around =)
echo $NAME                # Access
echo ${NAME}              # Parameter expansion
result=$(command)         # Command substitution
result=$((5 + 3))         # Arithmetic expansion

# CONDITIONALS
[ "$x" = "y" ]            # Old test (POSIX)
[[ "$x" == "y" ]]         # Modern test (bash)
(( x == 5 ))              # Arithmetic test

# LOOPS
for i in {1..5}           # Range
for ((i=0; i<n; i++))     # C-style
for item in $list         # List iteration
while [ condition ]       # While loop
until [ condition ]       # Until loop

# FUNCTIONS
function name { }         # Define (space before {)
name arg1 arg2            # Call (no parentheses)
echo "result"             # Return value (capture with $())
return 0                  # Return exit code
local var="value"         # Local variable

# STRINGS
${#str}                   # Length
${str:pos:len}            # Substring
${str^^}                  # Uppercase
${str,,}                  # Lowercase
${str/old/new}            # Replace

# MATH
$((x + y))                # Addition
$((x % 2))                # Modulo
(( x == 5 ))              # Comparison

# FILES
cp src dst                # Copy
mv src dst                # Move
rm file                   # Remove
cat file                  # Display
chmod +x file             # Make executable

# REDIRECTION
cmd > file                # Overwrite
cmd >> file               # Append
cmd < file                # Input from file
cmd1 | cmd2               # Pipe

# PERMISSIONS
chmod u+x file            # Add execute for user
chmod a-r file            # Remove read for all
chmod 755 file            # rwxr-xr-x

# CONVERSIONS
echo $((0x1A))            # Hex to decimal
printf "%X" 26            # Decimal to hex

# SOURCE vs EXECUTE
./script.sh               # Execute (new process)
source script.sh          # Source (current process)
. script.sh               # Same as source


29. Introduction to C Language

Notes from Lab 2 — covering IO, data types, compilation, pointers, memory allocation, Makefiles, GDB debugging, and file handling in C.


IO — printf and scanf

printf format specifiers:

Specifier Prints
%d decimal integer
%c character
%f float/double
%s string
%p pointer address

Display width vs precision — common mistake:

printf("%6s", str);    // minimum WIDTH of 6 — pads with spaces if shorter, does NOT truncate
printf("%.6s", str);   // maximum PRECISION of 6 — truncates if longer
printf("%10.6s", str); // 10 wide, max 6 chars printed

Key distinction: %6s controls minimum space used. %.6s controls maximum characters printed. If you want to print only 6 characters use %.6s not %6s.

scanf — reading input:

scanf("%d", &x);       // & required for non-array types
scanf("%s", str);      // no & needed — arrays are already pointers

Common mistake: forgetting & with scanf on non-array types will crash the program. scanf needs the address to know where in memory to store the value.

Another mistake: putting messages inside scanf format string — scanf ignores text, only reads format specifiers. Always use printf for messages, scanf only for reading.

// WRONG
scanf("enter a number: %d", &x);   // message is ignored, confusing

// CORRECT
printf("enter a number: ");
scanf("%d", &x);

Escape sequences:

Sequence Meaning
\n newline — moves to next line
\r carriage return — moves to start of SAME line
\t horizontal tab
\\ backslash
\" double quote

\r note: Moves cursor back to beginning of current line without going down. Rarely used alone. Windows uses \r\n to end lines while Linux uses just \n — this can cause bugs when sharing files between systems.


Data Types

Type Size Range
char 1 byte -128 to 127 (signed)
int 4 bytes ~±2 billion
float 4 bytes 7 digits precision
double 8 bytes 15 digits precision

bool in C:

C has no built-in bool. Uses integers instead:

  • 0 = false
  • any non-zero = true
// old C (C89) — no bool, use int
int isTrue = 1;
int isFalse = 0;

// modern C (C99+) — include header
#include <stdbool.h>
bool isTrue = true;
bool isFalse = false;

Comparison with C++: C++ has bool built in without any header. C needs <stdbool.h>. Under the hood both are still just integers.


Compilation & Execution

gcc hello.c -o hello      # compile → named executable
gcc hello.c               # compile → default name a.out
./hello                   # run executable
gcc hello.c -o hello && ./hello   # compile and run in one line

Important flags:

Flag Meaning
-o name name the output file (output)
-c compile to object file only, do not link
-Wall show all warnings
-g include debug info for gdb

-o is like Save As — without it everything saves as a.out and overwrites each other. Always use -o to keep files organized.

void main() vs int main(): The lab slides use void main() but this is non-standard. Modern C (C99+) requires int main() and return 0. gcc follows the standard so void main() may cause errors or warnings. Always use int main().


Package Management — apt

apt stands for Advanced Package Tool — it is the package manager for Ubuntu/Debian Linux. Think of it as the app store for the terminal.

sudo apt update                    # refresh list of available packages (always run first)
sudo apt install build-essential   # installs gcc, g++, make
sudo apt-get install manpages-dev  # installs man pages for C functions
sudo apt -y install gdb            # installs debugger (-y = yes to all prompts)
gcc --version                      # verify installation

apt vs apt-get: apt-get is the older version. They do the same thing. apt is preferred now.

sudo is needed because installing software affects the whole system, not just your user account — requires admin permissions.

How apt works with repositories:

Repository (online warehouse of packages)
      ↓
apt update  (refresh what's available)
      ↓
apt install (download and install)
      ↓
Your PC

Repository addresses are stored in /etc/apt/sources.list. Adding a new repo:

sudo add-apt-repository ppa:something

What build-essential installs:

  • gcc — C compiler
  • g++ — C++ compiler
  • make — build automation tool

What manpages-dev gives you:

man printf    # full documentation for printf
man scanf     # full documentation for scanf

Makefiles

A Makefile automates the compilation process. Instead of typing gcc commands every time, you just type make.

Build process:

Source files (.c) → Compiler (-c flag) → Object files (.o) → Linker → Executable

Basic Makefile format:

target: dependencies
[TAB] command

Critical: Indentation MUST be a TAB not spaces — make will throw an error with spaces.

Simple example:

all:
    gcc main.c -o program

With dependencies (only recompiles changed files):

all: hello

hello: main.o factorial.o hello.o
    gcc main.o factorial.o hello.o -o hello

main.o: main.c
    gcc -c main.c

factorial.o: factorial.c
    gcc -c factorial.c

clean:
    rm -f *.o hello

With variables:

CC=gcc
CFLAGS=-c -Wall
SOURCES=main.c hello.c factorial.c
OBJECTS=$(SOURCES:.c=.o)
EXECUTABLE=hello

all: $(EXECUTABLE)

$(EXECUTABLE): $(OBJECTS)
    $(CC) $(OBJECTS) -o $@

.c.o:
    $(CC) $(CFLAGS) $< -o $@

clean:
    rm -f $(OBJECTS) $(EXECUTABLE)

Key Makefile concepts:

Concept Example Meaning
Variable CC=gcc reusable value
Use variable $(CC) dereference variable
Auto variable $@ target name e.g. main.o
Auto variable $< first dependency e.g. main.c
Substitution $(SOURCES:.c=.o) replace .c with .o in all filenames
Pattern rule .c.o: how to convert any .c to .o

$(SOURCES:.c=.o) explained:

SOURCES = main.c    hello.c    factorial.c
                ↓          ↓            ↓
OBJECTS = main.o    hello.o    factorial.o

Acts like find-and-replace for file extensions. Add a new .c file to SOURCES and OBJECTS updates automatically.

$< and $@ in pattern rule:

.c.o:
    $(CC) $(CFLAGS) $< -o $@
# expands to:
# gcc -c -Wall main.c -o main.o
#              ↑           ↑
#              $<          $@
#         (dependency)  (target)

.c.o: is a special suffix rule — not a regular target. Make recognizes two extensions as a conversion rule meaning "for ANY .c file that needs to become a .o file, use this rule." Modern equivalent:

%.o: %.c         # newer syntax, same meaning
    $(CC) $(CFLAGS) $< -o $@

Dependencies are files OR other targets:

make sees a dependency → is it a file that exists?
    YES → use it
    NO  → look for a target rule to create it

make flags:

Command Meaning
make looks for file named Makefile automatically
make -f MyMakefile use specific file (-f = file)
make clean run the clean target
make -f Makefile-2 clean run clean in specific makefile

Execution order make follows:

1. look at target
2. check dependencies exist
3. if dependency missing → find rule to create it
4. if .c newer than .o → recompile
5. link everything together

clean target:

clean:
    rm -f *.o hello     # ✅ correct — only removes .o files
    rm -rf *o hello     # ⚠️ dangerous — removes ANYTHING ending in letter o

Wildcard warning: *o means anything ending with the letter o (matches hello, video etc). *.o means anything ending with .o specifically. Always use *.o not *o.

On Linux vs Windows: The output of linking is just hello with no extension. On Windows it would be hello.exe. The -o flag names the output file.


Pointers

int x = 420;
int *pointer = &x;   // pointer holds the ADDRESS of x

Memory layout:

x
420
pointer = 0x31
0x31  0x32  0x33  0x34

argv is an array of pointers:

char* argv[]
// [] → array of
// *  → pointers to
// char → characters (strings)

argv[0] → points to"hello\0"
argv[1] → points to"5\0"
argv[2] → points to"10\0"

argv contains pointers, not the data itself. This is why arguments are always strings — argv stores pointers to character arrays, so even the number 5 is stored as the string "5".

Casting argv — common mistake:

(int)argv[3]      // casts memory ADDRESS → garbage large number
(int)*argv[3]     // gets first char's ASCII value → '5' becomes 53, not 5
atoi(argv[3])     // correctly parses string "5" → 5  ✅

Why printf("%s", argv[3]) works but cast doesn't: %s tells printf to follow the pointer and read characters. A cast just reinterprets the raw pointer address as an integer — no string parsing involved.


Memory Allocation

Three types:

Type Size known? Lifetime Example
Stack yes (compile time) during function call only local variables
Static yes (compile time) entire program global, static variables
Heap no (decided at runtime) you control it malloc()

Stack — LIFO (Last In First Out):

main() called       → added to stack
funcA() called      → added on top
funcB() called      → added on top
funcB() ends        → removed
funcA() ends        → removed
main() ends         → removed

Stack overflow happens when too many function calls fill the stack — usually from infinite recursion.

Static keyword:

void counter(){
    static int count = 0;  // keeps its value between calls
    int x = 0;             // resets every call (stack)
    count++;
    x++;
    printf("count=%d x=%d", count, x);
}
counter();  // count=1 x=1
counter();  // count=2 x=1
counter();  // count=3 x=1

Dynamic allocation functions (all in <stdlib.h>):

void *malloc(size_t bytes);          // allocate memory
void *calloc(size_t n, size_t size); // allocate + initialize to zero
void *realloc(void *ptr, size_t new_size); // resize existing allocation
void free(void *p);                  // release memory
size_t sizeof(type);                 // get byte size of a type

malloc example:

int *ids = malloc(sizeof(int) * 40);  // space for 40 ints
ids[0] = 5;
free(ids);   // always free!

realloc — resize existing memory:

// safe pattern — always use a temp pointer
int *temp = realloc(ids, sizeof(int) * 10);
if(temp == NULL){
    free(ids);   // original still safe
} else {
    ids = temp;  // success
}

realloc behavior: If enough space exists next to current allocation it extends in place. Otherwise it allocates a new block, copies old data, frees old block — all automatically. Always assign to a temp pointer first in case it fails.

Memory leak: Forgetting free() means memory stays occupied even after the program doesn't need it — can slow down or crash the program over time.

static on a function — file scope:

static char** read_file_lines(...)

Means the function is only visible inside the same .c file. Used to:

  • hide internal helper functions (private vs public API)
  • prevent naming conflicts between files
  • signal intent — "this is internal plumbing"

C vs C++ static: In C, static on a function only means file scope restriction. In C++, static on a class member means it belongs to the class not an instance. C++ inherited the C meaning and added its own on top.


String Handling — string.h

#include <string.h>
Function Purpose Safe version
strlen(str) string length (not counting \0)
strcpy(dest, src) copy string strncpy(dest, src, size)
strcat(dest, src) concatenate strings strncat(dest, src, size)
strcmp(a, b) compare strings (0=equal) strncmp(a, b, n)
strchr(str, ch) find character, returns pointer or NULL
strstr(str, sub) find substring, returns pointer or NULL
strdup(str) duplicate string on heap (must free!)
memset(ptr, val, size) fill memory with value
memcpy(dest, src, size) copy memory block

Always prefer n versions: strcpystrncpy, strcatstrncat. The n versions respect buffer size limits and prevent overflow.

atoi — ASCII to Integer (<stdlib.h>):

atoi("42")      // → 42
atoi("-5")      // → -5
atoi("3.14")    // → 3 (stops at decimal point)
atoi("abc")     // → 0 (can't convert)
atoi("42abc")   // → 42 (stops at first non-numeric)

Problem with atoi: Returns 0 for both "0" (valid) and "abc" (error) — can't tell them apart. Use strtol for safer conversion.

snprintf — safe string formatting:

char buffer[50];
snprintf(buffer, sizeof(buffer), "Name: %s Age: %d", name, age);

Never use sprintf — it doesn't check buffer size and causes overflow. Always use snprintf which stops at the size limit.


File Handling — stdio.h

Opening files:

FILE* file = fopen(filename, mode);
Mode File exists File missing Notes
"r" opens ✅ NULL ❌ read only
"w" overwrites ⚠️ creates ✅ write
"a" adds to end ✅ creates ✅ append
"r+" opens ✅ NULL ❌ read + write

"w" deletes existing content — if the file exists all its content is wiped. Use "a" if you want to keep existing content.

fgets — reading line by line:

char line[256];
while(fgets(line, sizeof(line), file)){
    printf("%s", line);
}

How fgets works:

  • reads one line at a time
  • stops at \n, end of file, or buffer size limit — whichever comes first
  • includes the \n in the result
  • returns NULL at EOF
  • each call automatically advances to next line (file pointer moves forward)

fgets stops at whichever comes first:

\n  (end of line)    → stops, includes \n in buffer
256 (buffer full)    → stops WITHOUT \n — line was too long!
EOF                  → returns NULL

If line is longer than buffer:

first fgets  → reads 255 chars (reserves 1 for \0)
second fgets → reads remaining chars

To detect truncation:

if(line[strlen(line)-1] != '\n'){
    // line was longer than buffer!
}

Use 1024 for safety — SRT subtitle files usually have short lines but 1024 is a common standard buffer size.

How fgets remembers position — the FILE pointer:

The FILE struct internally stores a position indicator — a byte number saying "I am currently at byte X in the file."

fopen()          // position = 0 (start)
fgets() call 1   // reads bytes 0-5 → position = 6
fgets() call 2   // reads bytes 6-11 → position = 12
fgets() call 3   // reads bytes 12-15 → position = 16 (EOF)

Useful position functions:

rewind(file);              // reset position to byte 0
ftell(file);               // get current position number
fseek(file, 0, SEEK_SET);  // jump to start (same as rewind)
fseek(file, 0, SEEK_END);  // jump to end

fprintf — writing to files or streams:

printf("Hello\n");                 // → stdout
fprintf(stdout, "Hello\n");        // → stdout (same)
fprintf(stderr, "Error!\n");       // → stderr
fprintf(file, "Hello\n");          // → file

printf is just fprintf(stdout, ...) — the f in fprintf stands for file, as it was originally designed to print to any file/stream.

Always use stderr for error messages:

fprintf(stderr, "Error: File does not exist.\n");  // ✅
printf("Error: File does not exist.\n");            // ❌ goes to stdout

This way error messages still appear in terminal even when stdout is redirected to a file.

The f-function family:

Function f stands for Purpose
printf print to stdout
fprintf file print to any stream/file
scanf read from stdin
fscanf file read from any stream/file
sprintf string print to string buffer (dangerous!)
snprintf string+n print to string buffer with size limit (safe)
sscanf string read from string buffer

Common file handling pattern:

// read from one file, write to another
FILE* input  = fopen(input_filename,  "r");
FILE* output = fopen(output_filename, "w");

if(input == NULL){
    fprintf(stderr, "Error: Input file does not exist.\n");
    return 1;
}
if(output == NULL){
    fprintf(stderr, "Error: Could not create output file.\n");
    fclose(input);   // close input before returning!
    return 1;
}

char line[256];
while(fgets(line, sizeof(line), input)){
    fprintf(output, "%s", line);
}

fclose(input);
fclose(output);

Always close files before returning — even in error paths. Not closing causes resource leaks.


GDB Debugging

gcc -g -o myprogram myprogram.c   # compile with debug info (-g flag required)
gdb ./myprogram                    # start debugger

Most used GDB commands:

Command Short Purpose
run r start program
run arg1 arg2 r arg1 arg2 start with arguments
break 10 b 10 breakpoint at line 10
break main b main breakpoint at function
info break show all breakpoints
delete 1 remove breakpoint 1
delete remove all breakpoints
next n next line (skip into functions)
step s next line (enter functions)
finish run until current function returns
continue c continue to next breakpoint
print x p x print variable value
set var x = 5 assign value to variable
watch x stop when x changes
info watch show watched variables
backtrace where show call stack
frame show current function and line
list l display source code
list main l main show code for specific function
quit q exit gdb

next vs step: next skips over function calls, step goes inside them.


Common Mistakes in C

// 1. semicolon after function body
int foo(){ }; // ❌ unnecessary (structs need it, functions don't)
int foo(){ }  // ✅

// 2. forgetting & in scanf
scanf("%d", x);   // ❌ crashes
scanf("%d", &x);  // ✅

// 3. message inside scanf
scanf("enter: %d", &x);  // ❌ message ignored, confusing
printf("enter: ");
scanf("%d", &x);          // ✅

// 4. wrong width specifier
printf("%6s", str);   // ❌ if you want truncation
printf("%.6s", str);  // ✅ truncates to 6 chars

// 5. casting argv instead of parsing
(int)argv[1]      // ❌ casts memory address → garbage
atoi(argv[1])     // ✅ parses string to integer

// 6. forgetting rewind after counting lines
while(fgets(...))count++;   // file pointer now at EOF
// must call rewind(file) before reading again!

// 7. not closing file before early return
if(n > count){
    return 1;           // ❌ file never closed!
    fclose(file);
    return 1;           // ✅ close before return
}

// 8. sprintf instead of snprintf
sprintf(buf, "%s", str);              // ❌ dangerous, buffer overflow
snprintf(buf, sizeof(buf), "%s", str); // ✅ safe

// 9. using *o instead of *.o in makefile clean
rm -rf *o hello    // ❌ deletes anything ending in letter o
rm -f *.o hello    // ✅ only deletes .o files

// 10. forgetting free
int* p = malloc(sizeof(int) * 40);
// ... use p ...
// forgot free(p) → memory leak ❌
free(p);  // ✅ always free

// 11. using realloc directly on original pointer
ids = realloc(ids, new_size);   // ❌ if fails, ids becomes NULL, original lost
int* temp = realloc(ids, new_size);  // ✅ safe pattern
if(temp) ids = temp;

C vs C++ Quick Comparison

Feature C C++
bool type needs <stdbool.h> built in
malloc return auto converts void* must cast (int*)malloc(...)
static on function file scope only file scope OR class member
Standard library <stdio.h>, <stdlib.h> also <iostream>, std::string etc
Compile with gcc g++
String type char* arrays manually std::string

C++ is a superset of C — all C code is valid C++, but C++ adds classes, templates, STL, and more. Use g++ to compile C++ source code.


Naming Reference — C Functions

Name Stands For
argc Argument Count
argv Argument Values (Vector)
printf Print Formatted
scanf Scan Formatted
fprintf File Print Formatted
fgets File Get String
fopen File Open
fclose File Close
malloc Memory Allocate
realloc Re-Allocate
calloc Clear Allocate (zeroed)
atoi ASCII To Integer
snprintf String N Print Formatted
strlen String Length
strcpy String Copy
strcmp String Compare
strcat String Concatenate
strchr String Character (find)
strstr String String (find substring)
strdup String Duplicate
memset Memory Set
memcpy Memory Copy
rewind Rewind (back to start)
ftell File Tell (current position)
fseek File Seek (jump to position)
apt Advanced Package Tool
gcc GNU C Compiler
gdb GNU Debugger

🔧 Operating Systems — Process Management

A deep-dive into UNIX process management, system calls, virtual memory, scheduling, and file handling under the hood. Built from real questions, common confusions, and first-principles thinking.


📚 Table of Contents

  1. What is a Process?
  2. Program vs Process
  3. Process Memory Layout
  4. Static Variables and the Data Segment
  5. Virtual Memory — Why It Exists
  6. Page Tables
  7. Process States in UNIX
  8. Process Creation — 4 Ways
  9. Process Termination
  10. PCB — Process Control Block
  11. System Calls — The Bridge to the Kernel
  12. fork() — Cloning a Process
  13. wait() — Synchronizing with Children
  14. exit() — Terminating a Process
  15. execl() — Replacing a Process
  16. nice() — Scheduling Priority
  17. Orphan and Zombie Processes
  18. The UNIX Scheduler
  19. ps and top Commands
  20. fopen Under the Hood
  21. fork() + Files — What Gets Shared
  22. fork() + Heap Memory — COW
  23. Lab Code Walkthroughs
  24. Common Mistakes in Process Management
  25. Naming Reference — OS Functions

29. What is a Process?

A process is a running instance of a program. The OS keeps track of every process using a process table — each process gets a unique PID (Process ID).

Program on disk (passive)   →   Process in RAM (active)
exe file sitting there      →   Running instance with memory, state, identity

Each process has 5 components:

Component What it holds
Code The instructions to execute
Data Global and static variables
Stack Temporary data, function calls, local variables
User Area Open files, signal handlers, CPU info
Page Table Virtual → physical memory translation map

30. Program vs Process

Common confusion: "Aren't a program and a process both just code?"

No. A program is passive — just bytes sitting on disk. A process is the program brought to life in RAM with real resources.

Program = recipe (just text, does nothing)
Process = actually cooking (uses stove, ingredients, time)

Why you need a process and can't just "run the code" directly:

The code alone has no context. A process gives the code:

  • Its own memory space — where variables live
  • A stack — to track function calls
  • A state — running? sleeping? waiting?
  • An identity — PID, owner, permissions
  • Resources — open files, I/O, signals

The same program can become multiple processes:

chrome.exe on disk = one file
Open Chrome 3 times = 3 separate processes
Each with own tabs, memory, state
All from the same single program file

Does the process edit the original program on disk?

Never. The OS copies needed parts into RAM and works entirely from that copy. The disk file stays untouched — it's read-only from the process's perspective.

If you change a username in an app and it affects other sessions — that's not the process editing the code. It's the process writing to a shared database or file on disk. The code itself is never touched. Other processes see the change because they all read from the same data source.


31. Process Memory Layout

Process Virtual Memory:
┌─────────────────┐  ← high address
│   Stack         │  temporary, grows downward
│   (local vars)  │  dies when function returns
├─────────────────┤
│   Heap          │  dynamic (malloc), grows upward
├─────────────────┤
│   Data Segment  │  global + static variables
│                 │  lives entire program lifetime
├─────────────────┤
│   Code Segment  │  the executable instructions
└─────────────────┘  ← low address

Stack vs Heap vs Data:

int globalCounter = 0;    // DATA segment — lives entire program
static int y = 10;        // DATA segment — lives entire program

int main() {
    int x = 5;            // STACK — dies when main() returns
    int* p = malloc(100); // HEAP  — lives until free() is called
}

User Area contains:

  • Which files are currently open (open file table)
  • Signal handling rules (what to do on kill, alarm, etc.)
  • CPU register values (saved when process is switched out)

Page Table maps virtual addresses → physical RAM addresses (explained in section 34).


32. Static Variables and the Data Segment

Common confusion: "I put int x = 5 inside main() — why is it on the stack not the data segment?"

It's not about where in the file you wrote it. It's about lifetime.

int globalCounter = 0;    // data segment — lives entire program

int main() {
    int x = 5;            // STACK — temporary, dies when main returns
    static int y = 10;    // DATA  — lives entire program despite being inside main!
}

The static keyword forces a variable into the data segment regardless of where it's declared.

Why static exists — the problem it solves:

// WITHOUT static — resets every call:
void countClicks() {
    int counter = 0;   // created fresh every call
    counter++;
    printf("%d", counter);  // always prints 1!
}

// WITH static — remembers between calls:
void countClicks() {
    static int counter = 0;  // created once, stays alive
    counter++;
    printf("%d", counter);  // prints 1, 2, 3...
}

Static vs Global:

Global: anyone anywhere can access and accidentally modify it ❌
Static: persists like a global BUT locked to its own scope ✓
int counter = 0;  // global — any function can reset this!

void someOtherFunction() {
    counter = 0;  // oops, accidentally reset it
}

Static gives you persistence with protection — the best of both worlds.

Static in classes (C++):

class Player {
public:
    static int playerCount = 0;  // shared across ALL instances
    Player() { playerCount++; }
};

Player p1;  // playerCount = 1
Player p2;  // playerCount = 2
// same concept — one value living for entire program lifetime

The rule: static = "I need a permanent apartment, not a temporary Airbnb room." Goes to data segment, regardless of where you declared it. But the scope (who can access it) stays local.


33. Virtual Memory — Why It Exists

Common question: "Why can't I just give each process real physical addresses directly?"

Early computers did exactly that — and it was a disaster. Here's why virtual memory was invented:

Problem 1 — Security

Without virtual memory:
Process 1 (bank app) at address 0x001
Process 2 (virus) just does:
int* steal = (int*)0x001  → reads bank app memory directly!

With virtual memory, each process only sees its own fake address space. The OS controls the page table — no process can reach another's memory. It's physically impossible at the hardware level.

Problem 2 — Programs need to be compiled to specific addresses

Without virtual memory, every program would need to know at compile time exactly where in RAM it will be loaded. Impossible because:

Today you run: Chrome + VSCode + Spotify
Tomorrow: Chrome + Discord + Game
Next day: just VSCode

Different combinations → different available spaces →
you'd have to recompile every program every time!

With virtual memory, every program compiles assuming it starts at address 0. The page table maps it to wherever RAM is free.

Problem 3 — Fragmentation

RAM without virtual memory:
├── 0x000 - 0x100  USED
├── 0x100 - 0x200  FREE (100 units)
├── 0x200 - 0x400  USED
├── 0x400 - 0x500  FREE (100 units)
└── 0x500 - 0x600  FREE (100 units)

New process needs 250 units of CONTIGUOUS memory → FAILS!
Even though 300 units are free total.

With pages, free frames can be scattered anywhere but still appear contiguous to the process.

Problem 4 — RAM is too small

Without virtual memory:
RAM = 4GB, Program = 6GB → simply cannot run

With virtual memory + swapping:
OS loads only needed pages into RAM
Pushes unused pages to disk
Program runs fine!

The key insight

The process NEVER knows its physical address.
        ↓
OS sits in the middle controlling everything through the page table.
        ↓
A process has NO mechanism to ask "what is my physical address?"
        ↓
The CPU itself enforces this at hardware level.

Virtual memory in one sentence: Every process lives in a completely isolated world controlled entirely by the OS, with no escape route to physical memory whatsoever.


34. Page Tables

The problem: Process thinks linearly. RAM is scattered.

int arr[3];
arr[0]  // address 100
arr[1]  // address 104  (just +4)
arr[2]  // address 108  (just +4)

The CPU calculates next address by simple addition. It has no idea how to jump around scattered physical locations. Virtual memory creates the illusion of contiguity.

How pages work:

Virtual Memory          Physical RAM
┌──────────┐            ┌──────────┐
│ Page 1   │ ────────→  │ Frame 5  │
├──────────┤            ├──────────┤
│ Page 2   │ ────────→  │ Frame 2  │
├──────────┤            ├──────────┤
│ Page 3   │ ────────→  │ Frame 9  │
└──────────┘            └──────────┘

- Virtual memory divided into pages
- Physical RAM divided into frames
- Page table maps pages → frames
- Pages don't need to be contiguous in RAM!

Dynamic memory with page tables:

Process needs MORE memory:
OS finds any free frames anywhere in RAM
Adds new virtual pages → mapped to those frames
Process sees contiguous virtual addresses
Nobody else affected

Process needs LESS memory:
OS unmaps pages
Frames returned to free pool
Available for other processes immediately

Swapping — when RAM is full:

RAM is full, process needs more
        ↓
OS finds a page not used recently
Moves it from RAM → disk
That frame is now free
        ↓
When process needs that page back:
OS loads from disk → any free frame
Updates page table
Process has no idea this happened!

The process's perspective:

"I have pages 1, 2, 3, 4, 5"
"They all feel contiguous to me"
"Some might be in RAM, some on disk — I don't care"

Reality: completely scattered, some on disk, OS juggling everything

Virtual memory in one line: "Don't worry about the physical place — here are some contiguous virtual pages and I'll handle them."


35. Process States in UNIX

fork()
  ↓
Ready/Runnable  ←─────────────────────┐
  ↓ dispatched                         │ preempted
Running ─────────────────────────────→─┘
  │
  ├── Blocked/Waiting  (sleeping, waiting for event)
  │         ↓ event occurs
  │       Ready again
  │
  └── Exited/Terminated
            ↓
          Zombie ──── cleaned by wait() ────→ gone
            ↓ if parent dies first
          Orphan
            ↓ reparented to init (PID 1)
          init calls wait() → cleaned
State Meaning
Running Currently using the CPU
Ready Could run anytime, waiting for CPU
Sleeping Waiting for an event (like I/O or wait())
Stopped Frozen by a signal
Zombie Finished but exit code not collected by parent yet
Orphan Parent died before child finished

36. Process Creation — 4 Ways

1. System initialization
   └── OS boots → creates init (PID 1) + system services
       before you even see the desktop

2. fork() system call
   └── a running process clones itself
       most common in UNIX

3. User request
   └── you double-click an app or type ./myprogram
       OS creates a process for it

4. Batch job
   └── scheduled task runs automatically at set time
       like a cron job — nobody clicked anything

What happens internally for any creation:

1. Create new PCB entry → assigns unique PID
2. Allocate memory → sets up code, stack, data, user area, page table
3. Copy parent info (if forked) → child inherits parent's environment
4. Add to process table → scheduler can now see it
5. Return PID → parent gets child PID, child gets 0

Process hierarchy:

init (PID 1)              ← created at boot, no parent
├── bash (PID 14)         ← your terminal
│   ├── chrome (PID 100)
│   └── myprogram (PID 101)
│       └── child (PID 102)
└── system services

Every single process traces back to init (PID 1). It's the only process with no parent.


37. Process Termination

4 ways a process can end:
├── Normal exit      → return 0 from main, or exit(0)
├── Error exit       → exit(1) or some non-zero code
├── Fatal error      → segfault, divide by zero, unhandled signal
└── Killed           → kill -9 <pid> from another process

38. PCB — Process Control Block

The PCB is a data structure the OS keeps for every process — the process's complete profile.

PCB for Process 101:
├── PID          → 101
├── PPID         → 100 (parent)
├── State        → running/sleeping/zombie
├── CPU registers → exactly where it was when switched out
├── Program counter → which instruction is next
├── Stack pointer → where its stack is
├── Page Table   → virtual → physical memory map
├── Open files   → which files it has open
├── Signal handlers → what to do on signals
└── Priority/nice → how much CPU time to give it

Why PCB is essential — context switching:

Chrome running on CPU
    ↓
Scheduler: "time's up, Spotify's turn"
    ↓
OS saves Chrome's ENTIRE state to Chrome's PCB:
├── which instruction was executing
├── all register values
└── stack pointer
    ↓
Load Spotify's state from Spotify's PCB
    ↓
Spotify runs as if nothing happened
    ↓
... repeats hundreds of times per second

Without PCB the OS would have nowhere to save state — like waking up with amnesia every time a process gets CPU back.

fork()   → creates new PCB
wait()   → removes child's PCB after it exits
exit()   → marks PCB as zombie until parent collects
nice()   → updates priority field in the PCB
ps/top   → reads from the process table (all PCBs)

Analogy: PCB is a hospital patient's file. Without it the doctor (OS) has no idea what was happening and has to start from scratch every time.


39. System Calls — The Bridge to the Kernel

Processes run in User Mode — restricted, can only access their own data. But sometimes they need to do things only the kernel can do (create processes, read files, allocate memory).

A system call is the formal mechanism to ask the kernel for help:

Process (user mode)
    ↓
"hey kernel, I need something"  ← system call
    ↓
CPU switches to kernel mode
Kernel validates the request
Kernel does the privileged work
CPU switches back to user mode
Result returned to process

Analogy: You can't go behind the bank counter yourself. You fill out a form (system call), the teller (kernel) does the work behind the barrier, and hands you back the result.

Why system calls are safe:

You (process)  → never enter kernel mode yourself
Kernel         → validates BEFORE doing anything
CPU hardware   → physically enforces the boundary
Result         → either success or safe -1 (failure)

System call vs normal function:

Normal function:  runs inside your process memory, user mode
System call:      crosses into kernel mode, privileged work
                  fork, wait, execl, exit, nice, open, read...

40. fork() — Cloning a Process

fork() creates an exact copy of the current process. One process goes in, two come out.

pid_t pid = fork();

if(pid == -1) {
    // fork FAILED — no child created
} else if(pid == 0) {
    // I am the CHILD — fork returned 0 to me
} else {
    // I am the PARENT — fork returned child's actual PID
}

Common confusion: "Shouldn't the child's PID be 0?"

No! The 0 is just fork's return value — a flag saying "you are the child." The child's actual PID is still a real number. getpid() gives the real PID.

if(pid == 0) {
    printf("%d", pid);      // prints 0  (fork's return value, just a flag)
    printf("%d", getpid()); // prints 101 (actual real PID!)
}

What gets copied:

Parent Process:              Child Process (copy):
├── Code          →          ├── Code          (same)
├── Stack         →          ├── Stack         (same values)
├── Data          →          ├── Data          (same values)
├── Heap          →          ├── Heap          (independent copy via COW)
├── User Area     →          ├── User Area     (same)
├── Page Table    →          ├── Page Table    (own copy)
└── PID = 100     →          └── PID = 101     (different!)
                                 PPID = 100    (parent's PID)

After fork — completely independent:

int x = 10;
pid_t pid = fork();

if(pid == 0) {
    x = 999;           // child changes x
    printf("%d", x);   // prints 999
} else {
    printf("%d", x);   // prints 10! parent unaffected
}

Variables declared before fork are already in the child's memory — fork copies the entire snapshot at that moment. The child wakes up with everything already there, not a blank slate.

Why fork exists:

// Web server pattern — handles multiple users simultaneously:
while(1) {
    wait_for_connection();

    if(fork() == 0) {
        handle_this_user();  // child handles ONE user
        exit(0);
    }
    // parent loops back immediately to accept next user
}
Without fork: User 1 → wait → User 2 → wait → User 3 (sequential, unusable)
With fork:    User 1 → child 1
              User 2 → child 2    ← all simultaneously!
              User 3 → child 3

Forking in a loop for parallel work:

pid_t children[n];

// spawn all workers
for(int i = 0; i < n; i++) {
    children[i] = fork();
    if(children[i] == 0) {
        do_work(chunk[i]);
        exit(result);
    }
}

// collect results IN ORDER using waitpid
for(int i = 0; i < n; i++) {
    waitpid(children[i], &status, 0);  // wait for SPECIFIC child
    results[i] = WEXITSTATUS(status);
}

Never put waitpid inside the fork loop — it makes everything sequential. Fork all first, then collect. The whole point is getting all workers running simultaneously before collecting.

fork() is not dangerous because:

fork() only:
├── copies YOUR OWN process   → no other process touched
├── allocates new memory      → kernel controls safely
├── creates new PCB           → kernel owns process table
└── returns two integers      → harmless

41. wait() — Synchronizing with Children

wait() makes the parent block until a child finishes, then collects its exit code and cleans up its PCB entry.

pid_t var_1 = wait(&var_2);
// var_1 = PID of child that finished
// var_2 = raw status (contains exit code packed in)

What happens internally:

parent calls wait()
        ↓
kernel blocks parent: "sleep until a child finishes"
        ↓
child calls exit(42)
        ↓
kernel wakes parent: "your child just finished!"
        ↓
kernel removes child's PCB from process table
        ↓
returns to parent:
├── child's PID → var_1
└── exit status → var_2

The status byte structure:

var_2 (raw integer from wait):
[  exit code (bytes 1-3)  ]  [  how it exited (byte 4)  ]

byte 4 = 0 → exited normally (called exit() or return)
byte 4 ≠ 0 → killed by signal

stat_loc & 0x00FF  → isolates byte 4 (check if normal exit)
stat_loc >> 8      → extracts exit code (throws away byte 4)

Macros that do the same thing:

// manual:
if(!(stat_loc & 0x00FF))
    printf("%d", stat_loc >> 8);

// using macros (identical result):
if(WIFEXITED(stat_loc))
    printf("%d", WEXITSTATUS(stat_loc));

WIFEXITED = "Wait — If — Exited normally?" → yes/no WEXITSTATUS = "Wait — Exit — Status" → the actual code

Always check WIFEXITED before WEXITSTATUS — no point reading exit code if process was killed by signal.

wait() handles both cases:

Child still running when wait() called:
→ parent blocks until child finishes

Child already finished before wait() called:
→ OS saved the exit code in the zombie
→ wait() returns IMMEDIATELY with saved status
→ no blocking needed!

waitpid() — wait for specific child:

waitpid(child2, &status, 0);  // only waits for child2
// child1 and child3 finishing → parent ignores them
// child2 finishes → parent wakes up!

Without wait — zombie problem:

Child finishes
        ↓
OS: "hold on, parent might need your exit code"
        ↓
Keeps child's PCB in process table as ZOMBIE
        ↓
Parent never calls wait() → zombie stays forever
        ↓
Process table fills up → system slows → eventually crashes

Zombie vs Orphan:

  • Zombie = child died, parent hasn't called wait() yet
  • Orphan = parent died, child still running → reparented to init

When parent dies with unresolved zombies:

Parent dies
        ↓
init (PID 1) inherits all zombies
        ↓
init automatically calls wait() → cleaned up!

Zombies are only dangerous in long-running processes (servers) that fork many children but never call wait() — zombies accumulate until the process table is full.


42. exit() — Terminating a Process

exit(42);  // sends 42 as exit code to parent

What exit() does:

1. Runs atexit() handlers
2. Flushes stdio buffers
3. Closes all open file descriptors
4. Releases all memory
5. Sends SIGCHLD to parent
6. Passes exit code to parent via wait()
7. Removes from process table (or becomes zombie if parent not listening)

_exit() vs exit(): _exit() skips steps 1-2 (no buffer flushing, no atexit handlers). Use in child after fork to avoid double-flushing stdio buffers.

Exit code limitation:

exit code is only 8 bits → max value 255!

exit(300)           → truncated to 44  (300 % 256 = 44)
WEXITSTATUS = 44    → WRONG!

If your result can exceed 255, use pipes or shared memory instead of exit codes.


43. execl() — Replacing a Process

execl() replaces the current process with a completely different program. Not a copy — the process itself gets overwritten.

execl("/bin/ps", "ps", "-e", NULL);
//     path       name  flags  end
Parameter Meaning
First Absolute path to binary on disk
Second Name of the program (usually same as binary name)
Middle params Any flags, each in "" separated by commas
Last Always NULL — marks end of arguments

What happens:

execl() called
        ↓
kernel finds /bin/ps on disk
        ↓
loads ps into THIS process's memory:
├── replaces code segment   ← your code is GONE
├── replaces data segment   ← your variables are GONE
└── replaces stack          ← your stack is GONE
        ↓
starts executing ps from the beginning

What stays the same:

REPLACED:                   KEPT:
├── code segment            ├── PID (same process!)
├── data segment            ├── PPID
├── stack                   ├── open files
└── heap                    └── user area

Never returns on success:

printf("before execl\n");
execl("/bin/ps", "ps", NULL);
printf("after execl\n");   // THIS NEVER PRINTS!
// because the process that would print it no longer exists

Only returns if it fails (wrong path, no permission) — returns -1.

This is why execl is always used WITH fork:

if(fork() == 0) {
    execl("/bin/ps", "ps", "-f", NULL);  // child gets replaced by ps
} else {
    wait(NULL);  // parent survives, waits for ps to finish
}
Without fork: your program → execl → your program GONE
With fork:    parent → survives
              child  → execl → replaced by ps → runs → dies
              parent → wakes from wait → continues

This is literally how your terminal runs every command:

bash (parent)
    ↓ fork()
child → execl("ls") → replaced by ls → runs → dies
    ↓ wait()
bash (parent) → ready for next command

44. nice() — Scheduling Priority

nice(10);   // lower priority by 10
nice(-5);   // raise priority by 5 (superuser only!)
nice value range:
-20 (highest priority, superuser only)
  0 (default, normal)
 19 (lowest priority)

higher nice = more "generous" = gives CPU to others = less CPU for you
lower nice  = more "selfish"  = takes CPU = more CPU for you

getpriority() — read current nice value:

#include <sys/resource.h>

getpriority(PRIO_PROCESS, 0)
// PRIO_PROCESS = get priority of a process
// 0 = this process itself
// returns current nice value

In ps -l output:

PID   PR   NI   CMD
100   20    0   process08   ← default
100   35   15   process08   ← after nice(15)

PR = PR_base + NI   (NI affects actual scheduling priority)

45. Orphan and Zombie Processes

Orphan

Parent dies before child finishes
        ↓
Child still running, parent is gone
        ↓
OS automatically reparents child to init (PID 1)
        ↓
init eventually calls wait() → cleans up properly

Detecting orphan in code:

sleep(3);
printf("parent PID: %d\n", getppid());
// if this prints 1 → you are an orphan!

Why orphans are a problem:

├── Nobody managing the child anymore
├── Resource leaks (open files, connections)
├── If child finishes → becomes zombie under init
├── Loss of control over running process
└── Accumulation can fill process table

Zombie

Child finishes
        ↓
Parent alive but not calling wait()
        ↓
OS keeps child's PCB as zombie:
├── code not running   → dead
├── memory freed       → dead
└── PCB still in table → NOT fully gone
    exit code preserved → waiting for parent
$ ps
PID    STATE    CMD
101    Z        myprogram   ← Z = ZOMBIE

Why zombie exists: OS is being cautious — doesn't want to throw away the exit code in case parent needs it.

Resolution:

Parent calls wait()     → zombie cleaned immediately
Parent never calls wait() → zombie stays until parent dies
Parent dies             → init inherits zombie → init cleans it

Key insight: A living but inattentive parent is worse than a dead parent. Dead parent → init takes responsibility. Alive parent → OS waits for parent to call wait(). If parent sleeps for 10 seconds first, child is a zombie for those 10 seconds.


46. The UNIX Scheduler

UNIX uses multi-level priority + Round Robin within each level.

Multi-level priority:

Priority 0  (highest) → critical system processes
Priority 10           → normal user processes
Priority 15  (lowest) → background tasks

Higher priority runs STRICTLY FIRST:
└── lower priority doesn't get CPU while higher is ready

Round Robin within same level:

Priority 10 queue:
Chrome    → runs 2ms → back of queue
Spotify   → runs 2ms → back of queue
VSCode    → runs 2ms → back of queue
Chrome    → runs 2ms → ...repeats

Each gets a fair time slice. Nobody starves.

Simultaneous or sequential?

Different priority levels → sequential (higher goes first)
Same priority level       → simultaneous illusion (round robin)

In reality, processes at similar priorities all feel simultaneous because the switching happens hundreds of times per second — faster than human perception.


47. ps and top Commands

$ ps          # only YOUR processes in current terminal (static snapshot)
$ ps -e       # ALL processes on entire system (static snapshot)
$ ps -f       # full detailed view with extra columns
$ ps -ef      # full view of everything

$ top         # ALL system processes, live updating view

ps columns:

Column Meaning
PID Process ID
PPID Parent's PID
TTY Which terminal it's running in
TIME CPU time consumed
CMD Command that started it
NI Nice value (from ps -l)
STATE R=running, S=sleeping, Z=zombie

top shows extra:

%CPU  → how much CPU right now
%MEM  → how much RAM using
Tasks summary → total, running, sleeping, stopped, zombie count

ps = photo, top = live video

Most common real-world use:

# something is slow
$ top                    # find what's eating CPU

# kill a specific process
$ ps -e                  # find its PID
$ kill -9 <PID>          # force kill

# check if your program is running
$ ps -e | grep myprogram # filter output

48. fopen Under the Hood

When you call fopen("file.txt", "r"), two separate objects are created:

FILE struct (userspace)

Lives on your heap. Created by fopen(), returned as FILE*.

// roughly what FILE looks like:
struct _IO_FILE {
    int fd;              // the kernel fd integer (e.g. 3)
    char buffer[8192];   // userspace buffer (8KB)
    char* buf_pos;       // where we are in the buffer
    int buf_level;       // how full the buffer is
    int flags;           // eof, error, etc.
};

This is what adds buffering on top of raw file access. Actual syscalls only fire when the buffer fills/empties — not on every character.

struct file (kernel space)

Lives in kernel memory. You never touch it directly — only through syscalls.

struct file contains:
├── offset/cursor    ← current read/write position
├── open flags       ← O_RDONLY, O_APPEND, etc.
├── reference count  ← how many fds point to it
└── pointer to inode ← the actual file on disk

inode (filesystem/disk)

The file itself. Exists on disk whether anyone has it open or not.

inode contains:
├── file size
├── permissions (rwx)
├── owner (UID, GID)
├── timestamps
└── pointers to data blocks on disk

The full hierarchy:

Your code
   │
   ↓
FILE* f  ──→  [FILE struct — your heap]
                  fd = 3
                  buffer = [...]
                       │ syscall (read, write, lseek)
                       ↓
              [struct file — kernel memory]   ← you never touch this
                  offset = 512
                  refcount = 1
                       │
                       ↓
              [inode — filesystem on disk]
                  size, permissions, data blocks

fd — file descriptor

Just an integer index into the process's fd table. 0, 1, 2 are always stdin, stdout, stderr. Every open() gives you the next available integer.

fd = 3  →  just a number
           OS uses it as a key to look up the struct file in kernel

fclose() — what it does:

1. Flushes stdio buffer → write() syscall (unwritten data saved)
2. Calls close(fd) → kernel decrements refcount on struct file
3. Frees the FILE struct from heap
4. If refcount hits 0 → kernel frees struct file
   (inode stays alive on disk regardless)

Reference counting:

fopen()       → refcount = 1
fork()        → refcount = 2  (parent + child both hold fd)
parent close  → refcount = 1
child close   → refcount = 0  → struct file freed

49. fork() + Files — What Gets Shared

The critical difference between memory and files after fork:

Memory (heap, stack, etc.) → COPIED (independent per process via COW)
FILE* / file descriptor    → SHARED (same struct file in kernel)

What happens to a file opened before fork:

FILE* file = fopen("data.txt", "r");  // opened BEFORE fork
fork();
Parent FILE struct → fd 3 ──→ ┐
                               ├── SAME struct file (kernel)
Child  FILE struct → fd 3 ──→ ┘    shared cursor! refcount = 2

The FILE struct is COPIED (it's on your heap) but the struct file (kernel object) is NOT copied — just refcount incremented. The cursor is in the kernel object — so it's shared.

The shared cursor problem:

// file has: "Hello World"
// both parent and child have same cursor at position 0

if(pid == 0) {
    fgets(buffer, 5, file);  // child reads "Hello", cursor → 5
}
else {
    fgets(buffer, 5, file);  // parent reads "World"! cursor was already at 5!
}

Solutions:

// Option 1: open file AFTER fork (each gets independent struct file)
fork();
if(pid == 0) { FILE* f = fopen("data.txt", "r"); }

// Option 2: close unused copy immediately after fork
fork();
if(pid == 0) {
    fclose(file);    // child doesn't need it → close
    // do other work
}

Why your code works despite the shared cursor:

Parent reads file BEFORE fork each iteration:
    fgets() → parent fills orders_arr
    THEN fork happens
    Child inherits cursor position BUT never reads file
    Child just closes its reference (refcount decrements)
    Parent continues reading next chunk normally

The parallelism is in computation, not I/O. Parent serializes all I/O, children only work on already-read data.

fclose() in child — do you need it?

Calling fclose() in child:
└── decrements refcount → from 2 to 1
    parent's reference unaffected
    file still open for parent ✓

NOT calling fclose() in child:
└── exit() handles it anyway → refcount decremented on exit
    no real difference

Best practice: In child processes that exit immediately, you don't need to manually fclose(). exit() handles all cleanup. Use _exit() instead of exit() in children after fork to avoid double-flushing stdio buffers.


50. fork() + Heap Memory — COW

Copy-on-Write (COW) — fork doesn't actually copy all memory immediately. It's expensive. Instead:

fork() called:
parent page → [physical page A]  ← both point here, read-only
child page  → [physical page A]

child writes to a variable:
parent page → [physical page A]  ← parent unchanged
child page  → [physical page B]  ← kernel copied ONLY this page

Forking is cheap until you start writing. Pages that neither process writes to are never duplicated — they stay as one shared physical page.

For malloc'd memory:

int* ptr = malloc(sizeof(int));
*ptr = 42;
fork();
After fork:
Parent virtual 0x500 → page table A → physical frame A (value: 42)
Child  virtual 0x500 → page table B → physical frame A (value: 42) ← same!

Child writes *ptr = 99:
→ COW triggers
→ kernel copies frame A to frame B
→ Child  virtual 0x500 → frame B (value: 99)
→ Parent virtual 0x500 → frame A (value: 42) ← unaffected!

Do you need to free() in the child?

Child never writes to malloc'd memory:
→ COW never triggers
→ one shared physical page (no duplication)
→ no need to free (exit() handles it)

Child writes to malloc'd memory:
→ COW triggers → physical page duplicated
→ child owns its own copy
→ theoretically should free, but exit() handles it anyway

Best practice: just exit() — let the OS clean up
Don't call free() in child — you're paying COW cost for zero benefit
(free() modifies heap metadata → triggers COW on that page → pointless copy)

free() — what it actually does:

1. Marks block as available in heap allocator (userspace, virtual)
2. May eventually call munmap() to release physical pages to kernel
   (or it might not — allocator often keeps pages for reuse)

So free() is primarily a userspace operation.
Physical memory release is a side effect that may or may not happen immediately.

51. Lab Code Walkthroughs

process01.c — fork basics + variable independence

int x = 3;
pid = fork();
// child:  x = 7  (only in child's memory)
// parent: x = 19 (only in parent's memory)
// proves: complete memory independence after fork

sleep(1) in parent — gives child time to finish so output isn't mixed up.

process02.c — orphan demonstration

// child sleeps 10s, parent has nothing to do and exits
sleep(10);
printf("parent PID: %d\n", getppid());  // prints 1! reparented to init

process03.c — wait + exit code collection

sid = wait(&stat_loc);
if(!(stat_loc & 0x00FF))              // byte 4 = 0? (normal exit)
    printf("%d", stat_loc >> 8);       // extract exit code (bytes 1-3)

process04.c — custom exit code

exit(42);   // child sends specific exit code
// parent reads: stat_loc >> 8 = 42

process05.c — zombie demonstration

// REVERSED timing:
// child finishes immediately → becomes zombie
// parent sleeps 10 seconds → not listening!
// zombie visible in ps for 10 seconds
// parent wakes → calls wait() → zombie cleaned
sleep(10);
pid = wait(&stat_loc);

process06.c — execl demonstration

if(pid == 0) {
    execl("/bin/ps", "ps", "-e", NULL);  // child replaced by ps
    // child never reaches final printf — its code is gone!
}
// only parent reaches: printf("PID %d terminated\n")

process07.c — infinite loops + kill experiment

while(1) {}  // both parent and child loop forever
// purpose: observe with ps/top, practice kill command
// kill parent → child becomes orphan (PPID changes to 1)
// kill child → disappears from ps

process08.c — nice + priority

getpriority(PRIO_PROCESS, 0)  // read current nice value
system("ps -l");               // show NI column in ps
nice(5);                       // child: slightly lower priority
nice(15);                      // parent: much lower priority
while(1) {}                    // stay alive for observation
system("ps -l") output:
PID   NI   CMD
100    0   process08   ← before nice
100   15   process08   ← after nice(15) — much less CPU
101    5   process08   ← after nice(5)  — slightly less CPU

52. Common Mistakes in Process Management

// 1. forgetting wait() → zombie accumulation
fork();
// parent does nothing after fork
// child dies → zombie stays forever ❌
// fix: always call wait() or waitpid() ❌

// 2. wait() inside fork loop → kills parallelism
for(int i = 0; i < n; i++) {
    fork();
    wait(&status);  // ❌ blocks until child done before next fork
}
// fix: fork loop first, wait loop second ✓

// 3. exit code truncation for large values
exit(300);                    // ❌ truncated to 44 (300 % 256)
WEXITSTATUS(status) == 44;    // wrong answer!
// fix: use pipes or shared memory for values > 255

// 4. reading file in both parent and child (shared cursor)
FILE* f = fopen("file.txt", "r");
fork();
if(pid == 0) fgets(buffer, 10, f);  // moves shared cursor ❌
// parent's next read is at wrong position!
// fix: open file after fork, or only read in one process

// 5. freeing in child unnecessarily
fork();
if(pid == 0) {
    free(ptr);    // ❌ triggers COW for no benefit
    exit(0);      // exit() handles cleanup anyway
}

// 6. execl without fork → your program is gone
execl("/bin/ps", "ps", NULL);  // ❌ your process is replaced!
// fix: always fork first, execl in child

// 7. not checking execl return
execl("/bin/ps", "ps", NULL);
// if we reach here → execl failed!
// always add error handling after execl

// 8. forgetting NULL at end of execl
execl("/bin/ps", "ps", "-e");      // ❌ missing NULL → undefined behavior
execl("/bin/ps", "ps", "-e", NULL); // ✅

// 9. using exit() instead of _exit() in child after fork
// exit() flushes stdio buffers → double flush if parent also exits!
_exit(0);  // ✅ skips buffer flush in child

53. Naming Reference — OS Functions

Name Stands For
fork() Fork (split into two)
wait() Wait for child state change
waitpid() Wait for specific PID
execl() Execute (list of args)
exit() Exit process
_exit() Exit directly (no cleanup)
getpid() Get Process ID
getppid() Get Parent Process ID
getpgrp() Get Process Group ID
nice() Adjust nice value (priority)
getpriority() Get current priority
sleep() Suspend for N seconds
kill() Send signal to process
fopen() File Open
fclose() File Close
fgets() File Get String
WIFEXITED() Wait — If — Exited normally?
WEXITSTATUS() Wait — Exit — Status (the code)
perror() Print Error (with system message)
system() Run shell command from C
mmap() Memory Map (shared memory)
PID Process ID
PPID Parent Process ID
PCB Process Control Block
COW Copy On Write
fd File Descriptor
NI Nice value (ps column)
PR Priority (ps column)
TTY Teletype (terminal)

END OF OS PROCESS MANAGEMENT SECTION 🎓

This section covers UNIX process management from first principles — built through real questions, common confusions, and deep dives into how everything works under the hood. From virtual memory to fork/exec/wait, from zombies to COW, from scheduling to file descriptors.

END OF GUIDE 🎓

This comprehensive guide covers everything from the foundational concepts of how Linux works to advanced bash scripting techniques, common pitfalls, and best practices. Keep it as a reference for your computer engineering journey!

About

A deep-dive guide into the Linux ecosystem, from terminal mechanics and state-machine regex to advanced Bash scripting. This reference bridges the gap between basic commands and the underlying C-based logic of the shell. Perfect for developers looking to master automation, text processing, and system-level operations.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages