Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 6 additions & 4 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -1,15 +1,17 @@
FROM debian:bookworm-slim
FROM debian:bookworm-slim AS builder

RUN apt-get update && apt-get install -y --no-install-recommends \
build-essential \
make \
&& rm -rf /var/lib/apt/lists/*

WORKDIR /app
WORKDIR /src
COPY . .

RUN make all

EXPOSE 5555
FROM debian:bookworm-slim
WORKDIR /app
COPY --from=builder /src/bin ./bin

EXPOSE 5555
CMD ["./bin/pcc_server", "5555"]
21 changes: 21 additions & 0 deletions LICENSE
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
MIT License

Copyright (c) 2026 odeliyach

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
33 changes: 21 additions & 12 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,27 +1,33 @@
# Printable Characters Counter (PCC)

## Read This First
<a href="https://github.com/odeliyach/Network-Infrastructure-C/actions"><img src="https://github.com/odeliyach/Network-Infrastructure-C/actions/workflows/ci.yml/badge.svg"></a>
<a href="https://github.com/odeliyach/Network-Infrastructure-C/blob/main/LICENSE"><img src="https://img.shields.io/badge/License-MIT-blue.svg"></a>

## 🚀 Read This First
- The authoritative project compass is [`docs/instructions_network.txt`](docs/instructions_network.txt). All development choices, error semantics, and protocol details align with that document.

## Architecture Overview
## 🧠 Architecture Overview
- **Role:** Implements a TCP printable-characters counter with deterministic client/server protocol flow.
- **Server (`pcc_server`)**: Accepts TCP sessions, counts printable bytes (ASCII 32–126) from each byte stream, persists aggregated frequency metrics, and emits statistics on atomic SIGINT handling.
- **Client (`pcc_client`)**: Streams a file over TCP using the mandated protocol, prints the printable-count returned by the server, and exits with zero on success.
- **Concurrency Model:** Iterative single-threaded loop with signal-aware acceptance; SIGINT processing is atomic relative to in-flight client handling as required by the assignment text.

## Protocol Specification
1. Client sends **N** (uint32, network byte order) representing the byte-stream length.
2. Client streams **N** bytes of payload.
3. Server responds with **C** (uint32, network byte order) representing printable-byte count.
4. Server maintains global printable-frequency counters and, on SIGINT, prints only characters with non-zero frequency in ascending ASCII order plus the number of successfully served clients. These outputs match the exact format strings in `instructions_network.txt`.
### 🧠 Protocol Diagram
```mermaid
flowchart LR
A[Client sends N (uint32, network byte order)] --> B[Server confirms receipt]
B --> C[Client streams N bytes]
C --> D[Server counts printable bytes]
D --> E[Server returns C (uint32, network byte order)]
```

## Core Networking Concepts
## 🧠 Core Networking Concepts
- **TCP Handshake:** The server binds to `INADDR_ANY`, listens with backlog `10`, and completes the three-way handshake before payload exchange. This guarantees reliable, ordered byte streams prior to protocol framing.
- **Network Byte Order (Endianness):** Length and count fields are serialized with `htonl`/`ntohl` to enforce big-endian wire representation across heterogeneous hosts.
- **PCC Algorithm:** Byte streams are ingested in bounded buffers (1 KiB) to avoid oversized allocations. Each byte is classified as printable or non-printable; printable bytes increment both per-connection and global frequency vectors. Only successful sessions contribute to `pcc_total`, ensuring TCP errors or premature disconnects do not contaminate statistics.
- **Robust Error Handling:** All critical system calls (`socket`, `bind`, `listen`, `accept`, `read`, `write`, `connect`) are validated. TCP-specific disconnect conditions (`ETIMEDOUT`, `ECONNRESET`, `EPIPE`) log and continue without mutating global counters; other failures are fatal with descriptive diagnostics.

## Build, Test, and Run
## 🛠️ Build, Test, and Run
- Toolchain: `gcc`, `make` (flags: `-Wall -Werror -O3 -std=c11 -D_POSIX_C_SOURCE=200809`).
- Build artifacts: `bin/pcc_server`, `bin/pcc_client`.
- Commands:
Expand All @@ -32,18 +38,21 @@
- `./bin/pcc_server 5555`
- `./bin/pcc_client 127.0.0.1 5555 ./path/to/file`

## Dockerized Execution
## 🐳 Dockerized Execution
1. Build: `docker build -t pcc .`
2. Run server: `docker run --rm -p 5555:5555 pcc ./bin/pcc_server 5555`
3. Run client (loopback to host or container IP): `docker run --rm --network host -v $(pwd):/data pcc ./bin/pcc_client 127.0.0.1 5555 /data/yourfile`

## CI/CD
## 🔄 CI/CD
- GitHub Actions workflow `ci.yml` performs `make all` and `make test` on every push and pull request to enforce compilation hygiene and protocol smoke coverage.

## Repository Layout
## 🧭 Repository Layout
- `src/` — C sources (`pcc_server.c`, `pcc_client.c`).
- `docs/` — assignment and protocol instructions (`instructions_network.txt`).
- `bin/` — build outputs (generated).

## 🧠 Interview Prep
- Study the mentor-style guide in [`docs/INTERVIEW_PREP.md`](docs/INTERVIEW_PREP.md).

## Academic Policy
Educational Use Only. The logic here is unique and easily detectable by automated plagiarism tools. Use of this code in academic assignments is strictly prohibited.
41 changes: 41 additions & 0 deletions docs/INTERVIEW_PREP.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
# PCC Interview Prep Guide

## Elevator Pitch (30 seconds)
The Printable Characters Counter (PCC) is a TCP client/server pair that streams arbitrary files over a deterministic protocol defined in `instructions_network.txt`. The client sends a 32-bit length (network byte order), streams that many bytes, and receives a 32-bit printable-character count back. The server is an iterative, signal-aware TCP listener that tallies printable ASCII (32–126) per session, aggregates totals across sessions, and emits statistics atomically on `SIGINT`. It uses bounded buffers, strict return-value checking, and endian-safe serialization (`htonl`/`ntohl`) to remain correct across heterogeneous hosts.

## Deep-Dive Questions & Answers

1. **TCP/IP & Sockets — Why `AF_INET` + `SOCK_STREAM`, and what happens if the connection drops mid-transfer?**
We use IPv4 TCP streams to guarantee ordered, reliable delivery that matches the PCC framing (N length → payload → count). UDP would risk loss/reordering and force us to reimplement reliability. If a drop occurs (e.g., `ETIMEDOUT`, `ECONNRESET`, `EPIPE`), the server treats it as a TCP error, logs to `stderr`, aborts that session, and *does not* merge its partial histogram into `pcc_total`, preserving global correctness.

2. **Concurrency — How does the server handle multiple clients, and why this model?**
The server is intentionally iterative: `accept()` → handle one client fully → loop. This keeps signal handling simple (atomic with respect to in-flight client) as required by the spec. Alternatives like `select()`/`epoll()` or threading could increase throughput, but would complicate the guarantee that `SIGINT` snapshots include exactly the completed client’s stats while avoiding races on `pcc_total`.

3. **Data Integrity — Why is network byte order non-negotiable here?**
The length (N) and count (C) are serialized with `htonl`/`ntohl` to force big-endian wire format. Without this, a little-endian host would misinterpret the 32-bit fields, leading to truncated reads, oversized allocations, or wrong counts. Using network byte order makes the PCC protocol portable across architectures and aligns with the assignment’s explicit requirement.

4. **Error Handling — How do we handle partial reads/writes and `EINTR`?**
Both client and server wrap `read_exact`/`write_exact` loops that continue until the requested byte count is satisfied, retry on `EINTR`, and treat zero-length reads during payload as remote closure. This prevents short reads/writes from silently corrupting framing or counts. Not checking return values could leave length fields half-written, causing peers to block or mis-parse the stream.

5. **Signal Safety — How is `SIGINT` handled safely to print stats?**
`sigaction` installs a handler that sets a `sig_atomic_t` flag. The main loop checks the flag between clients; if set mid-request, it finishes the current client before breaking. Statistics are printed after the loop, ensuring `pcc_total` reflects only fully processed sessions and avoiding unsafe work inside the handler.

6. **Edge Cases — What are potential weak spots and how to improve them?**
- **Slowloris/pacing clients:** An attacker could drip bytes and occupy the only handler. Mitigate by adding timeouts (e.g., `SO_RCVTIMEO`) or moving to `select()`/`epoll()` with per-connection timers.
- **Large inputs:** We bound buffers to 1 KiB, but `N` could be large; adding chunk-level progress logging or capping `N` server-side can prevent long single-connection monopolization.
- **Backlog saturation:** With backlog 10 and iterative handling, bursts can drop connections. A worker pool or event loop would improve availability while still guarding `pcc_total` with synchronization.

7. **What happens if the client misreports N (sends fewer/more bytes)?**
If fewer bytes arrive, the server hits EOF before `N` bytes and aborts the session without updating totals. If the client tries to send more, the server only reads `N` bytes and then returns the count; excess bytes remain unread and the connection closes, preventing protocol poisoning.

## Key Terminology Cheat Sheet
- **Byte Stream (TCP), File Descriptor, System Calls (`socket`, `bind`, `listen`, `accept`, `connect`, `read`, `write`)**
- **Network Byte Order / Endianness (`htonl`, `ntohl`)**
- **Protocol Framing (length-prefix, bounded buffers)**
- **Printable ASCII (32–126) Histogram**
- **Graceful Shutdown vs. TCP Reset**
- **Signal Handling (`sigaction`, `sig_atomic_t`, `EINTR`)**
- **Partial Read/Write Mitigation, Backlog, `SO_REUSEADDR`**
- **Concurrency Models (Iterative vs. `select`/`epoll`, Worker Pool)**

Use these keywords to anchor answers, then explain the trade-offs (why iterative over threaded, why network byte order, why retry on `EINTR`) to show senior-level reasoning tied directly to the PCC protocol.
Loading