Skip to content

docs(networking): add network architecture overview#422

Open
IvanHunters wants to merge 2 commits intomainfrom
docs/network-architecture
Open

docs(networking): add network architecture overview#422
IvanHunters wants to merge 2 commits intomainfrom
docs/network-architecture

Conversation

@IvanHunters
Copy link
Contributor

@IvanHunters IvanHunters commented Feb 16, 2026

Summary

  • Add comprehensive network architecture documentation to the Networking section
  • Covers MetalLB (L2/BGP modes), Cilium eBPF (kube-proxy replacement), Kube-OVN (centralized IPAM), and tenant isolation with identity-based eBPF policies
  • All diagrams use Mermaid

Test plan

  • Verify page renders correctly on the Hugo site
  • Verify all Mermaid diagrams render properly

Summary by CodeRabbit

  • Documentation
    • Added comprehensive Network Architecture docs detailing the multi-layer networking stack: external load balancing, service load balancing, pod networking with centralized IPAM, kernel-level policy enforcement, tenant isolation policies, observability (optional), deployment variants, traffic flow diagrams, and YAML examples for configuration and namespace isolation.

Add comprehensive documentation covering the Cozystack networking
stack: MetalLB load balancing (L2 and BGP modes), Cilium eBPF as
kube-proxy replacement, Kube-OVN centralized IPAM, and tenant
isolation with identity-based eBPF policies. All diagrams use Mermaid.

Signed-off-by: ohotnikov.ivan <ohotnikov.ivan@e-queo.net>
@netlify
Copy link

netlify bot commented Feb 16, 2026

Deploy Preview for cozystack ready!

Name Link
🔨 Latest commit 29973a5
🔍 Latest deploy log https://app.netlify.com/projects/cozystack/deploys/699567b7234ea800086fe989
😎 Deploy Preview https://deploy-preview-422--cozystack.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @IvanHunters, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces new documentation that provides a detailed overview of the Cozystack cluster's network architecture. It clarifies how various components like MetalLB, Cilium eBPF, and Kube-OVN work together to manage external load balancing, internal pod networking, and robust tenant isolation. The document aims to enhance understanding of the system's networking capabilities and security enforcement mechanisms.

Highlights

  • New Documentation: Added a comprehensive network architecture overview document to the Networking section.
  • MetalLB Integration: Detailed the use of MetalLB for external load balancing, covering both Layer 2 (ARP) and BGP modes with illustrative diagrams.
  • Cilium eBPF as kube-proxy Replacement: Explained how Cilium eBPF replaces kube-proxy for efficient service load balancing and network policies, including a comparison with traditional iptables.
  • Kube-OVN for Pod Networking: Described Kube-OVN's role in centralized IPAM and overlay networking, highlighting its shared Pod CIDR model.
  • Tenant Isolation: Outlined the tenant isolation strategy using Kube-OVN and Cilium's identity-based eBPF policies, demonstrating how cross-tenant traffic is denied in-kernel.
  • Mermaid Diagrams: Incorporated numerous Mermaid diagrams throughout the document to visually represent network flows and architectural components.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • content/en/docs/v1/networking/architecture.md
    • Added new documentation file detailing the network architecture.
Activity
  • The pull request was created by IvanHunters, introducing new documentation for the network architecture.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 16, 2026

📝 Walkthrough

Walkthrough

Adds a new Network Architecture documentation file describing Cozystack's multi-layer networking stack: external load balancing (MetalLB), service load balancing and eBPF policy enforcement (Cilium), pod networking with centralized IPAM (Kube‑OVN), observability (Hubble), traffic flows, and example policies and configs. (≤50 words)

Changes

Cohort / File(s) Summary
Network Architecture Documentation
content/en/docs/v1/networking/architecture.md
New comprehensive doc describing layered networking (MetalLB, Cilium, Kube‑OVN, Hubble), cluster network defaults, Pod CIDR/IPAM model, ingress/egress flows, VLAN and MetalLB modes, Cilium as kube-proxy replacement, eBPF/kernel enforcement, tenant isolation examples, YAML policy snippets, and Mermaid diagrams.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~15 minutes

Poem

🐇 I hop through packets, soft and spry,
MetalLB greets the traffic sky,
Cilium guards the kernel gate,
Kube‑OVN maps each IP and state,
Hubble nods — observability, oh my!

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'docs(networking): add network architecture overview' directly and clearly summarizes the main change: adding comprehensive network architecture documentation to the networking section.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch docs/network-architecture

Tip

Issue Planner is now in beta. Read the docs and try it out! Share your feedback on Discord.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds a comprehensive and well-structured documentation page for Cozystack's network architecture. The document is clear, detailed, and makes excellent use of Mermaid diagrams to explain complex concepts like MetalLB modes, Cilium's eBPF-based processing, and tenant isolation. The explanations are accurate and easy to follow. I have a couple of minor suggestions to improve the clarity of two diagrams, but overall this is a great addition to the documentation.

Comment on lines +362 to +365
flowchart LR
A["Pod A"] --> CHECK{"eBPF<br/>Policy Check"}
CHECK -->|"Cross-tenant"| DENY["DENY"]
CHECK -->|"Same tenant"| ALLOW["ALLOW → Pod A'"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

In the "Tenant Isolation" summary diagram, the label ALLOW → Pod A' could be clearer. The A' notation is ambiguous and might be confused with a different state of Pod A. To improve clarity, consider changing it to explicitly state that traffic is allowed to another pod within the same tenant.

Suggested change
flowchart LR
A["Pod A"] --> CHECK{"eBPF<br/>Policy Check"}
CHECK -->|"Cross-tenant"| DENY["DENY"]
CHECK -->|"Same tenant"| ALLOW["ALLOW → Pod A'"]
flowchart LR
A["Pod A"] --> CHECK{"eBPF<br/>Policy Check"}
CHECK -->|"Cross-tenant"| DENY["DENY"]
CHECK -->|"Same tenant"| ALLOW["ALLOW → Pod in same tenant"]

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@content/en/docs/v1/networking/architecture.md`:
- Around line 277-292: The sentence "All of this happens in kernel space in
approximately 100 nanoseconds." is an unsupported precise latency claim; update
the text in the "Policy Enforcement in Kernel" section to either remove the
numeric value or qualify it and add a citation: e.g., replace with a softened
statement such as "All of this happens in kernel space and is typically
performed in sub-microsecond time on modern hardware" or "…in approximately 100
nanoseconds (hardware- and version-dependent; see [benchmark/source])" and
include a reference to the benchmark or paper if you keep the number; locate the
exact sentence in that section to edit.
🧹 Nitpick comments (1)
content/en/docs/v1/networking/architecture.md (1)

294-316: Avoid absolute security guarantees; qualify the statements.

Phrases like “No userspace bypass” / “no race conditions” / “cannot be bypassed” read as unconditional guarantees. Consider qualifying them (e.g., “by design” or “under correct configuration”) to avoid over-promising.

✏️ Suggested wording
-| **No userspace bypass** | All network traffic must pass through eBPF hooks |
-| **Atomic updates** | Policy changes are atomic — no race conditions |
+| **No userspace bypass (by design)** | All network traffic is expected to pass through eBPF hooks under correct configuration |
+| **Atomic updates** | Policy updates are applied atomically to reduce race windows |
-        EBPF["eBPF Programs<br/>• Attached to network interfaces<br/>• Run in privileged kernel context<br/>• Verified by kernel<br/>• Cannot be bypassed by userspace<br/>• Atomic policy updates"]
+        EBPF["eBPF Programs<br/>• Attached to network interfaces<br/>• Run in privileged kernel context<br/>• Verified by kernel<br/>• Not intended to be bypassed by userspace (with correct configuration)<br/>• Atomic policy updates"]

Comment on lines +277 to +292
### Policy Enforcement in Kernel

When a packet is sent between pods, Cilium enforces policies entirely within kernel space:

```mermaid
flowchart TD
PKT["Packet: 10.244.0.10 → 10.244.1.20"]
STEP1["1. Lookup source identity:<br/>10.244.0.10 → ID 12345 (tenant-a)"]
STEP2["2. Lookup destination identity:<br/>10.244.1.20 → ID 67890 (tenant-b)"]
STEP3["3. Check policy map:<br/>(12345, 67890, TCP, 80) → DENY"]
DROP["4. DROP packet"]
PKT --> STEP1 --> STEP2 --> STEP3 --> DROP
```

All of this happens in kernel space in approximately 100 nanoseconds.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Soften or source the “~100 nanoseconds” performance claim.

This is a very specific latency figure and is likely hardware/version dependent. Consider removing the number, qualifying it, or citing a benchmark if you have one.

✏️ Suggested wording
-All of this happens in kernel space in approximately 100 nanoseconds.
+All of this happens in kernel space with very low per-packet overhead (exact latency depends on hardware, kernel, and policy complexity).
🤖 Prompt for AI Agents
In `@content/en/docs/v1/networking/architecture.md` around lines 277 - 292, The
sentence "All of this happens in kernel space in approximately 100 nanoseconds."
is an unsupported precise latency claim; update the text in the "Policy
Enforcement in Kernel" section to either remove the numeric value or qualify it
and add a citation: e.g., replace with a softened statement such as "All of this
happens in kernel space and is typically performed in sub-microsecond time on
modern hardware" or "…in approximately 100 nanoseconds (hardware- and
version-dependent; see [benchmark/source])" and include a reference to the
benchmark or paper if you keep the number; locate the exact sentence in that
section to edit.

Add missing technical details: networking stack variants, GENEVE overlay,
Join CIDR, cluster domain, Hubble observability section. Improve tenant
isolation description with Cozystack-specific labels, automatic policy
application, and CNI chaining details.

Signed-off-by: ohotnikov.ivan <ohotnikov.ivan@e-queo.net>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@content/en/docs/v1/networking/architecture.md`:
- Line 420: The sentence "See Enabling Hubble for full configuration details."
is plain text and needs to be converted into a proper Hugo cross-reference;
update the text to use Hugo's relref or a direct URL pointing to the Enabling
Hubble page (replace the plain phrase with something like {{< relref
"path/to/enabling-hubble.md" >}} or the correct URL), ensuring the phrase
"Enabling Hubble" becomes a clickable link; locate this sentence in the content
for content/en/docs/v1/networking/architecture.md and replace it accordingly.
- Around line 6-8: The frontmatter "aliases" entry currently points to the wrong
section; locate the "aliases:" key in the networking/architecture document and
either remove the incorrect alias string
"/docs/v1/reference/applications/architecture" or replace it with the correct
networking URL; if the alias is intentionally a redirect from an old URL, add a
short inline comment in the frontmatter explaining that it's an intentional
redirect so reviewers know it's deliberate.
- Around line 344-357: The NOTE node ("NOTE") is incorrectly placed inside the
US subgraph (User Space) making its kernel-enforcement message semantically
wrong; move or re-declare the NOTE node so it is not inside US or KS (for
example place it after both subgraph declarations or attach it to the edge
between US and KS), and update the flowchart connections so the NOTE still
visually associates with the link (US --> KS) without being nested in the US
subgraph; keep the node identifier NOTE and the subgraph identifiers US and KS
to locate and adjust the diagram.
- Around line 285-299: The Mermaid node ID TB conflicts with the flowchart TD
direction keyword and can break rendering in older Mermaid versions; rename the
node ID (and any references to it) to a non-reserved identifier such as TenantB
or TB_tenant in the flowchart TD block — update the node declaration (TB ->
TenantB) and the edge from TenantB to ENGINE so the labels TA, TC, ENGINE,
ALLOW, and DENY remain unchanged and the diagram renders reliably.

---

Duplicate comments:
In `@content/en/docs/v1/networking/architecture.md`:
- Line 330: Replace the unsupported precise latency claim in the sentence "All
of this happens in kernel space in approximately 100 nanoseconds." by removing
the numeric value or softening it; for example change to "All of this happens in
kernel space and completes in a very small amount of time" or "All of this
happens in kernel space and typically completes in micro- to nanoseconds
depending on hardware" so the statement is not presented as an unsourced,
hardware-specific 100 ns figure.

Comment on lines +6 to +8
aliases:
- /docs/v1/reference/applications/architecture
---
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Alias path references the wrong section.

/docs/v1/reference/applications/architecture belongs to the applications section, not networking. If this alias is intentional (redirect from an old URL), it should be noted; otherwise it looks like a copy-paste artefact and should be corrected or removed.

✏️ Suggested fix
 aliases:
-  - /docs/v1/reference/applications/architecture
+  - /docs/v1/reference/networking/architecture

(or remove entirely if no old URL needs redirecting)

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
aliases:
- /docs/v1/reference/applications/architecture
---
aliases:
- /docs/v1/reference/networking/architecture
---
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@content/en/docs/v1/networking/architecture.md` around lines 6 - 8, The
frontmatter "aliases" entry currently points to the wrong section; locate the
"aliases:" key in the networking/architecture document and either remove the
incorrect alias string "/docs/v1/reference/applications/architecture" or replace
it with the correct networking URL; if the alias is intentionally a redirect
from an old URL, add a short inline comment in the frontmatter explaining that
it's an intentional redirect so reviewers know it's deliberate.

Comment on lines +285 to +299
```mermaid
flowchart TD
TA["Tenant A — Namespace app-a<br/>Pods: 10.244.0.10, 10.244.0.11"]
TB["Tenant B — Namespace app-b<br/>Pods: 10.244.1.20, 10.244.1.21"]
TC["Tenant C — Namespace app-c<br/>Pods: 10.244.2.30, 10.244.2.31"]
ENGINE{"Cilium eBPF Policy Engine"}
TA --> ENGINE
TB --> ENGINE
TC --> ENGINE
ENGINE -->|"A ↔ A — ALLOWED"| ALLOW["Same-tenant traffic passes"]
ENGINE -->|"A ↔ B — DENIED"| DENY["Cross-tenant traffic dropped"]
```
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Node ID TB collides with Mermaid's TB (Top-to-Bottom) direction keyword.

TB is an alias for the TD direction in Mermaid's flowchart grammar. Using it as a node ID inside flowchart TD works in most recent Mermaid releases, but has caused rendering failures in older versions. Rename to avoid the ambiguity.

✏️ Suggested fix
-    TA["Tenant A — Namespace app-a<br/>Pods: 10.244.0.10, 10.244.0.11"]
-    TB["Tenant B — Namespace app-b<br/>Pods: 10.244.1.20, 10.244.1.21"]
-    TC["Tenant C — Namespace app-c<br/>Pods: 10.244.2.30, 10.244.2.31"]
+    TENANT_A["Tenant A — Namespace app-a<br/>Pods: 10.244.0.10, 10.244.0.11"]
+    TENANT_B["Tenant B — Namespace app-b<br/>Pods: 10.244.1.20, 10.244.1.21"]
+    TENANT_C["Tenant C — Namespace app-c<br/>Pods: 10.244.2.30, 10.244.2.31"]

     ENGINE{"Cilium eBPF Policy Engine"}

-    TA --> ENGINE
-    TB --> ENGINE
-    TC --> ENGINE
+    TENANT_A --> ENGINE
+    TENANT_B --> ENGINE
+    TENANT_C --> ENGINE

-    ENGINE -->|"A ↔ A — ALLOWED"| ALLOW["Same-tenant traffic passes"]
-    ENGINE -->|"A ↔ B — DENIED"| DENY["Cross-tenant traffic dropped"]
+    ENGINE -->|"A ↔ A — ALLOWED"| ALLOW["Same-tenant traffic passes"]
+    ENGINE -->|"A ↔ B — DENIED"| DENY["Cross-tenant traffic dropped"]
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@content/en/docs/v1/networking/architecture.md` around lines 285 - 299, The
Mermaid node ID TB conflicts with the flowchart TD direction keyword and can
break rendering in older Mermaid versions; rename the node ID (and any
references to it) to a non-reserved identifier such as TenantB or TB_tenant in
the flowchart TD block — update the node declaration (TB -> TenantB) and the
edge from TenantB to ENGINE so the labels TA, TC, ENGINE, ALLOW, and DENY remain
unchanged and the diagram renders reliably.

Comment on lines +344 to +357
```mermaid
flowchart TD
subgraph US["User Space"]
PODA["Pod A<br/>(Tenant A)"]
PODB["Pod B<br/>(Tenant B)"]
NOTE["Cannot bypass policy —<br/>traffic MUST go through kernel"]
end
subgraph KS["Kernel Space"]
EBPF["eBPF Programs<br/>• Attached to network interfaces<br/>• Run in privileged kernel context<br/>• Verified by kernel<br/>• Cannot be bypassed by userspace<br/>• Atomic policy updates"]
end
US -->|"all traffic"| KS
```
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

NOTE node about kernel enforcement is visually placed inside the "User Space" subgraph — semantically backwards.

The message "Cannot bypass policy — traffic MUST go through kernel" belongs outside both subgraphs (or on the edge), not inside the User Space box, where it implies the kernel-bypass note is a user-space property.

✏️ Suggested fix
 flowchart TD
     subgraph US["User Space"]
         PODA["Pod A<br/>(Tenant A)"]
         PODB["Pod B<br/>(Tenant B)"]
-        NOTE["Cannot bypass policy —<br/>traffic MUST go through kernel"]
     end
 
     subgraph KS["Kernel Space"]
         EBPF["eBPF Programs<br/>• Attached to network interfaces<br/>• Run in privileged kernel context<br/>• Verified by kernel<br/>• Cannot be bypassed by userspace<br/>• Atomic policy updates"]
     end
 
-    US -->|"all traffic"| KS
+    US -->|"all traffic (cannot bypass policy)"| KS
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@content/en/docs/v1/networking/architecture.md` around lines 344 - 357, The
NOTE node ("NOTE") is incorrectly placed inside the US subgraph (User Space)
making its kernel-enforcement message semantically wrong; move or re-declare the
NOTE node so it is not inside US or KS (for example place it after both subgraph
declarations or attach it to the edge between US and KS), and update the
flowchart connections so the NOTE still visually associates with the link (US
--> KS) without being nested in the US subgraph; keep the node identifier NOTE
and the subgraph identifiers US and KS to locate and adjust the diagram.

enabled: true
```
See Enabling Hubble for full configuration details.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

"Enabling Hubble" cross-reference is not hyperlinked.

The sentence is plain text; readers have no way to navigate to the referenced page. Add the appropriate Hugo relref or URL.

✏️ Suggested fix
-See Enabling Hubble for full configuration details.
+See [Enabling Hubble]({{< relref "enabling-hubble" >}}) for full configuration details.

(adjust path to match the actual page location)

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
See Enabling Hubble for full configuration details.
See [Enabling Hubble]({{< relref "enabling-hubble" >}}) for full configuration details.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@content/en/docs/v1/networking/architecture.md` at line 420, The sentence "See
Enabling Hubble for full configuration details." is plain text and needs to be
converted into a proper Hugo cross-reference; update the text to use Hugo's
relref or a direct URL pointing to the Enabling Hubble page (replace the plain
phrase with something like {{< relref "path/to/enabling-hubble.md" >}} or the
correct URL), ensuring the phrase "Enabling Hubble" becomes a clickable link;
locate this sentence in the content for
content/en/docs/v1/networking/architecture.md and replace it accordingly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant