docs(networking): add network architecture overview#422
docs(networking): add network architecture overview#422IvanHunters wants to merge 2 commits intomainfrom
Conversation
Add comprehensive documentation covering the Cozystack networking stack: MetalLB load balancing (L2 and BGP modes), Cilium eBPF as kube-proxy replacement, Kube-OVN centralized IPAM, and tenant isolation with identity-based eBPF policies. All diagrams use Mermaid. Signed-off-by: ohotnikov.ivan <ohotnikov.ivan@e-queo.net>
✅ Deploy Preview for cozystack ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
Summary of ChangesHello @IvanHunters, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces new documentation that provides a detailed overview of the Cozystack cluster's network architecture. It clarifies how various components like MetalLB, Cilium eBPF, and Kube-OVN work together to manage external load balancing, internal pod networking, and robust tenant isolation. The document aims to enhance understanding of the system's networking capabilities and security enforcement mechanisms. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
📝 WalkthroughWalkthroughAdds a new Network Architecture documentation file describing Cozystack's multi-layer networking stack: external load balancing (MetalLB), service load balancing and eBPF policy enforcement (Cilium), pod networking with centralized IPAM (Kube‑OVN), observability (Hubble), traffic flows, and example policies and configs. (≤50 words) Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~15 minutes Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Tip Issue Planner is now in beta. Read the docs and try it out! Share your feedback on Discord. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Code Review
This pull request adds a comprehensive and well-structured documentation page for Cozystack's network architecture. The document is clear, detailed, and makes excellent use of Mermaid diagrams to explain complex concepts like MetalLB modes, Cilium's eBPF-based processing, and tenant isolation. The explanations are accurate and easy to follow. I have a couple of minor suggestions to improve the clarity of two diagrams, but overall this is a great addition to the documentation.
| flowchart LR | ||
| A["Pod A"] --> CHECK{"eBPF<br/>Policy Check"} | ||
| CHECK -->|"Cross-tenant"| DENY["DENY"] | ||
| CHECK -->|"Same tenant"| ALLOW["ALLOW → Pod A'"] |
There was a problem hiding this comment.
In the "Tenant Isolation" summary diagram, the label ALLOW → Pod A' could be clearer. The A' notation is ambiguous and might be confused with a different state of Pod A. To improve clarity, consider changing it to explicitly state that traffic is allowed to another pod within the same tenant.
| flowchart LR | |
| A["Pod A"] --> CHECK{"eBPF<br/>Policy Check"} | |
| CHECK -->|"Cross-tenant"| DENY["DENY"] | |
| CHECK -->|"Same tenant"| ALLOW["ALLOW → Pod A'"] | |
| flowchart LR | |
| A["Pod A"] --> CHECK{"eBPF<br/>Policy Check"} | |
| CHECK -->|"Cross-tenant"| DENY["DENY"] | |
| CHECK -->|"Same tenant"| ALLOW["ALLOW → Pod in same tenant"] |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@content/en/docs/v1/networking/architecture.md`:
- Around line 277-292: The sentence "All of this happens in kernel space in
approximately 100 nanoseconds." is an unsupported precise latency claim; update
the text in the "Policy Enforcement in Kernel" section to either remove the
numeric value or qualify it and add a citation: e.g., replace with a softened
statement such as "All of this happens in kernel space and is typically
performed in sub-microsecond time on modern hardware" or "…in approximately 100
nanoseconds (hardware- and version-dependent; see [benchmark/source])" and
include a reference to the benchmark or paper if you keep the number; locate the
exact sentence in that section to edit.
🧹 Nitpick comments (1)
content/en/docs/v1/networking/architecture.md (1)
294-316: Avoid absolute security guarantees; qualify the statements.Phrases like “No userspace bypass” / “no race conditions” / “cannot be bypassed” read as unconditional guarantees. Consider qualifying them (e.g., “by design” or “under correct configuration”) to avoid over-promising.
✏️ Suggested wording
-| **No userspace bypass** | All network traffic must pass through eBPF hooks | -| **Atomic updates** | Policy changes are atomic — no race conditions | +| **No userspace bypass (by design)** | All network traffic is expected to pass through eBPF hooks under correct configuration | +| **Atomic updates** | Policy updates are applied atomically to reduce race windows |- EBPF["eBPF Programs<br/>• Attached to network interfaces<br/>• Run in privileged kernel context<br/>• Verified by kernel<br/>• Cannot be bypassed by userspace<br/>• Atomic policy updates"] + EBPF["eBPF Programs<br/>• Attached to network interfaces<br/>• Run in privileged kernel context<br/>• Verified by kernel<br/>• Not intended to be bypassed by userspace (with correct configuration)<br/>• Atomic policy updates"]
| ### Policy Enforcement in Kernel | ||
|
|
||
| When a packet is sent between pods, Cilium enforces policies entirely within kernel space: | ||
|
|
||
| ```mermaid | ||
| flowchart TD | ||
| PKT["Packet: 10.244.0.10 → 10.244.1.20"] | ||
| STEP1["1. Lookup source identity:<br/>10.244.0.10 → ID 12345 (tenant-a)"] | ||
| STEP2["2. Lookup destination identity:<br/>10.244.1.20 → ID 67890 (tenant-b)"] | ||
| STEP3["3. Check policy map:<br/>(12345, 67890, TCP, 80) → DENY"] | ||
| DROP["4. DROP packet"] | ||
| PKT --> STEP1 --> STEP2 --> STEP3 --> DROP | ||
| ``` | ||
|
|
||
| All of this happens in kernel space in approximately 100 nanoseconds. |
There was a problem hiding this comment.
Soften or source the “~100 nanoseconds” performance claim.
This is a very specific latency figure and is likely hardware/version dependent. Consider removing the number, qualifying it, or citing a benchmark if you have one.
✏️ Suggested wording
-All of this happens in kernel space in approximately 100 nanoseconds.
+All of this happens in kernel space with very low per-packet overhead (exact latency depends on hardware, kernel, and policy complexity).🤖 Prompt for AI Agents
In `@content/en/docs/v1/networking/architecture.md` around lines 277 - 292, The
sentence "All of this happens in kernel space in approximately 100 nanoseconds."
is an unsupported precise latency claim; update the text in the "Policy
Enforcement in Kernel" section to either remove the numeric value or qualify it
and add a citation: e.g., replace with a softened statement such as "All of this
happens in kernel space and is typically performed in sub-microsecond time on
modern hardware" or "…in approximately 100 nanoseconds (hardware- and
version-dependent; see [benchmark/source])" and include a reference to the
benchmark or paper if you keep the number; locate the exact sentence in that
section to edit.
Add missing technical details: networking stack variants, GENEVE overlay, Join CIDR, cluster domain, Hubble observability section. Improve tenant isolation description with Cozystack-specific labels, automatic policy application, and CNI chaining details. Signed-off-by: ohotnikov.ivan <ohotnikov.ivan@e-queo.net>
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@content/en/docs/v1/networking/architecture.md`:
- Line 420: The sentence "See Enabling Hubble for full configuration details."
is plain text and needs to be converted into a proper Hugo cross-reference;
update the text to use Hugo's relref or a direct URL pointing to the Enabling
Hubble page (replace the plain phrase with something like {{< relref
"path/to/enabling-hubble.md" >}} or the correct URL), ensuring the phrase
"Enabling Hubble" becomes a clickable link; locate this sentence in the content
for content/en/docs/v1/networking/architecture.md and replace it accordingly.
- Around line 6-8: The frontmatter "aliases" entry currently points to the wrong
section; locate the "aliases:" key in the networking/architecture document and
either remove the incorrect alias string
"/docs/v1/reference/applications/architecture" or replace it with the correct
networking URL; if the alias is intentionally a redirect from an old URL, add a
short inline comment in the frontmatter explaining that it's an intentional
redirect so reviewers know it's deliberate.
- Around line 344-357: The NOTE node ("NOTE") is incorrectly placed inside the
US subgraph (User Space) making its kernel-enforcement message semantically
wrong; move or re-declare the NOTE node so it is not inside US or KS (for
example place it after both subgraph declarations or attach it to the edge
between US and KS), and update the flowchart connections so the NOTE still
visually associates with the link (US --> KS) without being nested in the US
subgraph; keep the node identifier NOTE and the subgraph identifiers US and KS
to locate and adjust the diagram.
- Around line 285-299: The Mermaid node ID TB conflicts with the flowchart TD
direction keyword and can break rendering in older Mermaid versions; rename the
node ID (and any references to it) to a non-reserved identifier such as TenantB
or TB_tenant in the flowchart TD block — update the node declaration (TB ->
TenantB) and the edge from TenantB to ENGINE so the labels TA, TC, ENGINE,
ALLOW, and DENY remain unchanged and the diagram renders reliably.
---
Duplicate comments:
In `@content/en/docs/v1/networking/architecture.md`:
- Line 330: Replace the unsupported precise latency claim in the sentence "All
of this happens in kernel space in approximately 100 nanoseconds." by removing
the numeric value or softening it; for example change to "All of this happens in
kernel space and completes in a very small amount of time" or "All of this
happens in kernel space and typically completes in micro- to nanoseconds
depending on hardware" so the statement is not presented as an unsourced,
hardware-specific 100 ns figure.
| aliases: | ||
| - /docs/v1/reference/applications/architecture | ||
| --- |
There was a problem hiding this comment.
Alias path references the wrong section.
/docs/v1/reference/applications/architecture belongs to the applications section, not networking. If this alias is intentional (redirect from an old URL), it should be noted; otherwise it looks like a copy-paste artefact and should be corrected or removed.
✏️ Suggested fix
aliases:
- - /docs/v1/reference/applications/architecture
+ - /docs/v1/reference/networking/architecture(or remove entirely if no old URL needs redirecting)
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| aliases: | |
| - /docs/v1/reference/applications/architecture | |
| --- | |
| aliases: | |
| - /docs/v1/reference/networking/architecture | |
| --- |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@content/en/docs/v1/networking/architecture.md` around lines 6 - 8, The
frontmatter "aliases" entry currently points to the wrong section; locate the
"aliases:" key in the networking/architecture document and either remove the
incorrect alias string "/docs/v1/reference/applications/architecture" or replace
it with the correct networking URL; if the alias is intentionally a redirect
from an old URL, add a short inline comment in the frontmatter explaining that
it's an intentional redirect so reviewers know it's deliberate.
| ```mermaid | ||
| flowchart TD | ||
| TA["Tenant A — Namespace app-a<br/>Pods: 10.244.0.10, 10.244.0.11"] | ||
| TB["Tenant B — Namespace app-b<br/>Pods: 10.244.1.20, 10.244.1.21"] | ||
| TC["Tenant C — Namespace app-c<br/>Pods: 10.244.2.30, 10.244.2.31"] | ||
| ENGINE{"Cilium eBPF Policy Engine"} | ||
| TA --> ENGINE | ||
| TB --> ENGINE | ||
| TC --> ENGINE | ||
| ENGINE -->|"A ↔ A — ALLOWED"| ALLOW["Same-tenant traffic passes"] | ||
| ENGINE -->|"A ↔ B — DENIED"| DENY["Cross-tenant traffic dropped"] | ||
| ``` |
There was a problem hiding this comment.
Node ID TB collides with Mermaid's TB (Top-to-Bottom) direction keyword.
TB is an alias for the TD direction in Mermaid's flowchart grammar. Using it as a node ID inside flowchart TD works in most recent Mermaid releases, but has caused rendering failures in older versions. Rename to avoid the ambiguity.
✏️ Suggested fix
- TA["Tenant A — Namespace app-a<br/>Pods: 10.244.0.10, 10.244.0.11"]
- TB["Tenant B — Namespace app-b<br/>Pods: 10.244.1.20, 10.244.1.21"]
- TC["Tenant C — Namespace app-c<br/>Pods: 10.244.2.30, 10.244.2.31"]
+ TENANT_A["Tenant A — Namespace app-a<br/>Pods: 10.244.0.10, 10.244.0.11"]
+ TENANT_B["Tenant B — Namespace app-b<br/>Pods: 10.244.1.20, 10.244.1.21"]
+ TENANT_C["Tenant C — Namespace app-c<br/>Pods: 10.244.2.30, 10.244.2.31"]
ENGINE{"Cilium eBPF Policy Engine"}
- TA --> ENGINE
- TB --> ENGINE
- TC --> ENGINE
+ TENANT_A --> ENGINE
+ TENANT_B --> ENGINE
+ TENANT_C --> ENGINE
- ENGINE -->|"A ↔ A — ALLOWED"| ALLOW["Same-tenant traffic passes"]
- ENGINE -->|"A ↔ B — DENIED"| DENY["Cross-tenant traffic dropped"]
+ ENGINE -->|"A ↔ A — ALLOWED"| ALLOW["Same-tenant traffic passes"]
+ ENGINE -->|"A ↔ B — DENIED"| DENY["Cross-tenant traffic dropped"]🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@content/en/docs/v1/networking/architecture.md` around lines 285 - 299, The
Mermaid node ID TB conflicts with the flowchart TD direction keyword and can
break rendering in older Mermaid versions; rename the node ID (and any
references to it) to a non-reserved identifier such as TenantB or TB_tenant in
the flowchart TD block — update the node declaration (TB -> TenantB) and the
edge from TenantB to ENGINE so the labels TA, TC, ENGINE, ALLOW, and DENY remain
unchanged and the diagram renders reliably.
| ```mermaid | ||
| flowchart TD | ||
| subgraph US["User Space"] | ||
| PODA["Pod A<br/>(Tenant A)"] | ||
| PODB["Pod B<br/>(Tenant B)"] | ||
| NOTE["Cannot bypass policy —<br/>traffic MUST go through kernel"] | ||
| end | ||
| subgraph KS["Kernel Space"] | ||
| EBPF["eBPF Programs<br/>• Attached to network interfaces<br/>• Run in privileged kernel context<br/>• Verified by kernel<br/>• Cannot be bypassed by userspace<br/>• Atomic policy updates"] | ||
| end | ||
| US -->|"all traffic"| KS | ||
| ``` |
There was a problem hiding this comment.
NOTE node about kernel enforcement is visually placed inside the "User Space" subgraph — semantically backwards.
The message "Cannot bypass policy — traffic MUST go through kernel" belongs outside both subgraphs (or on the edge), not inside the User Space box, where it implies the kernel-bypass note is a user-space property.
✏️ Suggested fix
flowchart TD
subgraph US["User Space"]
PODA["Pod A<br/>(Tenant A)"]
PODB["Pod B<br/>(Tenant B)"]
- NOTE["Cannot bypass policy —<br/>traffic MUST go through kernel"]
end
subgraph KS["Kernel Space"]
EBPF["eBPF Programs<br/>• Attached to network interfaces<br/>• Run in privileged kernel context<br/>• Verified by kernel<br/>• Cannot be bypassed by userspace<br/>• Atomic policy updates"]
end
- US -->|"all traffic"| KS
+ US -->|"all traffic (cannot bypass policy)"| KS🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@content/en/docs/v1/networking/architecture.md` around lines 344 - 357, The
NOTE node ("NOTE") is incorrectly placed inside the US subgraph (User Space)
making its kernel-enforcement message semantically wrong; move or re-declare the
NOTE node so it is not inside US or KS (for example place it after both subgraph
declarations or attach it to the edge between US and KS), and update the
flowchart connections so the NOTE still visually associates with the link (US
--> KS) without being nested in the US subgraph; keep the node identifier NOTE
and the subgraph identifiers US and KS to locate and adjust the diagram.
| enabled: true | ||
| ``` | ||
| See Enabling Hubble for full configuration details. |
There was a problem hiding this comment.
"Enabling Hubble" cross-reference is not hyperlinked.
The sentence is plain text; readers have no way to navigate to the referenced page. Add the appropriate Hugo relref or URL.
✏️ Suggested fix
-See Enabling Hubble for full configuration details.
+See [Enabling Hubble]({{< relref "enabling-hubble" >}}) for full configuration details.(adjust path to match the actual page location)
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| See Enabling Hubble for full configuration details. | |
| See [Enabling Hubble]({{< relref "enabling-hubble" >}}) for full configuration details. |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@content/en/docs/v1/networking/architecture.md` at line 420, The sentence "See
Enabling Hubble for full configuration details." is plain text and needs to be
converted into a proper Hugo cross-reference; update the text to use Hugo's
relref or a direct URL pointing to the Enabling Hubble page (replace the plain
phrase with something like {{< relref "path/to/enabling-hubble.md" >}} or the
correct URL), ensuring the phrase "Enabling Hubble" becomes a clickable link;
locate this sentence in the content for
content/en/docs/v1/networking/architecture.md and replace it accordingly.
Summary
Test plan
Summary by CodeRabbit