Skip to content

Conversation

@adamoutler
Copy link
Collaborator

@adamoutler adamoutler commented Dec 8, 2025

This script generates a synthetic CSV inventory of NetAlertX devices, including routers, switches, APs, and leaf nodes with random but reproducible attributes.

(venv) netalertx@2e90633d0dc9 ~/NetAlertX/scripts % ./generate_device_inventory.py --devices 1000 
Wrote 1006 devices to generated-devices.csv

(venv) netalertx@2e90633d0dc9 ~/NetAlertX/scripts % ./generate_device_inventory.py --help                                                                                                                                                main
usage: generate_device_inventory.py [-h] [--output OUTPUT] [--seed SEED] [--devices DEVICES] [--switches SWITCHES] [--aps APS] [--site SITE] [--ssid SSID] [--owner OWNER] [--network NETWORK] [--template TEMPLATE]

Generate a synthetic device CSV for NetAlertX

options:
  -h, --help            show this help message and exit
  --output OUTPUT, -o OUTPUT
                        Output CSV path
  --seed SEED           Seed for reproducible output
  --devices DEVICES     Number of leaf nodes to generate
  --switches SWITCHES   Number of switches under the router
  --aps APS             Number of APs under switches
  --site SITE           Site name
  --ssid SSID           SSID placeholder
  --owner OWNER         Owner name for devices
  --network NETWORK     IPv4 network to draw addresses from (must have enough hosts for requested devices)
  --template TEMPLATE   Optional CSV to pull header from; defaults to the sample inventory layout

There appears to be an issue with NetAlertX in including a large number of devices. It appears to only import 100 of them.

Summary by CodeRabbit

  • New Features
    • Added device inventory generator script for creating synthetic NetAlertX device datasets with configurable network topology (routers → switches → access points), device counts, sites, networks, and ownership. Supports reproducible generation via seed control, unique MAC/IP allocation, and CSV export.

✏️ Tip: You can customize this high-level summary in your review settings.

This script generates a synthetic CSV inventory of NetAlertX devices, including routers, switches, APs, and leaf nodes with random but reproducible attributes.

./generate_device_inventory.py --help                                                                                                                                                main
usage: generate_device_inventory.py [-h] [--output OUTPUT] [--seed SEED] [--devices DEVICES] [--switches SWITCHES] [--aps APS] [--site SITE] [--ssid SSID] [--owner OWNER] [--network NETWORK] [--template TEMPLATE]

Generate a synthetic device CSV for NetAlertX

options:
  -h, --help            show this help message and exit
  --output OUTPUT, -o OUTPUT
                        Output CSV path
  --seed SEED           Seed for reproducible output
  --devices DEVICES     Number of leaf nodes to generate
  --switches SWITCHES   Number of switches under the router
  --aps APS             Number of APs under switches
  --site SITE           Site name
  --ssid SSID           SSID placeholder
  --owner OWNER         Owner name for devices
  --network NETWORK     IPv4 network to draw addresses from (must have enough hosts for requested devices)
  --template TEMPLATE   Optional CSV to pull header from; defaults to the sample inventory layout
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 8, 2025

Walkthrough

A new Python script is introduced that generates synthetic NetAlertX device inventory data in CSV format. The script creates a hierarchical device topology with random, seedable data, supports customizable parameters, allocates IP addresses from a CIDR block, and writes output with validated headers and device records.

Changes

Cohort / File(s) Summary
Device Inventory Generator
scripts/generate-device-inventory.py
New script with 7 functions: parse_args() for CLI parsing, load_header() for CSV header management, random_mac() for unique MAC generation, prepare_ip_pool() for CIDR-based IP allocation with validation, random_time() for connection timestamp generation, build_row() for constructing device records, and generate_rows() for building hierarchical topology. Includes main() orchestration function. Supports custom site, SSID, owner, network parameters with reproducible randomization via seed.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

  • IP allocation logic in prepare_ip_pool(): Requires verification of CIDR parsing, host count validation, and error handling for insufficient network capacity
  • Topology building in generate_rows(): Review the hierarchical structure generation (Router → Switches → APs → leaf nodes) and ensure parent-child relationships are correctly established
  • Header field mapping in build_row(): Confirm that device rows align with the CSV header and default values are sensible
  • MAC uniqueness enforcement: Verify that the set-based tracking prevents collisions across generated devices

Poem

🐰 A script hops into being, with MACs both unique and true,
IPs bloom from CIDR gardens, each device gets its due,
Routers, switches, access points in hierarchy they stand,
Synthetic yet reproducible, with seeds held in paw-hand! 🌳✨

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Add script to generate synthetic device inventory CSV' directly and accurately summarizes the main change: adding a new Python script for generating synthetic device inventory data in CSV format.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (3)
scripts/generate-device-inventory.py (3)

1-1: Set executable permission on the script.

The shebang #!/usr/bin/env python3 is present but the file is not executable. Either mark the file executable (chmod +x) or remove the shebang if it's only meant to be run via python scripts/generate-device-inventory.py.


116-125: Consider warning when the template file is not found.

Silently falling back to DEFAULT_HEADER when the user explicitly provides a --template path that doesn't exist may cause confusion. A warning would help users identify typos in the path.

 def load_header(template_path: Path | None) -> list[str]:
     if not template_path:
         return DEFAULT_HEADER
     try:
         with template_path.open(newline="", encoding="utf-8") as handle:
             reader = csv.reader(handle)
             header = next(reader)
             return header if header else DEFAULT_HEADER
     except FileNotFoundError:
+        print(f"Warning: template file '{template_path}' not found, using default header", file=sys.stderr)
         return DEFAULT_HEADER

212-212: datetime.utcnow() is deprecated.

datetime.utcnow() is deprecated since Python 3.12. Use timezone-aware datetime instead.

-    now = dt.datetime.utcnow()
+    now = dt.datetime.now(dt.timezone.utc)
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 23aa48e and d5328a3.

📒 Files selected for processing (1)
  • scripts/generate-device-inventory.py (1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

**/*.py: Use mylog(level, [message]) for logging; levels are: none/minimal/verbose/debug/trace. Use none for most important messages that should always appear, such as exceptions.
Always set explicit timeouts on subprocess calls. Default to 60s minimum unless plugin config specifies otherwise. Nested subprocess calls need their own timeout.
Never hardcode ports or secrets; always use get_setting_value() to retrieve configuration values.
Use environment variables (NETALERTX_DB, NETALERTX_LOG, etc.) everywhere instead of hardcoding paths like /data/db or relative paths.
Use helper.py functions (timeNowDB, normalize_mac, sanitizers) for time/MAC/string operations. Validate MACs before DB writes.
Add/modify settings via ccd() in server/initialise.py or per-plugin manifest. Define config once and read it via helpers everywhere.
Always leave logging enabled. If there is a possibility it will be difficult to debug with current logging, add more logging.

Files:

  • scripts/generate-device-inventory.py
🪛 Ruff (0.14.7)
scripts/generate-device-inventory.py

1-1: Shebang is present but file is not executable

(EXE001)


130-130: Standard pseudo-random generators are not suitable for cryptographic purposes

(S311)


140-140: Avoid specifying long messages outside the exception class

(TRY003)


145-145: Standard pseudo-random generators are not suitable for cryptographic purposes

(S311)


146-146: Standard pseudo-random generators are not suitable for cryptographic purposes

(S311)


191-191: Standard pseudo-random generators are not suitable for cryptographic purposes

(S311)


220-223: Avoid specifying long messages outside the exception class

(TRY003)


226-226: Standard pseudo-random generators are not suitable for cryptographic purposes

(S311)


236-236: Standard pseudo-random generators are not suitable for cryptographic purposes

(S311)


257-257: Standard pseudo-random generators are not suitable for cryptographic purposes

(S311)


273-273: Standard pseudo-random generators are not suitable for cryptographic purposes

(S311)


279-279: Standard pseudo-random generators are not suitable for cryptographic purposes

(S311)


295-295: Standard pseudo-random generators are not suitable for cryptographic purposes

(S311)


296-296: Standard pseudo-random generators are not suitable for cryptographic purposes

(S311)


297-297: Useless if-else condition

(RUF034)


303-303: Standard pseudo-random generators are not suitable for cryptographic purposes

(S311)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: docker_dev
🔇 Additional comments (5)
scripts/generate-device-inventory.py (5)

93-113: LGTM!

The argument parser is well-structured with sensible defaults and clear help text.


128-133: LGTM!

The MAC generation logic correctly ensures uniqueness. The use of standard random is appropriate for synthetic test data.


136-141: LGTM!

Clean implementation using the standard ipaddress module with appropriate error handling.


144-148: LGTM!

The random timestamp generation is appropriate for synthetic test data.


318-337: LGTM!

The main function correctly handles seeding, directory creation, and CSV output. The output message provides useful feedback.

Comment on lines +165 to +166
first_seen = random_time(now)
last_seen = random_time(now)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

devFirstConnection may be after devLastConnection.

Both timestamps are generated independently, so first_seen can end up being more recent than last_seen, which is semantically incorrect. For more realistic test data, ensure the first connection is always before or equal to the last connection.

-    first_seen = random_time(now)
-    last_seen = random_time(now)
+    t1 = random_time(now)
+    t2 = random_time(now)
+    first_seen, last_seen = (t1, t2) if t1 <= t2 else (t2, t1)
🤖 Prompt for AI Agents
In scripts/generate-device-inventory.py around lines 165-166, first_seen and
last_seen are generated independently which can produce first_seen > last_seen;
to fix, generate first_seen and then generate last_seen that is >= first_seen
(e.g., draw last_seen from a time range that starts at first_seen or, if kept
independent, swap the two values when first_seen > last_seen) so the first
connection is always before or equal to the last connection.

Comment on lines +297 to +298
name_prefix = "Node" if dev_type == "Server" else "Node"
name = f"{name_prefix}-{idx:02d}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Useless conditional: both branches return "Node".

This condition has identical outcomes regardless of dev_type. If the intent was to generate distinct prefixes per device type (e.g., "Server", "Laptop", etc.), the logic needs to be completed. Otherwise, simplify to just name_prefix = "Node".

-        name_prefix = "Node" if dev_type == "Server" else "Node"
-        name = f"{name_prefix}-{idx:02d}"
+        name = f"Node-{idx:02d}"

Or, if different prefixes were intended:

name = f"{dev_type}-{idx:02d}"
🧰 Tools
🪛 Ruff (0.14.7)

297-297: Useless if-else condition

(RUF034)

🤖 Prompt for AI Agents
In scripts/generate-device-inventory.py around lines 297-298, the conditional
assigning name_prefix is useless because both branches return "Node"; either
simplify by setting name_prefix = "Node" directly, or implement the intended
distinct prefixes (for example use the device type value or a mapping from
dev_type to prefix) and then build the name accordingly; update the code to
remove the redundant if/else and ensure the chosen prefix matches the intended
naming scheme.

@jokob-sk jokob-sk merged commit 4472595 into netalertx:main Dec 8, 2025
6 checks passed
@coderabbitai coderabbitai bot mentioned this pull request Dec 8, 2025
8 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants