Add script to generate synthetic device inventory CSV #1338

adamoutler · 2025-12-08T16:04:41Z

This script generates a synthetic CSV inventory of NetAlertX devices, including routers, switches, APs, and leaf nodes with random but reproducible attributes.

(venv) netalertx@2e90633d0dc9 ~/NetAlertX/scripts % ./generate_device_inventory.py --devices 1000 
Wrote 1006 devices to generated-devices.csv

(venv) netalertx@2e90633d0dc9 ~/NetAlertX/scripts % ./generate_device_inventory.py --help                                                                                                                                                main
usage: generate_device_inventory.py [-h] [--output OUTPUT] [--seed SEED] [--devices DEVICES] [--switches SWITCHES] [--aps APS] [--site SITE] [--ssid SSID] [--owner OWNER] [--network NETWORK] [--template TEMPLATE]

Generate a synthetic device CSV for NetAlertX

options:
  -h, --help            show this help message and exit
  --output OUTPUT, -o OUTPUT
                        Output CSV path
  --seed SEED           Seed for reproducible output
  --devices DEVICES     Number of leaf nodes to generate
  --switches SWITCHES   Number of switches under the router
  --aps APS             Number of APs under switches
  --site SITE           Site name
  --ssid SSID           SSID placeholder
  --owner OWNER         Owner name for devices
  --network NETWORK     IPv4 network to draw addresses from (must have enough hosts for requested devices)
  --template TEMPLATE   Optional CSV to pull header from; defaults to the sample inventory layout

There appears to be an issue with NetAlertX in including a large number of devices. It appears to only import 100 of them.

Summary by CodeRabbit

New Features
- Added device inventory generator script for creating synthetic NetAlertX device datasets with configurable network topology (routers → switches → access points), device counts, sites, networks, and ownership. Supports reproducible generation via seed control, unique MAC/IP allocation, and CSV export.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

This script generates a synthetic CSV inventory of NetAlertX devices, including routers, switches, APs, and leaf nodes with random but reproducible attributes. ./generate_device_inventory.py --help main usage: generate_device_inventory.py [-h] [--output OUTPUT] [--seed SEED] [--devices DEVICES] [--switches SWITCHES] [--aps APS] [--site SITE] [--ssid SSID] [--owner OWNER] [--network NETWORK] [--template TEMPLATE] Generate a synthetic device CSV for NetAlertX options: -h, --help show this help message and exit --output OUTPUT, -o OUTPUT Output CSV path --seed SEED Seed for reproducible output --devices DEVICES Number of leaf nodes to generate --switches SWITCHES Number of switches under the router --aps APS Number of APs under switches --site SITE Site name --ssid SSID SSID placeholder --owner OWNER Owner name for devices --network NETWORK IPv4 network to draw addresses from (must have enough hosts for requested devices) --template TEMPLATE Optional CSV to pull header from; defaults to the sample inventory layout

coderabbitai · 2025-12-08T16:04:56Z

Walkthrough

A new Python script is introduced that generates synthetic NetAlertX device inventory data in CSV format. The script creates a hierarchical device topology with random, seedable data, supports customizable parameters, allocates IP addresses from a CIDR block, and writes output with validated headers and device records.

Changes

Cohort / File(s)	Summary
Device Inventory Generator `scripts/generate-device-inventory.py`	New script with 7 functions: `parse_args()` for CLI parsing, `load_header()` for CSV header management, `random_mac()` for unique MAC generation, `prepare_ip_pool()` for CIDR-based IP allocation with validation, `random_time()` for connection timestamp generation, `build_row()` for constructing device records, and `generate_rows()` for building hierarchical topology. Includes `main()` orchestration function. Supports custom site, SSID, owner, network parameters with reproducible randomization via seed.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

IP allocation logic in prepare_ip_pool(): Requires verification of CIDR parsing, host count validation, and error handling for insufficient network capacity
Topology building in generate_rows(): Review the hierarchical structure generation (Router → Switches → APs → leaf nodes) and ensure parent-child relationships are correctly established
Header field mapping in build_row(): Confirm that device rows align with the CSV header and default values are sensible
MAC uniqueness enforcement: Verify that the set-based tracking prevents collisions across generated devices

Poem

🐰 A script hops into being, with MACs both unique and true,
IPs bloom from CIDR gardens, each device gets its due,
Routers, switches, access points in hierarchy they stand,
Synthetic yet reproducible, with seeds held in paw-hand! 🌳✨

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'Add script to generate synthetic device inventory CSV' directly and accurately summarizes the main change: adding a new Python script for generating synthetic device inventory data in CSV format.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (3)

scripts/generate-device-inventory.py (3)
1-1: Set executable permission on the script.

The shebang #!/usr/bin/env python3 is present but the file is not executable. Either mark the file executable (chmod +x) or remove the shebang if it's only meant to be run via python scripts/generate-device-inventory.py.

116-125: Consider warning when the template file is not found.

Silently falling back to DEFAULT_HEADER when the user explicitly provides a --template path that doesn't exist may cause confusion. A warning would help users identify typos in the path.
 def load_header(template_path: Path | None) -> list[str]:
     if not template_path:
         return DEFAULT_HEADER
     try:
         with template_path.open(newline="", encoding="utf-8") as handle:
             reader = csv.reader(handle)
             header = next(reader)
             return header if header else DEFAULT_HEADER
     except FileNotFoundError:
+        print(f"Warning: template file '{template_path}' not found, using default header", file=sys.stderr)
         return DEFAULT_HEADER
212-212: datetime.utcnow() is deprecated.

datetime.utcnow() is deprecated since Python 3.12. Use timezone-aware datetime instead.
-    now = dt.datetime.utcnow()
+    now = dt.datetime.now(dt.timezone.utc)

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 23aa48e and d5328a3.

📒 Files selected for processing (1)

scripts/generate-device-inventory.py (1 hunks)

🧰 Additional context used

📓 Path-based instructions (1)

**/*.py

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

**/*.py: Use mylog(level, [message]) for logging; levels are: none/minimal/verbose/debug/trace. Use none for most important messages that should always appear, such as exceptions.
Always set explicit timeouts on subprocess calls. Default to 60s minimum unless plugin config specifies otherwise. Nested subprocess calls need their own timeout.
Never hardcode ports or secrets; always use get_setting_value() to retrieve configuration values.
Use environment variables (NETALERTX_DB, NETALERTX_LOG, etc.) everywhere instead of hardcoding paths like /data/db or relative paths.
Use helper.py functions (timeNowDB, normalize_mac, sanitizers) for time/MAC/string operations. Validate MACs before DB writes.
Add/modify settings via ccd() in server/initialise.py or per-plugin manifest. Define config once and read it via helpers everywhere.
Always leave logging enabled. If there is a possibility it will be difficult to debug with current logging, add more logging.

Files:

scripts/generate-device-inventory.py

🪛 Ruff (0.14.7)

scripts/generate-device-inventory.py

1-1: Shebang is present but file is not executable

(EXE001)

130-130: Standard pseudo-random generators are not suitable for cryptographic purposes

(S311)

140-140: Avoid specifying long messages outside the exception class

(TRY003)

145-145: Standard pseudo-random generators are not suitable for cryptographic purposes

(S311)

146-146: Standard pseudo-random generators are not suitable for cryptographic purposes

(S311)

191-191: Standard pseudo-random generators are not suitable for cryptographic purposes

(S311)

220-223: Avoid specifying long messages outside the exception class

(TRY003)

226-226: Standard pseudo-random generators are not suitable for cryptographic purposes

(S311)

236-236: Standard pseudo-random generators are not suitable for cryptographic purposes

(S311)

257-257: Standard pseudo-random generators are not suitable for cryptographic purposes

(S311)

273-273: Standard pseudo-random generators are not suitable for cryptographic purposes

(S311)

279-279: Standard pseudo-random generators are not suitable for cryptographic purposes

(S311)

295-295: Standard pseudo-random generators are not suitable for cryptographic purposes

(S311)

296-296: Standard pseudo-random generators are not suitable for cryptographic purposes

(S311)

297-297: Useless if-else condition

(RUF034)

303-303: Standard pseudo-random generators are not suitable for cryptographic purposes

(S311)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: docker_dev

🔇 Additional comments (5)

scripts/generate-device-inventory.py (5)

93-113: LGTM!

The argument parser is well-structured with sensible defaults and clear help text.

128-133: LGTM!

The MAC generation logic correctly ensures uniqueness. The use of standard random is appropriate for synthetic test data.

136-141: LGTM!

Clean implementation using the standard ipaddress module with appropriate error handling.

144-148: LGTM!

The random timestamp generation is appropriate for synthetic test data.

318-337: LGTM!

The main function correctly handles seeding, directory creation, and CSV output. The output message provides useful feedback.

coderabbitai · 2025-12-08T16:07:56Z

scripts/generate-device-inventory.py

+    first_seen = random_time(now)
+    last_seen = random_time(now)


⚠️ Potential issue | 🟡 Minor

devFirstConnection may be after devLastConnection.

Both timestamps are generated independently, so first_seen can end up being more recent than last_seen, which is semantically incorrect. For more realistic test data, ensure the first connection is always before or equal to the last connection.

- first_seen = random_time(now) - last_seen = random_time(now) + t1 = random_time(now) + t2 = random_time(now) + first_seen, last_seen = (t1, t2) if t1 <= t2 else (t2, t1)

🤖 Prompt for AI Agents

In scripts/generate-device-inventory.py around lines 165-166, first_seen and last_seen are generated independently which can produce first_seen > last_seen; to fix, generate first_seen and then generate last_seen that is >= first_seen (e.g., draw last_seen from a time range that starts at first_seen or, if kept independent, swap the two values when first_seen > last_seen) so the first connection is always before or equal to the last connection.

coderabbitai · 2025-12-08T16:07:56Z

scripts/generate-device-inventory.py

+        name_prefix = "Node" if dev_type == "Server" else "Node"
+        name = f"{name_prefix}-{idx:02d}"


⚠️ Potential issue | 🟡 Minor

Useless conditional: both branches return "Node".

This condition has identical outcomes regardless of dev_type. If the intent was to generate distinct prefixes per device type (e.g., "Server", "Laptop", etc.), the logic needs to be completed. Otherwise, simplify to just name_prefix = "Node".

- name_prefix = "Node" if dev_type == "Server" else "Node" - name = f"{name_prefix}-{idx:02d}" + name = f"Node-{idx:02d}"

Or, if different prefixes were intended:

name = f"{dev_type}-{idx:02d}"

🧰 Tools

🪛 Ruff (0.14.7)

297-297: Useless if-else condition

(RUF034)

🤖 Prompt for AI Agents

In scripts/generate-device-inventory.py around lines 297-298, the conditional assigning name_prefix is useless because both branches return "Node"; either simplify by setting name_prefix = "Node" directly, or implement the intended distinct prefixes (for example use the device type value or a mapping from dev_type to prefix) and then build the name accordingly; update the code to remove the redundant if/else and ensure the chosen prefix matches the intended naming scheme.

coderabbitai bot reviewed Dec 8, 2025

View reviewed changes

jokob-sk merged commit 4472595 into netalertx:main Dec 8, 2025
6 checks passed

coderabbitai bot mentioned this pull request Dec 8, 2025

Devcontainer-devices #1340

Merged

8 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add script to generate synthetic device inventory CSV #1338

Add script to generate synthetic device inventory CSV #1338

Uh oh!

adamoutler commented Dec 8, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Dec 8, 2025 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Dec 8, 2025

Uh oh!

coderabbitai bot Dec 8, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		name_prefix = "Node" if dev_type == "Server" else "Node"
		name = f"{name_prefix}-{idx:02d}"

Uh oh!

Add script to generate synthetic device inventory CSV #1338

Add script to generate synthetic device inventory CSV #1338

Uh oh!

Conversation

adamoutler commented Dec 8, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Dec 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Dec 8, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Dec 8, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

adamoutler commented Dec 8, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Dec 8, 2025 •

edited

Loading