-
-
Notifications
You must be signed in to change notification settings - Fork 366
Add script to generate synthetic device inventory CSV #1338
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This script generates a synthetic CSV inventory of NetAlertX devices, including routers, switches, APs, and leaf nodes with random but reproducible attributes.
./generate_device_inventory.py --help main
usage: generate_device_inventory.py [-h] [--output OUTPUT] [--seed SEED] [--devices DEVICES] [--switches SWITCHES] [--aps APS] [--site SITE] [--ssid SSID] [--owner OWNER] [--network NETWORK] [--template TEMPLATE]
Generate a synthetic device CSV for NetAlertX
options:
-h, --help show this help message and exit
--output OUTPUT, -o OUTPUT
Output CSV path
--seed SEED Seed for reproducible output
--devices DEVICES Number of leaf nodes to generate
--switches SWITCHES Number of switches under the router
--aps APS Number of APs under switches
--site SITE Site name
--ssid SSID SSID placeholder
--owner OWNER Owner name for devices
--network NETWORK IPv4 network to draw addresses from (must have enough hosts for requested devices)
--template TEMPLATE Optional CSV to pull header from; defaults to the sample inventory layout
WalkthroughA new Python script is introduced that generates synthetic NetAlertX device inventory data in CSV format. The script creates a hierarchical device topology with random, seedable data, supports customizable parameters, allocates IP addresses from a CIDR block, and writes output with validated headers and device records. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (3)
scripts/generate-device-inventory.py (3)
1-1: Set executable permission on the script.The shebang
#!/usr/bin/env python3is present but the file is not executable. Either mark the file executable (chmod +x) or remove the shebang if it's only meant to be run viapython scripts/generate-device-inventory.py.
116-125: Consider warning when the template file is not found.Silently falling back to
DEFAULT_HEADERwhen the user explicitly provides a--templatepath that doesn't exist may cause confusion. A warning would help users identify typos in the path.def load_header(template_path: Path | None) -> list[str]: if not template_path: return DEFAULT_HEADER try: with template_path.open(newline="", encoding="utf-8") as handle: reader = csv.reader(handle) header = next(reader) return header if header else DEFAULT_HEADER except FileNotFoundError: + print(f"Warning: template file '{template_path}' not found, using default header", file=sys.stderr) return DEFAULT_HEADER
212-212:datetime.utcnow()is deprecated.
datetime.utcnow()is deprecated since Python 3.12. Use timezone-aware datetime instead.- now = dt.datetime.utcnow() + now = dt.datetime.now(dt.timezone.utc)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
scripts/generate-device-inventory.py(1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
**/*.py: Usemylog(level, [message])for logging; levels are: none/minimal/verbose/debug/trace. Usenonefor most important messages that should always appear, such as exceptions.
Always set explicit timeouts on subprocess calls. Default to 60s minimum unless plugin config specifies otherwise. Nested subprocess calls need their own timeout.
Never hardcode ports or secrets; always useget_setting_value()to retrieve configuration values.
Use environment variables (NETALERTX_DB,NETALERTX_LOG, etc.) everywhere instead of hardcoding paths like/data/dbor relative paths.
Usehelper.pyfunctions (timeNowDB,normalize_mac, sanitizers) for time/MAC/string operations. Validate MACs before DB writes.
Add/modify settings viaccd()inserver/initialise.pyor per-plugin manifest. Define config once and read it via helpers everywhere.
Always leave logging enabled. If there is a possibility it will be difficult to debug with current logging, add more logging.
Files:
scripts/generate-device-inventory.py
🪛 Ruff (0.14.7)
scripts/generate-device-inventory.py
1-1: Shebang is present but file is not executable
(EXE001)
130-130: Standard pseudo-random generators are not suitable for cryptographic purposes
(S311)
140-140: Avoid specifying long messages outside the exception class
(TRY003)
145-145: Standard pseudo-random generators are not suitable for cryptographic purposes
(S311)
146-146: Standard pseudo-random generators are not suitable for cryptographic purposes
(S311)
191-191: Standard pseudo-random generators are not suitable for cryptographic purposes
(S311)
220-223: Avoid specifying long messages outside the exception class
(TRY003)
226-226: Standard pseudo-random generators are not suitable for cryptographic purposes
(S311)
236-236: Standard pseudo-random generators are not suitable for cryptographic purposes
(S311)
257-257: Standard pseudo-random generators are not suitable for cryptographic purposes
(S311)
273-273: Standard pseudo-random generators are not suitable for cryptographic purposes
(S311)
279-279: Standard pseudo-random generators are not suitable for cryptographic purposes
(S311)
295-295: Standard pseudo-random generators are not suitable for cryptographic purposes
(S311)
296-296: Standard pseudo-random generators are not suitable for cryptographic purposes
(S311)
297-297: Useless if-else condition
(RUF034)
303-303: Standard pseudo-random generators are not suitable for cryptographic purposes
(S311)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: docker_dev
🔇 Additional comments (5)
scripts/generate-device-inventory.py (5)
93-113: LGTM!The argument parser is well-structured with sensible defaults and clear help text.
128-133: LGTM!The MAC generation logic correctly ensures uniqueness. The use of standard
randomis appropriate for synthetic test data.
136-141: LGTM!Clean implementation using the standard
ipaddressmodule with appropriate error handling.
144-148: LGTM!The random timestamp generation is appropriate for synthetic test data.
318-337: LGTM!The main function correctly handles seeding, directory creation, and CSV output. The output message provides useful feedback.
| first_seen = random_time(now) | ||
| last_seen = random_time(now) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
devFirstConnection may be after devLastConnection.
Both timestamps are generated independently, so first_seen can end up being more recent than last_seen, which is semantically incorrect. For more realistic test data, ensure the first connection is always before or equal to the last connection.
- first_seen = random_time(now)
- last_seen = random_time(now)
+ t1 = random_time(now)
+ t2 = random_time(now)
+ first_seen, last_seen = (t1, t2) if t1 <= t2 else (t2, t1)🤖 Prompt for AI Agents
In scripts/generate-device-inventory.py around lines 165-166, first_seen and
last_seen are generated independently which can produce first_seen > last_seen;
to fix, generate first_seen and then generate last_seen that is >= first_seen
(e.g., draw last_seen from a time range that starts at first_seen or, if kept
independent, swap the two values when first_seen > last_seen) so the first
connection is always before or equal to the last connection.
| name_prefix = "Node" if dev_type == "Server" else "Node" | ||
| name = f"{name_prefix}-{idx:02d}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Useless conditional: both branches return "Node".
This condition has identical outcomes regardless of dev_type. If the intent was to generate distinct prefixes per device type (e.g., "Server", "Laptop", etc.), the logic needs to be completed. Otherwise, simplify to just name_prefix = "Node".
- name_prefix = "Node" if dev_type == "Server" else "Node"
- name = f"{name_prefix}-{idx:02d}"
+ name = f"Node-{idx:02d}"Or, if different prefixes were intended:
name = f"{dev_type}-{idx:02d}"🧰 Tools
🪛 Ruff (0.14.7)
297-297: Useless if-else condition
(RUF034)
🤖 Prompt for AI Agents
In scripts/generate-device-inventory.py around lines 297-298, the conditional
assigning name_prefix is useless because both branches return "Node"; either
simplify by setting name_prefix = "Node" directly, or implement the intended
distinct prefixes (for example use the device type value or a mapping from
dev_type to prefix) and then build the name accordingly; update the code to
remove the redundant if/else and ensure the chosen prefix matches the intended
naming scheme.
This script generates a synthetic CSV inventory of NetAlertX devices, including routers, switches, APs, and leaf nodes with random but reproducible attributes.
There appears to be an issue with NetAlertX in including a large number of devices. It appears to only import 100 of them.
Summary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings.