Skip to content

Conversation

@moshloop
Copy link
Member

@moshloop moshloop commented Dec 15, 2025

requires: flanksource/duty#1687

Summary by CodeRabbit

  • New Features

    • Added an exec scraper to run scripts with env var support, Git checkout, cloud connections (AWS/GCP/Azure/Kubernetes), artifacts, and JSON/YAML/plain-text parsing; registered in scraper list.
  • Documentation

    • Added comprehensive exec fixtures and an exec README with many usage examples and patterns.
  • CRD / Schema

    • Introduced exec/script and env fields in ScrapeConfig CRD and added comprehensive executable config JSON schemas.
  • Tests

    • Added unit and integration tests for parsing, git checkout, and Backstage integration.
  • Chores

    • Updated .gitignore, bumped tool/dependency versions, Makefile lint version, and CI test setup (pyyaml).

✏️ Tip: You can customize this high-level summary in your review settings.

- Add Exec scraper type to api/v1 mirroring playbook ExecAction
- Implement ExecScraper with script execution via duty/shell.Run()
- Support AWS/GCP/Azure/K8s connection credential injection
- Support git checkout before script execution
- Parse stdout as JSON/YAML to create multiple config items
- Register exec scraper in ScraperSpec and common registry
- Add comprehensive unit tests for output parsing
- All tests pass and linting clean
Add 8 example ScrapeConfig fixtures demonstrating exec scraper capabilities:

- exec-simple.yaml: Basic inline bash script with JSON output
- exec-aws-connection.yaml: AWS CLI with credential injection
- exec-git-checkout.yaml: Script execution from Git repository
- exec-python.yaml: Python script with Kubernetes connection
- exec-nodejs.yaml: Node.js script with AWS connection
- exec-env-transform.yaml: Environment variables and transforms
- exec-yaml-output.yaml: YAML output parsing example
- exec-comprehensive.yaml: Multi-feature demonstration

Also includes exec-README.md with detailed documentation covering:
- Feature overview for each fixture
- Key capabilities demonstrated
- Usage patterns and best practices
- Scheduling recommendations
- Add exec_suite_test.go with Ginkgo test suite setup
- Add exec_git_integration_test.go with git checkout integration test
- Test verifies exec scraper can clone repo and execute inline script
- Uses file:// URL to clone config-db repo itself
- Inline script checks for go.mod to verify checkout worked
- All tests pass successfully
Tests exec scraper's ability to checkout git repos and run scripts
- Created fixtures/exec-backstage-catalog.yaml with Python script to scrape Backstage entities
- Parses Components, APIs, Systems, and Domains from catalog-model/examples
- Extracts metadata, properties (type, lifecycle), and tags from multiple sources
- Maps relationships (owner, domain, system, dependsOn, consumesApis) using CEL expressions
- Handles multi-document YAML files with safe_load_all
- Created integration test in scrapers/exec/exec_backstage_integration_test.go
- Tests entity scraping, relationship extraction, properties, and tag aggregation
- Fixed exec_git_integration_test.go unused variable

Note: Git checkout currently fails with 'reference not found' - requires investigation
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 15, 2025

Walkthrough

Adds an Exec scraping feature: new Exec API type and deepcopy, CRD/JSON schema extensions for exec/script/env/connections, ExecScraper implementation and registration, fixtures and docs, unit and integration tests, and dependency/tooling updates.

Changes

Cohort / File(s) Summary
API Types & Deepcopy
api/v1/exec.go, api/v1/scrapeconfig_types.go, api/v1/types.go, api/v1/zz_generated.deepcopy.go
Add Exec API type (script, connections, checkout, env, artifacts); register "exec" in AllScraperConfigs; extend ScraperSpec with Exec slice; add DeepCopy methods and deepcopy support for Exec.
CRD & JSON Schemas
chart/crds/configs.flanksource.com_scrapeconfigs.yaml, chart/crds/configs.flanksource.com_scrapeplugins.yaml, config/schemas/config_exec.schema.json, config/schemas/scrape_config.schema.json
Bump controller-gen annotation; rename queryscript; add exec, env, artifacts, extensive connection/valueFrom definitions; introduce Exec/ExecConnections/Git/AWS/GCP/Azure/Kubernetes connection schema types and many nested schema objects; adjust CRD descriptions.
Scraper Implementation & Registration
scrapers/exec/exec.go, scrapers/common.go
Add ExecScraper with CanScrape and Scrape: iterate Exec configs, run scripts (respecting env/checkout/connections), handle exit codes, parse stdout (JSON→YAML→text), and produce ScrapeResults; register scraper in registry.
Fixtures & Documentation
fixtures/exec-README.md, fixtures/exec-*.yaml
Add README and multiple exec fixtures demonstrating inline scripts, AWS/GCP/Azure/Kubernetes connections, Git checkouts, Python/Node.js examples, env/transform usage, and YAML/JSON outputs.
Tests & Test Suite
scrapers/exec/exec_suite_test.go, scrapers/exec/exec_test.go, scrapers/exec/exec_git_integration_test.go, scrapers/exec/exec_backstage_integration_test.go
Add unit tests for output parsing and integration tests for Git checkout and Backstage catalog; include suite setup, helpers, and assertions for parsed entities and relationships.
Build / Tooling / Modules
.gitignore, Makefile, go.mod
Add ignore patterns (.db/, bin/, out/, ._DS_Store, exec-checkout/); update GOLANGCI_LINT_VERSION; bump numerous Go module dependencies.
Miscellaneous
cmd/server.go
Reorder imports and add a // nolint:staticcheck directive for a LoggerWithConfig usage.

Sequence Diagram(s)

sequenceDiagram
    autonumber
    actor Scheduler
    participant ExecScraper
    participant Checkout as Git/Checkout
    participant Shell as ShellExecutor
    participant Parser
    participant Results as ResultsCollector

    Scheduler->>ExecScraper: Trigger Scrape(ctx)
    ExecScraper->>ExecScraper: iterate Exec configs

    alt config has checkout
        ExecScraper->>Checkout: perform git checkout
        Checkout-->>ExecScraper: workspace ready / error
    end

    ExecScraper->>Shell: run script (env, connections, workspace)
    Shell-->>ExecScraper: stdout, stderr, exitCode

    alt exitCode == 0
        ExecScraper->>Parser: parseOutput(stdout)
        alt parsed as JSON array/object
            Parser-->>ExecScraper: structured data
            ExecScraper->>Results: create result(s) per item
        else parsed as YAML
            Parser-->>ExecScraper: YAML→JSON data
            ExecScraper->>Results: create result(s)
        else plain text
            Parser-->>ExecScraper: raw text
            ExecScraper->>Results: create single textual result
        end
    else
        ExecScraper->>Results: emit error result (stderr, exitCode)
    end

    ExecScraper-->>Scheduler: aggregated ScrapeResults
Loading

Pre-merge checks and finishing touches

❌ Failed checks (1 warning, 1 inconclusive)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 20.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
Title check ❓ Inconclusive The title 'Pr/exec' is vague and generic, using a branch name rather than a descriptive summary of the changes. Use a clear, descriptive title that summarizes the main feature. For example: 'Add Exec scraper for executing shell scripts' or 'Implement script execution capability for scrape configs'.
✅ Passed checks (1 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch pr/exec

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@moshloop moshloop requested a review from adityathebe December 15, 2025 09:10
@adityathebe adityathebe self-assigned this Dec 15, 2025
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
chart/crds/configs.flanksource.com_scrapeconfigs.yaml (1)

1875-2087: Consider marking account as required for clickhouse.azureBlobStorage

The CRD correctly reflects the Go struct (only collection is required). However, the account field is used unconditionally in GetAccountKeyCommand() at runtime without validation—if it's empty, the command will fail. Either mark account as required in the CRD schema, or add controller-level validation to prevent users from creating specs that pass admission but fail at execution.

♻️ Duplicate comments (5)
fixtures/exec-comprehensive.yaml (3)

29-48: Critical YAML syntax error: duplicate "name" key.

Lines 29 and 32 both define a name key in the same YAML mapping. This is the same structural error as in Example 1.


51-62: Critical YAML syntax error: duplicate "name" key.

Lines 51 and 54 both define a name key in the same YAML mapping. This is the same structural error as in Examples 1 and 2.


65-99: Critical YAML syntax error: duplicate "name" key.

Lines 65 and 68 both define a name key in the same YAML mapping. This is the same structural error as in the previous examples.

fixtures/exec-env-transform.yaml (1)

8-11: Critical YAML syntax error: duplicate "name" key.

Lines 8 and 11 both define a name key in the same YAML mapping, which is invalid YAML syntax. This is the same structural error present in exec-comprehensive.yaml.

fixtures/exec-backstage-catalog.yaml (1)

11-14: Critical YAML syntax error: duplicate "name" key.

Lines 11 and 14 both define a name key in the same YAML mapping, which is invalid YAML syntax. This is the same structural error present in other exec fixture files.

Additionally, this fixture is used by the failing integration test exec_backstage_integration_test.go. The duplicate key issue may contribute to unexpected behavior, though the primary test failure appears to be related to git checkout.

🧹 Nitpick comments (2)
go.mod (1)

1-464: Verify dependency updates for compatibility.

The go.mod file shows extensive dependency updates across AWS SDK v2, Azure SDK, Kubernetes libraries, and OpenTelemetry components. While these updates appear to align with the new Exec feature requirements, ensure:

  1. All updated dependencies are tested in integration tests
  2. Breaking changes in major version bumps are addressed
  3. Security advisories for the updated versions are reviewed

Do you want me to generate a script to check for known vulnerabilities in the updated dependencies?

scrapers/exec/exec_test.go (1)

75-98: Plain-text assertion is tightly coupled to internal representation

The plain-text test currently expects the config to be a quoted string ("This is plain text..."), which bakes in how the value is serialized rather than the underlying content. That makes the test brittle if the implementation changes to keep the raw string and defer JSON quoting to a later stage.

Consider relaxing this to assert the underlying text only, e.g. just checking that results[0].Config is a string containing the original message (without enforcing JSON-quoted formatting).

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7952468 and 538d0cd.

⛔ Files ignored due to path filters (1)
  • go.sum is excluded by !**/*.sum
📒 Files selected for processing (27)
  • .gitignore (1 hunks)
  • api/v1/exec.go (1 hunks)
  • api/v1/scrapeconfig_types.go (1 hunks)
  • api/v1/types.go (3 hunks)
  • api/v1/zz_generated.deepcopy.go (3 hunks)
  • chart/crds/configs.flanksource.com_scrapeconfigs.yaml (5 hunks)
  • chart/crds/configs.flanksource.com_scrapeplugins.yaml (1 hunks)
  • config/schemas/config_exec.schema.json (1 hunks)
  • config/schemas/scrape_config.schema.json (8 hunks)
  • fixtures/exec-README.md (1 hunks)
  • fixtures/exec-aws-connection.yaml (1 hunks)
  • fixtures/exec-backstage-catalog.yaml (1 hunks)
  • fixtures/exec-comprehensive.yaml (1 hunks)
  • fixtures/exec-env-transform.yaml (1 hunks)
  • fixtures/exec-git-checkout-test.yaml (1 hunks)
  • fixtures/exec-git-checkout.yaml (1 hunks)
  • fixtures/exec-nodejs.yaml (1 hunks)
  • fixtures/exec-python.yaml (1 hunks)
  • fixtures/exec-simple.yaml (1 hunks)
  • fixtures/exec-yaml-output.yaml (1 hunks)
  • go.mod (10 hunks)
  • scrapers/common.go (2 hunks)
  • scrapers/exec/exec.go (1 hunks)
  • scrapers/exec/exec_backstage_integration_test.go (1 hunks)
  • scrapers/exec/exec_git_integration_test.go (1 hunks)
  • scrapers/exec/exec_suite_test.go (1 hunks)
  • scrapers/exec/exec_test.go (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (8)
api/v1/types.go (2)
api/v1/exec.go (1)
  • Exec (9-26)
api/v1/common.go (1)
  • BaseScraper (240-255)
scrapers/common.go (1)
scrapers/exec/exec.go (1)
  • ExecScraper (12-12)
api/v1/scrapeconfig_types.go (1)
api/v1/exec.go (1)
  • Exec (9-26)
scrapers/exec/exec_backstage_integration_test.go (2)
api/context.go (1)
  • NewScrapeContext (29-34)
scrapers/exec/exec.go (1)
  • ExecScraper (12-12)
api/v1/exec.go (1)
api/v1/common.go (1)
  • BaseScraper (240-255)
scrapers/exec/exec_suite_test.go (3)
api/context.go (2)
  • ScrapeContext (16-27)
  • NewScrapeContext (29-34)
api/v1/scrapeconfig_types.go (1)
  • ScrapeConfig (50-56)
api/v1/config.go (1)
  • ParseConfigs (30-43)
api/v1/zz_generated.deepcopy.go (2)
api/v1/exec.go (1)
  • Exec (9-26)
api/v1/common.go (1)
  • BaseScraper (240-255)
scrapers/exec/exec_test.go (1)
api/v1/exec.go (1)
  • Exec (9-26)
🪛 GitHub Actions: Test
scrapers/exec/exec_backstage_integration_test.go

[error] 35-35: Exec Scraper - Backstage Catalog Integration test failed. got error: failed to execute script: error during git clone: reference not found

scrapers/exec/exec_git_integration_test.go

[error] 33-33: Exec Scraper - Git Checkout Integration test failed. Expected 2 config items from script output but got 1; error: failed to checkout git repository and execute script

🪛 YAMLlint (1.37.1)
fixtures/exec-git-checkout-test.yaml

[error] 12-12: duplication of key "name" in mapping

(key-duplicates)

fixtures/exec-aws-connection.yaml

[error] 11-11: duplication of key "name" in mapping

(key-duplicates)

fixtures/exec-env-transform.yaml

[error] 11-11: duplication of key "name" in mapping

(key-duplicates)

fixtures/exec-backstage-catalog.yaml

[error] 14-14: duplication of key "name" in mapping

(key-duplicates)

fixtures/exec-yaml-output.yaml

[error] 11-11: duplication of key "name" in mapping

(key-duplicates)

fixtures/exec-comprehensive.yaml

[error] 12-12: duplication of key "name" in mapping

(key-duplicates)


[error] 32-32: duplication of key "name" in mapping

(key-duplicates)


[error] 54-54: duplication of key "name" in mapping

(key-duplicates)


[error] 68-68: duplication of key "name" in mapping

(key-duplicates)

🔇 Additional comments (23)
.gitignore (1)

29-34: LGTM!

The ignore patterns are appropriate for build artifacts and the new exec-checkout workflow.

chart/crds/configs.flanksource.com_scrapeplugins.yaml (1)

6-6: LGTM!

Controller-gen version bump to v0.19.0 aligns with the broader API surface changes introduced in this PR.

fixtures/exec-git-checkout-test.yaml (1)

46-46: Verify placeholder URL before deployment.

The checkout URL is set to a placeholder that will be replaced. Ensure that this fixture is only used in tests where the URL is properly substituted, or consider adding a comment indicating this is test-only.

api/v1/zz_generated.deepcopy.go (1)

644-676: LGTM!

The auto-generated DeepCopy implementation for the Exec type correctly handles all fields including pointer fields (Checkout), slices (Env, Artifacts), and nested structs (BaseScraper, Connections).

config/schemas/config_exec.schema.json (1)

1-749: LGTM! Comprehensive schema definition.

The JSON schema provides thorough type definitions for the Exec feature with:

  • Strict validation via additionalProperties: false
  • Proper required field enforcement (script, path, etc.)
  • Complete connection type coverage (AWS, GCP, Azure, K8s, Git)
  • Flexible environment variable and artifact handling

The schema aligns well with the API type definitions in api/v1/exec.go.

fixtures/exec-simple.yaml (1)

1-31: LGTM! Clean example fixture.

This fixture provides a clear, minimal example of the Exec scraper with:

  • Simple inline bash script
  • JSON output via heredoc
  • JSONPath-based field mappings

Good starting point for users learning the Exec feature.

api/v1/scrapeconfig_types.go (1)

114-116: LGTM! Consistent type detection pattern.

The exec type detection follows the same pattern as existing scraper types and is correctly placed at the end of the conditional chain.

scrapers/common.go (1)

7-7: ExecScraper registration wiring looks correct

Importing scrapers/exec and registering exec.ExecScraper{} in All is consistent with the existing pattern for other scrapers.

Also applies to: 31-31

fixtures/exec-README.md (1)

1-174: README coverage and examples look consistent with the Exec feature set

The described fixtures, runtimes, connection handling, output formats, and patterns match the added exec YAMLs and the new Exec scraper behavior. This is a solid, discoverable overview for users.

fixtures/exec-git-checkout.yaml (1)

1-20: Git checkout exec fixture looks structurally sound

The example correctly wires exec, inline Bash, and a checkout block, matching the new Exec model and providing a clear Git‑based usage pattern.

api/v1/types.go (1)

19-20: Exec type is correctly wired into config registry and plugin application

Adding "exec": Exec{} to AllScraperConfigs, the Exec []Exec field on ScraperSpec with json/yaml:"exec,omitempty", and the corresponding ApplyPlugin loop is consistent with how other scraper types are integrated.

Also applies to: 72-73, 145-147

scrapers/exec/exec_test.go (1)

9-73: Good coverage of parseOutput for structured outputs

The tests for JSON object/array, YAML, and empty output exercise the main branches of parseOutput and at least validate result counts and absence of errors, which should catch regressions in basic parsing behavior.

scrapers/exec/exec_backstage_integration_test.go (1)

14-86: The fixture configuration and repository are properly configured. The exec-backstage-catalog.yaml fixture correctly specifies https://github.com/backstage/backstage and the expected path packages/catalog-model/examples exists in the repository with valid YAML example files. The git checkout configuration contains no apparent errors (the depth: 1 setting is valid, and using the default branch without an explicit ref is standard practice).

To diagnose the actual pipeline failure, please provide the specific error logs from the pipeline run. The assertion about "reference not found" during git clone cannot be confirmed from the fixture configuration alone.

scrapers/exec/exec_git_integration_test.go (1)

14-66: No issues found - fixture and script are correctly implemented.

The fixture file fixtures/exec-git-checkout-test.yaml exists and is properly configured. The script correctly outputs 2 JSON objects matching the test expectations (config-1 and config-2), and the test code is well-formed and properly aligned with the fixture structure. No evidence of pipeline failure or missing components exists.

Likely an incorrect or invalid review comment.

fixtures/exec-comprehensive.yaml (1)

9-26: The YAML structure is correct and intentional—not a syntax error.

Lines 9 and 12 do not create a YAML parsing error. The first name (line 9) identifies the exec scraper configuration, while the second name (line 12) specifies the JSONPath expression to extract the name field from scraped items. This is the documented schema pattern for exec scrapers, as evidenced by:

  • All four examples in this file follow the same pattern (lines 9/12, 29/32, 51/54, 65/68)
  • The README explicitly documents this under "Key Features Demonstrated → Field extraction: JSONPath-based ID, name, type extraction"
  • The comprehensive example demonstrates this pattern works across multiple scraper types

The config-db exec scraper schema intentionally uses name for both the config identifier and the output field extraction specification.

Likely an incorrect or invalid review comment.

scrapers/exec/exec.go (4)

1-16: LGTM!

Clean package structure with appropriate imports and a straightforward CanScrape implementation.


18-48: LGTM!

The scrape loop correctly:

  • Maps configuration fields to the shell execution config
  • Handles execution errors gracefully by appending error results and continuing
  • Captures stderr in the error message for non-zero exit codes
  • Delegates output parsing to parseOutput

50-74: Verify plain text handling behavior.

YAML is extremely permissive—almost any plain text is valid YAML. This means the plain text fallback (lines 71-73) will rarely execute. Instead, plain text outputs will:

  1. Fail JSON parsing
  2. Pass YAML parsing (as a scalar string)
  3. Go through createResultsFromJSON → get re-marshaled as a JSON string

For example, script output Hello World becomes "Hello World" (quoted), and multi-line text gets escaped newlines.

If this is intentional (always produce JSON-compatible output), this is fine. If raw text preservation is expected, consider checking config.Format or adding explicit format detection.


76-112: LGTM!

The function correctly handles all JSON types:

  • Arrays produce one result per item
  • Objects and scalars produce single results
  • Marshal errors are captured as error results without stopping processing
config/schemas/scrape_config.schema.json (3)

922-1008: LGTM!

The Exec definition correctly:

  • Inherits BaseScraper fields for consistent metadata
  • Requires script as the only mandatory field
  • References valid definitions for connections, checkout, env, and artifacts

1009-1035: LGTM!

The ExecConnections definition provides flexible cloud provider connection options (AWS, GCP, Azure, Kubernetes) with proper references to their respective connection types.


2995-3000: LGTM!

The exec property is correctly added to ScraperSpec as an optional array of Exec definitions, consistent with other scraper types in the schema.

chart/crds/configs.flanksource.com_scrapeconfigs.yaml (1)

2508-3794: Exec scraper schema looks consistent; ensure it stays generated from the canonical types

The new spec.exec[] schema (artifacts, checkout, connections, env, script) looks structurally consistent with the rest of the CRD:

  • Uses the same valueFrom pattern (configMapKeyRef/helmRef/secretKeyRef/serviceAccount) as other credential-bearing fields.
  • connections mirrors existing AWS/Azure/GCP/Kubernetes connection shapes.
  • env reuses the familiar EnvVar‑like structure.
  • script is required and clearly documented as “inline or path to script file”.

Given the size and complexity, it’s important this section remains derived from the Go API types / JSON Schemas rather than hand-edited, so future changes to Exec/ExecConnections don’t silently diverge from the installed CRD.

If not already part of CI, consider a simple guard:

#!/bin/bash
# Regenerate CRDs & schemas and verify no diff.
make generate-crds generate-schemas  # or the project’s existing targets

git diff -- chart/crds/configs.flanksource.com_scrapeconfigs.yaml config/schemas || true

to ensure the CRD, Go types, and JSON Schemas for exec remain in sync.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
cmd/server.go (1)

103-109: Scoped nolint for deprecated Echo logger is acceptable, but plan a migration

The localized //nolint:staticcheck // SA1019 ignore. around middleware.LoggerWithConfig is a reasonable way to quiet the deprecation warning for now. Consider migrating to the recommended logging approach from Echo in a follow‑up so this suppression can eventually be removed.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 538d0cd and d457b0c.

📒 Files selected for processing (9)
  • Makefile (1 hunks)
  • cmd/server.go (2 hunks)
  • fixtures/exec-aws-connection.yaml (1 hunks)
  • fixtures/exec-backstage-catalog.yaml (1 hunks)
  • fixtures/exec-comprehensive.yaml (1 hunks)
  • fixtures/exec-env-transform.yaml (1 hunks)
  • fixtures/exec-git-checkout-test.yaml (1 hunks)
  • fixtures/exec-yaml-output.yaml (1 hunks)
  • scrapers/exec/exec_backstage_integration_test.go (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (6)
  • fixtures/exec-comprehensive.yaml
  • fixtures/exec-yaml-output.yaml
  • fixtures/exec-backstage-catalog.yaml
  • scrapers/exec/exec_backstage_integration_test.go
  • fixtures/exec-aws-connection.yaml
  • fixtures/exec-git-checkout-test.yaml
🔇 Additional comments (3)
cmd/server.go (1)

28-32: Config‑DB imports re‑grouping looks fine

Reordering v1, db, jobs, and utils alongside scrapers is clean and non‑functional; no issues from this change.

Makefile (1)

196-199: golangci‑lint version bump is straightforward; verify CI/local compatibility

Updating GOLANGCI_LINT_VERSION to v2.7.2 fits the existing /v2 install path and should be safe, but it may enable/alter some linters. Please confirm make golangci-lint and CI lints still pass as expected with this version.

fixtures/exec-env-transform.yaml (1)

54-55: .toJSON() is a valid and well-supported method in gomplate v3.

The expression uses gomplate templating (not CEL), and .toJSON() is a documented gomplate function that marshals objects to JSON strings. This pattern is consistently used throughout the codebase for relationship expressions.

Likely an incorrect or invalid review comment.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (3)
api/v1/exec.go (2)

9-13: Consider adding validation for the Script field.

The Script field lacks an omitempty tag, making it required in the JSON/YAML, but Go strings default to empty. Without validation tags or runtime checks, an empty script could reach the execution layer and cause issues.

Consider one of these approaches:

  1. Add a validation tag if using a validation framework:
-	Script string `json:"script" yaml:"script" template:"true"`
+	Script string `json:"script" yaml:"script" template:"true" validate:"required,min=1"`
  1. Document that validation is performed at runtime in the scraper implementation.

21-22: Consider whether environment variables need templating support.

The Env field lacks a template:"true" tag while other dynamic fields like Script and Artifacts have it. If environment variable values need to support dynamic templating (e.g., referencing other config values), consider adding the template tag.

If templating is needed for environment variables:

-	Env []types.EnvVar `json:"env,omitempty" yaml:"env,omitempty"`
+	Env []types.EnvVar `json:"env,omitempty" yaml:"env,omitempty" template:"true"`
chart/crds/configs.flanksource.com_scrapeconfigs.yaml (1)

3338-3342: Consider renaming boolean serviceAccount flag under exec.connections.kubernetes

Within spec.exec[*].connections.kubernetes, serviceAccount is a boolean toggle (“will allow access to KUBERNETES env vars”), whereas throughout the rest of the CRD serviceAccount is a string field (the SA name) inside various valueFrom blocks.

The meaning difference is clear from the description, but the reused name with a different type may be confusing to users and tooling.

If you have the flexibility, consider renaming this boolean to something like useServiceAccountEnv or enableServiceAccountEnv for clarity and to avoid overloading serviceAccount across different shapes.

Also applies to: 3362-3406

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d457b0c and f9d4206.

📒 Files selected for processing (5)
  • api/v1/exec.go (1 hunks)
  • api/v1/zz_generated.deepcopy.go (3 hunks)
  • chart/crds/configs.flanksource.com_scrapeconfigs.yaml (5 hunks)
  • fixtures/exec-python.yaml (1 hunks)
  • scrapers/exec/exec.go (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
  • scrapers/exec/exec.go
  • api/v1/zz_generated.deepcopy.go
  • fixtures/exec-python.yaml
🧰 Additional context used
🧬 Code graph analysis (1)
api/v1/exec.go (1)
api/v1/common.go (2)
  • BaseScraper (240-255)
  • Script (30-35)
🔇 Additional comments (6)
api/v1/exec.go (2)

3-7: Imports are correctly specified and all types exist in duty v1.0.1126.

connection.ExecConnections, connection.GitConnection, types.EnvVar, and shell.Artifact are all active in the codebase with proper code generation support.


15-19: Consider adding template:"true" to Checkout field for consistency.

The Connections field supports templating via template:"true", but Checkout does not. If Git repository URLs or checkout parameters should support dynamic values (e.g., branch names or repository URLs from environment variables), the template tag should be added to maintain consistency with how Script and Artifacts fields are handled.

Note: The Env field also lacks the template tag, suggesting this may be intentional. If repository URLs should remain static for stability or security reasons, the current design is appropriate.

chart/crds/configs.flanksource.com_scrapeconfigs.yaml (4)

6-6: controller-gen version bump looks fine

The CRD annotation is updated to v0.19.0; no issues from the manifest perspective assuming the generator/tooling is aligned with this version in the build pipeline.


1985-2499: ClickHouse + Azure Blob additions are structurally consistent

The new/expanded clickhouse.azureBlobStorage and related ClickHouse fields (clickhouseURL, query, credentials with valueFrom, required collection) mirror the existing credential/valueFrom patterns used elsewhere in the CRD and keep query-based behavior for ClickHouse. I don't see structural or type inconsistencies here.


2508-3795: Exec scraper schema looks coherent; nice reuse of connection/env patterns

The new spec.exec[] block (artifacts, checkout, connections, env, script, transforms, etc.) matches existing schema conventions:

  • artifacts[*].path is correctly required.
  • checkout and nested connection/credential fields follow the established {name,value,valueFrom} pattern.
  • connections covers AWS/Azure/GCP/Kubernetes variants with the same shapes used elsewhere.
  • env[] and script are modelled as expected, with script correctly marked as required.

This all lines up with an “exec + optional git checkout + connection injection” model and should be easy to consume from clients.


3500-3502: Exec script semantics are clear; required flag is appropriate

script is documented as “an inline script to run” and is the only required field in the exec item, which matches the intended API: checkout/env/connections are optional helpers around a core script. This looks like the right balance between flexibility and validation.

Also applies to: 3793-3794

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
scrapers/exec/exec_git_integration_test.go (1)

42-66: Make config field assertions safer to avoid panics on shape changes

The direct type assertions on the nested config fields can panic if the Exec scraper ever changes the shape or type of the returned payload:

configData1 := config1["config"].(map[string]any)
...
configData2 := config2["config"].(map[string]any)

You can keep the test strict while avoiding panics by checking the assertion result:

-		configData1 := config1["config"].(map[string]any)
-		Expect(configData1["key"]).To(Equal("value1"))
-		Expect(configData1["source"]).To(Equal("git-checkout-test"))
+		configData1, ok := config1["config"].(map[string]any)
+		Expect(ok).To(BeTrue(), "config field should be a map[string]any")
+		Expect(configData1["key"]).To(Equal("value1"))
+		Expect(configData1["source"]).To(Equal("git-checkout-test"))
@@
-		configData2 := config2["config"].(map[string]any)
-		Expect(configData2["key"]).To(Equal("value2"))
-		Expect(configData2["source"]).To(Equal("git-checkout-test"))
+		configData2, ok := config2["config"].(map[string]any)
+		Expect(ok).To(BeTrue(), "config field should be a map[string]any")
+		Expect(configData2["key"]).To(Equal("value2"))
+		Expect(configData2["source"]).To(Equal("git-checkout-test"))

This keeps the intent of the assertions but makes the test failure mode clearer and less brittle to internal representation changes.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b2b2e06 and 0f29a7b.

📒 Files selected for processing (3)
  • fixtures/exec-azure-devops.yaml (1 hunks)
  • fixtures/exec-git-checkout-test.yaml (1 hunks)
  • scrapers/exec/exec_git_integration_test.go (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
scrapers/exec/exec_git_integration_test.go (4)
api/v1/exec.go (1)
  • Exec (9-26)
api/context.go (1)
  • NewScrapeContext (29-34)
scrapers/exec/exec_suite_test.go (1)
  • DefaultContext (22-22)
scrapers/exec/exec.go (1)
  • ExecScraper (13-13)
🪛 GitHub Actions: Test
scrapers/exec/exec_git_integration_test.go

[error] 34-34: Exec Scraper - Git Checkout Integration: should checkout git repository and execute script. Expected 2 config items from script output but got 1. failed to execute script: error during git clone: reference not found.

🔇 Additional comments (1)
fixtures/exec-git-checkout-test.yaml (1)

1-45: Fixture structure and script output look consistent

The ScrapeConfig wiring (type/id/name JSONPaths, inline script JSON payload, and git checkout settings) is coherent and matches the Exec scraper expectations; no issues spotted here.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
.github/workflows/test.yml (2)

109-110: Consider pinning the PyYAML version for reproducibility.

Same concern as in the test job: installing pyyaml without a version constraint can lead to non-reproducible builds and supply chain risks.

Apply this diff to pin the PyYAML version:

-      - name: Install pyyaml for backstage test
-        run: pip3 install pyyaml
+      - name: Install pyyaml for backstage test
+        run: pip3 install pyyaml==6.0.2

28-29: Pin PyYAML version in both test jobs for reproducibility.

Installing pyyaml without a version constraint reduces build reproducibility and introduces supply chain risk. Pinning dependencies in files like requirements.txt is widely considered good practice, and pinning versions is crucial for maintaining stability and reproducibility, as builds could fail due to breaking changes in new package versions.

Both the test job (lines 28-29) and test-prod job (lines 109-110) install pyyaml without pinning. Pin both to the same version:

- name: Install pyyaml for backstage test
  run: pip3 install pyyaml==6.0.2

Note: The e2e job correctly does not require this dependency, as backstage integration tests are only part of the main test suite, not the e2e tests.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a00e824 and 062df7a.

📒 Files selected for processing (2)
  • .github/workflows/test.yml (2 hunks)
  • fixtures/exec-backstage-catalog.yaml (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • fixtures/exec-backstage-catalog.yaml

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants