Skip to content

Conversation

@moshloop
Copy link
Member

@moshloop moshloop commented Dec 14, 2025

Summary by CodeRabbit

  • New Features

    • Azure AD scraper with user, group, app registration, and login data collection via Microsoft Graph
    • GitHub Security scraper for vulnerability and alert scanning across repositories
    • OpenSSF Scorecard scraper for security assessment scoring with compliance mappings
    • GCP and Azure connection configuration support
    • Debug and trace mode flags for enhanced troubleshooting
  • Bug Fixes

    • Fixed typo in AWS Instance struct field name
    • Improved Azure DevOps pipeline parameter handling
  • Improvements

    • Enhanced JSON/YAML error handling with contextual logging
    • Added GitHub API rate-limit detection and pause handling
    • Better result formatting and display utilities

✏️ Tip: You can customize this high-level summary in your review settings.

- Add GitHubSecurity and OpenSSFScorecard config types to ScraperSpec
- Create GitHubSecurityClient with methods for fetching alerts:
  * Dependabot alerts
  * Code scanning alerts
  * Secret scanning alerts
  * Security advisories
- Implement rate limit checking and backoff logic
- Create GithubSecurityScraper with repository scraping logic
- Add health status calculation based on alert severity
- Add repository properties for alert counts
- Fix duplicate Includes method in azure.go
- Generate deepcopy code for new types

Partial implementation - FIXME markers for:
- Full pagination support
- ConfigInsight creation
- Complete alert filtering
- OpenSSF scraper implementation
OpenSSF Scorecard Scraper:
- Add compliance framework mappings (SOC 2, NIST SSDF, CIS Controls)
- Implement ScorecardClient with retry logic and exponential backoff
- Parse scorecard API responses with all check results
- Calculate health status based on score thresholds and critical checks
- Create repository config items with badges, scores, and properties
- Support minScore filtering

GitHub Security Scraper Updates:
- Fix compilation errors
- Remove unused models import
- Fix GetSecurityAdvisories API call
- Add createConfigInsights stub for future implementation

Scraper Registration:
- Register GithubSecurityScraper in All scrapers registry
- Register OpenSSFScorecardScraper in All scrapers registry
- Add openssf import to common.go

Both scrapers:
- Follow existing patterns from GithubActionsScraper
- Implement CanScrape() and Scrape() interface methods
- Create typed config items with health status
- Use ScrapeContext for logging and configuration
- Handle errors gracefully and continue with other repositories

FIXME markers remain for:
- Full pagination support for GitHub alerts
- Complete ConfigInsight creation for alerts
- Advanced filtering logic
GitHub Security Fixtures:
- dependabot-alerts.json: 2 sample alerts (critical lodash, high axios)
- code-scanning-alerts.json: 2 CodeQL alerts (SQL injection, path injection)
- secret-scanning-alerts.json: 2 secret alerts (GitHub PAT, AWS key)
- security-advisories.json: 1 published advisory (SQL injection)

OpenSSF Scorecard Fixtures:
- scorecard-high-score.json: Score 8.2/10 with good security practices
- scorecard-medium-score.json: Score 5.4/10 with some gaps
- scorecard-low-score.json: Score 2.8/10 with critical failures

All fixtures contain realistic data structure matching actual API responses
for use in unit tests with mocked HTTP clients.
- Document GitHub Security fixtures structure and usage
- Document OpenSSF Scorecard fixtures and expected health status
- Include API endpoint references and compliance mapping info
- Add code examples for using fixtures in tests
GitHub Security Scraper:
- Add LastAlertScrapeTime sync.Map to track last update timestamp per repo
- Store max alert updated_at timestamp after each scrape
- Use stored timestamp on subsequent runs for incremental fetching
- Add V(2) logging: repository start, completion with alert counts
- Add V(3) logging: incremental vs full scrape mode, total counts per type
- Add Debugf: individual alert type fetch counts (Dependabot, code scan, secrets)
- Add Tracef: individual alert details with numbers, titles, severities
- Add rate limit status logging at V(3) level

OpenSSF Scorecard Scraper:
- Add LastScorecardScrapeTime sync.Map to track last scrape per repo
- Implement 24h cache TTL to skip redundant API calls
- Check cache before fetching and skip if within TTL window
- Add V(2) logging: repository start, scorecard completion with scores
- Add V(3) logging: cache expiration, retry attempts with backoff duration
- Add Debugf: overall score and check counts, non-retryable error details
- Add Tracef: API request URLs, individual check scores with compliance counts
- Store scrape time immediately after successful fetch

Performance Benefits:
- GitHub: Only fetch alerts updated since last scrape (reduces API calls)
- OpenSSF: Skip API calls entirely if data less than 24h old
- Both: Detailed logging enables troubleshooting and performance monitoring

Logging Levels:
- V(2): Important operations (scrape start/end, counts)
- V(3): Detailed operational info (cache, retries, incremental mode)
- Debugf: Per-operation results and error details
- Tracef: Individual item processing and API interactions
Changed OpenSSF scorecard scraper to follow the Azure Advisor pattern:
- Repository config item stores full API response JSON as config
- Each security check becomes a separate ConfigAnalysis (insight) record
- Insights are linked to parent repository via external ID

ConfigAnalysis Records:
- Analyzer: Check name (e.g., "Code-Review", "SAST")
- AnalysisType: models.AnalysisTypeSecurity
- Severity: Mapped from check score (0-3=critical, 4-6=high, 7-9=medium, 10=low)
- Status: "failing" (score <10) or "passing" (score 10)
- Summary: Check reason from API
- Messages: Individual detail lines from check
- Analysis JSON contains:
  * Check name, score, max_score, reason
  * Details array
  * Documentation links
  * Compliance mappings (SOC 2, NIST SSDF, CIS Controls)

Logging Improvements:
- Debug: Analysis record creation count
- Trace: Individual check details with compliance mapping counts
- Trace: Analysis creation with severity and status

Benefits:
- Each check visible as separate insight in UI
- Compliance framework mappings accessible via analysis JSON
- Historical tracking via FirstObserved/LastObserved
- Queryable by severity, status, analyzer
- Follows established AnalysisResult pattern from Azure Advisor
- Remove duplicate fetchAppRegistrations method (already in active_directory.go)
- Remove duplicate appToScrapeResult method (already in active_directory.go)
- Remove fetchDNS call (method doesn't exist)
- Fix fetchAppRegistrations call to use correct Entra.AppRegistrations field
- Remove unused msgraph imports: graphcore, applications, msgraphModels

These duplicates were causing compilation errors. The correct implementations
exist in active_directory.go with proper resource selector support.
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 14, 2025

Walkthrough

This pull request introduces three new scraper types (Azure AD, GitHub Security, OpenSSF Scorecard), extends the configuration API with new connection and filter structures, implements corresponding scraper logic with GitHub and OpenSSF API integrations, adds debug-level logging support, and includes test fixtures and supporting infrastructure updates.

Changes

Cohort / File(s) Change Summary
New Scraper Type Definitions
api/v1/azure_ad.go, api/v1/github_security.go, api/v1/openssf.go
Introduces AzureAD, GitHubSecurity, and OpenSSFScorecard configuration structs with embedded BaseScraper and associated filter/connection types; adds CELFilter alias, MsGraphFilter, GitHubSecurityFilters, and OpenSSFRepository supporting types with YAML/JSON tags.
Connection & Configuration Types
api/v1/common.go
Adds GCPConnection and AzureConnection types with authentication fields; introduces Pretty(), PrettyShort() methods on Script and Connection for styled text rendering; adds GetModel() method to GCPConnection.
Core Configuration Extensions
api/v1/types.go
Extends ScraperSpec struct to include GitHubSecurity and OpenSSFScorecard fields as first-class configurable items.
Result Types & Aggregation
api/v1/interface.go
Adds omitempty tags to AnalysisResult and ChangeResult fields; introduces FullScrapeResults type for aggregating multiple result facets; adds MergeScrapeResults() function; extends ScrapeResult with Debug(), PrettyRow(), IsMetadataOnly(), and Pretty() methods; adds pretty:"hide" tags to non-essential fields.
Debug Support
api/context.go, api/v1/scrapeconfig_types.go
Adds IsDebug() method to ScrapeContext; adds IsDebug() and IsTrace() methods to ScrapeConfig reading from Annotations with case-insensitive comparison.
Scraper Registry & Configuration
api/v1/zz_generated.deepcopy.go, scrapers/common.go
Generates DeepCopyInto and DeepCopy methods for all new types; registers GithubSecurityScraper and OpenSSFScorecardScraper in global scraper registry.
GitHub Security Scraper Implementation
scrapers/github/security.go, scrapers/github/security_client.go
Implements GithubSecurityScraper with Scrape and CanScrape methods; introduces GitHubSecurityClient wrapper for GitHub API interactions; adds methods to fetch Dependabot, code scanning, and secret scanning alerts; includes rate-limiting logic and alert aggregation.
OpenSSF Scorecard Scraper Implementation
scrapers/openssf/scorecard.go, scrapers/openssf/client.go, scrapers/openssf/compliance.go
Implements OpenSSFScorecardScraper with repository scorecard fetching; adds ScorecardClient with retry logic and API integration; introduces ComplianceMapping type and GetComplianceMappings() function for compliance framework mappings.
GitHub Security Test Fixtures
scrapers/github/testdata/security/*
Adds README.md documentation and four JSON fixture files (dependabot-alerts.json, code-scanning-alerts.json, secret-scanning-alerts.json, security-advisories.json) containing sample GitHub security alert data.
OpenSSF Scorecard Test Fixtures
scrapers/openssf/testdata/*
Adds README.md documentation and three JSON scorecard fixture files (scorecard-high-score.json, scorecard-medium-score.json, scorecard-low-score.json) with varying compliance check results.
Azure Scraper Updates
scrapers/azure/azure.go, scrapers/azure/devops/client.go
Replaces DNS-based flow with Graph API integration for App Registrations; changes condition checks from constants to string-based keys; updates Pipeline and Run TemplateParameters from map[string]string to map[string]any; modifies GetLabels and GetTags to stringify template parameter values.
AWS Scraper Updates
scrapers/aws/aws.go, scrapers/aws/types.go
Adds Session field of type *aws.Config to AWSContext; introduces empty Beanstalk method; increases ECS task description logging verbosity from V(2) to V(3); refactors DNS alias target handling to mark as Type "Alias" instead of comment.
Processing & Utilities
scrapers/processors/json.go, scrapers/processors/script.go, scrapers/sql/sql.go
Modularizes JSON extraction pipeline with parseProperties, extractItems, and transform helpers; enhances error messages and debug logging for JSON/YAML parsing and script transformation; adds IsDebug()-gated logging for SQL queries and responses using clicky.
Command & Configuration
cmd/run.go
Adds export and save boolean flags; integrates clicky Flags for unified flag handling; removes output-dir/DB connection requirement guard; branches export logic to write files or print aggregated results via clicky.
Dependency Management & Cleanup
go.mod, .gitignore
Substantial version updates and replacements across cloud SDKs (Google Cloud, Azure), OpenTelemetry, and flanksource packages; expands .gitignore to include database directories (.db/, db/), OS artifacts (.DS_Store), and build outputs (out/, bin/).

Sequence Diagram

sequenceDiagram
    participant User
    participant ScrapeContext
    participant GithubSecurityScraper
    participant GitHubSecurityClient
    participant GitHubAPI
    participant Results
    participant Database

    User->>ScrapeContext: Scrape(ctx)
    ScrapeContext->>GithubSecurityScraper: Scrape(ctx)
    
    GithubSecurityScraper->>GithubSecurityScraper: Iterate GitHubSecurity configs
    
    loop For each repository
        GithubSecurityScraper->>GitHubSecurityClient: NewGitHubSecurityClient()
        GitHubSecurityClient-->>GithubSecurityScraper: client
        
        GithubSecurityScraper->>GitHubSecurityClient: CheckRateLimit()
        GitHubSecurityClient->>GitHubAPI: Query rate limit status
        GitHubAPI-->>GitHubSecurityClient: RateLimits
        GitHubSecurityClient-->>GithubSecurityScraper: RateLimits
        
        alt Rate limit exceeded
            GithubSecurityScraper->>GithubSecurityScraper: ShouldPauseForRateLimit()
            Note over GithubSecurityScraper: Pause if needed
        end
        
        GithubSecurityScraper->>GitHubSecurityClient: GetDependabotAlerts()
        GitHubSecurityClient->>GitHubAPI: Fetch Dependabot alerts
        GitHubAPI-->>GitHubSecurityClient: DependabotAlerts
        GitHubSecurityClient-->>GithubSecurityScraper: Alerts
        
        GithubSecurityScraper->>GitHubSecurityClient: GetCodeScanningAlerts()
        GitHubSecurityClient->>GitHubAPI: Fetch code scanning alerts
        GitHubAPI-->>GitHubSecurityClient: CodeScanningAlerts
        GitHubSecurityClient-->>GithubSecurityScraper: Alerts
        
        GithubSecurityScraper->>GitHubSecurityClient: GetSecretScanningAlerts()
        GitHubSecurityClient->>GitHubAPI: Fetch secret scanning alerts
        GitHubAPI-->>GitHubSecurityClient: SecretScanningAlerts
        GitHubSecurityClient-->>GithubSecurityScraper: Alerts
        
        GithubSecurityScraper->>GithubSecurityScraper: Aggregate alert counts & health status
        GithubSecurityScraper->>Results: Append ScrapeResult
        Results-->>GithubSecurityScraper: Updated results
    end
    
    GithubSecurityScraper-->>ScrapeContext: ScrapeResults
    
    alt Save flag set
        ScrapeContext->>Database: Save results to DB
        Database-->>ScrapeContext: Success
    end
    
    ScrapeContext-->>User: Final results
Loading

Pre-merge checks and finishing touches

❌ Failed checks (1 warning, 1 inconclusive)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 16.67% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
Title check ❓ Inconclusive The title 'Pr/GitHub security' is vague and generic, using a branch name prefix ('Pr/') rather than describing the actual changes made in the pull request. Revise the title to clearly describe the main changes, such as 'Add GitHub Security scraper and OpenSSF Scorecard integration' or 'Implement GitHub security and OpenSSF scanners'.
✅ Passed checks (1 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch pr/github-security

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 19

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
.gitignore (1)

2-2: Fix typo: _DS_Store should be .DS_Store.

Line 2 contains _DS_Store (with underscore) instead of .DS_Store (with dot). This typo means macOS metadata files won't be ignored as intended. Line 30 correctly adds .DS_Store, making line 2 redundant once fixed.

Apply this diff:

-.vscode
-_DS_Store
+.vscode
+.DS_Store
🧹 Nitpick comments (8)
scrapers/aws/aws.go (1)

1994-1996: Add a TODO comment or implement the method.

The Beanstalk method has an empty body. If this is intentional for future implementation, add a TODO comment to document the intent. Otherwise, consider removing it until it's ready to be implemented.

 func (aws Scraper) Beanstalk(ctx *AWSContext, config v1.AWS, results *v1.ScrapeResults) {
-
+  // TODO: Implement Beanstalk scraper
 }
scrapers/azure/azure.go (1)

642-642: Consider using constants for include checks.

Multiple include checks use string literals ("virtualMachines", "resourceGroups", "storageAccounts", "appServices", "privateDns", "trafficManager", "securityGroups", "publicIps") instead of the defined constants at lines 44-60 (e.g., IncludeVirtualMachines, IncludeResourceGroups). Using constants improves type safety and reduces the risk of typos.

Example:

-	if !azure.config.Includes("virtualMachines") {
+	if !azure.config.Includes(IncludeVirtualMachines) {

Also applies to: 739-739, 812-812, 850-850, 898-898, 936-936, 973-973, 1012-1012

cmd/run.go (2)

119-124: Consider graceful error handling for output formatting.

clicky.MustPrint will panic if formatting fails. While acceptable for a CLI tool with unrecoverable errors, explicit error handling would provide clearer diagnostics.

Consider handling errors explicitly:

-		clicky.MustPrint(all.Configs)
-		clicky.MustPrint(all.Changes)
-		clicky.MustPrint(all.Analysis)
-		clicky.MustPrint(all.Relationships)
+		if err := clicky.Print(all.Configs); err != nil {
+			logger.Errorf("failed to print configs: %v", err)
+		}
+		if err := clicky.Print(all.Changes); err != nil {
+			logger.Errorf("failed to print changes: %v", err)
+		}
+		if err := clicky.Print(all.Analysis); err != nil {
+			logger.Errorf("failed to print analysis: %v", err)
+		}
+		if err := clicky.Print(all.Relationships); err != nil {
+			logger.Errorf("failed to print relationships: %v", err)
+		}

127-137: Consider graceful error handling for summary formatting.

Similar to the output formatting above, clicky.MustFormat (line 135) will panic on error. Consider explicit error handling for better diagnostics.

Apply this diff:

-		fmt.Println(clicky.MustFormat(summary))
+		formatted, err := clicky.Format(summary)
+		if err != nil {
+			logger.Errorf("failed to format summary: %v", err)
+		} else {
+			fmt.Println(formatted)
+		}
scrapers/openssf/scorecard.go (2)

21-22: Consider memory management for the global cache.

The LastScorecardScrapeTime sync.Map will grow unboundedly as repositories are scraped over time. Consider adding a cleanup mechanism or using a TTL-based cache to prevent memory leaks in long-running processes.


265-272: Consider using slices.Contains from standard library.

Go 1.21+ provides slices.Contains which is more idiomatic and efficient than a manual loop.

+import "slices"
+
 func containsString(slice []string, item string) bool {
-	for _, s := range slice {
-		if s == item {
-			return true
-		}
-	}
-	return false
+	return slices.Contains(slice, item)
 }
scrapers/github/security.go (1)

291-297: Stub function with FIXME comment.

The createConfigInsights function is unimplemented. Consider removing it if not needed in this PR, or tracking it as a follow-up issue.

Do you want me to open a new issue to track the implementation of createConfigInsights?

api/v1/interface.go (1)

106-106: Redundant omitempty on excluded JSON field.

The tag json:"-,omitempty" is redundant. The - already tells the JSON encoder to skip this field entirely, making ,omitempty unnecessary.

-	_map map[string]any `json:"-,omitempty"`
+	_map map[string]any `json:"-"`
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7952468 and 6922ccd.

⛔ Files ignored due to path filters (1)
  • go.sum is excluded by !**/*.sum
📒 Files selected for processing (34)
  • .gitignore (1 hunks)
  • api/context.go (1 hunks)
  • api/v1/azure_ad.go (1 hunks)
  • api/v1/common.go (4 hunks)
  • api/v1/github_security.go (1 hunks)
  • api/v1/interface.go (6 hunks)
  • api/v1/openssf.go (1 hunks)
  • api/v1/scrapeconfig_types.go (2 hunks)
  • api/v1/types.go (1 hunks)
  • api/v1/zz_generated.deepcopy.go (8 hunks)
  • cmd/run.go (4 hunks)
  • go.mod (11 hunks)
  • scrapers/aws/aws.go (4 hunks)
  • scrapers/aws/types.go (1 hunks)
  • scrapers/azure/azure.go (10 hunks)
  • scrapers/azure/devops/client.go (3 hunks)
  • scrapers/common.go (2 hunks)
  • scrapers/github/security.go (1 hunks)
  • scrapers/github/security_client.go (1 hunks)
  • scrapers/github/testdata/security/README.md (1 hunks)
  • scrapers/github/testdata/security/code-scanning-alerts.json (1 hunks)
  • scrapers/github/testdata/security/dependabot-alerts.json (1 hunks)
  • scrapers/github/testdata/security/secret-scanning-alerts.json (1 hunks)
  • scrapers/github/testdata/security/security-advisories.json (1 hunks)
  • scrapers/openssf/client.go (1 hunks)
  • scrapers/openssf/compliance.go (1 hunks)
  • scrapers/openssf/scorecard.go (1 hunks)
  • scrapers/openssf/testdata/README.md (1 hunks)
  • scrapers/openssf/testdata/scorecard-high-score.json (1 hunks)
  • scrapers/openssf/testdata/scorecard-low-score.json (1 hunks)
  • scrapers/openssf/testdata/scorecard-medium-score.json (1 hunks)
  • scrapers/processors/json.go (6 hunks)
  • scrapers/processors/script.go (2 hunks)
  • scrapers/sql/sql.go (2 hunks)
🧰 Additional context used
🧬 Code graph analysis (14)
api/v1/openssf.go (1)
api/v1/common.go (1)
  • BaseScraper (271-286)
scrapers/common.go (3)
scrapers/github/security.go (1)
  • GithubSecurityScraper (23-23)
api/global.go (1)
  • Scraper (20-23)
scrapers/openssf/scorecard.go (1)
  • OpenSSFScorecardScraper (25-25)
scrapers/azure/devops/client.go (1)
api/v1/interface.go (1)
  • ChangeResult (79-107)
api/v1/azure_ad.go (1)
api/v1/common.go (2)
  • BaseScraper (271-286)
  • AzureConnection (408-415)
api/v1/scrapeconfig_types.go (2)
api/v1/github_security.go (1)
  • GitHubSecurity (12-27)
api/v1/openssf.go (1)
  • OpenSSFScorecard (4-12)
scrapers/processors/json.go (6)
api/v1/interface.go (2)
  • ScrapeResult (652-723)
  • ChangeResult (79-107)
api/v1/json_types.go (1)
  • JSON (16-16)
api/v1/data.go (2)
  • Properties (26-26)
  • Property (29-39)
api/context.go (1)
  • ScrapeContext (16-27)
api/v1/common.go (3)
  • Template (471-477)
  • BaseScraper (271-286)
  • Script (33-38)
scrapers/processors/script.go (1)
  • RunScript (11-28)
cmd/run.go (2)
api/v1/interface.go (1)
  • MergeScrapeResults (612-648)
db/update.go (1)
  • SaveResults (465-467)
api/v1/github_security.go (1)
api/v1/common.go (1)
  • BaseScraper (271-286)
scrapers/openssf/client.go (1)
api/context.go (1)
  • ScrapeContext (16-27)
scrapers/github/security_client.go (4)
api/context.go (1)
  • ScrapeContext (16-27)
api/v1/github_security.go (1)
  • GitHubSecurity (12-27)
api/global.go (1)
  • Namespace (13-13)
scrapers/trivy/models.go (1)
  • Severity (32-32)
api/v1/zz_generated.deepcopy.go (4)
api/v1/azure_ad.go (7)
  • AzureAD (3-10)
  • AzureAppRegistrations (32-34)
  • MsGraphFilter (14-18)
  • AzureGroups (28-30)
  • AzureLogins (20-22)
  • AzureUsers (24-26)
  • CELFilter (12-12)
api/v1/common.go (2)
  • BaseScraper (271-286)
  • AzureConnection (408-415)
api/v1/github_security.go (3)
  • GitHubSecurity (12-27)
  • GitHubSecurityRepository (30-33)
  • GitHubSecurityFilters (36-45)
api/v1/openssf.go (2)
  • OpenSSFRepository (15-18)
  • OpenSSFScorecard (4-12)
api/v1/types.go (4)
api/v1/github.go (1)
  • GitHubActions (7-27)
api/v1/github_security.go (1)
  • GitHubSecurity (12-27)
api/v1/openssf.go (1)
  • OpenSSFScorecard (4-12)
api/v1/azure.go (1)
  • Azure (25-35)
scrapers/azure/azure.go (2)
api/v1/azure.go (1)
  • Entra (17-23)
api/global.go (1)
  • Scraper (20-23)
api/v1/interface.go (4)
api/v1/types.go (1)
  • ExternalID (165-175)
api/v1/const.go (1)
  • ChangeAction (3-3)
db/models/config_change.go (1)
  • ConfigChange (21-42)
api/v1/json_types.go (1)
  • JSON (16-16)
🪛 GitHub Actions: PR #1783
go.mod

[error] Dependency replacement points to '../duty' but the file /home/runner/work/config-db/duty/go.mod cannot be found.


[error] go mod tidy failed due to missing or incompatible module replacements; updates to go.mod are needed.

🔇 Additional comments (44)
scrapers/processors/script.go (1)

3-9: LGTM: Logger import added for debug logging.

The logger import is correctly added to support the new debug logging functionality.

scrapers/aws/aws.go (2)

59-60: LGTM!

The Session field exposes the AWS configuration to scraper methods, which is properly initialized and used throughout the codebase.


494-494: LGTM!

Increasing the log verbosity level from V(2) to V(3) reduces log noise for ECS task descriptions.

scrapers/github/testdata/security/security-advisories.json (1)

1-67: LGTM!

The test fixture provides comprehensive coverage of the GitHub Security Advisory API response format, including all relevant fields for security analysis (GHSA/CVE identifiers, severity, CVSS scores, CWE mappings, vulnerability details, and credits).

scrapers/azure/devops/client.go (2)

42-56: LGTM!

The change from map[string]string to map[string]any for TemplateParameters provides more flexibility for template parameter types. The updated GetLabels() method correctly stringifies all parameter values using fmt.Sprintf("%v", v), ensuring proper conversion regardless of the underlying type.


88-109: LGTM!

Consistent with the Pipeline changes, Run.TemplateParameters now uses map[string]any and GetTags() properly stringifies values. The implementation correctly handles type conversion and maintains backward compatibility with the existing Variables handling.

scrapers/openssf/client.go (1)

73-104: LGTM!

The retry logic with exponential backoff is well-implemented, and the method properly handles context, logging, and error propagation. The approach of retrying only on retryable errors (server errors, timeouts, connection issues) is appropriate for API clients.

api/v1/common.go (3)

69-95: LGTM!

The new Pretty() and PrettyShort() methods provide rich formatting with syntax highlighting for different script types. The use of lo.Ellipsis() in PrettyShort() to truncate long scripts (200 chars) is appropriate for preview contexts.


395-415: LGTM!

The new GCPConnection and AzureConnection types provide structured configuration for cloud provider credentials. The GetModel() implementation correctly maps GCP credentials to the models.Connection schema.


445-452: LGTM!

The Connection.Pretty() method appropriately calls RedactSecretValues() before rendering connection strings, preventing credential leakage in logs or UI display.

scrapers/aws/types.go (1)

563-563: No remaining references to old field name detected.

The field rename from TrsutedAdvisorChecks to TrustedAdvisorChecks is complete. A search of the codebase found no lingering references to the old field name, confirming that all usages have been properly updated.

api/v1/azure_ad.go (1)

1-34: LGTM!

Clean type definitions for the Azure AD scraper configuration. The struct embedding and YAML/JSON tags follow established patterns in the codebase.

scrapers/sql/sql.go (1)

61-74: LGTM with dependency!

The debug logging implementation using clicky is well-structured and properly guarded. However, ensure the nil guard fix for ctx.IsDebug() in api/context.go is applied to prevent potential panics.

scrapers/github/testdata/security/README.md (1)

1-105: LGTM!

Comprehensive documentation for the GitHub security test fixtures. The structure is clear and includes helpful usage examples.

scrapers/common.go (1)

10-10: LGTM!

Clean registration of the new GitHub Security and OpenSSF Scorecard scrapers in the global registry.

Also applies to: 36-36, 40-40

scrapers/openssf/testdata/README.md (1)

1-160: LGTM!

Excellent documentation for the OpenSSF Scorecard test fixtures. The health status expectations table and compliance framework mappings are particularly helpful for understanding test scenarios.

scrapers/openssf/testdata/scorecard-low-score.json (1)

1-170: LGTM!

Well-structured test fixture for the low-score OpenSSF Scorecard scenario. The data aligns with the expectations documented in the README.

cmd/run.go (1)

178-179: Verify the default behavior change is intentional.

The export flag now defaults to true, which means scraped data will always be printed or exported even without --output-dir. This changes the previous behavior where either an output directory or database connection was required. Ensure this aligns with the intended UX.

scrapers/openssf/testdata/scorecard-medium-score.json (1)

1-145: LGTM! Well-structured medium-score test fixture.

The test data provides realistic coverage for a medium-scoring repository (5.4/10) with a mix of passing and failing checks, suitable for testing scorecard parsing and filtering logic.

api/v1/scrapeconfig_types.go (1)

104-109: LGTM! Type dispatch correctly recognizes new scrapers.

The new type checks for GitHubSecurity and OpenSSFScorecard are properly integrated into the dispatch logic and return the expected type identifiers.

scrapers/openssf/testdata/scorecard-high-score.json (1)

1-247: LGTM! Comprehensive high-score test fixture.

This fixture provides excellent coverage with 19 checks and an overall score of 8.2/10, suitable for testing high-scoring repository scenarios. The additional checks (CI-Tests, CII-Best-Practices, Contributors, Fuzzing, Packaging, Signed-Releases, Webhooks) compared to the medium-score fixture provide more comprehensive test coverage.

scrapers/github/testdata/security/secret-scanning-alerts.json (1)

1-42: LGTM! Well-structured secret scanning alert fixtures.

The test data provides two realistic secret scanning scenarios with different secret types (GitHub PAT and AWS key) and validity states, suitable for testing alert parsing and categorization.

scrapers/github/testdata/security/code-scanning-alerts.json (1)

1-98: LGTM! Detailed code scanning alert fixtures.

The test data includes comprehensive CodeQL alert structures with location details, security classifications, and CWE mappings for both high and medium severity issues, providing good coverage for testing alert processing.

api/v1/openssf.go (1)

1-18: LGTM! Clean OpenSSF scraper type definitions.

The OpenSSFScorecard and OpenSSFRepository types are well-structured with:

  • Proper BaseScraper embedding for consistent scraper behavior
  • Required Repositories field for specifying targets
  • Optional MinScore pointer for filtering (nil = no filter)
  • Consistent JSON/YAML serialization tags
scrapers/github/testdata/security/dependabot-alerts.json (1)

1-136: LGTM! Comprehensive Dependabot alert fixtures.

The test data includes detailed security advisory information with CVSS scores, CWE mappings, vulnerability ranges, and patch versions for both critical and high severity vulnerabilities, providing thorough coverage for testing Dependabot alert parsing.

scrapers/openssf/compliance.go (3)

1-8: LGTM! Well-structured compliance mapping type.

The ComplianceMapping struct is cleanly defined with appropriate JSON tags and omitempty for optional fields.


10-108: LGTM! Comprehensive compliance mappings.

The mappings cover all standard OpenSSF Scorecard checks with appropriate SOC2, NIST SSDF, and CIS control references. The comment referencing the DataSheet source is helpful for traceability.


110-116: LGTM! Safe lookup with zero-value fallback.

The function correctly returns an empty ComplianceMapping when the check name is not found, avoiding nil pointer issues.

scrapers/processors/json.go (5)

440-453: LGTM! Clean refactoring of extraction flow.

The extraction logic is now properly decomposed into transformextractItems → property parsing → attribute extraction. The debug output on error (lines 450-452) is helpful for troubleshooting.


475-511: LGTM! Property parsing extracted into helper.

The parseProperties method cleanly handles property filtering, link cloning, and templating with appropriate error handling via result.Errorf.


513-530: LGTM! Item extraction properly handles recursion.

The extractItems method correctly recurses with WithoutItems() to prevent infinite loops while processing nested items.


532-557: LGTM! Transform with enhanced debug output.

The transform method provides helpful debug output including the script and input when errors occur, which will aid troubleshooting.


651-660: LGTM! Improved error handling for missing IDs.

The enhanced logic now allows standalone changes with external_change_id while still requiring IDs for regular configs. The debug output using input.Debug() provides better context.

scrapers/openssf/scorecard.go (2)

27-72: LGTM! Well-structured scraper implementation.

The Scrape method properly handles:

  • Iterating over configured repositories
  • TTL-based skip logic to avoid redundant API calls
  • MinScore filtering
  • Error handling with continuation for other repos
  • Comprehensive logging at appropriate verbosity levels

173-204: LGTM! Health calculation with critical check awareness.

The health status logic appropriately considers both the overall score and critical security checks (Code-Review, SAST, etc.), providing meaningful health states.

scrapers/github/security.go (2)

29-67: LGTM! Well-structured scraper with rate limit handling.

The scraper properly:

  • Handles rate limits proactively before making requests
  • Continues processing other repos on individual failures
  • Logs progress at appropriate verbosity levels

220-254: LGTM! Health status calculation with reasonable thresholds.

The health determination logic is clear:

  • Critical alerts → Unhealthy
  • 5+ high alerts → Unhealthy
  • Any high alerts → Warning
  • 10+ medium alerts → Warning
scrapers/github/security_client.go (3)

23-48: LGTM! Proper token resolution with fallback.

The client constructor correctly:

  1. Tries to hydrate connection first
  2. Falls back to environment variable
  3. Creates an authenticated OAuth2 client

144-164: LGTM! Proactive rate limit handling.

The threshold-based approach (100 remaining) with logging provides good protection against hitting rate limits while being transparent about pauses.


59-133: LGTM! Consistent error handling across alert fetchers.

All four alert fetching methods follow a consistent pattern with proper error wrapping and return values.

api/v1/zz_generated.deepcopy.go (1)

1-2160: Auto-generated deepcopy implementations look correct.

This file is generated by controller-gen (line 19) and the new DeepCopy/DeepCopyInto implementations for the added types (AzureAD, GitHubSecurity, OpenSSFScorecard, etc.) follow the standard patterns:

  • Proper nil checks before slice/pointer allocations
  • Correct use of make() for slice copying
  • Appropriate DeepCopyInto calls for nested structs with reference types

The implementations align with the source struct definitions provided in the relevant code snippets.

api/v1/interface.go (3)

598-648: New aggregation type and merge function look reasonable.

The FullScrapeResults struct properly aggregates multiple result facets, and MergeScrapeResults correctly:

  • Skips error-bearing results (line 616-618)
  • Converts AnalysisResult to ConfigAnalysis model (lines 620-622)
  • Excludes metadata-only configs (line 635-637)
  • Aggregates external entities (lines 639-644)

725-792: Debug method provides comprehensive output.

The Debug() method thoroughly covers all relevant fields of ScrapeResult and handles different config types (string, map[string]any, map[string]string) appropriately for display.


652-723: ScrapeResult display annotations are well-structured.

The pretty:"hide" tags appropriately hide internal/sensitive fields (DeletedAt, DeleteReason, ConfigClass, Aliases, Source, Config, Ready) from pretty-print output while keeping user-relevant fields visible.

Comment on lines +165 to +167
func (ctx ScrapeContext) IsDebug() bool {
return ctx.scrapeConfig.Spec.IsDebug()
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Add nil guard to prevent panic.

Both IsDebug() and IsTrace() (line 162) access ctx.scrapeConfig.Spec without checking if ctx.scrapeConfig is nil. Since ScrapeConfig() (line 140) can return nil, calling these methods will panic if the scrape config hasn't been set.

Apply this diff to add nil guards:

 func (ctx ScrapeContext) IsTrace() bool {
+	if ctx.scrapeConfig == nil {
+		return false
+	}
 	return ctx.scrapeConfig.Spec.IsTrace()
 }

 func (ctx ScrapeContext) IsDebug() bool {
+	if ctx.scrapeConfig == nil {
+		return false
+	}
 	return ctx.scrapeConfig.Spec.IsDebug()
 }

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In api/context.go around lines 162–167, both IsTrace() and IsDebug() dereference
ctx.scrapeConfig.Spec without guarding for a nil scrapeConfig (ScrapeConfig()
can return nil) which can cause a panic; update both methods to first check if
ctx.scrapeConfig == nil or ctx.scrapeConfig.Spec == nil and return false in that
case, otherwise call and return ctx.scrapeConfig.Spec.IsTrace() / IsDebug().

Comment on lines +47 to +53
// ParseMaxAge converts the MaxAge string to a time.Duration
func (f GitHubSecurityFilters) ParseMaxAge() (time.Duration, error) {
if f.MaxAge == "" {
return 0, nil
}
return time.ParseDuration(f.MaxAge)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Fix duration parsing to support documented format.

The ParseMaxAge() method uses time.ParseDuration(), which doesn't support the "d" (days) suffix mentioned in the comment on line 43. time.ParseDuration() only accepts units like "ns", "us", "ms", "s", "m", "h". Specifying "90d" will cause a parse error.

Consider implementing custom parsing or updating the documentation to use hours (e.g., "2160h" for 90 days):

 // ParseMaxAge converts the MaxAge string to a time.Duration
 func (f GitHubSecurityFilters) ParseMaxAge() (time.Duration, error) {
 	if f.MaxAge == "" {
 		return 0, nil
 	}
-	return time.ParseDuration(f.MaxAge)
+	// Support days suffix (e.g., "90d")
+	if strings.HasSuffix(f.MaxAge, "d") {
+		days, err := strconv.Atoi(strings.TrimSuffix(f.MaxAge, "d"))
+		if err != nil {
+			return 0, fmt.Errorf("invalid duration format: %w", err)
+		}
+		return time.Duration(days) * 24 * time.Hour, nil
+	}
+	return time.ParseDuration(f.MaxAge)
 }

Comment on lines +624 to +633
for _, change := range r.Changes {
configChange := models.ConfigChange{
ChangeType: change.ChangeType,
Severity: models.Severity(change.Severity),
Source: change.Source,
Summary: change.Summary,
CreatedAt: change.CreatedAt,
}
full.Changes = append(full.Changes, configChange)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

# Find ConfigChange struct definition
ast-grep --pattern $'type ConfigChange struct {
  $$$
  CreatedAt $_
  $$$
}'

Repository: flanksource/config-db

Length of output: 47


🏁 Script executed:

# Also search with rg to find ConfigChange definition
rg "type ConfigChange struct" -A 20

Repository: flanksource/config-db

Length of output: 47


🏁 Script executed:

# Find the Changes field type in the request struct around line 96
rg "type.*Request.*struct" -B 5 -A 30 | head -100

Repository: flanksource/config-db

Length of output: 814


🏁 Script executed:

# Find all Go files in db/models directory
fd "\.go$" db/models/ | head -20

Repository: flanksource/config-db

Length of output: 149


🏁 Script executed:

# Search for ConfigChange struct in the entire repo
rg "type ConfigChange" -A 15

Repository: flanksource/config-db

Length of output: 47


🏁 Script executed:

# Search for the request type with Changes field
rg "Changes.*\[\]" api/v1/ -B 5 -A 2

Repository: flanksource/config-db

Length of output: 2165


🏁 Script executed:

# Read the ConfigChange struct from db/models/config_change.go
cat db/models/config_change.go

Repository: flanksource/config-db

Length of output: 3098


🏁 Script executed:

# Find ChangeResult struct definition
rg "type ChangeResult struct" -A 20

Repository: flanksource/config-db

Length of output: 1571


Fix type mismatch for CreatedAt field assignment.

Line 630 assigns change.CreatedAt (type *time.Time) to configChange.CreatedAt (type time.Time). This is a type mismatch and will cause a compilation error. Use lo.FromPtr(change.CreatedAt) or dereference the pointer appropriately, similar to how it's done in NewConfigChangeFromV1 in db/models/config_change.go.

🤖 Prompt for AI Agents
In api/v1/interface.go around lines 624 to 633, the assignment
configChange.CreatedAt = change.CreatedAt is a type mismatch because
change.CreatedAt is *time.Time while configChange.CreatedAt is time.Time;
replace the direct assignment by converting the pointer to a value (e.g.,
configChange.CreatedAt = lo.FromPtr(change.CreatedAt) or dereference with a
nil-check like if change.CreatedAt != nil { configChange.CreatedAt =
*change.CreatedAt } ) following the pattern used in NewConfigChangeFromV1 in
db/models/config_change.go.

Comment on lines +794 to +810
func (s ScrapeResult) PrettyRow(options interface{}) map[string]api.Text {
row := make(map[string]api.Text)
row["ID"] = clicky.Text(s.ID)
row["Name"] = clicky.Text(s.Name)
row["Type"] = clicky.Text(s.Type)
row["Health"] = clicky.Text(string(s.Health))
row["Status"] = clicky.Text(s.Status)
if s.Error != nil {
row["Error"] = clicky.Text(s.Error.Error())
} else {
row["Error"] = clicky.Text("")
}
clicky.MustPrint([]map[string]api.Text{row}, clicky.FormatOptions{JSON: true})
clicky.MustPrint([]map[string]api.Text{row}, clicky.FormatOptions{Pretty: true})

return row
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Remove debug print statements from PrettyRow method.

Lines 806-807 contain MustPrint calls that print directly to stdout. This appears to be leftover debug code that should not be in a method designed to return a row representation.

Apply this diff to remove the debug statements:

 func (s ScrapeResult) PrettyRow(options interface{}) map[string]api.Text {
 	row := make(map[string]api.Text)
 	row["ID"] = clicky.Text(s.ID)
 	row["Name"] = clicky.Text(s.Name)
 	row["Type"] = clicky.Text(s.Type)
 	row["Health"] = clicky.Text(string(s.Health))
 	row["Status"] = clicky.Text(s.Status)
 	if s.Error != nil {
 		row["Error"] = clicky.Text(s.Error.Error())
 	} else {
 		row["Error"] = clicky.Text("")
 	}
-	clicky.MustPrint([]map[string]api.Text{row}, clicky.FormatOptions{JSON: true})
-	clicky.MustPrint([]map[string]api.Text{row}, clicky.FormatOptions{Pretty: true})
 
 	return row
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
func (s ScrapeResult) PrettyRow(options interface{}) map[string]api.Text {
row := make(map[string]api.Text)
row["ID"] = clicky.Text(s.ID)
row["Name"] = clicky.Text(s.Name)
row["Type"] = clicky.Text(s.Type)
row["Health"] = clicky.Text(string(s.Health))
row["Status"] = clicky.Text(s.Status)
if s.Error != nil {
row["Error"] = clicky.Text(s.Error.Error())
} else {
row["Error"] = clicky.Text("")
}
clicky.MustPrint([]map[string]api.Text{row}, clicky.FormatOptions{JSON: true})
clicky.MustPrint([]map[string]api.Text{row}, clicky.FormatOptions{Pretty: true})
return row
}
func (s ScrapeResult) PrettyRow(options interface{}) map[string]api.Text {
row := make(map[string]api.Text)
row["ID"] = clicky.Text(s.ID)
row["Name"] = clicky.Text(s.Name)
row["Type"] = clicky.Text(s.Type)
row["Health"] = clicky.Text(string(s.Health))
row["Status"] = clicky.Text(s.Status)
if s.Error != nil {
row["Error"] = clicky.Text(s.Error.Error())
} else {
row["Error"] = clicky.Text("")
}
return row
}
🤖 Prompt for AI Agents
In api/v1/interface.go around lines 794 to 810 the PrettyRow method contains
leftover debug print statements (clicky.MustPrint calls on lines ~806-807) that
write to stdout; remove those two clicky.MustPrint(...) calls so the method only
builds and returns the row map without printing, leaving the rest of the
function (row construction and return) unchanged.

Comment on lines +816 to +820
func (s ScrapeResult) Pretty() api.Text {
t := clicky.Text("")

return t
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Empty Pretty() implementation returns no useful output.

The Pretty() method creates an empty text and returns it immediately without any content. This appears to be a placeholder that was never implemented.

Either implement the method similar to Debug() or remove it if not needed. Would you like me to help generate an implementation?

🤖 Prompt for AI Agents
In api/v1/interface.go around lines 816 to 820, the ScrapeResult.Pretty() method
currently creates and returns an empty clicky.Text, providing no readable
output; implement Pretty() to produce a human-friendly textual representation
(similar to Debug()) by populating the clicky.Text with the key/high-level
fields of ScrapeResult (e.g., status, URL, timestamp, summary of data or errors)
formatted for readability, or if a pretty form is unnecessary remove the method
and its usages; ensure the implementation returns the constructed clicky.Text
populated with the chosen fields.

Comment on lines +155 to +168
func contains(s, substr string) bool {
return len(s) >= len(substr) && (s == substr || len(s) > len(substr) &&
(s[:len(substr)] == substr || s[len(s)-len(substr):] == substr ||
findSubstring(s, substr)))
}

func findSubstring(s, substr string) bool {
for i := 0; i <= len(s)-len(substr); i++ {
if s[i:i+len(substr)] == substr {
return true
}
}
return false
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Replace custom string utilities with standard library.

The contains() and findSubstring() functions reimplement functionality already available in the standard library. Use strings.Contains() for simpler, more maintainable code.

Apply this diff:

+import (
+	"strings"
+	// ... other imports
+)

 func (c *ScorecardClient) isRetryable(err error) bool {
 	if err == nil {
 		return false
 	}
 
 	errStr := err.Error()
-	return contains(errStr, "server error") ||
-	       contains(errStr, "timeout") ||
-	       contains(errStr, "connection")
+	return strings.Contains(errStr, "server error") ||
+	       strings.Contains(errStr, "timeout") ||
+	       strings.Contains(errStr, "connection")
 }
-
-func contains(s, substr string) bool {
-	return len(s) >= len(substr) && (s == substr || len(s) > len(substr) &&
-		(s[:len(substr)] == substr || s[len(s)-len(substr):] == substr ||
-		 findSubstring(s, substr)))
-}
-
-func findSubstring(s, substr string) bool {
-	for i := 0; i <= len(s)-len(substr); i++ {
-		if s[i:i+len(substr)] == substr {
-			return true
-		}
-	}
-	return false
-}

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In scrapers/openssf/client.go around lines 155 to 168, the custom contains() and
findSubstring() functions reimplement standard library behavior; replace all
uses of these helpers with strings.Contains from the standard library and remove
the two functions; also add or ensure "strings" is imported in the file and
remove the now-unused function definitions to keep the code clean.

Comment on lines +74 to +84
func shouldSkipScorecardFetch(ctx api.ScrapeContext, repoFullName string) (bool, string) {
if lastScrape, ok := LastScorecardScrapeTime.Load(repoFullName); ok {
lastTime := lastScrape.(time.Time)
timeSinceLastScrape := time.Since(lastTime)
if timeSinceLastScrape < ScorecardCacheTTL {
return true, fmt.Sprintf("last scraped %v ago (cache TTL: %v)", timeSinceLastScrape, ScorecardCacheTTL)
}
ctx.Logger.V(3).Infof("cache expired for %s (last scraped %v ago)", repoFullName, timeSinceLastScrape)
}
return false, ""
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Unsafe type assertion could panic.

The type assertion lastScrape.(time.Time) on line 76 will panic if the stored value is not a time.Time. While this is unlikely given the controlled usage, adding a check makes the code more robust.

Apply this diff for safer type assertion:

 func shouldSkipScorecardFetch(ctx api.ScrapeContext, repoFullName string) (bool, string) {
 	if lastScrape, ok := LastScorecardScrapeTime.Load(repoFullName); ok {
-		lastTime := lastScrape.(time.Time)
+		lastTime, ok := lastScrape.(time.Time)
+		if !ok {
+			return false, ""
+		}
 		timeSinceLastScrape := time.Since(lastTime)
 		if timeSinceLastScrape < ScorecardCacheTTL {
 			return true, fmt.Sprintf("last scraped %v ago (cache TTL: %v)", timeSinceLastScrape, ScorecardCacheTTL)
 		}
 		ctx.Logger.V(3).Infof("cache expired for %s (last scraped %v ago)", repoFullName, timeSinceLastScrape)
 	}
 	return false, ""
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
func shouldSkipScorecardFetch(ctx api.ScrapeContext, repoFullName string) (bool, string) {
if lastScrape, ok := LastScorecardScrapeTime.Load(repoFullName); ok {
lastTime := lastScrape.(time.Time)
timeSinceLastScrape := time.Since(lastTime)
if timeSinceLastScrape < ScorecardCacheTTL {
return true, fmt.Sprintf("last scraped %v ago (cache TTL: %v)", timeSinceLastScrape, ScorecardCacheTTL)
}
ctx.Logger.V(3).Infof("cache expired for %s (last scraped %v ago)", repoFullName, timeSinceLastScrape)
}
return false, ""
}
func shouldSkipScorecardFetch(ctx api.ScrapeContext, repoFullName string) (bool, string) {
if lastScrape, ok := LastScorecardScrapeTime.Load(repoFullName); ok {
lastTime, ok := lastScrape.(time.Time)
if !ok {
return false, ""
}
timeSinceLastScrape := time.Since(lastTime)
if timeSinceLastScrape < ScorecardCacheTTL {
return true, fmt.Sprintf("last scraped %v ago (cache TTL: %v)", timeSinceLastScrape, ScorecardCacheTTL)
}
ctx.Logger.V(3).Infof("cache expired for %s (last scraped %v ago)", repoFullName, timeSinceLastScrape)
}
return false, ""
}
🤖 Prompt for AI Agents
In scrapers/openssf/scorecard.go around lines 74 to 84, the code uses an unsafe
type assertion lastScrape.(time.Time) which can panic if the stored value is not
a time.Time; replace it with a safe comma-ok type assertion (e.g., lastTime, ok
:= lastScrape.(time.Time)) and handle the false case by logging a warning
(ctx.Logger.V(2) or similar), removing or ignoring the invalid cache entry, and
proceeding as if there was no last scrape (return false, "") so the function
never panics on bad data.

Comment on lines +274 to +276
func scorecardCheckNameToKebab(name string) string {
return strings.ToLower(strings.ReplaceAll(name, "-", "-"))
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

No-op function: scorecardCheckNameToKebab does nothing meaningful.

This function replaces "-" with "-", which has no effect. If the intent is to convert to kebab-case, the logic needs to be corrected. If this function is not needed, consider removing it.

If the intent is to normalize check names (e.g., spaces or underscores to hyphens):

 func scorecardCheckNameToKebab(name string) string {
-	return strings.ToLower(strings.ReplaceAll(name, "-", "-"))
+	return strings.ToLower(strings.ReplaceAll(name, " ", "-"))
 }

Alternatively, if this function is unused, remove it entirely.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
func scorecardCheckNameToKebab(name string) string {
return strings.ToLower(strings.ReplaceAll(name, "-", "-"))
}
func scorecardCheckNameToKebab(name string) string {
return strings.ToLower(strings.ReplaceAll(name, " ", "-"))
}

Comment on lines +559 to 566
func (e Extract) postProcess(ctx api.ScrapeContext, results v1.ScrapeResults) (v1.ScrapeResults, error) {

if !e.Config.Transform.Masks.IsEmpty() {
results, err := e.applyMask(results)
if err != nil {
return results, fmt.Errorf("e.applyMask(); %w", err)
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Bug: applyMask result is shadowed and discarded.

The results variable from applyMask shadows the function parameter, and any error from applyMask causes the function to return the shadowed (potentially empty) results instead of the original. Additionally, if there's no error, the masked results are discarded because the shadowed variable goes out of scope.

Apply this diff to fix the shadowing issue:

 func (e Extract) postProcess(ctx api.ScrapeContext, results v1.ScrapeResults) (v1.ScrapeResults, error) {
 
 	if !e.Config.Transform.Masks.IsEmpty() {
-		results, err := e.applyMask(results)
+		var err error
+		results, err = e.applyMask(results)
 		if err != nil {
 			return results, fmt.Errorf("e.applyMask(); %w", err)
 		}
 	}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
func (e Extract) postProcess(ctx api.ScrapeContext, results v1.ScrapeResults) (v1.ScrapeResults, error) {
if !e.Config.Transform.Masks.IsEmpty() {
results, err := e.applyMask(results)
if err != nil {
return results, fmt.Errorf("e.applyMask(); %w", err)
}
}
func (e Extract) postProcess(ctx api.ScrapeContext, results v1.ScrapeResults) (v1.ScrapeResults, error) {
if !e.Config.Transform.Masks.IsEmpty() {
var err error
results, err = e.applyMask(results)
if err != nil {
return results, fmt.Errorf("e.applyMask(); %w", err)
}
}
🤖 Prompt for AI Agents
In scrapers/processors/json.go around lines 559-566, the results from
e.applyMask are being captured with a short declaration which shadows the
function parameter and causes masked results to be discarded or an incorrect
value to be returned on error; replace the short declaration with an assignment
to the existing results variable (e.g. results, err = e.applyMask(results)) so
the function parameter is updated with masked data and on error return the
current results and wrapped error as before.

Comment on lines +34 to +36
if logger.V(5).Enabled() {
logger.Infof("Failed to unmarshal script output: %v\n%s\n", err, s)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Critical: Potential secrets/PII leakage in debug logs.

Logging the full string s (script output) poses a critical security risk. This output may contain sensitive data such as API keys, tokens, passwords, or PII from the configuration or script results. Even debug-level logs can be exposed through log aggregation systems, CI/CD outputs, or monitoring platforms.

Given the PR context involves GitHub security scrapers, this could leak GitHub tokens or other credentials.

Apply this diff to avoid logging potentially sensitive data:

 	if err := json.Unmarshal([]byte(s), &results); err != nil {
-		if logger.V(5).Enabled() {
-			logger.Infof("Failed to unmarshal script output: %v\n%s\n", err, s)
-		}
+		if logger.V(5).Enabled() {
+			logger.Infof("Failed to unmarshal script output: %v (output length: %d bytes)", err, len(s))
+		}
 		return nil, err
 	}

Alternatively, if the output must be logged for debugging, implement sanitization to redact sensitive patterns (tokens, keys, passwords) before logging.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if logger.V(5).Enabled() {
logger.Infof("Failed to unmarshal script output: %v\n%s\n", err, s)
}
if logger.V(5).Enabled() {
logger.Infof("Failed to unmarshal script output: %v (output length: %d bytes)", err, len(s))
}
🤖 Prompt for AI Agents
In scrapers/processors/script.go around lines 34-36, avoid logging the full
script output string `s` (which may contain secrets/PII); instead, modify the
logger call to not include `s` directly — either log only the error and a safe
indicator (e.g., length or checksum) or log a sanitized/truncated/redacted
version of `s` after applying pattern-based redaction for tokens/keys/passwords;
keep the logger.V(5).Enabled() guard, and ensure any sanitization function is
applied before building the log message so no raw sensitive content is emitted.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants