-
Notifications
You must be signed in to change notification settings - Fork 19
Pr/GitHub security #1783
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Pr/GitHub security #1783
Conversation
- Add GitHubSecurity and OpenSSFScorecard config types to ScraperSpec - Create GitHubSecurityClient with methods for fetching alerts: * Dependabot alerts * Code scanning alerts * Secret scanning alerts * Security advisories - Implement rate limit checking and backoff logic - Create GithubSecurityScraper with repository scraping logic - Add health status calculation based on alert severity - Add repository properties for alert counts - Fix duplicate Includes method in azure.go - Generate deepcopy code for new types Partial implementation - FIXME markers for: - Full pagination support - ConfigInsight creation - Complete alert filtering - OpenSSF scraper implementation
OpenSSF Scorecard Scraper: - Add compliance framework mappings (SOC 2, NIST SSDF, CIS Controls) - Implement ScorecardClient with retry logic and exponential backoff - Parse scorecard API responses with all check results - Calculate health status based on score thresholds and critical checks - Create repository config items with badges, scores, and properties - Support minScore filtering GitHub Security Scraper Updates: - Fix compilation errors - Remove unused models import - Fix GetSecurityAdvisories API call - Add createConfigInsights stub for future implementation Scraper Registration: - Register GithubSecurityScraper in All scrapers registry - Register OpenSSFScorecardScraper in All scrapers registry - Add openssf import to common.go Both scrapers: - Follow existing patterns from GithubActionsScraper - Implement CanScrape() and Scrape() interface methods - Create typed config items with health status - Use ScrapeContext for logging and configuration - Handle errors gracefully and continue with other repositories FIXME markers remain for: - Full pagination support for GitHub alerts - Complete ConfigInsight creation for alerts - Advanced filtering logic
GitHub Security Fixtures: - dependabot-alerts.json: 2 sample alerts (critical lodash, high axios) - code-scanning-alerts.json: 2 CodeQL alerts (SQL injection, path injection) - secret-scanning-alerts.json: 2 secret alerts (GitHub PAT, AWS key) - security-advisories.json: 1 published advisory (SQL injection) OpenSSF Scorecard Fixtures: - scorecard-high-score.json: Score 8.2/10 with good security practices - scorecard-medium-score.json: Score 5.4/10 with some gaps - scorecard-low-score.json: Score 2.8/10 with critical failures All fixtures contain realistic data structure matching actual API responses for use in unit tests with mocked HTTP clients.
- Document GitHub Security fixtures structure and usage - Document OpenSSF Scorecard fixtures and expected health status - Include API endpoint references and compliance mapping info - Add code examples for using fixtures in tests
GitHub Security Scraper: - Add LastAlertScrapeTime sync.Map to track last update timestamp per repo - Store max alert updated_at timestamp after each scrape - Use stored timestamp on subsequent runs for incremental fetching - Add V(2) logging: repository start, completion with alert counts - Add V(3) logging: incremental vs full scrape mode, total counts per type - Add Debugf: individual alert type fetch counts (Dependabot, code scan, secrets) - Add Tracef: individual alert details with numbers, titles, severities - Add rate limit status logging at V(3) level OpenSSF Scorecard Scraper: - Add LastScorecardScrapeTime sync.Map to track last scrape per repo - Implement 24h cache TTL to skip redundant API calls - Check cache before fetching and skip if within TTL window - Add V(2) logging: repository start, scorecard completion with scores - Add V(3) logging: cache expiration, retry attempts with backoff duration - Add Debugf: overall score and check counts, non-retryable error details - Add Tracef: API request URLs, individual check scores with compliance counts - Store scrape time immediately after successful fetch Performance Benefits: - GitHub: Only fetch alerts updated since last scrape (reduces API calls) - OpenSSF: Skip API calls entirely if data less than 24h old - Both: Detailed logging enables troubleshooting and performance monitoring Logging Levels: - V(2): Important operations (scrape start/end, counts) - V(3): Detailed operational info (cache, retries, incremental mode) - Debugf: Per-operation results and error details - Tracef: Individual item processing and API interactions
Changed OpenSSF scorecard scraper to follow the Azure Advisor pattern: - Repository config item stores full API response JSON as config - Each security check becomes a separate ConfigAnalysis (insight) record - Insights are linked to parent repository via external ID ConfigAnalysis Records: - Analyzer: Check name (e.g., "Code-Review", "SAST") - AnalysisType: models.AnalysisTypeSecurity - Severity: Mapped from check score (0-3=critical, 4-6=high, 7-9=medium, 10=low) - Status: "failing" (score <10) or "passing" (score 10) - Summary: Check reason from API - Messages: Individual detail lines from check - Analysis JSON contains: * Check name, score, max_score, reason * Details array * Documentation links * Compliance mappings (SOC 2, NIST SSDF, CIS Controls) Logging Improvements: - Debug: Analysis record creation count - Trace: Individual check details with compliance mapping counts - Trace: Analysis creation with severity and status Benefits: - Each check visible as separate insight in UI - Compliance framework mappings accessible via analysis JSON - Historical tracking via FirstObserved/LastObserved - Queryable by severity, status, analyzer - Follows established AnalysisResult pattern from Azure Advisor
- Remove duplicate fetchAppRegistrations method (already in active_directory.go) - Remove duplicate appToScrapeResult method (already in active_directory.go) - Remove fetchDNS call (method doesn't exist) - Fix fetchAppRegistrations call to use correct Entra.AppRegistrations field - Remove unused msgraph imports: graphcore, applications, msgraphModels These duplicates were causing compilation errors. The correct implementations exist in active_directory.go with proper resource selector support.
WalkthroughThis pull request introduces three new scraper types (Azure AD, GitHub Security, OpenSSF Scorecard), extends the configuration API with new connection and filter structures, implements corresponding scraper logic with GitHub and OpenSSF API integrations, adds debug-level logging support, and includes test fixtures and supporting infrastructure updates. Changes
Sequence DiagramsequenceDiagram
participant User
participant ScrapeContext
participant GithubSecurityScraper
participant GitHubSecurityClient
participant GitHubAPI
participant Results
participant Database
User->>ScrapeContext: Scrape(ctx)
ScrapeContext->>GithubSecurityScraper: Scrape(ctx)
GithubSecurityScraper->>GithubSecurityScraper: Iterate GitHubSecurity configs
loop For each repository
GithubSecurityScraper->>GitHubSecurityClient: NewGitHubSecurityClient()
GitHubSecurityClient-->>GithubSecurityScraper: client
GithubSecurityScraper->>GitHubSecurityClient: CheckRateLimit()
GitHubSecurityClient->>GitHubAPI: Query rate limit status
GitHubAPI-->>GitHubSecurityClient: RateLimits
GitHubSecurityClient-->>GithubSecurityScraper: RateLimits
alt Rate limit exceeded
GithubSecurityScraper->>GithubSecurityScraper: ShouldPauseForRateLimit()
Note over GithubSecurityScraper: Pause if needed
end
GithubSecurityScraper->>GitHubSecurityClient: GetDependabotAlerts()
GitHubSecurityClient->>GitHubAPI: Fetch Dependabot alerts
GitHubAPI-->>GitHubSecurityClient: DependabotAlerts
GitHubSecurityClient-->>GithubSecurityScraper: Alerts
GithubSecurityScraper->>GitHubSecurityClient: GetCodeScanningAlerts()
GitHubSecurityClient->>GitHubAPI: Fetch code scanning alerts
GitHubAPI-->>GitHubSecurityClient: CodeScanningAlerts
GitHubSecurityClient-->>GithubSecurityScraper: Alerts
GithubSecurityScraper->>GitHubSecurityClient: GetSecretScanningAlerts()
GitHubSecurityClient->>GitHubAPI: Fetch secret scanning alerts
GitHubAPI-->>GitHubSecurityClient: SecretScanningAlerts
GitHubSecurityClient-->>GithubSecurityScraper: Alerts
GithubSecurityScraper->>GithubSecurityScraper: Aggregate alert counts & health status
GithubSecurityScraper->>Results: Append ScrapeResult
Results-->>GithubSecurityScraper: Updated results
end
GithubSecurityScraper-->>ScrapeContext: ScrapeResults
alt Save flag set
ScrapeContext->>Database: Save results to DB
Database-->>ScrapeContext: Success
end
ScrapeContext-->>User: Final results
Pre-merge checks and finishing touches❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (1 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 19
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
.gitignore (1)
2-2: Fix typo:_DS_Storeshould be.DS_Store.Line 2 contains
_DS_Store(with underscore) instead of.DS_Store(with dot). This typo means macOS metadata files won't be ignored as intended. Line 30 correctly adds.DS_Store, making line 2 redundant once fixed.Apply this diff:
-.vscode -_DS_Store +.vscode +.DS_Store
🧹 Nitpick comments (8)
scrapers/aws/aws.go (1)
1994-1996: Add a TODO comment or implement the method.The
Beanstalkmethod has an empty body. If this is intentional for future implementation, add a TODO comment to document the intent. Otherwise, consider removing it until it's ready to be implemented.func (aws Scraper) Beanstalk(ctx *AWSContext, config v1.AWS, results *v1.ScrapeResults) { - + // TODO: Implement Beanstalk scraper }scrapers/azure/azure.go (1)
642-642: Consider using constants for include checks.Multiple include checks use string literals (
"virtualMachines","resourceGroups","storageAccounts","appServices","privateDns","trafficManager","securityGroups","publicIps") instead of the defined constants at lines 44-60 (e.g.,IncludeVirtualMachines,IncludeResourceGroups). Using constants improves type safety and reduces the risk of typos.Example:
- if !azure.config.Includes("virtualMachines") { + if !azure.config.Includes(IncludeVirtualMachines) {Also applies to: 739-739, 812-812, 850-850, 898-898, 936-936, 973-973, 1012-1012
cmd/run.go (2)
119-124: Consider graceful error handling for output formatting.
clicky.MustPrintwill panic if formatting fails. While acceptable for a CLI tool with unrecoverable errors, explicit error handling would provide clearer diagnostics.Consider handling errors explicitly:
- clicky.MustPrint(all.Configs) - clicky.MustPrint(all.Changes) - clicky.MustPrint(all.Analysis) - clicky.MustPrint(all.Relationships) + if err := clicky.Print(all.Configs); err != nil { + logger.Errorf("failed to print configs: %v", err) + } + if err := clicky.Print(all.Changes); err != nil { + logger.Errorf("failed to print changes: %v", err) + } + if err := clicky.Print(all.Analysis); err != nil { + logger.Errorf("failed to print analysis: %v", err) + } + if err := clicky.Print(all.Relationships); err != nil { + logger.Errorf("failed to print relationships: %v", err) + }
127-137: Consider graceful error handling for summary formatting.Similar to the output formatting above,
clicky.MustFormat(line 135) will panic on error. Consider explicit error handling for better diagnostics.Apply this diff:
- fmt.Println(clicky.MustFormat(summary)) + formatted, err := clicky.Format(summary) + if err != nil { + logger.Errorf("failed to format summary: %v", err) + } else { + fmt.Println(formatted) + }scrapers/openssf/scorecard.go (2)
21-22: Consider memory management for the global cache.The
LastScorecardScrapeTimesync.Map will grow unboundedly as repositories are scraped over time. Consider adding a cleanup mechanism or using a TTL-based cache to prevent memory leaks in long-running processes.
265-272: Consider usingslices.Containsfrom standard library.Go 1.21+ provides
slices.Containswhich is more idiomatic and efficient than a manual loop.+import "slices" + func containsString(slice []string, item string) bool { - for _, s := range slice { - if s == item { - return true - } - } - return false + return slices.Contains(slice, item) }scrapers/github/security.go (1)
291-297: Stub function with FIXME comment.The
createConfigInsightsfunction is unimplemented. Consider removing it if not needed in this PR, or tracking it as a follow-up issue.Do you want me to open a new issue to track the implementation of
createConfigInsights?api/v1/interface.go (1)
106-106: Redundantomitemptyon excluded JSON field.The tag
json:"-,omitempty"is redundant. The-already tells the JSON encoder to skip this field entirely, making,omitemptyunnecessary.- _map map[string]any `json:"-,omitempty"` + _map map[string]any `json:"-"`
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
go.sumis excluded by!**/*.sum
📒 Files selected for processing (34)
.gitignore(1 hunks)api/context.go(1 hunks)api/v1/azure_ad.go(1 hunks)api/v1/common.go(4 hunks)api/v1/github_security.go(1 hunks)api/v1/interface.go(6 hunks)api/v1/openssf.go(1 hunks)api/v1/scrapeconfig_types.go(2 hunks)api/v1/types.go(1 hunks)api/v1/zz_generated.deepcopy.go(8 hunks)cmd/run.go(4 hunks)go.mod(11 hunks)scrapers/aws/aws.go(4 hunks)scrapers/aws/types.go(1 hunks)scrapers/azure/azure.go(10 hunks)scrapers/azure/devops/client.go(3 hunks)scrapers/common.go(2 hunks)scrapers/github/security.go(1 hunks)scrapers/github/security_client.go(1 hunks)scrapers/github/testdata/security/README.md(1 hunks)scrapers/github/testdata/security/code-scanning-alerts.json(1 hunks)scrapers/github/testdata/security/dependabot-alerts.json(1 hunks)scrapers/github/testdata/security/secret-scanning-alerts.json(1 hunks)scrapers/github/testdata/security/security-advisories.json(1 hunks)scrapers/openssf/client.go(1 hunks)scrapers/openssf/compliance.go(1 hunks)scrapers/openssf/scorecard.go(1 hunks)scrapers/openssf/testdata/README.md(1 hunks)scrapers/openssf/testdata/scorecard-high-score.json(1 hunks)scrapers/openssf/testdata/scorecard-low-score.json(1 hunks)scrapers/openssf/testdata/scorecard-medium-score.json(1 hunks)scrapers/processors/json.go(6 hunks)scrapers/processors/script.go(2 hunks)scrapers/sql/sql.go(2 hunks)
🧰 Additional context used
🧬 Code graph analysis (14)
api/v1/openssf.go (1)
api/v1/common.go (1)
BaseScraper(271-286)
scrapers/common.go (3)
scrapers/github/security.go (1)
GithubSecurityScraper(23-23)api/global.go (1)
Scraper(20-23)scrapers/openssf/scorecard.go (1)
OpenSSFScorecardScraper(25-25)
scrapers/azure/devops/client.go (1)
api/v1/interface.go (1)
ChangeResult(79-107)
api/v1/azure_ad.go (1)
api/v1/common.go (2)
BaseScraper(271-286)AzureConnection(408-415)
api/v1/scrapeconfig_types.go (2)
api/v1/github_security.go (1)
GitHubSecurity(12-27)api/v1/openssf.go (1)
OpenSSFScorecard(4-12)
scrapers/processors/json.go (6)
api/v1/interface.go (2)
ScrapeResult(652-723)ChangeResult(79-107)api/v1/json_types.go (1)
JSON(16-16)api/v1/data.go (2)
Properties(26-26)Property(29-39)api/context.go (1)
ScrapeContext(16-27)api/v1/common.go (3)
Template(471-477)BaseScraper(271-286)Script(33-38)scrapers/processors/script.go (1)
RunScript(11-28)
cmd/run.go (2)
api/v1/interface.go (1)
MergeScrapeResults(612-648)db/update.go (1)
SaveResults(465-467)
api/v1/github_security.go (1)
api/v1/common.go (1)
BaseScraper(271-286)
scrapers/openssf/client.go (1)
api/context.go (1)
ScrapeContext(16-27)
scrapers/github/security_client.go (4)
api/context.go (1)
ScrapeContext(16-27)api/v1/github_security.go (1)
GitHubSecurity(12-27)api/global.go (1)
Namespace(13-13)scrapers/trivy/models.go (1)
Severity(32-32)
api/v1/zz_generated.deepcopy.go (4)
api/v1/azure_ad.go (7)
AzureAD(3-10)AzureAppRegistrations(32-34)MsGraphFilter(14-18)AzureGroups(28-30)AzureLogins(20-22)AzureUsers(24-26)CELFilter(12-12)api/v1/common.go (2)
BaseScraper(271-286)AzureConnection(408-415)api/v1/github_security.go (3)
GitHubSecurity(12-27)GitHubSecurityRepository(30-33)GitHubSecurityFilters(36-45)api/v1/openssf.go (2)
OpenSSFRepository(15-18)OpenSSFScorecard(4-12)
api/v1/types.go (4)
api/v1/github.go (1)
GitHubActions(7-27)api/v1/github_security.go (1)
GitHubSecurity(12-27)api/v1/openssf.go (1)
OpenSSFScorecard(4-12)api/v1/azure.go (1)
Azure(25-35)
scrapers/azure/azure.go (2)
api/v1/azure.go (1)
Entra(17-23)api/global.go (1)
Scraper(20-23)
api/v1/interface.go (4)
api/v1/types.go (1)
ExternalID(165-175)api/v1/const.go (1)
ChangeAction(3-3)db/models/config_change.go (1)
ConfigChange(21-42)api/v1/json_types.go (1)
JSON(16-16)
🪛 GitHub Actions: PR #1783
go.mod
[error] Dependency replacement points to '../duty' but the file /home/runner/work/config-db/duty/go.mod cannot be found.
[error] go mod tidy failed due to missing or incompatible module replacements; updates to go.mod are needed.
🔇 Additional comments (44)
scrapers/processors/script.go (1)
3-9: LGTM: Logger import added for debug logging.The logger import is correctly added to support the new debug logging functionality.
scrapers/aws/aws.go (2)
59-60: LGTM!The
Sessionfield exposes the AWS configuration to scraper methods, which is properly initialized and used throughout the codebase.
494-494: LGTM!Increasing the log verbosity level from V(2) to V(3) reduces log noise for ECS task descriptions.
scrapers/github/testdata/security/security-advisories.json (1)
1-67: LGTM!The test fixture provides comprehensive coverage of the GitHub Security Advisory API response format, including all relevant fields for security analysis (GHSA/CVE identifiers, severity, CVSS scores, CWE mappings, vulnerability details, and credits).
scrapers/azure/devops/client.go (2)
42-56: LGTM!The change from
map[string]stringtomap[string]anyforTemplateParametersprovides more flexibility for template parameter types. The updatedGetLabels()method correctly stringifies all parameter values usingfmt.Sprintf("%v", v), ensuring proper conversion regardless of the underlying type.
88-109: LGTM!Consistent with the Pipeline changes,
Run.TemplateParametersnow usesmap[string]anyandGetTags()properly stringifies values. The implementation correctly handles type conversion and maintains backward compatibility with the existing Variables handling.scrapers/openssf/client.go (1)
73-104: LGTM!The retry logic with exponential backoff is well-implemented, and the method properly handles context, logging, and error propagation. The approach of retrying only on retryable errors (server errors, timeouts, connection issues) is appropriate for API clients.
api/v1/common.go (3)
69-95: LGTM!The new
Pretty()andPrettyShort()methods provide rich formatting with syntax highlighting for different script types. The use oflo.Ellipsis()inPrettyShort()to truncate long scripts (200 chars) is appropriate for preview contexts.
395-415: LGTM!The new
GCPConnectionandAzureConnectiontypes provide structured configuration for cloud provider credentials. TheGetModel()implementation correctly maps GCP credentials to themodels.Connectionschema.
445-452: LGTM!The
Connection.Pretty()method appropriately callsRedactSecretValues()before rendering connection strings, preventing credential leakage in logs or UI display.scrapers/aws/types.go (1)
563-563: No remaining references to old field name detected.The field rename from
TrsutedAdvisorCheckstoTrustedAdvisorChecksis complete. A search of the codebase found no lingering references to the old field name, confirming that all usages have been properly updated.api/v1/azure_ad.go (1)
1-34: LGTM!Clean type definitions for the Azure AD scraper configuration. The struct embedding and YAML/JSON tags follow established patterns in the codebase.
scrapers/sql/sql.go (1)
61-74: LGTM with dependency!The debug logging implementation using
clickyis well-structured and properly guarded. However, ensure the nil guard fix forctx.IsDebug()inapi/context.gois applied to prevent potential panics.scrapers/github/testdata/security/README.md (1)
1-105: LGTM!Comprehensive documentation for the GitHub security test fixtures. The structure is clear and includes helpful usage examples.
scrapers/common.go (1)
10-10: LGTM!Clean registration of the new GitHub Security and OpenSSF Scorecard scrapers in the global registry.
Also applies to: 36-36, 40-40
scrapers/openssf/testdata/README.md (1)
1-160: LGTM!Excellent documentation for the OpenSSF Scorecard test fixtures. The health status expectations table and compliance framework mappings are particularly helpful for understanding test scenarios.
scrapers/openssf/testdata/scorecard-low-score.json (1)
1-170: LGTM!Well-structured test fixture for the low-score OpenSSF Scorecard scenario. The data aligns with the expectations documented in the README.
cmd/run.go (1)
178-179: Verify the default behavior change is intentional.The
exportflag now defaults totrue, which means scraped data will always be printed or exported even without--output-dir. This changes the previous behavior where either an output directory or database connection was required. Ensure this aligns with the intended UX.scrapers/openssf/testdata/scorecard-medium-score.json (1)
1-145: LGTM! Well-structured medium-score test fixture.The test data provides realistic coverage for a medium-scoring repository (5.4/10) with a mix of passing and failing checks, suitable for testing scorecard parsing and filtering logic.
api/v1/scrapeconfig_types.go (1)
104-109: LGTM! Type dispatch correctly recognizes new scrapers.The new type checks for GitHubSecurity and OpenSSFScorecard are properly integrated into the dispatch logic and return the expected type identifiers.
scrapers/openssf/testdata/scorecard-high-score.json (1)
1-247: LGTM! Comprehensive high-score test fixture.This fixture provides excellent coverage with 19 checks and an overall score of 8.2/10, suitable for testing high-scoring repository scenarios. The additional checks (CI-Tests, CII-Best-Practices, Contributors, Fuzzing, Packaging, Signed-Releases, Webhooks) compared to the medium-score fixture provide more comprehensive test coverage.
scrapers/github/testdata/security/secret-scanning-alerts.json (1)
1-42: LGTM! Well-structured secret scanning alert fixtures.The test data provides two realistic secret scanning scenarios with different secret types (GitHub PAT and AWS key) and validity states, suitable for testing alert parsing and categorization.
scrapers/github/testdata/security/code-scanning-alerts.json (1)
1-98: LGTM! Detailed code scanning alert fixtures.The test data includes comprehensive CodeQL alert structures with location details, security classifications, and CWE mappings for both high and medium severity issues, providing good coverage for testing alert processing.
api/v1/openssf.go (1)
1-18: LGTM! Clean OpenSSF scraper type definitions.The
OpenSSFScorecardandOpenSSFRepositorytypes are well-structured with:
- Proper
BaseScraperembedding for consistent scraper behavior- Required
Repositoriesfield for specifying targets- Optional
MinScorepointer for filtering (nil = no filter)- Consistent JSON/YAML serialization tags
scrapers/github/testdata/security/dependabot-alerts.json (1)
1-136: LGTM! Comprehensive Dependabot alert fixtures.The test data includes detailed security advisory information with CVSS scores, CWE mappings, vulnerability ranges, and patch versions for both critical and high severity vulnerabilities, providing thorough coverage for testing Dependabot alert parsing.
scrapers/openssf/compliance.go (3)
1-8: LGTM! Well-structured compliance mapping type.The
ComplianceMappingstruct is cleanly defined with appropriate JSON tags and omitempty for optional fields.
10-108: LGTM! Comprehensive compliance mappings.The mappings cover all standard OpenSSF Scorecard checks with appropriate SOC2, NIST SSDF, and CIS control references. The comment referencing the DataSheet source is helpful for traceability.
110-116: LGTM! Safe lookup with zero-value fallback.The function correctly returns an empty
ComplianceMappingwhen the check name is not found, avoiding nil pointer issues.scrapers/processors/json.go (5)
440-453: LGTM! Clean refactoring of extraction flow.The extraction logic is now properly decomposed into
transform→extractItems→ property parsing → attribute extraction. The debug output on error (lines 450-452) is helpful for troubleshooting.
475-511: LGTM! Property parsing extracted into helper.The
parsePropertiesmethod cleanly handles property filtering, link cloning, and templating with appropriate error handling viaresult.Errorf.
513-530: LGTM! Item extraction properly handles recursion.The
extractItemsmethod correctly recurses withWithoutItems()to prevent infinite loops while processing nested items.
532-557: LGTM! Transform with enhanced debug output.The
transformmethod provides helpful debug output including the script and input when errors occur, which will aid troubleshooting.
651-660: LGTM! Improved error handling for missing IDs.The enhanced logic now allows standalone changes with
external_change_idwhile still requiring IDs for regular configs. The debug output usinginput.Debug()provides better context.scrapers/openssf/scorecard.go (2)
27-72: LGTM! Well-structured scraper implementation.The
Scrapemethod properly handles:
- Iterating over configured repositories
- TTL-based skip logic to avoid redundant API calls
- MinScore filtering
- Error handling with continuation for other repos
- Comprehensive logging at appropriate verbosity levels
173-204: LGTM! Health calculation with critical check awareness.The health status logic appropriately considers both the overall score and critical security checks (Code-Review, SAST, etc.), providing meaningful health states.
scrapers/github/security.go (2)
29-67: LGTM! Well-structured scraper with rate limit handling.The scraper properly:
- Handles rate limits proactively before making requests
- Continues processing other repos on individual failures
- Logs progress at appropriate verbosity levels
220-254: LGTM! Health status calculation with reasonable thresholds.The health determination logic is clear:
- Critical alerts → Unhealthy
- 5+ high alerts → Unhealthy
- Any high alerts → Warning
- 10+ medium alerts → Warning
scrapers/github/security_client.go (3)
23-48: LGTM! Proper token resolution with fallback.The client constructor correctly:
- Tries to hydrate connection first
- Falls back to environment variable
- Creates an authenticated OAuth2 client
144-164: LGTM! Proactive rate limit handling.The threshold-based approach (100 remaining) with logging provides good protection against hitting rate limits while being transparent about pauses.
59-133: LGTM! Consistent error handling across alert fetchers.All four alert fetching methods follow a consistent pattern with proper error wrapping and return values.
api/v1/zz_generated.deepcopy.go (1)
1-2160: Auto-generated deepcopy implementations look correct.This file is generated by
controller-gen(line 19) and the new DeepCopy/DeepCopyInto implementations for the added types (AzureAD, GitHubSecurity, OpenSSFScorecard, etc.) follow the standard patterns:
- Proper nil checks before slice/pointer allocations
- Correct use of
make()for slice copying- Appropriate
DeepCopyIntocalls for nested structs with reference typesThe implementations align with the source struct definitions provided in the relevant code snippets.
api/v1/interface.go (3)
598-648: New aggregation type and merge function look reasonable.The
FullScrapeResultsstruct properly aggregates multiple result facets, andMergeScrapeResultscorrectly:
- Skips error-bearing results (line 616-618)
- Converts
AnalysisResulttoConfigAnalysismodel (lines 620-622)- Excludes metadata-only configs (line 635-637)
- Aggregates external entities (lines 639-644)
725-792: Debug method provides comprehensive output.The
Debug()method thoroughly covers all relevant fields ofScrapeResultand handles different config types (string, map[string]any, map[string]string) appropriately for display.
652-723: ScrapeResult display annotations are well-structured.The
pretty:"hide"tags appropriately hide internal/sensitive fields (DeletedAt, DeleteReason, ConfigClass, Aliases, Source, Config, Ready) from pretty-print output while keeping user-relevant fields visible.
| func (ctx ScrapeContext) IsDebug() bool { | ||
| return ctx.scrapeConfig.Spec.IsDebug() | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add nil guard to prevent panic.
Both IsDebug() and IsTrace() (line 162) access ctx.scrapeConfig.Spec without checking if ctx.scrapeConfig is nil. Since ScrapeConfig() (line 140) can return nil, calling these methods will panic if the scrape config hasn't been set.
Apply this diff to add nil guards:
func (ctx ScrapeContext) IsTrace() bool {
+ if ctx.scrapeConfig == nil {
+ return false
+ }
return ctx.scrapeConfig.Spec.IsTrace()
}
func (ctx ScrapeContext) IsDebug() bool {
+ if ctx.scrapeConfig == nil {
+ return false
+ }
return ctx.scrapeConfig.Spec.IsDebug()
}Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In api/context.go around lines 162–167, both IsTrace() and IsDebug() dereference
ctx.scrapeConfig.Spec without guarding for a nil scrapeConfig (ScrapeConfig()
can return nil) which can cause a panic; update both methods to first check if
ctx.scrapeConfig == nil or ctx.scrapeConfig.Spec == nil and return false in that
case, otherwise call and return ctx.scrapeConfig.Spec.IsTrace() / IsDebug().
| // ParseMaxAge converts the MaxAge string to a time.Duration | ||
| func (f GitHubSecurityFilters) ParseMaxAge() (time.Duration, error) { | ||
| if f.MaxAge == "" { | ||
| return 0, nil | ||
| } | ||
| return time.ParseDuration(f.MaxAge) | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix duration parsing to support documented format.
The ParseMaxAge() method uses time.ParseDuration(), which doesn't support the "d" (days) suffix mentioned in the comment on line 43. time.ParseDuration() only accepts units like "ns", "us", "ms", "s", "m", "h". Specifying "90d" will cause a parse error.
Consider implementing custom parsing or updating the documentation to use hours (e.g., "2160h" for 90 days):
// ParseMaxAge converts the MaxAge string to a time.Duration
func (f GitHubSecurityFilters) ParseMaxAge() (time.Duration, error) {
if f.MaxAge == "" {
return 0, nil
}
- return time.ParseDuration(f.MaxAge)
+ // Support days suffix (e.g., "90d")
+ if strings.HasSuffix(f.MaxAge, "d") {
+ days, err := strconv.Atoi(strings.TrimSuffix(f.MaxAge, "d"))
+ if err != nil {
+ return 0, fmt.Errorf("invalid duration format: %w", err)
+ }
+ return time.Duration(days) * 24 * time.Hour, nil
+ }
+ return time.ParseDuration(f.MaxAge)
}| for _, change := range r.Changes { | ||
| configChange := models.ConfigChange{ | ||
| ChangeType: change.ChangeType, | ||
| Severity: models.Severity(change.Severity), | ||
| Source: change.Source, | ||
| Summary: change.Summary, | ||
| CreatedAt: change.CreatedAt, | ||
| } | ||
| full.Changes = append(full.Changes, configChange) | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
# Find ConfigChange struct definition
ast-grep --pattern $'type ConfigChange struct {
$$$
CreatedAt $_
$$$
}'Repository: flanksource/config-db
Length of output: 47
🏁 Script executed:
# Also search with rg to find ConfigChange definition
rg "type ConfigChange struct" -A 20Repository: flanksource/config-db
Length of output: 47
🏁 Script executed:
# Find the Changes field type in the request struct around line 96
rg "type.*Request.*struct" -B 5 -A 30 | head -100Repository: flanksource/config-db
Length of output: 814
🏁 Script executed:
# Find all Go files in db/models directory
fd "\.go$" db/models/ | head -20Repository: flanksource/config-db
Length of output: 149
🏁 Script executed:
# Search for ConfigChange struct in the entire repo
rg "type ConfigChange" -A 15Repository: flanksource/config-db
Length of output: 47
🏁 Script executed:
# Search for the request type with Changes field
rg "Changes.*\[\]" api/v1/ -B 5 -A 2Repository: flanksource/config-db
Length of output: 2165
🏁 Script executed:
# Read the ConfigChange struct from db/models/config_change.go
cat db/models/config_change.goRepository: flanksource/config-db
Length of output: 3098
🏁 Script executed:
# Find ChangeResult struct definition
rg "type ChangeResult struct" -A 20Repository: flanksource/config-db
Length of output: 1571
Fix type mismatch for CreatedAt field assignment.
Line 630 assigns change.CreatedAt (type *time.Time) to configChange.CreatedAt (type time.Time). This is a type mismatch and will cause a compilation error. Use lo.FromPtr(change.CreatedAt) or dereference the pointer appropriately, similar to how it's done in NewConfigChangeFromV1 in db/models/config_change.go.
🤖 Prompt for AI Agents
In api/v1/interface.go around lines 624 to 633, the assignment
configChange.CreatedAt = change.CreatedAt is a type mismatch because
change.CreatedAt is *time.Time while configChange.CreatedAt is time.Time;
replace the direct assignment by converting the pointer to a value (e.g.,
configChange.CreatedAt = lo.FromPtr(change.CreatedAt) or dereference with a
nil-check like if change.CreatedAt != nil { configChange.CreatedAt =
*change.CreatedAt } ) following the pattern used in NewConfigChangeFromV1 in
db/models/config_change.go.
| func (s ScrapeResult) PrettyRow(options interface{}) map[string]api.Text { | ||
| row := make(map[string]api.Text) | ||
| row["ID"] = clicky.Text(s.ID) | ||
| row["Name"] = clicky.Text(s.Name) | ||
| row["Type"] = clicky.Text(s.Type) | ||
| row["Health"] = clicky.Text(string(s.Health)) | ||
| row["Status"] = clicky.Text(s.Status) | ||
| if s.Error != nil { | ||
| row["Error"] = clicky.Text(s.Error.Error()) | ||
| } else { | ||
| row["Error"] = clicky.Text("") | ||
| } | ||
| clicky.MustPrint([]map[string]api.Text{row}, clicky.FormatOptions{JSON: true}) | ||
| clicky.MustPrint([]map[string]api.Text{row}, clicky.FormatOptions{Pretty: true}) | ||
|
|
||
| return row | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove debug print statements from PrettyRow method.
Lines 806-807 contain MustPrint calls that print directly to stdout. This appears to be leftover debug code that should not be in a method designed to return a row representation.
Apply this diff to remove the debug statements:
func (s ScrapeResult) PrettyRow(options interface{}) map[string]api.Text {
row := make(map[string]api.Text)
row["ID"] = clicky.Text(s.ID)
row["Name"] = clicky.Text(s.Name)
row["Type"] = clicky.Text(s.Type)
row["Health"] = clicky.Text(string(s.Health))
row["Status"] = clicky.Text(s.Status)
if s.Error != nil {
row["Error"] = clicky.Text(s.Error.Error())
} else {
row["Error"] = clicky.Text("")
}
- clicky.MustPrint([]map[string]api.Text{row}, clicky.FormatOptions{JSON: true})
- clicky.MustPrint([]map[string]api.Text{row}, clicky.FormatOptions{Pretty: true})
return row
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| func (s ScrapeResult) PrettyRow(options interface{}) map[string]api.Text { | |
| row := make(map[string]api.Text) | |
| row["ID"] = clicky.Text(s.ID) | |
| row["Name"] = clicky.Text(s.Name) | |
| row["Type"] = clicky.Text(s.Type) | |
| row["Health"] = clicky.Text(string(s.Health)) | |
| row["Status"] = clicky.Text(s.Status) | |
| if s.Error != nil { | |
| row["Error"] = clicky.Text(s.Error.Error()) | |
| } else { | |
| row["Error"] = clicky.Text("") | |
| } | |
| clicky.MustPrint([]map[string]api.Text{row}, clicky.FormatOptions{JSON: true}) | |
| clicky.MustPrint([]map[string]api.Text{row}, clicky.FormatOptions{Pretty: true}) | |
| return row | |
| } | |
| func (s ScrapeResult) PrettyRow(options interface{}) map[string]api.Text { | |
| row := make(map[string]api.Text) | |
| row["ID"] = clicky.Text(s.ID) | |
| row["Name"] = clicky.Text(s.Name) | |
| row["Type"] = clicky.Text(s.Type) | |
| row["Health"] = clicky.Text(string(s.Health)) | |
| row["Status"] = clicky.Text(s.Status) | |
| if s.Error != nil { | |
| row["Error"] = clicky.Text(s.Error.Error()) | |
| } else { | |
| row["Error"] = clicky.Text("") | |
| } | |
| return row | |
| } |
🤖 Prompt for AI Agents
In api/v1/interface.go around lines 794 to 810 the PrettyRow method contains
leftover debug print statements (clicky.MustPrint calls on lines ~806-807) that
write to stdout; remove those two clicky.MustPrint(...) calls so the method only
builds and returns the row map without printing, leaving the rest of the
function (row construction and return) unchanged.
| func (s ScrapeResult) Pretty() api.Text { | ||
| t := clicky.Text("") | ||
|
|
||
| return t | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Empty Pretty() implementation returns no useful output.
The Pretty() method creates an empty text and returns it immediately without any content. This appears to be a placeholder that was never implemented.
Either implement the method similar to Debug() or remove it if not needed. Would you like me to help generate an implementation?
🤖 Prompt for AI Agents
In api/v1/interface.go around lines 816 to 820, the ScrapeResult.Pretty() method
currently creates and returns an empty clicky.Text, providing no readable
output; implement Pretty() to produce a human-friendly textual representation
(similar to Debug()) by populating the clicky.Text with the key/high-level
fields of ScrapeResult (e.g., status, URL, timestamp, summary of data or errors)
formatted for readability, or if a pretty form is unnecessary remove the method
and its usages; ensure the implementation returns the constructed clicky.Text
populated with the chosen fields.
| func contains(s, substr string) bool { | ||
| return len(s) >= len(substr) && (s == substr || len(s) > len(substr) && | ||
| (s[:len(substr)] == substr || s[len(s)-len(substr):] == substr || | ||
| findSubstring(s, substr))) | ||
| } | ||
|
|
||
| func findSubstring(s, substr string) bool { | ||
| for i := 0; i <= len(s)-len(substr); i++ { | ||
| if s[i:i+len(substr)] == substr { | ||
| return true | ||
| } | ||
| } | ||
| return false | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion | 🟠 Major
Replace custom string utilities with standard library.
The contains() and findSubstring() functions reimplement functionality already available in the standard library. Use strings.Contains() for simpler, more maintainable code.
Apply this diff:
+import (
+ "strings"
+ // ... other imports
+)
func (c *ScorecardClient) isRetryable(err error) bool {
if err == nil {
return false
}
errStr := err.Error()
- return contains(errStr, "server error") ||
- contains(errStr, "timeout") ||
- contains(errStr, "connection")
+ return strings.Contains(errStr, "server error") ||
+ strings.Contains(errStr, "timeout") ||
+ strings.Contains(errStr, "connection")
}
-
-func contains(s, substr string) bool {
- return len(s) >= len(substr) && (s == substr || len(s) > len(substr) &&
- (s[:len(substr)] == substr || s[len(s)-len(substr):] == substr ||
- findSubstring(s, substr)))
-}
-
-func findSubstring(s, substr string) bool {
- for i := 0; i <= len(s)-len(substr); i++ {
- if s[i:i+len(substr)] == substr {
- return true
- }
- }
- return false
-}Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In scrapers/openssf/client.go around lines 155 to 168, the custom contains() and
findSubstring() functions reimplement standard library behavior; replace all
uses of these helpers with strings.Contains from the standard library and remove
the two functions; also add or ensure "strings" is imported in the file and
remove the now-unused function definitions to keep the code clean.
| func shouldSkipScorecardFetch(ctx api.ScrapeContext, repoFullName string) (bool, string) { | ||
| if lastScrape, ok := LastScorecardScrapeTime.Load(repoFullName); ok { | ||
| lastTime := lastScrape.(time.Time) | ||
| timeSinceLastScrape := time.Since(lastTime) | ||
| if timeSinceLastScrape < ScorecardCacheTTL { | ||
| return true, fmt.Sprintf("last scraped %v ago (cache TTL: %v)", timeSinceLastScrape, ScorecardCacheTTL) | ||
| } | ||
| ctx.Logger.V(3).Infof("cache expired for %s (last scraped %v ago)", repoFullName, timeSinceLastScrape) | ||
| } | ||
| return false, "" | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unsafe type assertion could panic.
The type assertion lastScrape.(time.Time) on line 76 will panic if the stored value is not a time.Time. While this is unlikely given the controlled usage, adding a check makes the code more robust.
Apply this diff for safer type assertion:
func shouldSkipScorecardFetch(ctx api.ScrapeContext, repoFullName string) (bool, string) {
if lastScrape, ok := LastScorecardScrapeTime.Load(repoFullName); ok {
- lastTime := lastScrape.(time.Time)
+ lastTime, ok := lastScrape.(time.Time)
+ if !ok {
+ return false, ""
+ }
timeSinceLastScrape := time.Since(lastTime)
if timeSinceLastScrape < ScorecardCacheTTL {
return true, fmt.Sprintf("last scraped %v ago (cache TTL: %v)", timeSinceLastScrape, ScorecardCacheTTL)
}
ctx.Logger.V(3).Infof("cache expired for %s (last scraped %v ago)", repoFullName, timeSinceLastScrape)
}
return false, ""
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| func shouldSkipScorecardFetch(ctx api.ScrapeContext, repoFullName string) (bool, string) { | |
| if lastScrape, ok := LastScorecardScrapeTime.Load(repoFullName); ok { | |
| lastTime := lastScrape.(time.Time) | |
| timeSinceLastScrape := time.Since(lastTime) | |
| if timeSinceLastScrape < ScorecardCacheTTL { | |
| return true, fmt.Sprintf("last scraped %v ago (cache TTL: %v)", timeSinceLastScrape, ScorecardCacheTTL) | |
| } | |
| ctx.Logger.V(3).Infof("cache expired for %s (last scraped %v ago)", repoFullName, timeSinceLastScrape) | |
| } | |
| return false, "" | |
| } | |
| func shouldSkipScorecardFetch(ctx api.ScrapeContext, repoFullName string) (bool, string) { | |
| if lastScrape, ok := LastScorecardScrapeTime.Load(repoFullName); ok { | |
| lastTime, ok := lastScrape.(time.Time) | |
| if !ok { | |
| return false, "" | |
| } | |
| timeSinceLastScrape := time.Since(lastTime) | |
| if timeSinceLastScrape < ScorecardCacheTTL { | |
| return true, fmt.Sprintf("last scraped %v ago (cache TTL: %v)", timeSinceLastScrape, ScorecardCacheTTL) | |
| } | |
| ctx.Logger.V(3).Infof("cache expired for %s (last scraped %v ago)", repoFullName, timeSinceLastScrape) | |
| } | |
| return false, "" | |
| } |
🤖 Prompt for AI Agents
In scrapers/openssf/scorecard.go around lines 74 to 84, the code uses an unsafe
type assertion lastScrape.(time.Time) which can panic if the stored value is not
a time.Time; replace it with a safe comma-ok type assertion (e.g., lastTime, ok
:= lastScrape.(time.Time)) and handle the false case by logging a warning
(ctx.Logger.V(2) or similar), removing or ignoring the invalid cache entry, and
proceeding as if there was no last scrape (return false, "") so the function
never panics on bad data.
| func scorecardCheckNameToKebab(name string) string { | ||
| return strings.ToLower(strings.ReplaceAll(name, "-", "-")) | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No-op function: scorecardCheckNameToKebab does nothing meaningful.
This function replaces "-" with "-", which has no effect. If the intent is to convert to kebab-case, the logic needs to be corrected. If this function is not needed, consider removing it.
If the intent is to normalize check names (e.g., spaces or underscores to hyphens):
func scorecardCheckNameToKebab(name string) string {
- return strings.ToLower(strings.ReplaceAll(name, "-", "-"))
+ return strings.ToLower(strings.ReplaceAll(name, " ", "-"))
}Alternatively, if this function is unused, remove it entirely.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| func scorecardCheckNameToKebab(name string) string { | |
| return strings.ToLower(strings.ReplaceAll(name, "-", "-")) | |
| } | |
| func scorecardCheckNameToKebab(name string) string { | |
| return strings.ToLower(strings.ReplaceAll(name, " ", "-")) | |
| } |
| func (e Extract) postProcess(ctx api.ScrapeContext, results v1.ScrapeResults) (v1.ScrapeResults, error) { | ||
|
|
||
| if !e.Config.Transform.Masks.IsEmpty() { | ||
| results, err := e.applyMask(results) | ||
| if err != nil { | ||
| return results, fmt.Errorf("e.applyMask(); %w", err) | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug: applyMask result is shadowed and discarded.
The results variable from applyMask shadows the function parameter, and any error from applyMask causes the function to return the shadowed (potentially empty) results instead of the original. Additionally, if there's no error, the masked results are discarded because the shadowed variable goes out of scope.
Apply this diff to fix the shadowing issue:
func (e Extract) postProcess(ctx api.ScrapeContext, results v1.ScrapeResults) (v1.ScrapeResults, error) {
if !e.Config.Transform.Masks.IsEmpty() {
- results, err := e.applyMask(results)
+ var err error
+ results, err = e.applyMask(results)
if err != nil {
return results, fmt.Errorf("e.applyMask(); %w", err)
}
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| func (e Extract) postProcess(ctx api.ScrapeContext, results v1.ScrapeResults) (v1.ScrapeResults, error) { | |
| if !e.Config.Transform.Masks.IsEmpty() { | |
| results, err := e.applyMask(results) | |
| if err != nil { | |
| return results, fmt.Errorf("e.applyMask(); %w", err) | |
| } | |
| } | |
| func (e Extract) postProcess(ctx api.ScrapeContext, results v1.ScrapeResults) (v1.ScrapeResults, error) { | |
| if !e.Config.Transform.Masks.IsEmpty() { | |
| var err error | |
| results, err = e.applyMask(results) | |
| if err != nil { | |
| return results, fmt.Errorf("e.applyMask(); %w", err) | |
| } | |
| } |
🤖 Prompt for AI Agents
In scrapers/processors/json.go around lines 559-566, the results from
e.applyMask are being captured with a short declaration which shadows the
function parameter and causes masked results to be discarded or an incorrect
value to be returned on error; replace the short declaration with an assignment
to the existing results variable (e.g. results, err = e.applyMask(results)) so
the function parameter is updated with masked data and on error return the
current results and wrapped error as before.
| if logger.V(5).Enabled() { | ||
| logger.Infof("Failed to unmarshal script output: %v\n%s\n", err, s) | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Critical: Potential secrets/PII leakage in debug logs.
Logging the full string s (script output) poses a critical security risk. This output may contain sensitive data such as API keys, tokens, passwords, or PII from the configuration or script results. Even debug-level logs can be exposed through log aggregation systems, CI/CD outputs, or monitoring platforms.
Given the PR context involves GitHub security scrapers, this could leak GitHub tokens or other credentials.
Apply this diff to avoid logging potentially sensitive data:
if err := json.Unmarshal([]byte(s), &results); err != nil {
- if logger.V(5).Enabled() {
- logger.Infof("Failed to unmarshal script output: %v\n%s\n", err, s)
- }
+ if logger.V(5).Enabled() {
+ logger.Infof("Failed to unmarshal script output: %v (output length: %d bytes)", err, len(s))
+ }
return nil, err
}Alternatively, if the output must be logged for debugging, implement sanitization to redact sensitive patterns (tokens, keys, passwords) before logging.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if logger.V(5).Enabled() { | |
| logger.Infof("Failed to unmarshal script output: %v\n%s\n", err, s) | |
| } | |
| if logger.V(5).Enabled() { | |
| logger.Infof("Failed to unmarshal script output: %v (output length: %d bytes)", err, len(s)) | |
| } |
🤖 Prompt for AI Agents
In scrapers/processors/script.go around lines 34-36, avoid logging the full
script output string `s` (which may contain secrets/PII); instead, modify the
logger call to not include `s` directly — either log only the error and a safe
indicator (e.g., length or checksum) or log a sanitized/truncated/redacted
version of `s` after applying pattern-based redaction for tokens/keys/passwords;
keep the logger.V(5).Enabled() guard, and ensure any sanitization function is
applied before building the log message so no raw sensitive content is emitted.
Summary by CodeRabbit
New Features
Bug Fixes
Improvements
✏️ Tip: You can customize this high-level summary in your review settings.