-
Notifications
You must be signed in to change notification settings - Fork 5
Add pipdeptree integration for transitive dependency discovery #163
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
When processing requirements.txt files, use pipdeptree to discover transitive dependencies that are not explicitly listed. This creates more complete SBOMs by including the full dependency tree. Features: - New _dependency_expansion module with Protocol + Registry pattern - PipdeptreeExpander discovers transitives from installed packages - Step 1.4 in CLI pipeline between generation and hash enrichment - Audit trail records each discovered dependency with parent info - Graceful skip when packages not installed in environment Closes #161 Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR adds pipdeptree integration to discover and include transitive dependencies that are not explicitly listed in requirements.txt files, creating more complete SBOMs.
Changes:
- Implements a Protocol + Registry pattern for dependency expansion with
PipdeptreeExpanderas the first implementation - Adds Step 1.4 to the CLI pipeline between SBOM generation and hash enrichment
- Records discovered dependencies in the audit trail with parent package information
Reviewed changes
Copilot reviewed 10 out of 11 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/test_dependency_expansion.py | Comprehensive test suite for dependency expansion subsystem |
| sbomify_action/cli/main.py | Integrates Step 1.4 transitive dependency discovery into pipeline |
| sbomify_action/_dependency_expansion/registry.py | Registry for managing dependency expander instances |
| sbomify_action/_dependency_expansion/protocol.py | Protocol definition for dependency expander plugins |
| sbomify_action/_dependency_expansion/models.py | Data models for discovered dependencies and expansion results |
| sbomify_action/_dependency_expansion/expanders/pipdeptree.py | pipdeptree-based expander for Python dependencies |
| sbomify_action/_dependency_expansion/expanders/init.py | Expander implementations module exports |
| sbomify_action/_dependency_expansion/enricher.py | Orchestrates dependency expansion and SBOM enrichment |
| sbomify_action/_dependency_expansion/init.py | Module initialization and public API exports |
| pyproject.toml | Adds pipdeptree>=2.0.0 dependency |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Make ExpansionResult.source a required field (no default value) - Add supports_dependency_expansion() function using registry pattern - Replace hardcoded requirements.txt check with registry-based lookup - Update test to explicitly pass source parameter Refs #163 Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 10 out of 11 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Change step_num type hints from int to int | float in console.py and main.py to accurately reflect substep usage (1.4, 1.5) - Add comment documenting the substep pattern (1.x = substeps of Step 1) Refs #163 Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 11 out of 12 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Add comment explaining pipdeptree package_name vs key fallback - Document SPDX ID format assumptions and collision handling - Rename 'seen' to 'seen_package_versions' in test for clarity Refs #163 Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
We require pipdeptree>=2.0.0, so just use the current field name (package_name) without fallback to older versions. Refs #163 Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 11 out of 12 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Simplify version extraction condition (split always produces 2 parts) - Use hyphens instead of stripping separators for SPDX IDs - Add assertion to verify SBOM is unmodified when expansion skipped Refs #163 Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 11 out of 12 changed files in this pull request and generated 7 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| bom = Bom.from_json(sbom_data) | ||
| original_count = len(bom.components) if bom.components else 0 | ||
|
|
||
| # Build set of existing PURLs for deduplication | ||
| existing_purls: set[str] = set() | ||
| if bom.components: | ||
| for comp in bom.components: | ||
| if comp.purl: | ||
| existing_purls.add(str(comp.purl).lower()) | ||
|
|
||
| # Add new components | ||
| added_count = 0 | ||
| audit = get_audit_trail() | ||
|
|
||
| for dep in discovered: | ||
| # Skip if already exists | ||
| if dep.purl.lower() in existing_purls: | ||
| continue | ||
|
|
||
| # Create new component | ||
| try: | ||
| purl_obj = PackageURL.from_string(dep.purl) | ||
| except Exception as e: | ||
| logger.warning(f"Invalid PURL {dep.purl}: {e}") | ||
| continue | ||
|
|
||
| component = Component( | ||
| type=ComponentType.LIBRARY, | ||
| name=dep.name, | ||
| version=dep.version, | ||
| purl=purl_obj, | ||
| ) | ||
|
|
||
| bom.components.add(component) | ||
| added_count += 1 |
Copilot
AI
Feb 11, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bom.components is treated as optional (if bom.components checks), but later bom.components.add(component) assumes it’s always initialized. If the SBOM has no components and Bom.from_json(...) yields components=None, this will raise an AttributeError. Initialize bom.components to an empty collection before calling .add(...) (or use the library’s supported way to ensure the components collection exists).
| # Skip options like -r, -e, --index-url, etc. | ||
| if line.startswith("-"): | ||
| continue | ||
|
|
||
| # Handle environment markers (e.g., requests; python_version >= "3.6") | ||
| if ";" in line: | ||
| line = line.split(";")[0].strip() | ||
|
|
||
| # Parse the requirement line | ||
| name, version = self._parse_requirement_line(line) | ||
| if name: | ||
| deps[name] = version | ||
|
|
||
| return deps |
Copilot
AI
Feb 11, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This parser will mis-handle valid requirements.txt entries that don’t start with - but also aren’t simple name[extras]specifier forms, such as PEP 508 direct references (pkg @ https://...), URL/VCS requirements (git+https://...), or local paths. Those can end up being passed into pipdeptree --packages as “names”, causing pipdeptree to return nothing or fail. Consider using packaging.requirements.Requirement to parse lines and either (a) extract .name while ignoring direct-reference location, or (b) explicitly skip unsupported requirement formats with a debug log.
| from ...tool_checks import check_tool_available | ||
| from ..models import DiscoveredDependency, normalize_python_package_name | ||
|
|
||
| _PIPDEPTREE_AVAILABLE, _PIPDEPTREE_PATH = check_tool_available("pipdeptree") |
Copilot
AI
Feb 11, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_PIPDEPTREE_PATH is computed but never used; both subprocess calls hardcode "pipdeptree". If check_tool_available returns a resolved path (or if PATH differs between checks and runtime), this can cause false positives/negatives. Use _PIPDEPTREE_PATH as the executable in both can_expand() and _run_pipdeptree() (or drop _PIPDEPTREE_PATH entirely if it’s not needed).
| _PIPDEPTREE_AVAILABLE, _PIPDEPTREE_PATH = check_tool_available("pipdeptree") | |
| _PIPDEPTREE_AVAILABLE, _ = check_tool_available("pipdeptree") |
| result = subprocess.run( | ||
| ["pipdeptree", "--json-tree", "--warn", "silence"], | ||
| capture_output=True, | ||
| text=True, | ||
| timeout=30, | ||
| ) |
Copilot
AI
Feb 11, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_PIPDEPTREE_PATH is computed but never used; both subprocess calls hardcode "pipdeptree". If check_tool_available returns a resolved path (or if PATH differs between checks and runtime), this can cause false positives/negatives. Use _PIPDEPTREE_PATH as the executable in both can_expand() and _run_pipdeptree() (or drop _PIPDEPTREE_PATH entirely if it’s not needed).
| packages: Comma-separated list of package names to filter to. | ||
| If None, returns all packages in the environment. | ||
| """ | ||
| cmd = ["pipdeptree", "--json-tree", "--warn", "silence"] |
Copilot
AI
Feb 11, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_PIPDEPTREE_PATH is computed but never used; both subprocess calls hardcode "pipdeptree". If check_tool_available returns a resolved path (or if PATH differs between checks and runtime), this can cause false positives/negatives. Use _PIPDEPTREE_PATH as the executable in both can_expand() and _run_pipdeptree() (or drop _PIPDEPTREE_PATH entirely if it’s not needed).
| cmd = ["pipdeptree", "--json-tree", "--warn", "silence"] | |
| cmd = [_PIPDEPTREE_PATH, "--json-tree", "--warn", "silence"] |
| logger.debug(f"Discovered {len(dependencies)} transitive dependencies") | ||
| return dependencies | ||
| except Exception as e: | ||
| logger.warning(f"Failed to expand {lock_file.name}: {e}") |
Copilot
AI
Feb 11, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When catching a broad exception during expansion, logger.warning(...) drops the traceback, which makes diagnosing failures harder (especially since failures are intentionally non-fatal). Prefer logger.exception(...) here (or logger.warning(..., exc_info=True)) to preserve stack traces in logs.
| logger.warning(f"Failed to expand {lock_file.name}: {e}") | |
| logger.warning(f"Failed to expand {lock_file.name}: {e}", exc_info=True) |
| def _enrich_cyclonedx( | ||
| self, | ||
| sbom_path: Path, | ||
| sbom_data: dict[str, Any], | ||
| discovered: list[DiscoveredDependency], | ||
| source: str, | ||
| ) -> ExpansionResult: | ||
| """Add discovered dependencies to CycloneDX SBOM.""" | ||
| bom = Bom.from_json(sbom_data) |
Copilot
AI
Feb 11, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The PR adds substantial new enrichment behavior (CycloneDX/SPDX mutation, deduplication by PURL/name+version, component/package insertion, SPDXID generation/collision handling), but the added tests mainly cover parsing/registry/skip paths. Add tests that exercise _enrich_cyclonedx and _enrich_spdx end-to-end (e.g., empty/no-components input, deduplication when SBOM already contains the PURL, and verifying audit trail entries are created).
When processing requirements.txt files, use pipdeptree to discover transitive dependencies that are not explicitly listed. This creates more complete SBOMs by including the full dependency tree.
Features:
Closes #161