-
Notifications
You must be signed in to change notification settings - Fork 108
Closed as not planned
Closed as not planned
Copy link
Description
Background
Following the successful implementation of standardized dependency diffing in #126 (see PR #3 on fork), the same normalization pattern can be extended to diff_pkg, diff_url, and diff_pkg_url.
The current pattern established for dependencies:
PM-specific data → normalize_*() → NormalizedPackage → diff_dependencies() → results
This issue proposes extending NormalizedPackage to include all package data, enabling a single normalization step that feeds all diff operations.
Current State
Each PM's diff.py contains ~80-100 lines of nearly identical logic:
diff_pkg: Check if package exists in cache → return(pkg_id, new_pkg | None, update_payload | None)diff_url: Resolve URLs against cache/new_urls → returndict[UUID, UUID](url_type_id → url_id)diff_pkg_url: Link packages to URLs → return(new_links, updates)
The logic is 90%+ identical across crates, homebrew, debian, and pkgx.
Proposed Approach
Extend NormalizedPackage (single dataclass)
Instead of creating separate dataclasses for each operation, extend the existing NormalizedPackage to hold all normalized data:
@dataclass(frozen=True)
class ParsedURL:
url: str
url_type_id: UUID
@dataclass
class NormalizedPackage:
# Package identification
identifier: str # import_id
derived_id: str
name: str
readme: str | None
# URLs for this package
urls: list[ParsedURL]
# Dependencies (already implemented in #126)
dependencies: list[ParsedDependency]Shared diff functions in core/diff.py
def diff_package(
normalized: NormalizedPackage,
cache: Cache,
pm_id: UUID,
now: datetime,
) -> tuple[UUID, Package | None, dict | None]:
"""Shared package diffing logic."""
...
def diff_urls(
urls: list[ParsedURL],
cache: Cache,
new_urls: dict[URLKey, URL],
now: datetime,
) -> dict[UUID, UUID]:
"""Shared URL resolution logic."""
...
def diff_package_urls(
pkg_id: UUID,
resolved_urls: dict[UUID, UUID],
cache: Cache,
now: datetime,
) -> tuple[list[PackageURL], list[dict]]:
"""Shared package-URL linking logic."""
...PM normalizers become complete
Each PM's normalizer.py provides a single function that produces a complete NormalizedPackage:
def normalize_crates_package(crate: Crate, config: Config) -> NormalizedPackage:
"""Convert Crate to complete NormalizedPackage with all fields."""
...PM diff.py after refactor
def diff_pkg(self, pkg: Crate) -> tuple[UUID, Package | None, dict | None]:
normalized = normalize_crates_package(pkg, self.config)
return diff_package(normalized, self.caches, self.config.pm_config.pm_id, self.now)
def diff_url(self, pkg: Crate, new_urls: dict[URLKey, URL]) -> dict[UUID, UUID]:
normalized = normalize_crates_package(pkg, self.config)
return diff_urls(normalized.urls, self.caches, new_urls, self.now)
def diff_pkg_url(self, pkg_id: UUID, resolved_urls: dict[UUID, UUID]) -> tuple[...]:
return diff_package_urls(pkg_id, resolved_urls, self.caches, self.now)Work Items
Core Infrastructure
- Add
ParsedURLdataclass tocore/diff.py - Extend
NormalizedPackagewithderived_id,name,readme,urlsfields - Implement
diff_package()incore/diff.py - Implement
diff_urls()incore/diff.py - Implement
diff_package_urls()incore/diff.py - Add unit tests for shared functions
Package Manager Refactors
- Update crates
normalizer.pyto produce completeNormalizedPackage - Update crates
diff.pyto use shared functions - Update homebrew
normalizer.pyto produce completeNormalizedPackage - Update homebrew
diff.pyto use shared functions - Update pkgx
normalizer.pyto produce completeNormalizedPackage - Update pkgx
diff.pyto use shared functions - Update debian
normalizer.pyto produce completeNormalizedPackage - Update debian
diff.pyto use shared functions
Acceptance Criteria
- All existing tests pass
- Each PM's
diff_pkg,diff_url,diff_pkg_urlreduced to ~3-5 lines each - Shared logic has comprehensive unit tests
- No behavioral changes (same output for same input)
-
NormalizedPackageis the single source of truth for all diff operations
Notes
diff_pkg_urlis already nearly identical across all PMs—easiest to consolidate firstdiff_urlmutates thenew_urlsdict passed in; this side effect should be preserved- Some PMs have slight variations in URL generation (e.g., debian's
_generate_chai_urls)—normalizers handle this - Consider caching the normalized package if multiple diff operations are called sequentially
Metadata
Metadata
Assignees
Labels
No labels