Skip to content

Conversation

@emmyoop
Copy link
Member

@emmyoop emmyoop commented Dec 1, 2025

Problem

After fixing the core bug in #9104, the check_for_duplicate_packages method still had limitations in its duplicate detection logic:

  1. Naming inconsistencies: Hub packages often use underscores (e.g., dbt-labs/dbt_utils) while git repositories use hyphens (e.g., dbt-utils.git), causing the method to miss duplicates across different package sources.
  2. False positive matches: The substring matching was too loose, causing unintended matches. For example, adding dbt-core would incorrectly match and remove dbt-core-utils.

These issues made it difficult to reliably detect and remove duplicate packages when using dbt deps --add-package, potentially leading to packages being listed multiple times in packages.yml.

Solution

Flexible Variant Matching

  • Generate multiple name variants to handle underscore/hyphen inconsistencies (e.g., dbt_utils ↔ dbt-utils)
  • Check both full package names (dbt-labs/dbt_utils) and short names (dbt_utils)
  • Enables cross-source duplicate detection: hub packages correctly identify and replace git packages with the same underlying package name

Word Boundary Checking

  • Implemented custom word boundary validation using / and . as delimiters
  • Prevents false positives where package names are substrings of other packages
  • Example: dbt-core will not match dbt-core-utils since - is not a word boundary
  • Example: dbt-utils will correctly match dbt-utils.git since . is a word boundary

Enhanced Docstring

  • Clarified that this method only runs during --add-package
  • Documented the cross-source matching behavior
  • Explained the underscore/hyphen variant handling

Note this branched off #12233 and is dependent on those changes. But I separated it out since it's a different fix

Checklist

  • I have read the contributing guide and understand what's expected of me.
  • I have run this code in development, and it appears to resolve the stated issue.
  • This PR includes tests, or tests are not required or relevant for this PR.
  • This PR has no interface changes (e.g., macros, CLI, logs, JSON artifacts, config files, adapter interface, etc.) or this PR has already received feedback and approval from Product or DX.
  • This PR includes type annotations for new and modified functions.

@emmyoop emmyoop requested a review from a team as a code owner December 1, 2025 18:34
@cla-bot cla-bot bot added the cla:yes label Dec 1, 2025
@codecov
Copy link

codecov bot commented Dec 1, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 91.94%. Comparing base (c559848) to head (bb53e7b).
⚠️ Report is 7 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main   #12239   +/-   ##
=======================================
  Coverage   91.94%   91.94%           
=======================================
  Files         203      203           
  Lines       24965    24982   +17     
=======================================
+ Hits        22953    22969   +16     
- Misses       2012     2013    +1     
Flag Coverage Δ
integration 88.83% <100.00%> (+<0.01%) ⬆️
unit 65.21% <100.00%> (+0.02%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
Unit Tests 65.21% <100.00%> (+0.02%) ⬆️
Integration Tests 88.83% <100.00%> (+<0.01%) ⬆️
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@emmyoop emmyoop changed the title Er/9104 part ii Improve --add-package duplicate detection Dec 1, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants