Make the DBT dependency optional #412
Description
Alternative title: Sudden dependency bloat in data-diff 4.0 and up
Overview of the problem
Version 0.3.2 of data-diff required 11 dependencies total, most of which are already very popular libraries, like "rich", "click", and "dsnparse".
Version 0.4.0 of data-diff introduced 66 new dependencies (!).
This is due to the introduction of the mandatory "dbt" dependency, even for users who don't plan to use the --dbt
switch.
Implications of not fixing this
-
Minimum load time for the tool increased from
0.165
seconds to2.410
seconds (!!) -
Extra requirements might collide with the requirements of other Python libraries, and make it harder for our non-dbt users to have data-diff installed alongside them.
-
Users who would consider using data-diff as a lightweight tool might be put out by the large number of dependencies.
-
That is also true in regards to being included by default in package managers. For example, in Ubuntu you can
apt install python3-lark
to install Lark. It would have been much harder to include it if it had many dependencies.
Implications of fixing this
Users that don't already have dbt installed, and run --dbt
, will see a message telling them to use pip install data-diff[dbt]
which will install dbt.
However, there is absolutely no point in using the --dbt
switch if you don't already have dbt installed and configured.
Conclusion
I think there is no reason to keep dbt as a dependency, but very good reasons to make it optional.