Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

diff everything on removal #367

Open
cboden opened this issue Oct 17, 2023 · 11 comments
Open

diff everything on removal #367

cboden opened this issue Oct 17, 2023 · 11 comments

Comments

@cboden
Copy link

cboden commented Oct 17, 2023

In our CI/CD we're running the command diff-quality --violations sqlfluff --compare-branch=master --fail-under=100 on version 7.7.0 on a branch that has only deleted/removed files. We have hundreds of files. Normally, if any file is modified or added, diff-quality only checks those files and takes several seconds. When the branch only has removals it's scanning everything and taking 40 minutes.

@Bachmann1234
Copy link
Owner

interesting, ill have to find some time to see if I can reproduce this and see whats up

@cboden
Copy link
Author

cboden commented Oct 18, 2023

An update: This issue is non-deterministic.

Scenario: We have a file that would fail a lint/violation check, but it shouldn't be scanned in our given scenario. I've confirmed via git diff --name-only master the file failing is not present, only the removed files are. My expectation is diff-quality would pass 0 files to the violations tool. (as I write this, I realize I'm making some assumptions on how the interaction between diff_cover and the sqlfluff plugin might work)

When attempting to replicate locally, my first attempt running diff-quality passed. I re-traced my steps to ensure I did the same thing as CI/CD and got it to fail on the file that wasn't modified. I then re-ran the exact same command listed in my first comment 5 times in succession. The result was 4 failures and 1 pass.

@cboden
Copy link
Author

cboden commented Oct 18, 2023

Upgraded to 8.0 with same result. Using Python 3.9.9.

@Bachmann1234
Copy link
Owner

Bachmann1234 commented Oct 19, 2023

So its not immediately obvious to me why diff-quality would be acting differently on removed files.

the basic algorithm is

run the quality tool command over the project

Run
'git -c diff.mnemonicprefix=no -c diff.noprefix=no diff --no-color --no-ext-diff -U0 origin/main...HEAD'

compare the lines reported the quality tool to the lines in the diff.

Now, I do think sql fluff is a bit different in that it runs over each and every file. https://github.com/sqlfluff/sqlfluff/blob/9579127595333037aa893adb4384d7612c32e0ed/src/sqlfluff/diff_quality_plugin.py#L63

@barrywhart do you have any insight on anything on this issue?

@cboden do you have an example repo we can use to debug this at all? Also the branch with the deleted files. Can you share if its a ton of deleted files? Or just s small number?

@barrywhart
Copy link
Contributor

No immediate insights, but I see that the SQLFluff plugin logs the SQLFluff command before running it. It may be helpful to look at that.

One random thought (no evidence, just brainstorming) -- is it possible it's running SQLFluff against one or more directory paths, hence processing all the files in that directory?

@barrywhart
Copy link
Contributor

Two more thoughts:

  • A sample repo would definitely be useful. I can't help but wonder if the user's repo or their git installation is corrupted somehow.
  • Maybe the core diff-quality command (not using SQLFluff plugin) has the same issue. As they say, "absence of evidence is not evidence of absence." From this code, it's clear that both branches (batch or non-batch handling) process src_paths_changed.

@cboden
Copy link
Author

cboden commented Nov 24, 2023

do you have an example repo we can use to debug this at all? Also the branch with the deleted files. Can you share if its a ton of deleted files? Or just s small number?

It's a small number and is happening on every PR we open like it. The latest PR had the following changes to a dbt repository:

  • 33 line .sql file removed
  • 15 lines removed from a medium sized .md file
  • 40 lines removed from a medium sized .yml file
  • 5 lines removed from a huge .yml file

@barrywhart
Copy link
Contributor

My guess is that this is not a bug; that the repo is corrupted somehow.

@cboden
Copy link
Author

cboden commented Nov 24, 2023

My guess is that this is not a bug; that the repo is corrupted somehow.

How can we test/confirm this? How would a corruption apply to diff_cover or sqlfluff?

@barrywhart
Copy link
Contributor

Run the same git command that diff-cover uses and see if it includes more files than it should.

@cboden
Copy link
Author

cboden commented Nov 26, 2023

I've confirmed via git diff --name-only master the file failing is not present, only the removed files are.

Is there another git command I should test?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants