Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a testing case for a future check integrity flag #8633

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

fbidu
Copy link

@fbidu fbidu commented Jul 26, 2020

Hello all!

I was working on the EuroPython Packaging Sprint today and found issue #8579.

While the discussion is still young, I tried to come up with a failing test case for what the issue proposes.

Given this is my first attempt at working with pip code itself, I'm not at all familiar with the codebase and there are probably cleaner ways to achieve my goal. Also, the semantics of this new checking - a flag? a subcommand? - must be discussed. I simply guessed a check --integrity. What I tried to do was:

  1. Install a package
  2. Delete one of its files
  3. Run check --integrity
  4. Assert it returned an error

Feedback is very much appreciated!

Thanks

This commit adds a testing case for issue pypa#8579
Co-authored-by: Sviatoslav Sydorenko <wk.cvs.github@sydorenko.org.ua>
fbidu and others added 3 commits July 26, 2020 19:46
While the feature is not fully developed, this test is expected to fail

Co-authored-by: Sviatoslav Sydorenko <wk.cvs.github@sydorenko.org.ua>
Co-authored-by: Sviatoslav Sydorenko <wk.cvs.github@sydorenko.org.ua>
Co-authored-by: Sviatoslav Sydorenko <wk.cvs.github@sydorenko.org.ua>
@webknjaz
Copy link
Member

@fbidu for CIs to pass, you also need to import pytest at the top of the test module you're editing.

@fbidu
Copy link
Author

fbidu commented Aug 10, 2020

@fbidu for CIs to pass, you also need to import pytest at the top of the test module you're editing.

Oh, my bad. Fixed it now @webknjaz

@fbidu fbidu changed the title WIP: Add a testing case for a future check integrity flag Add a testing case for a future check integrity flag Aug 10, 2020

@pytest.mark.xfail(
reason='Not yet implemented: https://github.com/pypa/pip/pull/8633',
strict=True,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please add raises= arg here indicating what sort of exception is currently expected to happen?

I guess it's some kind of a subprocess error, right?

reason='Not yet implemented: https://github.com/pypa/pip/pull/8633',
strict=True,
)
def test_check_integrity_errors_on_missing_files(data, script, tmpdir):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By the way, this is a negative test. It's probably reasonable to add a positive test too (for when the integrity is fine)

@webknjaz
Copy link
Member

@pradyunsg WDYT?

@webknjaz
Copy link
Member

@fbidu I think you'll need to try implementing the feature because just a test case for something that does not exist won't get approvals from the pip maintainers.

@fbidu
Copy link
Author

fbidu commented Aug 12, 2020

@fbidu I think you'll need to try implementing the feature because just a test case for something that does not exist won't get approvals from the pip maintainers.

well, it makes sense. I'm a total newbie on this code base but I could give it a try!

I think it would be better to go back to #8579 and discuss it a little bit further because right now we don't have much definition about what we want. What do you think, @webknjaz @pradyunsg

@fbidu
Copy link
Author

fbidu commented Aug 12, 2020

I could even expand my very informal research on how other package managers handle this feature - if they handle at all - so that we can have more basis to discuss this

@pradyunsg
Copy link
Member

pradyunsg commented Aug 12, 2020

This test case looks great @fbidu!

However, as @webknjaz noted, I'm not sure adding in a test for a feature that hasn't been implemented I'm not keen on doing that given that our tests already take... an hour in CI. :)

I think looking at how other package managers handle this sort of thing (as well as, the issue about pip being able to overwrite files - on mobile, can't look it up right now) would be great.

As for whether we make this a subcommand or a flag, I'm not 100% sure myself. The easy answer is probably make it a flag (i.e. opt-in) because there's less possibility of disruption. OTOH, I think this won't be very disruptive, do making it the default with an "opt-out" would also work.

@webknjaz
Copy link
Member

I'd say just implement it with a flag and change if decided during the review stage. Swapping between flag vs command shouldn't be too much burden.

@pradyunsg
Copy link
Member

pradyunsg commented Aug 13, 2020

To be clear, I'm saying we'd pick one of (a) and (b), for "enabled / not enabled":

(a) pip check --file-integrity / pip check -- opt-in
(b) pip check / pip check --no-file-integrity -- opt-out

@pfmoore
Copy link
Member

pfmoore commented Aug 13, 2020

I'd prefer (a). I'd assume such a check would both check that files existed and confirm they matched the hashes in RECORD, so it would be costly and I'd rather that be opt-in.

If we want to only check file existence, then I'm less worried about performance, but I would say that it shouldn't be called "check integrity" as that implies (to me at least) a stronger check than we'd be doing.

@webknjaz
Copy link
Member

@pfmoore: how about pip check just looks up the files on disks and pip check --file-integrity also checks their hashes?

@pfmoore
Copy link
Member

pfmoore commented Aug 13, 2020

Short version: We have no use case for hash checking, and #8579 said the feature may want to be optional as it's potentially costly. So my view is we should have a flag to enable this check, but as I said above, I don't like the term "integrity" for it because it implies a hash check which we're not doing.

The bit below the break is what I wrote before I realised we'd digressed into talking about hash checks that no-one had asked for. Feel free to ignore it. Probably a good illustration of why we should be careful to focus on actual use cases 🙂


I don't have much use for pip check myself, so I'm not the best judge, but for me it would depend on cost. Does checking all of the installed files (even just existence check) have a significant overhead beyond the current pip check? #8579 suggests so, and states that it may be reasonable to make the new check optional as a result. If so, then we should be careful about adding that extra check by default, and without users being able to switch it off.

Basically, I don't think we should be making up answers here, we need proper use cases. The original use case for pip check was purely about dependencies, because the legacy resolver doesn't always install a consistent set of versions. That's still important, as a transition mechanism to allow people to ensure that their systems are valid when moving to the new resolver¹. #8579 flagged a new use case, checking for missing files. Nobody has actually pointed out a need for hash checking - my original comment was that I assumed it would check hashes, and if it didn't, then I didn't like the name --file-integrity because it felt like it should. Reading the background in more detail as part of writing this reply, I discovered that it probably wouldn't be doing hash checks. So I'll go with what @ssbarnea said in #8579 combined with my comment above - it should probably be optional, but I don't like the flag name "integrity" because it implies a better check than is actually being done.

¹ Even longer-term it may remain valid, because of flags like --no-deps

@fbidu
Copy link
Author

fbidu commented Aug 14, 2020

Hello all!

Well, as @pfmoore pointed out, the original issue asked for optional checking of the files that should be present. I think the optional bit is important, checking for all the files in a big package is probably going to be costly. That said, throwing a hash for integrity checking in the mix makes sense to me. This could be implemented as an yet another optional flag or as a 'second-pass' mechanism - first we check if all the files are present, halt if they are not and only if all the files are there, we hash them. But this is just something I made up right now.

As for checking how our package-manager neighbours behave is something I find it interesting and my help us gauge how - and if - this issue is handled by them. I did a short field test with Yarn, Cargo, Mix and NPM when I started this, I can now take a closer look at them and bring the results. I'm considering:

  1. Create a new project and add a dependency
  2. Removing a file, corrupting another
  3. Is there a command that will detected a removed or corrupted file?
  4. Is there a command that fixes it?
  5. Does the packaging/compiling/building step breaks?

What do you all think?

@webknjaz
Copy link
Member

@fbidu this sounds good. I haven't heard of Mix, though. Did you mean Nix?

@fbidu
Copy link
Author

fbidu commented Aug 15, 2020

@fbidu this sounds good. I haven't heard of Mix, though. Did you mean Nix?

It is Elixir's Mix. I have played around with Elixir for the last few months, building micro-services at my current company. It is a very interesting language and Mix provides a nice "developer ergonomics", if that's a thing

@Yuvraj786
Copy link

Why you guys didn't merge this pull request.

@fbidu
Copy link
Author

fbidu commented Jun 30, 2021

Why you guys didn't merge this pull request.

Because it is not complete in itself. I created this as a means to test a future implementation of the linked issue, but we decided it does not make much sense to add a test for a feature we are not implementing right now.

That said, @Yuvraj786, thanks for reminding me of this. This PR flew out of my attention but I'll get back to it

@github-actions github-actions bot added the needs rebase or merge PR has conflicts with current master label Nov 19, 2021
@webknjaz
Copy link
Member

@fbidu I think it might be a good idea to rebase this PR to see what's changed over time.

@@ -10,6 +10,7 @@ __pycache__/
*.eggs
*.egg-info/
MANIFEST
pip-wheel-metadata/
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can drop this now, FWIW.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
pip-wheel-metadata/

@webknjaz
Copy link
Member

@fbidu do you think you could rebase this effort?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs rebase or merge PR has conflicts with current master
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants