-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support normalization of line endings #156
Conversation
Looks great, thanks, @ofek! Does this address newlines on boundaries, do you know? |
Yes |
Dumb question: what if the nth chunk ends with |
Yes and that case is tested (full coverage) |
Mechanically: Conceptually, I'm a little worried, since this means that these two things hash equivalently. If someone knows this, they can swap in one for the other.... (LMK if my thinking is off.) One of these cars doesn't work. This might be too low-level a place to put this code? (Fine for metadata, bad for binary data?) |
So, the motivation for this change follows the git usecase, in which windows uses \r\n line-endings to files, whereas *nix uses \n. This would happen, which is troublesome, but it's up to the applications to make sure the replacement only happens on text files, where |
I'll think about it a little more and look at the stacks this lives in this afternoon, but I think that normalizing data is something that should happen at a higher level, or, at the least, be non-default and well-documented. |
Our thinking is that in-toto (and, by extension, securesystemslib) should handle this, but we also do not think that this should happen automagically for everyone, which is why we are advocating for the flag approach. We do not think that in-toto integrators should be explicitly aware, by default, of "interesting" edge cases such as this. @ofek Please let me know if I misunderstood anything. |
It's a good point: yes, life should be easy for in-toto integrators and this is a special case we'd like to avoid them having to deal with. Ideally, there'd be somewhere in the in-toto stack itself, though, where we could do this normalization or at least where we could turn this on for metadata as a default-off option. I don't think we want to surprise people with subtle issues letting people easily validate bad targets, either. |
My bad: somehow managed to read that the default was to normalize. It's not, so I'm OK with this. :) I'll take one more look and I should be able to merge it shortly this afternoon.
|
as an additional option to hash.digest_fileobject() (default: do not normalize). Also adds test for the above. Also moves default chunk size variable when hashing (4096) to the top of the hash.py module as a global. Signed-off-by: Sebastien Awwad <sebastien.awwad@gmail.com>
per PEP 8 and lab guidelines: 4-space line continuation, 2-space indent Signed-off-by: Sebastien Awwad <sebastien.awwad@gmail.com>
in hash.digest_filename(). To avoid security issues. Signed-off-by: Sebastien Awwad <sebastien.awwad@gmail.com>
Please fill in the fields below to submit a pull request. The more information
that is provided, the better.
Fixes issue #:
No cross-platform support
Description of the changes being introduced by the pull request:
Please verify and check that the pull request fulfills the following
requirements: