Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support normalization of line endings #156

Merged
merged 3 commits into from
Sep 27, 2018
Merged

Support normalization of line endings #156

merged 3 commits into from
Sep 27, 2018

Conversation

ofek
Copy link
Contributor

@ofek ofek commented Sep 26, 2018

Please fill in the fields below to submit a pull request. The more information
that is provided, the better.

Fixes issue #:

No cross-platform support

Description of the changes being introduced by the pull request:

Please verify and check that the pull request fulfills the following
requirements
:

  • The code follows the Code Style Guidelines
  • Tests have been added for the bug fix or new feature
  • Docs have been added for the bug fix or new feature

@trishankatdatadog
Copy link
Contributor

Looks great, thanks, @ofek! Does this address newlines on boundaries, do you know?

@ofek
Copy link
Contributor Author

ofek commented Sep 26, 2018

Yes

@trishankatdatadog
Copy link
Contributor

Dumb question: what if the nth chunk ends with \r and the (n+1)th chunk begins with \n and we're on Windows?

@ofek
Copy link
Contributor Author

ofek commented Sep 26, 2018

Yes and that case is tested (full coverage)

@awwad
Copy link
Contributor

awwad commented Sep 27, 2018

Mechanically:
At a quick glance, it looks right mechanically. Also, there are no resulting TUF or in-toto test failures I've seen (locally, 2.7 and 3.6 TUF, 2.7 in-toto). I doubt the tests encounter these cases (not sure if we should add some), but it means we didn't break anything, so that's good.

Conceptually, I'm a little worried, since this means that these two things hash equivalently. If someone knows this, they can swap in one for the other.... (LMK if my thinking is off.)

turtle
turtle-equivalent

One of these cars doesn't work. This might be too low-level a place to put this code? (Fine for metadata, bad for binary data?)

@SantiagoTorres
Copy link
Collaborator

So, the motivation for this change follows the git usecase, in which windows uses \r\n line-endings to files, whereas *nix uses \n. This would happen, which is troublesome, but it's up to the applications to make sure the replacement only happens on text files, where \n and \r\n have a distinctive usecase.

@awwad
Copy link
Contributor

awwad commented Sep 27, 2018

I'll think about it a little more and look at the stacks this lives in this afternoon, but I think that normalizing data is something that should happen at a higher level, or, at the least, be non-default and well-documented.

@trishankatdatadog
Copy link
Contributor

Our thinking is that in-toto (and, by extension, securesystemslib) should handle this, but we also do not think that this should happen automagically for everyone, which is why we are advocating for the flag approach. We do not think that in-toto integrators should be explicitly aware, by default, of "interesting" edge cases such as this.

@ofek Please let me know if I misunderstood anything.

@awwad
Copy link
Contributor

awwad commented Sep 27, 2018

It's a good point: yes, life should be easy for in-toto integrators and this is a special case we'd like to avoid them having to deal with. Ideally, there'd be somewhere in the in-toto stack itself, though, where we could do this normalization or at least where we could turn this on for metadata as a default-off option. I don't think we want to surprise people with subtle issues letting people easily validate bad targets, either.

@awwad
Copy link
Contributor

awwad commented Sep 27, 2018

My bad: somehow managed to read that the default was to normalize. It's not, so I'm OK with this. :) I'll take one more look and I should be able to merge it shortly this afternoon.

  • Squash no-comment commits into one commit with an expressive commit message
  • Deeper look
  • Local code doc / docstring explains the option
  • (EDIT: Nah.) SSL docs mention the option and indicate when to use it

ofek and others added 3 commits September 27, 2018 14:24
as an additional option to hash.digest_fileobject() (default:
do not normalize).

Also adds test for the above.

Also moves default chunk size variable when hashing (4096) to the
top of the hash.py module as a global.

Signed-off-by: Sebastien Awwad <sebastien.awwad@gmail.com>
per PEP 8 and lab guidelines: 4-space line continuation, 2-space
indent

Signed-off-by: Sebastien Awwad <sebastien.awwad@gmail.com>
in hash.digest_filename(). To avoid security issues.

Signed-off-by: Sebastien Awwad <sebastien.awwad@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants