-
Notifications
You must be signed in to change notification settings - Fork 103
Second round of checksum refactoring #827
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure about building a mandatory reset in here, what if the caller wanted both the byte-array and string checksums? Or, if they wanted intermediate checksums so that in a long stream they can still tell that the prefix isn't corrupted even if the final checksum is wrong? (Not an objection -- I don't think we anticipate either use case -- so much as just thinking out loud.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd agree that this could be a surprise. I can see how since this method ends chaining, one might want to do the reset in a single go, but then I'd suggest either passing a boolean or a second method which makes the close explicit. We may want to hold off on those, though, until we have used the new API a bit more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mtbc, @joshmoore I can see your point here. I was going along the line of what Guava did, where calling .hash() more than once on a Hasher object would yield "unspecified" results. I definitely can change the behaviour here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In that case, you would need to cache the results, no? Then the question is what happens if some uses put* after that? If the underlying library doesn't support more than one call to hash then likely we can't (easily) either. In which case, we just need to make that clear that calling either checksumAsBytes or checksumAsString is the end of the line.
|
Good to merge. |
|
I'll review as well as soon as possible. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Raw use of Hex.encodeHexString here was done before the addition of checksumAsString?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, good catch. I'll check if I unintentionally left any other uses of Hex. Will correct in commit, thanks.
|
Few minor comments. In general looks good. The use of |
|
Have you seen |
|
@joshmoore Would it make sense to try and come up with an artificial test case using a big (~ 2 GB) file? I'll look again at the class field types related to storing the byte count... Answering the second question, yes, I've seen that class. I was considering using it in the hash-from-file scenario, but I'm not sure about the performance... The current way of reading the file seems fast compared to others (cf http://nadeausoftware.com/articles/2008/02/java_tip_how_read_files_quickly) |
|
Another javadoc reference to look at: http://docs.guava-libraries.googlecode.com/git-history/release/javadoc/com/google/common/hash/HashCode.html#toString() |
|
Re: performance -- I would suspect that Google IO wouldn't be particular un-tested. Before adding our own 2GB+ tests, I'd vote for code re-use. |
|
@joshmoore Understood. Will refactor the |
|
See gh-851 |
|
@joshmoore #851 can be closed now. I cherry-picked the missing commit from my branch. |
|
Looks good to me for merging, okay with you @joshmoore? |
|
@joshmoore Please do not merge yet. I have one or two commits outstanding that will forbid calling |
|
@joshmoore Added logic for stopping the |
|
@bpindelski: ok to remove the |
Adjust putBytes for truncated byte buffers.
|
@joshmoore I'm ok with removing the |
|
Done |
|
Looks good & no complaints today. Merging. |
Second round of checksum refactoring
This PR introduces further changes in the ome.util.checksum package. It refactors tests for easier extensibility, deprecates methods in Utils.java in favour of commons-codec and improves the usability of the ChecksumProvider interface (now a fluent interface).