Skip to content

Conversation

@bpindelski
Copy link

This PR introduces further changes in the ome.util.checksum package. It refactors tests for easier extensibility, deprecates methods in Utils.java in favour of commons-codec and improves the usability of the ChecksumProvider interface (now a fluent interface).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure about building a mandatory reset in here, what if the caller wanted both the byte-array and string checksums? Or, if they wanted intermediate checksums so that in a long stream they can still tell that the prefix isn't corrupted even if the final checksum is wrong? (Not an objection -- I don't think we anticipate either use case -- so much as just thinking out loud.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd agree that this could be a surprise. I can see how since this method ends chaining, one might want to do the reset in a single go, but then I'd suggest either passing a boolean or a second method which makes the close explicit. We may want to hold off on those, though, until we have used the new API a bit more.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mtbc, @joshmoore I can see your point here. I was going along the line of what Guava did, where calling .hash() more than once on a Hasher object would yield "unspecified" results. I definitely can change the behaviour here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case, you would need to cache the results, no? Then the question is what happens if some uses put* after that? If the underlying library doesn't support more than one call to hash then likely we can't (easily) either. In which case, we just need to make that clear that calling either checksumAsBytes or checksumAsString is the end of the line.

@mtbc
Copy link
Member

mtbc commented Mar 8, 2013

Good to merge.

@joshmoore
Copy link
Member

I'll review as well as soon as possible.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Raw use of Hex.encodeHexString here was done before the addition of checksumAsString?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, good catch. I'll check if I unintentionally left any other uses of Hex. Will correct in commit, thanks.

@joshmoore
Copy link
Member

Few minor comments. In general looks good. The use of hashFunction.newHasher makes things quite simple! As an aside, I can say that there have been issues before with file sizes roughly the size of the byte array and larger than 2GB (with int/long issues) so extra caution is needed.

@joshmoore
Copy link
Member

Have you seen File.hash -- http://docs.guava-libraries.googlecode.com/git/javadoc/com/google/common/io/Files.html#hash(java.io.File, com.google.common.hash.HashFunction)

@bpindelski
Copy link
Author

@joshmoore Would it make sense to try and come up with an artificial test case using a big (~ 2 GB) file? I'll look again at the class field types related to storing the byte count... Answering the second question, yes, I've seen that class. I was considering using it in the hash-from-file scenario, but I'm not sure about the performance... The current way of reading the file seems fast compared to others (cf http://nadeausoftware.com/articles/2008/02/java_tip_how_read_files_quickly)

@joshmoore
Copy link
Member

@joshmoore
Copy link
Member

Re: performance -- I would suspect that Google IO wouldn't be particular un-tested. Before adding our own 2GB+ tests, I'd vote for code re-use.

@bpindelski
Copy link
Author

@joshmoore Understood. Will refactor the putBytes(String) method to make use of Guava more. Will also update the Javadoc to warn the consumer about the reset() being called.

@joshmoore
Copy link
Member

See gh-851

@bpindelski
Copy link
Author

@joshmoore #851 can be closed now. I cherry-picked the missing commit from my branch.

@mtbc
Copy link
Member

mtbc commented Mar 12, 2013

Looks good to me for merging, okay with you @joshmoore?

@bpindelski
Copy link
Author

@joshmoore Please do not merge yet. I have one or two commits outstanding that will forbid calling putBytes if it has been already called for a file path. Since we have a good discussion going on here, I don't want to restart it in a new PR.

@bpindelski
Copy link
Author

@joshmoore Added logic for stopping the put*/checksumAs*/put* invocation chain. Still need to fix tests. Will do that AM tomorrow, so maybe please mark as excluded from build? Thanks.

@joshmoore
Copy link
Member

@bpindelski: ok to remove the exclude label?

Blazej Pindelski added 2 commits March 14, 2013 02:12
@bpindelski
Copy link
Author

@joshmoore I'm ok with removing the exclude label now. Thanks.

@joshmoore
Copy link
Member

Done

@joshmoore
Copy link
Member

Looks good & no complaints today. Merging.

joshmoore added a commit that referenced this pull request Mar 15, 2013
Second round of checksum refactoring
@joshmoore joshmoore merged commit 35517cb into ome:dev_4_4 Mar 15, 2013
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants