-
Notifications
You must be signed in to change notification settings - Fork 82
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
use MD5 hashes when available #58
Conversation
Signed-off-by: Francesc Campoy <campoy@golang.org>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't really know what's going on here but I like how it looks like.
Nice improvement @campoy ! I have tried runnning it with the corrupted (partial) and here is an output
I find it a bit confusing, although #57 should help, but still may be something can be improved here? Here is FS stat
|
} | ||
logrus.Warnf("could not check md5 hashes for %s, comparing timestamps instead: %v", name, err) | ||
|
||
localTime, err := dest.ModTime(name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wandering, what is the reason to use ModTime
in local FS?
As posted in #58 (comment) - it seems that modification time has been updated on my system for some reason (did not do anything with that file), although the index itself is old (and corrupted).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Spent time educating myself about golang/go#3952, POSIX-compatibility and platform-dependent os.FileInfo.Sys()
, which is even supported in HDFS but seems not to provide a creation time (for the same reasons HADOOP-1377).
This makes me wonder, if modification time should work fine in this case, but it just gets updated somewhere, where it should not..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We use the mod time only for file system types (or files) that do not support MD5 hashes.
The main goal is to abstract the behavior so we can add other file system backends easily later on.
|
||
remoteTime, err := source.ModTime(name) | ||
if err != nil { | ||
logrus.Warnf("could not check mod time in %s: %v", dest.Abs(name), err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if first-time user experience with pga get
will be optimal or these 2 logs would be a bit too noisy, what do you think?
As there are no files yet, it will result in #of-siva-files x 2
warnings printed in CLI and that might not be something expected or obviously-safe-to-ignore for the user.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point.
I think it's fine to show that we're downloading stuff, specially for slow connections, since otherwise it might seem like the program is stuck without doing anything.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it will result in #of-siva-files x 2
I think it's fine to show that we're
A bit confused, am I right, reading it that you mean that first-time user experience is fine as it is now?
@@ -66,11 +87,8 @@ func getIndex() (io.ReadCloser, error) { | |||
dest := localFS(filepath.Join(usr.HomeDir, ".pga")) | |||
source := urlFS(indexURL) | |||
|
|||
ok, err := copy(dest, source, indexName) | |||
if !ok { | |||
if err := updateCache(dest, source, indexName); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From the experience running a workshop with people new to src-d stack, one common pitfall I noticed (2 out of 10 new users) is that after running pga
for the first time, users do not realize it's actually downloading an index, as there is no CLI feedback and it takes some time - and just perceive it as hang and do Ctrl+C
.
On the next run, pga list
does not work for them, as local index is corrupted.
Ofcourse having index md5 will catch that later in #57 , but while we are here, it could be easy to mitigate this type of failure by just printing some CLI output like "Index downloading in progress, please wait."
For sure, this could be done in subesequent PRs, but with the design proposed here updateCache
already has debug logging built-in but it is used for both, index and .siva files download, while the CLI output is desired only for the index file.
Wondering what would be the best way to do that? Pass isIndex
to updateCache
and conditionally logrus.Info from there or is there a better way?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wait, that's exactly what the comment above does, right?
In case where the index is non existent or not up to date we warn that we're downloading a new version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wait, that's exactly what the comment above does, right?
only after #57
But for now I think it's still prone to the failure described above.
func copy(dest, source FileSystem, name string) (bool, error) { | ||
// updateCache checks whether a new version of the file in url exists and downloads it | ||
// to dest. It returns an error when it was not possible to update it. | ||
func updateCache(dest, source FileSystem, name string) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is naming like copyMaybe
a thing in Golang? Might come handy explaining this case
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
copyMaybe
doesn't sound idiomatic, I've never seen it before.
I think updateCache
is pretty clear, but I'm happy to modify it if you think it's not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Up to you! It's just a common naming convention in many programming languages that origins from functional programming, to signify a function that have conditional side-effects.
See examples in:
maybe_download_iris_data
in Tensorflow Pythonmaybe_download_dbpedia
in Tensorflow PythonresizeMeMaybe
in Guava Java
Quick search over stdlib shows some hits in both public and private APIs:
@bzz: regarding the failure when the index is corrupt there's probably some work to improve the usability. Thanks for the thorough review! |
I think #57 would avoid that. Suggestion in #58 (comment) was about adding user feedback on index file downloading. |
Fixes #36
Fixes #37
Signed-off-by: Francesc Campoy campoy@golang.org