-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use resource manager to download meta files. Fixes #4234 #4314
Conversation
Codecov Report
@@ Coverage Diff @@
## master #4314 +/- ##
=========================================
Coverage ? 74.59%
=========================================
Files ? 878
Lines ? 154279
Branches ? 16876
=========================================
Hits ? 115078
Misses ? 34448
Partials ? 4753
|
src/Microsoft.ML.Dnn/DnnUtils.cs
Outdated
@@ -89,6 +90,24 @@ internal static Session LoadTFSession(IExceptionContext ectx, byte[] modelBytes, | |||
} | |||
return new Session(graph); | |||
} | |||
internal static void MaybeDownloadFile(Uri address, string fileName) | |||
{ | |||
using (WebClient client = new WebClient()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See the remarks on https://docs.microsoft.com/en-us/dotnet/api/system.net.webclient
Important
We don't recommend that you use the WebClient class for new development. Instead, use the System.Net.Http.HttpClient class. #Resolved
src/Microsoft.ML.Dnn/DnnUtils.cs
Outdated
File.Delete(fileName); | ||
var response = client.GetAsync(address).Result; | ||
using FileStream fileStream = new FileStream(fileName, FileMode.Create, FileAccess.Write, FileShare.None); | ||
using Stream contentStream = response.Content.ReadAsStreamAsync().Result; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
using Stream contentStream = response.Content.ReadAsStreamAsync().Result; [](start = 28, length = 73)
align this line with 110, this is the pattern we follow, this isn't some loop :) #Resolved
src/Microsoft.ML.Dnn/DnnUtils.cs
Outdated
{ | ||
var response = client.GetAsync(address).Result; | ||
using FileStream fileStream = new FileStream(fileName, FileMode.Create, FileAccess.Write, FileShare.None); | ||
using Stream contentStream = response.Content.ReadAsStreamAsync().Result; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
using Stream contentStream = response.Content.ReadAsStreamAsync().Result; [](start = 24, length = 73)
align this line with 118 #Resolved
// Just by changing/selecting InceptionV3 here instead of | ||
// ResnetV2101 you can try a different architecture/pre-trained | ||
// model. | ||
arch: ImageClassificationEstimator.Architecture.ResnetV2101, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ImageClassificationEstimator.Architecture.ResnetV2101, [](start = 25, length = 55)
please add another test for InceptionV3, also there is a way to run test twice using "theory" with different arguments. See https://stackoverflow.com/questions/22373258/difference-between-fact-and-theory-xunit-net #Resolved
@ashbhandare Change looks great, just some minor comments and thanks for adding this tests that covers the scenario user was hitting even though it is a negative test but important one! #Resolved |
@ashbhandare Seems like the error is happening because of the new test you added for InceptionV3. Please note inception V3 downloads meta file and also checkpoint files. Have you figured out why this test is failing? [xUnit.net 00:06:50.17] Microsoft.ML.Scenarios.TensorFlowScenariosTests.TensorflowRedownloadModelFile(arch: InceptionV3) [FAIL] |
This error cannot be consistently reproduced. I am working on finding the cause. In reply to: 543197416 [](ancestors = 543197416) |
d2e1dcb
to
78ebb74
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ashbhandare is your branch synced to master? It seems some builds are failing because it cannot find LoadImages with 4 arguments. |
@ashbhandare I have pushed commits into your branch to sync it with master, resolve build errors and fix some issues around disposing objects and waiting for task to finish. Just giving you heads up ... |
…ile is not of expected size.
@ashbhandare I realized we have a resource manager util function that is designed for this kind of scenario. It downloads the file to a temporary file path and then moves the file to the actual file path atomically (if the download was successful) hence when a file path is created it is guaranteed to have all the contents. I have updated the code to use that util function and also created short urls for all our meta files, this is useful incase we wanted to change the directed URL down the road without having to change the code. |
} | ||
|
||
DownloadIfNeeded(env, @"tfhub_modules.zip",currentDirectory,@"tfhub_modules.zip",timeout); | ||
if (!Directory.Exists(@"tfhub_modules")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why download the .zip file, but then decide afterwards not to use it?
Shouldn't you check if (!Directory.Exists(@"tfhub_modules"))
before downloaded the .zip file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yea ok 👍🏽 i’ll fix that
…t#4314) * Add MaybeDownloadFile(), where check to redownload existing file if file is not of expected size. * Changed WebClient to HttpClient, renamed function to DownloadIfNeeded. * Added unit test. * Fixed newline after test * Removed asynchronous copy * added test for InceptionV3, fixed formatting. * Modify to call DownloadIfNeeded * fixed unit test, minor formatting * fix test and change after rebase * sync to master and use LoadRawImagesBytes instead of LoadImages. * Dispose HttpClient object and wait for task for finish. * clean up. * Use resource manager to download meta files. * remove test. * remove unused namespaces.
…t#4314) * Add MaybeDownloadFile(), where check to redownload existing file if file is not of expected size. * Changed WebClient to HttpClient, renamed function to DownloadIfNeeded. * Added unit test. * Fixed newline after test * Removed asynchronous copy * added test for InceptionV3, fixed formatting. * Modify to call DownloadIfNeeded * fixed unit test, minor formatting * fix test and change after rebase * sync to master and use LoadRawImagesBytes instead of LoadImages. * Dispose HttpClient object and wait for task for finish. * clean up. * Use resource manager to download meta files. * remove test. * remove unused namespaces.
Previously, when the ImageClassification pipeline is run for the first time, the meta graph of the model (ResnetV2101 or InceptionV3) is downloaded, and in the subsequent runs, it is reused. If the run is interrupted while the download is in progress(eg.: by stopping), the protobuff is partially downloaded. This throws an error when this incomplete graph is attempted to be read in the subsequent runs.
This changes uses resource manager to download the meta file. Resource Manager downloads the file to a temporary file path(GUID) and then atomically moves the temporary file path to the actual path. Hence once a file path is created it is guaranteed to have all the contents (unless the user or someone goes and manipulates the file)
Fixes #4234