Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use resource manager to download meta files. Fixes #4234 #4314

Merged
merged 15 commits into from
Oct 28, 2019

Conversation

ashbhandare
Copy link
Contributor

@ashbhandare ashbhandare commented Oct 8, 2019

Previously, when the ImageClassification pipeline is run for the first time, the meta graph of the model (ResnetV2101 or InceptionV3) is downloaded, and in the subsequent runs, it is reused. If the run is interrupted while the download is in progress(eg.: by stopping), the protobuff is partially downloaded. This throws an error when this incomplete graph is attempted to be read in the subsequent runs.

This changes uses resource manager to download the meta file. Resource Manager downloads the file to a temporary file path(GUID) and then atomically moves the temporary file path to the actual path. Hence once a file path is created it is guaranteed to have all the contents (unless the user or someone goes and manipulates the file)

Fixes #4234

@ashbhandare ashbhandare requested a review from a team as a code owner October 8, 2019 21:06
@ashbhandare ashbhandare changed the title WIP: Add functionality for re-downloading incomplete/corrupt pre-trained model files. Fixes #4234 WIP: Add functionality for re-downloading incomplete pre-trained model files. Fixes #4234 Oct 8, 2019
@codecov
Copy link

codecov bot commented Oct 9, 2019

Codecov Report

❗ No coverage uploaded for pull request base (master@46f9a1f). Click here to learn what that means.
The diff coverage is 57.69%.

@@            Coverage Diff            @@
##             master    #4314   +/-   ##
=========================================
  Coverage          ?   74.59%           
=========================================
  Files             ?      878           
  Lines             ?   154279           
  Branches          ?    16876           
=========================================
  Hits              ?   115078           
  Misses            ?    34448           
  Partials          ?     4753
Flag Coverage Δ
#Debug 74.59% <57.69%> (?)
#production 70.18% <57.69%> (?)
#test 89.53% <ø> (?)
Impacted Files Coverage Δ
src/Microsoft.ML.Dnn/DnnCatalog.cs 82.81% <25%> (ø)
src/Microsoft.ML.Dnn/DnnUtils.cs 67.21% <85.71%> (ø)
#Resolved

@@ -89,6 +90,24 @@ internal static Session LoadTFSession(IExceptionContext ectx, byte[] modelBytes,
}
return new Session(graph);
}
internal static void MaybeDownloadFile(Uri address, string fileName)
{
using (WebClient client = new WebClient())
Copy link
Member

@eerhardt eerhardt Oct 9, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See the remarks on https://docs.microsoft.com/en-us/dotnet/api/system.net.webclient

Important
We don't recommend that you use the WebClient class for new development. Instead, use the System.Net.Http.HttpClient class. #Resolved

@ashbhandare ashbhandare changed the title WIP: Add functionality for re-downloading incomplete pre-trained model files. Fixes #4234 Add functionality for re-downloading incomplete pre-trained model files. Fixes #4234 Oct 10, 2019
File.Delete(fileName);
var response = client.GetAsync(address).Result;
using FileStream fileStream = new FileStream(fileName, FileMode.Create, FileAccess.Write, FileShare.None);
using Stream contentStream = response.Content.ReadAsStreamAsync().Result;
Copy link
Member

@codemzs codemzs Oct 14, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

using Stream contentStream = response.Content.ReadAsStreamAsync().Result; [](start = 28, length = 73)

align this line with 110, this is the pattern we follow, this isn't some loop :) #Resolved

{
var response = client.GetAsync(address).Result;
using FileStream fileStream = new FileStream(fileName, FileMode.Create, FileAccess.Write, FileShare.None);
using Stream contentStream = response.Content.ReadAsStreamAsync().Result;
Copy link
Member

@codemzs codemzs Oct 14, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

using Stream contentStream = response.Content.ReadAsStreamAsync().Result; [](start = 24, length = 73)

align this line with 118 #Resolved

// Just by changing/selecting InceptionV3 here instead of
// ResnetV2101 you can try a different architecture/pre-trained
// model.
arch: ImageClassificationEstimator.Architecture.ResnetV2101,
Copy link
Member

@codemzs codemzs Oct 14, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ImageClassificationEstimator.Architecture.ResnetV2101, [](start = 25, length = 55)

please add another test for InceptionV3, also there is a way to run test twice using "theory" with different arguments. See https://stackoverflow.com/questions/22373258/difference-between-fact-and-theory-xunit-net #Resolved

@codemzs
Copy link
Member

codemzs commented Oct 14, 2019

@ashbhandare Change looks great, just some minor comments and thanks for adding this tests that covers the scenario user was hitting even though it is a negative test but important one! #Resolved

@codemzs
Copy link
Member

codemzs commented Oct 17, 2019

@ashbhandare Seems like the error is happening because of the new test you added for InceptionV3. Please note inception V3 downloads meta file and also checkpoint files. Have you figured out why this test is failing?

[xUnit.net 00:06:50.17] Microsoft.ML.Scenarios.TensorFlowScenariosTests.TensorflowRedownloadModelFile(arch: InceptionV3) [FAIL]
X Microsoft.ML.Scenarios.TensorFlowScenariosTests.TensorflowRedownloadModelFile(arch: InceptionV3) [10s 5ms]
Error Message:
System.AccessViolationException : Attempted to read or write protected memory. This is often an indication that other memory is corrupt.
Stack Trace:
at Tensorflow.c_api.TF_TensorData(IntPtr tensor)

@ashbhandare
Copy link
Contributor Author

This error cannot be consistently reproduced. I am working on finding the cause.


In reply to: 543197416 [](ancestors = 543197416)

Copy link
Member

@codemzs codemzs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@codemzs
Copy link
Member

codemzs commented Oct 27, 2019

@ashbhandare is your branch synced to master? It seems some builds are failing because it cannot find LoadImages with 4 arguments.

@codemzs
Copy link
Member

codemzs commented Oct 27, 2019

@ashbhandare I have pushed commits into your branch to sync it with master, resolve build errors and fix some issues around disposing objects and waiting for task to finish. Just giving you heads up ...

@codemzs codemzs changed the title Add functionality for re-downloading incomplete pre-trained model files. Fixes #4234 Use resource manager to download meta files. Fixes #4234 Oct 28, 2019
@codemzs
Copy link
Member

codemzs commented Oct 28, 2019

@ashbhandare I realized we have a resource manager util function that is designed for this kind of scenario. It downloads the file to a temporary file path and then moves the file to the actual file path atomically (if the download was successful) hence when a file path is created it is guaranteed to have all the contents. I have updated the code to use that util function and also created short urls for all our meta files, this is useful incase we wanted to change the directed URL down the road without having to change the code.

@codemzs codemzs merged commit dfa7a52 into dotnet:master Oct 28, 2019
}

DownloadIfNeeded(env, @"tfhub_modules.zip",currentDirectory,@"tfhub_modules.zip",timeout);
if (!Directory.Exists(@"tfhub_modules"))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why download the .zip file, but then decide afterwards not to use it?

Shouldn't you check if (!Directory.Exists(@"tfhub_modules")) before downloaded the .zip file?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea ok 👍🏽 i’ll fix that

@ashbhandare ashbhandare deleted the Issue4234 branch October 28, 2019 18:26
frank-dong-ms-zz pushed a commit to frank-dong-ms-zz/machinelearning that referenced this pull request Nov 4, 2019
…t#4314)

* Add MaybeDownloadFile(), where check to redownload existing  file if file is not of expected size.

* Changed WebClient to HttpClient, renamed function to DownloadIfNeeded.

* Added unit test.

* Fixed newline after test

* Removed asynchronous copy

* added test for InceptionV3, fixed formatting.

* Modify to call DownloadIfNeeded

* fixed unit test, minor formatting

* fix test and change after rebase

* sync to master and use LoadRawImagesBytes instead of LoadImages.

* Dispose HttpClient object and wait for task for finish.

* clean up.

* Use resource manager to download meta files.

* remove test.

* remove unused namespaces.
frank-dong-ms-zz pushed a commit to frank-dong-ms-zz/machinelearning that referenced this pull request Nov 4, 2019
…t#4314)

* Add MaybeDownloadFile(), where check to redownload existing  file if file is not of expected size.

* Changed WebClient to HttpClient, renamed function to DownloadIfNeeded.

* Added unit test.

* Fixed newline after test

* Removed asynchronous copy

* added test for InceptionV3, fixed formatting.

* Modify to call DownloadIfNeeded

* fixed unit test, minor formatting

* fix test and change after rebase

* sync to master and use LoadRawImagesBytes instead of LoadImages.

* Dispose HttpClient object and wait for task for finish.

* clean up.

* Use resource manager to download meta files.

* remove test.

* remove unused namespaces.
@ghost ghost locked as resolved and limited conversation to collaborators Mar 20, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Image Classification API] TensorFlow exception triggered: input ended unexpectedly in the middle of a field
3 participants