Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed and Added unit tests for EnsureResourceAsync hanging issue #4943

Merged

Conversation

mstfbl
Copy link
Contributor

@mstfbl mstfbl commented Mar 13, 2020

Fixed EnsureResourceAsync by removing the helper function ResourceManagerUtils.CheckValidDownload and by using WebClientResponseUri (an extension of WebClient). Also added unit test for EnsureResourceAsync.

ResourceManagerUtils.EnsureResourceAsync(...) appends the relative URL of a resource to the $MlNetResourcesUrl environment variable (equal to "https://aka.ms/mlnet-resources/" by default).

private const string DefaultUrl = "https://aka.ms/mlnet-resources/";

For example, this occurs in the unit test WordEmbeddingsTest.TestWordEmbeddings(), where sentimend.emd is downloaded from https://aka.ms/mlnet-resources/Text/Sswe/sentiment.emd .

There is a possibility of this combined URL not correctly pointing to a resource, which is why this CheckValidDownload function exists:

private static string CheckValidDownload(IChannel ch, string filePath, string url, ref bool deleteNeeded)
{
// If the relative url does not exist, aka.ms redirects to www.microsoft.com. Make sure this did not happen.
// If the file is big then it is definitely not the redirect.
var info = new FileInfo(filePath);
if (info.Length > 4096)
return null;
string error = null;
using (var r = new StreamReader(filePath))
{
var text = r.ReadToEnd();
if (text.Contains("<head>") && text.Contains("<body>") && text.Contains("microsoft.com"))
error = $"The url '{url}' does not exist. Url was redirected to www.microsoft.com.";
}
deleteNeeded = error != null;
return error;
}

This function was not correctly warning that the HTML of www.microsoft.com was downloaded (instead of an intended data file), as the size of microsoft.com is bigger than 4 kilobytes, and the site should have the sub-strings <head and <body (not and ).

@mstfbl mstfbl requested a review from a team as a code owner March 13, 2020 23:33
@mstfbl mstfbl changed the title Fixed CheckValidDownload in ResourceManagerUtils.cs Fixed and Added unit tests for DownloadFromUrlWithRetryAsync hanging issue Mar 17, 2020
Copy link
Contributor

@harishsk harishsk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🕐

@mstfbl mstfbl changed the title Fixed and Added unit tests for DownloadFromUrlWithRetryAsync hanging issue Fixed and Added unit tests for EnsureResourceAsync hanging issue Mar 24, 2020
@mstfbl
Copy link
Contributor Author

mstfbl commented Mar 24, 2020

I have added a check to ensure only resources downloaded from the "aka.ms" host can use EnsureResourceAsync(). I have also edited the unit test for EnsureResourceAsync() to reflect this.

Copy link
Contributor

@harishsk harishsk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@mstfbl mstfbl merged commit 22c7ac8 into dotnet:master Mar 25, 2020
@ghost ghost locked as resolved and limited conversation to collaborators Mar 18, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants