-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix tensorflow test hanging issue #4997
Changes from all commits
1301378
3f2d6e8
0f8668f
62f73bb
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -127,14 +127,22 @@ private async Task<string> DownloadFromUrlWithRetryAsync(IHostEnvironment env, I | |
|
||
for (int i = 0; i < retryTimes; ++i) | ||
{ | ||
var thisDownloadResult = await DownloadFromUrlAsync(env, ch, url, fileName, timeout, filePath); | ||
try | ||
{ | ||
var thisDownloadResult = await DownloadFromUrlAsync(env, ch, url, fileName, timeout, filePath); | ||
|
||
if (string.IsNullOrEmpty(thisDownloadResult)) | ||
return thisDownloadResult; | ||
else | ||
downloadResult += thisDownloadResult + @"\n"; | ||
if (string.IsNullOrEmpty(thisDownloadResult)) | ||
return thisDownloadResult; | ||
else | ||
downloadResult += thisDownloadResult + @"\n"; | ||
|
||
await Task.Delay(10 * 1000); | ||
await Task.Delay(10 * 1000); | ||
} | ||
catch (Exception ex) | ||
{ | ||
// ignore any Exception and retrying download | ||
ch.Warning($"{i+1} - th try: Dowload {fileName} from {url} fail with exception {ex.Message}"); | ||
} | ||
} | ||
|
||
return downloadResult; | ||
|
@@ -257,6 +265,8 @@ private Exception DownloadResource(IHostEnvironment env, IChannel ch, WebClient | |
string tempPath = Path.GetFullPath(Path.Combine(Path.GetDirectoryName(path), "temp-resource-" + guid.ToString())); | ||
try | ||
{ | ||
int blockSize = 4096; | ||
|
||
using (var s = webClient.OpenRead(uri)) | ||
using (var fh = env.CreateOutputFile(tempPath)) | ||
using (var ws = fh.CreateWriteStream()) | ||
|
@@ -268,15 +278,24 @@ private Exception DownloadResource(IHostEnvironment env, IChannel ch, WebClient | |
size = 10000000; | ||
|
||
long printFreq = (long)(size / 10.0); | ||
var buffer = new byte[4096]; | ||
var buffer = new byte[blockSize]; | ||
long total = 0; | ||
int count; | ||
|
||
// REVIEW: use a progress channel instead. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Since you need to check the result after the first call, you can change the loop to a do{} loop instead of a while() loop. If you still want to have a while loop, you can change the loop to a while(true) loop and move this inside the while loop. In either case you won't need to have these three lines repeated twice. |
||
while ((count = s.Read(buffer, 0, 4096)) > 0) | ||
while (true) | ||
{ | ||
var task = s.ReadAsync(buffer, 0, blockSize, ct); | ||
task.Wait(); | ||
int count = task.Result; | ||
|
||
if(count <= 0) | ||
{ | ||
break; | ||
} | ||
|
||
ws.Write(buffer, 0, count); | ||
total += count; | ||
if ((total - (total / printFreq) * printFreq) <= 4096) | ||
if ((total - (total / printFreq) * printFreq) <= blockSize) | ||
ch.Info($"{fileName}: Downloaded {total} bytes out of {size}"); | ||
if (ct.IsCancellationRequested) | ||
{ | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have lots of trouble with failed downloads. Can we make the exceptions more visible?
That is, we should log all the failures as you have done. But if in the end the download still fails after the retries, how about throwing an exception that download failed even after retries?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, we indeed throw exception if retry still fails at line: https://github.com/dotnet/machinelearning/blob/master/src/Microsoft.ML.TensorFlow/TensorflowUtils.cs#L190
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But this download code is used for things other than Tensorflow as well...right? I meant should we throw here? If not make we have to make sure we are throwing whereever this is called from to make sure that download failures are propagated as errors.
In reply to: 403162052 [](ancestors = 403162052)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I can see one other code is also using this. We already have this "downloadResult" returned if download failed after retry, and ResourceDownloadResults will treat returned "downloadResult" as error message and take care the DownloadResults at caller method.
see https://github.com/dotnet/machinelearning/blob/master/src/Microsoft.ML.Transforms/Text/WordEmbeddingsExtractor.cs#L631-L638 for the other caller of this ensure resource.