-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixed & Reactivated AutoFitImageClassificationTrainTest hanging by freeing old Tensor objects #4978
Fixed & Reactivated AutoFitImageClassificationTrainTest hanging by freeing old Tensor objects #4978
Conversation
…Container in used cases
…cationModelParameters
I am now disposing of Tensor models and their C-library objects in |
6e64d60
to
8660ecc
Compare
…ificationTrainTest-memoryFix
src/Microsoft.ML.AutoML/API/ExperimentResults/CrossValidationExperimentResult.cs
Outdated
Show resolved
Hide resolved
…://github.com/mstfbl/machinelearning into AutoFitImageClassificationTrainTest-memoryFix
src/Microsoft.ML.AutoML/API/ExperimentResults/CrossValidationExperimentResult.cs
Outdated
Show resolved
Hide resolved
src/Microsoft.ML.AutoML/API/ExperimentResults/CrossValidationExperimentResult.cs
Outdated
Show resolved
Hide resolved
…t .Dispose() calls
With the current implementation, modelFileInfo is never null as in the case where the given root directory is null, a temp file in a temp path is returned. In reply to: 614421392 [](ancestors = 614421392) Refers to: src/Microsoft.ML.AutoML/Experiment/Runners/RunnerUtil.cs:22 in 087c0d5. [](commit_id = 087c0d5, deletion_comment = False) |
src/Microsoft.ML.AutoML/API/ExperimentResults/ExperimentResult.cs
Outdated
Show resolved
Hide resolved
src/Microsoft.ML.AutoML/API/ExperimentResults/ExperimentResult.cs
Outdated
Show resolved
Hide resolved
Yes, but that knowledge does not exist here in this function. modelFileInfo gets passed without any validation to ModelContainer. Tomorrow, someone else may make a code change without realizing that. In reply to: 614783589 [](ancestors = 614783589,614421392) Refers to: src/Microsoft.ML.AutoML/Experiment/Runners/RunnerUtil.cs:22 in 087c0d5. [](commit_id = 087c0d5, deletion_comment = False) |
…ificationTrainTest-memoryFix
Codecov Report
@@ Coverage Diff @@
## master #4978 +/- ##
==========================================
- Coverage 75.81% 73.32% -2.49%
==========================================
Files 993 1007 +14
Lines 181224 188128 +6904
Branches 19510 20246 +736
==========================================
+ Hits 137387 137937 +550
- Misses 38538 44665 +6127
- Partials 5299 5526 +227
|
AutoFitImageClassificationTrainTest
is occasionally hanging, even after PR #4939 . The issue here is that Tensor objects saved inITransformer model
(now renamed as 'estimatorModel`) are not being automatically freed by C#'s Garbage Collector, as these Tensor objects are made in TensorFlow's C libraries.Edited on 4/9/2020: This PR makes ExperimentResult and CrossValidationExperimentResult implements IDisposable to free remaining C Tensor objects in memory in a deterministic manner. This method of freeing these objects ensures the user will not face null-reference/use-after-free errors when trying to access the model, as this clean up is done after GC runs.
Edited on 4/9/2020: I confirmed that this fixes the hanging "out of memory" and/or "long running test" issues by running
AutoFitImageClassificationTrainTest
for 100 iterations, in addition to running all the other unit tests in this build. In all of these builds, none of the issues described occur. These builds all time-out because running 100 iterations ofAutoFitImageClassificationTrainTest
takes more than 1 hour.I have also reactivated the
AutoFitImageClassificationTrainTest
unit test with this fix.**Edit: **