Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Removed WeakReference<IHosts> already cleaned up by GC #4995

Merged
merged 2 commits into from
Apr 4, 2020

Conversation

mstfbl
Copy link
Contributor

@mstfbl mstfbl commented Apr 1, 2020

Fixes #4914

WeakReference objects are stored in _children. Occasionally Garbage Collector cleans up these WeakReferences and change their target value to null, but their WeakReferences in the List _children are never removed. This PR explicity removed those WeakReference objects that have null targets.

List.RemoveAll() has a time-complexity of O(n) where n is the number of objects in List.

@mstfbl mstfbl requested a review from a team as a code owner April 1, 2020 21:58
@antoniovs1029
Copy link
Member

In general, I guess it's ok and that it will work. But I am not 100% sure of having to do this every time a new IHost is Registered. Wonder if there's any performance impact because of this, but probably not, eventhough Register is called several times throughout the codebase. And in the other hand, after quickly looking into the code, I don't know where else could we do this.

Since I am not familiar enough with IHost's I don't know if we could add a mechanism somewhere so that when we're about to stop using the IHost it can report to its "parent" that it's good to remove it from the _children List.

Wonder if @eerhardt has any thoughts on this, since he has recently encountered this issue.

@antoniovs1029
Copy link
Member

Thanks for addressing my comment, and @eerhardt 's comment on multithreading.

I wonder, did you found any performance difference when running this on the original issue scenario?

@mstfbl
Copy link
Contributor Author

mstfbl commented Apr 2, 2020

@antoniovs1029 No worries! I did not see any performance difference when running this on master vs. on my own branch with these changes. I am currently running performance benchmark tests comparing the effect of my changes, and will share the results here when I'm done.

@mstfbl mstfbl mentioned this pull request Apr 2, 2020
@mstfbl
Copy link
Contributor Author

mstfbl commented Apr 3, 2020

I ran the ImageClassificationBench benchmark in Microsoft.ML.Benchmarks following the directions here to compare between the current master branch and the changes I've made in this PR. I ran this benchmark 4-5 times with no other background processes running, and the figures below were roughly unchanged.

Without changes:

Method Mean Error StdDev Extra Metric Gen 0 Gen 1 Gen 2 Allocated
TrainResnetV250 51.22 s 46.27 s 2.536 s - 80000.0000 34000.0000 12000.0000 2.01 GB

With changes:

Method Mean Error StdDev Extra Metric Gen 0 Gen 1 Gen 2 Allocated
TrainResnetV250 52.44 s 78.95 s 4.327 s - 80000.0000 34000.0000 12000.0000 2.01 GB

Explanation of Legends:

// * Legends *
Mean : Arithmetic mean of all measurements
Error : Half of 99.9% confidence interval
StdDev : Standard deviation of all measurements
Extra Metric : Value of the provided extra metric
Gen 0 : GC Generation 0 collects per 1000 operations
Gen 1 : GC Generation 1 collects per 1000 operations
Gen 2 : GC Generation 2 collects per 1000 operations
Allocated : Allocated memory per single operation (managed only, inclusive, 1KB = 1024B)
1 s : 1 Second (1 sec)

Here the results show no significant change in performance with these changes. Hence the changes of this PR do not add significant performance penalties.

@eerhardt
Copy link
Member

eerhardt commented Apr 3, 2020

I wouldn't use ImageClassificationBench as a measurement here. Most of the time in that benchmark isn't spent in ML.NET code, but instead in TensorFlow.

Instead, I would run the following:

https://github.com/dotnet/machinelearning/blob/master/test/Microsoft.ML.Benchmarks/KMeansAndLogisticRegressionBench.cs
https://github.com/dotnet/machinelearning/blob/master/test/Microsoft.ML.Benchmarks/Text/MultiClassClassification.cs
https://github.com/dotnet/machinelearning/blob/master/test/Microsoft.ML.Benchmarks/PredictionEngineBench.cs

Which should give a pretty good indication if anything has regressed.

@mstfbl
Copy link
Contributor Author

mstfbl commented Apr 3, 2020

@eerhardt I ran the KMeansAndLogisticRegressionBench and PredictionEngineBench benchmarks as well, the results are below. Again I do not see any significant changes in performance.

Without changes:
KMeansAndLogisticRegressionBench

Method Mean Error StdDev Extra Metric
TrainKMeansAndLR 43.01 ms 4.002 ms 4.609 ms -

PredictionEngineBench

Method Mean Error StdDev Extra Metric
MakeIrisPredictions 5.614 ms 0.0925 ms 0.0865 ms -
MakeSentimentPredictions 55.626 ms 2.1345 ms 2.4580 ms -
MakeBreastCancerPredictions 2.062 ms 0.0916 ms 0.1055 ms -

With changes:
KMeansAndLogisticRegressionBench

Method Mean Error StdDev Extra Metric
TrainKMeansAndLR 40.00 ms 4.658 ms 5.364 ms -

PredictionEngineBench

Method Mean Error StdDev Extra Metric
MakeIrisPredictions 5.688 ms 0.1077 ms 0.1197 ms -
MakeSentimentPredictions 60.780 ms 0.3018 ms 0.2356 ms -
MakeBreastCancerPredictions 2.218 ms 0.0038 ms 0.0033 ms -

@eerhardt
Copy link
Member

eerhardt commented Apr 3, 2020

MakeSentimentPredictions seems to be outside of the error range. Basically it is 10% slower (55ms to 60ms). Same for MakeBreastCancerPredictions. If you run it multiple times, are these numbers consistent?

@mstfbl
Copy link
Contributor Author

mstfbl commented Apr 3, 2020

@eerhardt I have attached a .xlsx file containing the results for 5 consecutive runs for PredictionEngineBench for both with and without changes (I attempted to share the full results directly in this markdown comment, but the tables don't turn out good.)

(https://github.com/dotnet/machinelearning/files/4430244/Benchmarks.Comparison.xlsx)

Here is a summary averaged over 5 runs:
MakeIrisPredictions increased by 2%
MakeSentimentPredictions increased by 1%
MakeBreastCancerPredictions decreased by 2%.

I first ran 5 runs for without changes, then ran 5 runs for with changes. Between each run I copied the new results to a text file editor. No other major applications were running in the background.

After multiple runs I still believe the run times for both with and without changes are very similar, with slight increases for with changes in runs 2 and 4.

Edit: Replaced copied results with an .xlsx file that does a better job of succinctly displaying results. Thank you @harishsk for the heads up.

@mstfbl mstfbl merged commit 820f1e2 into dotnet:master Apr 4, 2020
@mstfbl mstfbl mentioned this pull request May 26, 2020
@ghost ghost locked as resolved and limited conversation to collaborators Mar 18, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

WeakReference<IHost> memory leak?
4 participants