-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Closed
Labels
P2Priority of the issue for triage purpose: Needs to be fixed at some point.Priority of the issue for triage purpose: Needs to be fixed at some point.enhancementNew feature or requestNew feature or requestperfPerformance and Benchmarking relatedPerformance and Benchmarking related
Description
Benchmarking using VTune has found several bottlenecks in Factrization Machine training algorithm.
- Some phases training that are not parallelized. Consider adding parallel computation.
- Evaluate using AVX/AVX2 (C++ or C# instrinsics) in factorizationmachinenative.dll which currently implements C++ SSE code
- Consider optimizing the following hotspot
| Function | Module | CPU Time |
|---|---|---|
| Microsoft::ML::Internal::Utilities::DoubleParser::TryParseCore | Microsoft.ML.Core.dll | 27.390s |
| CalculateGradientAndUpdateNative | factorizationmachinenative.dll | 22.609s |
| HelperImpl::FetchNextField | Microsoft.ML.Data.dll | 13.826s |
| CalculateIntermediateVariablesNative | factorizationmachinenative.dll | 12.201s |
| Microsoft::ML::Internal::Utilities::DoubleParser::TryParse | Microsoft.ML.Core.dll | 9.666s |
Metadata
Metadata
Assignees
Labels
P2Priority of the issue for triage purpose: Needs to be fixed at some point.Priority of the issue for triage purpose: Needs to be fixed at some point.enhancementNew feature or requestNew feature or requestperfPerformance and Benchmarking relatedPerformance and Benchmarking related