Perf: Optimization for FactorizationMachine

Benchmarking using VTune has found several bottlenecks in Factrization Machine training algorithm.

- Some phases training that are not parallelized. Consider adding parallel computation.
- Evaluate using AVX/AVX2 (C++ or C# instrinsics) in factorizationmachinenative.dll which currently implements C++ SSE code
- Consider optimizing the following hotspot

Function | Module | CPU Time
-- | -- | --
Microsoft::ML::Internal::Utilities::DoubleParser::TryParseCore | Microsoft.ML.Core.dll | 27.390s
CalculateGradientAndUpdateNative | factorizationmachinenative.dll | 22.609s
HelperImpl::FetchNextField | Microsoft.ML.Data.dll | 13.826s
CalculateIntermediateVariablesNative | factorizationmachinenative.dll | 12.201s
Microsoft::ML::Internal::Utilities::DoubleParser::TryParse | Microsoft.ML.Core.dll | 9.666s















Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Perf: Optimization for FactorizationMachine #3000

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Function	Module	CPU Time
Microsoft::ML::Internal::Utilities::DoubleParser::TryParseCore	Microsoft.ML.Core.dll	27.390s
CalculateGradientAndUpdateNative	factorizationmachinenative.dll	22.609s
HelperImpl::FetchNextField	Microsoft.ML.Data.dll	13.826s
CalculateIntermediateVariablesNative	factorizationmachinenative.dll	12.201s
Microsoft::ML::Internal::Utilities::DoubleParser::TryParse	Microsoft.ML.Core.dll	9.666s

Perf: Optimization for FactorizationMachine #3000

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions