Skip to content

Perf: Optimization for FactorizationMachine #3000

@glebuk

Description

@glebuk

Benchmarking using VTune has found several bottlenecks in Factrization Machine training algorithm.

  • Some phases training that are not parallelized. Consider adding parallel computation.
  • Evaluate using AVX/AVX2 (C++ or C# instrinsics) in factorizationmachinenative.dll which currently implements C++ SSE code
  • Consider optimizing the following hotspot
Function Module CPU Time
Microsoft::ML::Internal::Utilities::DoubleParser::TryParseCore Microsoft.ML.Core.dll 27.390s
CalculateGradientAndUpdateNative factorizationmachinenative.dll 22.609s
HelperImpl::FetchNextField Microsoft.ML.Data.dll 13.826s
CalculateIntermediateVariablesNative factorizationmachinenative.dll 12.201s
Microsoft::ML::Internal::Utilities::DoubleParser::TryParse Microsoft.ML.Core.dll 9.666s

Metadata

Metadata

Assignees

No one assigned

    Labels

    P2Priority of the issue for triage purpose: Needs to be fixed at some point.enhancementNew feature or requestperfPerformance and Benchmarking related

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions