Skip to content

[DML] Optimizations to improve DML memory utilization #221

@bbernhar

Description

@bbernhar

The following (flame graph) is based on traces of {MobileNetv2, Resnet50v2, DirectMLSuperResolution}.

Issue 1. A large tensor resource being created takes 2x the wall-time due a re-attempt (intel/GPGMM#180).

image

First allocation attempts sub-allocation (blue), followed by another direct allocation (red-ish), about 2x slower. Should check the tensor size ahead of time and directly allocate it rather then attempt to sub-allocate then fallback.

Issue 2. Memory containing resources being created on-demand (intel/GPGMM#110).

image

Biggest cost of tensor creation is creating the heap. Should pre-fetch the next heap so a subsequent request is full-filled before being requested.

Issue 3. Tensors under utilizing memory.

Tensor allocations are 1MB whereas memory they occupy is 4MB (or 4x waste). My goal is to move away from having fixed/default memory sizes like we did in Dawn (now called PreferredResourceHeapSize) and to grow (or increase) them dynamically instead (intel/GPGMM#182).

image

FYI, @fujunwei, @huningxin @RafaelCintron

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions