Skip to content

Divide 1D tensor into more than 2 TPC instances #22

Open
@mcisowsk

Description

I noticed that for a 1D input tensor, we can define index space in such a way, that max 2 TPC cores are utilized (as in example https://docs.habana.ai/en/latest/TPC/TPC_User_Guide/TPC_Programming_Model.html#index-space-mapping). To use 4 TPCs, tensor must be 2D. What I want to achieve is to have a 1D tensor and divide the load equally into all TPC cores. So for a 1D tensor of shape size 512 want each TPC core to handle 64 elements. But all I can accomplish is 2 TPC each handling 256 elements. Why is that?

int elementsInVec = 64;
unsigned depthIndex = (outputSizes[0] + (elementsInVec - 1)) / elementsInVec;
kernel->indexSpaceGeometry.dims = 1;
kernel->indexSpaceGeometry.sizes[0] = depthIndex;

kernel->inputTensorAccessPattern[0].dim[0].dim      = 0;
kernel->inputTensorAccessPattern[0].dim[0].start_a  = elementsInVec;
kernel->inputTensorAccessPattern[0].dim[0].end_a    = elementsInVec;
kernel->inputTensorAccessPattern[0].dim[0].start_b  = 0;
kernel->inputTensorAccessPattern[0].dim[0].end_b    = elementsInVec - 1;

I defined the mapping as:

  • startF(x) = 64*x + 0
  • endF(x) = 64*x+63

but it seems that it is ignored and instead it behaves more as if the mapping was:

  • startF(x) = 256*x + 0
  • endF(x) = 256*x+255

What values is x actually gonna be? [0,1] ? What is wrong with my code? Is it even possible to launch 8 TPC for a data layout like this?

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions