Divide 1D tensor into more than 2 TPC instances

I noticed that for a 1D input tensor, we can define index space in such a way, that max 2 TPC cores are utilized (as in example https://docs.habana.ai/en/latest/TPC/TPC_User_Guide/TPC_Programming_Model.html#index-space-mapping). To use 4 TPCs, tensor must be 2D. What I want to achieve is to have a 1D tensor and divide the load equally into all TPC cores. So for a 1D tensor of shape size 512 want each TPC core to handle 64 elements. But all I can accomplish is 2 TPC each handling 256 elements. Why is that?

```
int elementsInVec = 64;
unsigned depthIndex = (outputSizes[0] + (elementsInVec - 1)) / elementsInVec;
kernel->indexSpaceGeometry.dims = 1;
kernel->indexSpaceGeometry.sizes[0] = depthIndex;

kernel->inputTensorAccessPattern[0].dim[0].dim      = 0;
kernel->inputTensorAccessPattern[0].dim[0].start_a  = elementsInVec;
kernel->inputTensorAccessPattern[0].dim[0].end_a    = elementsInVec;
kernel->inputTensorAccessPattern[0].dim[0].start_b  = 0;
kernel->inputTensorAccessPattern[0].dim[0].end_b    = elementsInVec - 1;
```

I defined the mapping as:

- startF(x) = 64*x + 0
- endF(x) = 64*x+63

but it seems that it is ignored and instead it behaves more as if the mapping was:

- startF(x) = 256*x + 0
- endF(x) = 256*x+255

What values is x actually gonna be? [0,1] ? What is wrong with my code? Is it even possible to launch 8 TPC for a data layout like this?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Divide 1D tensor into more than 2 TPC instances #22

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development