Skip to content

Commit

Permalink
Fix typos in cute docs (NVIDIA#1486)
Browse files Browse the repository at this point in the history
* fix typos in 02_layout_algebra.md

* fix typos in 03_tensor.md
  • Loading branch information
irasin authored May 2, 2024
1 parent 5c447dd commit acc3ee1
Show file tree
Hide file tree
Showing 3 changed files with 15 additions and 11 deletions.
6 changes: 3 additions & 3 deletions media/docs/cute/02_layout_algebra.md
Original file line number Diff line number Diff line change
Expand Up @@ -390,12 +390,12 @@ The elements NOT pointed to by `B` sounds like a complement, `B*`, up to the siz
### Logical Divide 1-D Example
Consider tiling the 1-D layout `A = (2,4,3):(4,1,8)` with the tiler `B = 4:2`. Informally, this means that we have a 1-D vector of 24 elements in some storage order defined by `A` and we want to extract tiles of 4 elements strided by 2.
Consider tiling the 1-D layout `A = (4,2,3):(2,1,8)` with the tiler `B = 4:2`. Informally, this means that we have a 1-D vector of 24 elements in some storage order defined by `A` and we want to extract tiles of 4 elements strided by 2.
This is computed in the three steps described in the implementation above.
* Complement of `B = 4:2` under `size(A) = 24` is `B* = (2,3):(1,8)`.
* Concantenation of `(B,B*) = (4,(2,3)):(2,(1,8))`.
* Composition of `A = (2,4,3):(4,1,8)` with `(B,B*)` is then `((2,2),(2,3)):((4,1),(2,8))`.
* Composition of `A = (4,2,3):(2,1,8)` with `(B,B*)` is then `((2,2),(2,3)):((4,1),(2,8))`.
<p align="center">
<img src="../../images/cute/divide1.png" alt="divide1.png" height="150"/>
Expand All @@ -415,7 +415,7 @@ Similar to the 2-D composition example above, consider a 2-D layout `A = (9,(4,8
The above figure depicts `A` as a 2-D layout with the elements pointed to by `B` highlighted in gray. The layout `B` describes our "tile" of data, and there are twelve of those tiles in `A` shown by each of the colors. After the divide, the first mode of each mode of the result is the tile of data and the second mode of each mode iterates over each tile. In that sense, this operation can be viewed as a kind of `gather` operation or as simply a permutation on the rows and cols.
Note that the first mode of each mode of the result is the sublayout `(3,(2,4)):(236,(13,52))` and is precisely the result we would have received if we had applied `composition` instead of `logical_divide`.
Note that the first mode of each mode of the result is the sublayout `(3,(2,4)):(177,(13,2))` and is precisely the result we would have received if we had applied `composition` instead of `logical_divide`.
### Zipped, Tiled, Flat Divides
Expand Down
20 changes: 12 additions & 8 deletions media/docs/cute/03_tensor.md
Original file line number Diff line number Diff line change
Expand Up @@ -157,8 +157,8 @@ Tensor rmem_4x8_col = make_tensor<float>(Shape<_4,_8>{});
Tensor rmem_4x8_row = make_tensor<float>(Shape<_4,_8>{},
LayoutRight{});
Tensor rmem_4x8_pad = make_tensor<float>(Shape <_4, _8>{},
Stride<_2,_32>{});
Tensor rmem_4x8_like = make_tensor_like(smem_4x8_pad);
Stride<_32,_2>{});
Tensor rmem_4x8_like = make_tensor_like(rmem_4x8_pad);
```

The `make_tensor_like` function makes an owning Tensor of register memory with the same value type and shape as its input `Tensor` argument and attempts to use the same order of strides as well.
Expand All @@ -168,7 +168,7 @@ Calling `print` on each of the above tensors produces similar output
```
rmem_4x8_col : ptr[32b](0x7ff1c8fff820) o (_4,_8):(_1,_4)
rmem_4x8_row : ptr[32b](0x7ff1c8fff8a0) o (_4,_8):(_8,_1)
rmem_4x8_pad : ptr[32b](0x7ff1c8fff920) o (_4,_8):(_2,_32)
rmem_4x8_pad : ptr[32b](0x7ff1c8fff920) o (_4,_8):(_32,_2)
rmem_4x8_like : ptr[32b](0x7f4158fffc60) o (_4,_8):(_8,_1)
```

Expand All @@ -194,8 +194,9 @@ decltype(auto) operator[](Coord const& coord) {
For example, we can read and write to `Tensor`s using natural coordinates, using the variadic `operator()`, or the container-like `operator[]`.
```c++
Tensor A = make_tensor<float>(Shape <Shape < _4,_5>,_13>{},
Tensor A = make_tensor<float>(Shape <Shape < _4,_5>,Int<13>>{},
Stride<Stride<_12,_1>,_64>{});
float* b_ptr = ...;
Tensor B = make_tensor(b_ptr, make_shape(13, 20));
// Fill A via natural coordinates op[]
Expand Down Expand Up @@ -261,8 +262,11 @@ Tensor C = A(_,5);
// (_3,2):(4,1)
Tensor D = A(make_coord(_,_),5);
// (_3,_5):(4,13)
Tensor E = A(make_coord(_,1),make_coord(0,_,1));
// (2,2,_2):(1,_2,100)
Tensor E = A(make_coord(2,_),make_coord(_,3,_));
Tensor F = A(make_coord(2,_),make_coord(_,3,_));
```

<p align="center">
Expand All @@ -280,8 +284,8 @@ To implement generic partitioning of a `Tensor`, we apply composition or tiling
Let's take a tiled example and look at how we can slice it in useful ways.

```cpp
Tensor A = make_tensor(ptr, make_shape(24,8)); // (8,24)
auto tiler = Shape<_8,_4>{}; // (_4,_8)
Tensor A = make_tensor(ptr, make_shape(8,24)); // (8,24)
auto tiler = Shape<_4,_8>{}; // (_4,_8)

Tensor tiled_a = zipped_divide(A, tiler); // ((_4,_8),(2,3))
```
Expand Down Expand Up @@ -313,7 +317,7 @@ Another common partitioning strategy is called a thread-value partitioning. In t
// to 1D coordinates within a 4x8 tensor
// (T8,V4) -> (M4,N8)
auto tv_layout = Layout<Shape <Shape <_2,_4>,Shape <_2, _2>>,
Stride<Stride<_8,_1>,Stride<_4,_16>>{}; // (8,4)
Stride<Stride<_8,_1>,Stride<_4,_16>>>{}; // (8,4)

// Construct a 4x8 tensor with any layout
Tensor A = make_tensor<float>(Shape<_4,_8>{}, LayoutRight{}); // (4,8)
Expand Down
Binary file modified media/images/cute/divide2.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit acc3ee1

Please sign in to comment.