Skip to content

Commit

Permalink
[Concept Entry] PyTorch Tensors (#4942)
Browse files Browse the repository at this point in the history
* Create tensors.md

* Update tensors.md

* Update tensors.md

* Update tensors.md

* Update tensors.md

* Update tensors.md

* Update catalog-content.md

* Update content/pytorch/concepts/tensors/tensors.md

Co-authored-by: Savi Dahegaonkar <124272050+SaviDahegaonkar@users.noreply.github.com>

* Update content/pytorch/concepts/tensors/tensors.md

Co-authored-by: Savi Dahegaonkar <124272050+SaviDahegaonkar@users.noreply.github.com>

* Update content/pytorch/concepts/tensors/tensors.md

Co-authored-by: Savi Dahegaonkar <124272050+SaviDahegaonkar@users.noreply.github.com>

* Update tensors.md

* Update tensors.md

* Update content/pytorch/concepts/tensors/tensors.md

Co-authored-by: John Rood <mr.john.rood@gmail.com>

---------
  • Loading branch information
noahpgordon authored Oct 4, 2024
1 parent 1c638a9 commit b8916c1
Show file tree
Hide file tree
Showing 2 changed files with 231 additions and 0 deletions.
229 changes: 229 additions & 0 deletions content/pytorch/concepts/tensors/tensors.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,229 @@
---
Title: 'tensors'
Description: 'A tensor is a multi-dimensional array of values of the same type.'
Subjects:
- 'Data Science'
- 'Machine Learning'
- 'AI'
Tags:
- 'AI'
- 'Data Structures'
- 'Arrays'
- 'Deep Learning'
CatalogContent:
- 'intro-to-py-torch-and-neural-networks'
- 'py-torch-for-classification'
- 'paths/machine-learning'
---

**Tensors** are multi-dimensional arrays containing values of the same data type. These types include booleans, floats, and integers. Tensors are a fundamental data structure in PyTorch. When building a machine learning model in PyTorch, all input data, output data, and model weights are stored as tensors.

1-dimensional tensors, which contain a single row of data, are known as **vectors**. 2-dimensional tensors, which contain arbitrarily many 1-dimensional tensors, are known as **matrices**. Higher (n ≥ 3) dimensional tensors, which contain arbitrarily many lower (n - 1) dimensional tensors, are simply known as tensors.

## Creating and Initializing Tensors

### Data-Agnostic Tensors

Tensors can be created with several methods. If the values within the tensor do not matter, `torch.empty()` should be used:

```psuedo
myTensor = torch.empty(dimension1Size, dimension2Size, dimensionNSize)
```

This creates a tensor, `myTensor`, with whatever data values happen to already be present at the location in memory where the tensor is created. Each dimension must be given a size (e.g. `dimension1Size`), which specifies how many data slots that dimension will hold.

The following demonstrates creating a matrix (2-dimensional tensor) with dimension sizes of 2 and 3:

```py
import torch

myTensor = torch.empty(2,3)
print(myTensor)
```

This results in the following output:

```shell
tensor([[0., 0., 0.],
[0., 0., 0.]])
```

> **Note:** The values contained in this tensor are not guaranteed and depend on the values already present at the relevant location in memory.
### Data-Specific Tensors

If the values within a tensor do matter, use the following methods:

- For all zeros: `torch.zeros()`
- For all ones: `torch.ones()`
- For specified values: `torch.tensor()`
- For all randomly-generated values: `torch.rand()`

The following examples demonstrate creating data-specific tensors:

```py
import torch

print(torch.zeros(3,2),"\n")
print(torch.ones(1,4),"\n")
print(torch.tensor([[1,0,1.56],[3.24, 0.5, 6]]),"\n")
print(torch.rand(2,2,2))
```

This results in the following output:

```shell
tensor([[0., 0.],
[0., 0.],
[0., 0.]])

tensor([[1., 1., 1., 1.]])

tensor([[1.0000, 0.0000, 1.5600],
[3.2400, 0.5000, 6.0000]])

tensor([[[0.2608, 0.5454],
[0.2085, 0.7901]],

[[0.8943, 0.7604],
[0.7584, 0.1678]]])
```

## Tensor Datatypes

The datatype of a tensor determines the types of values that it can contain. By default, tensors contain 32-bit floats. However, tensors can also contain the following datatypes:

- Floats (floating point numbers)
- 16-bit: `dtype=torch.float16`
- 32-bit: `dtype=torch.float32` (default)
- 64-bit: `dtype=torch.float64`
- Complex numbers
- 32-bit: `dtype=torch.complex32`
- 64-bit: `dtype=torch.complex64`
- 128-bit: `dtype=torch.complex128`
- Signed and unsigned integers
- 8-bit: `dtype=torch.int8` and `dtype=torch.uint8`
- 16-bit: `dtype=torch.int16` and `dtype=torch.uint16`
- 32-bit: `dtype=torch.int32` and `dtype=torch.uint32`
- 64-bit: `dtype=torch.int64` and `dtype=torch.uint64`
- Quantized integers
- 8-bit: `dtype=torch.qint8`
- 32-bit: `dtype=torch.qint32`
- Booleans: `dtype=torch.bool`

### Declaring and Altering Tensor Datatypes

Tensor datatypes can be declared at the creation of a datatype, and can be changed with the method `.to()`:

```py
import torch

#Create a boolean tensor by explicit declaration
myBoolTensor = torch.tensor([[1, 0],[0,0]], dtype=torch.bool)

#Create a float16 tensor by explicit declaration
myExFloat16Tensor = torch.tensor([[3.4, 0.9], [2.3, 1.3]], dtype=torch.float16)

#Convert the float16 tensor to an int16 tensor using .to()
myExFloat16Tensor = myExFloat16Tensor.to(torch.int16)

print(myBoolTensor,"\n")
print(myExFloat16Tensor)
```

This results in the following output:

```shell
tensor([[ True, False],
[False, False]])

tensor([[3, 0],
[2, 1]], dtype=torch.int16)
```

Notice that when printing a tensor with a numerical datatype other than the default `torch.float32`, the datatype will be specified in the output.

## Tensor Shapes

The shape of a tensor is determined by its number of dimensions and the size of each dimension. To illustrate, the tensor `torch.rand(2, 6, 4)` has the same shape as the tensor `torch.ones(2, 6, 4)`, whereas these have different shapes from the tensors `torch.rand(2, 6)` and `torch.ones(3, 6, 5)`.

Tensor shapes can be accessed via the `.shape` method:

```py
import torch

myTensor = torch.zeros(2, 4, 6, 7)
print(myTensor.shape)
```

This results in the following output:

```shell
torch.Size([2, 4, 6, 7])
```

Operations with multiple tensors, like adding or multiplying tensors, generally require the tensors involved to have the same shapes.

### Altering Tensor Shapes

PyTorch provides methods for altering the shape of a tensor, provided that the total number of elements in the tensor remains the same. The total number of elements in a tensor is equal to the product of all dimension sizes. So, for example, a tensor with shape [4, 1, 6] contains 4 x 1 x 6 = 24 elements.

Since dimensions of with a size of 1 do not alter the total number of elements in a tensor, PyTorch provides methods for adding and removing dimensions of size 1. The method `.unsqueeze_(n)` adds a dimension of size 1 at the nth place, shifting displiced dimension sizes to the right. Conversely, the method `.squeeze_(n)` removes a dimension of size 1 at the nth place, shifting displaced dimension sizes to the left. Note that the numbering of dimensions begins at 0.

The following demonstrates the effects of `.unsqueeze_()` and `.squeeze_()` on tensor shapes:

```py
import torch

myTensor1 = torch.empty(8, 2, 8, 1)
print(myTensor1.shape)

myTensor1.unsqueeze_(1)
print(myTensor1.shape)

myTensor1.squeeze_(4)
print(myTensor1.shape)
```

This results in the following output:

```shell
torch.Size([8, 2, 8, 1])
torch.Size([8, 1, 2, 8, 1])
torch.Size([8, 1, 2, 8])
```

One can also create new tensors by squeezing or unsqueezing other tensors. To do so, use the methods `.squeeze()` and `.unsqueeze()` (no underlines) respectively.

Tensor shapes can also be changed arbitrarily using `.reshape()`, provided that the total number of elements remains constant:

```py
import torch

myTensor = torch.empty(8, 2, 8, 1)
print(myTensor.shape)

myTensor = myTensor.reshape(128)
print(myTensor.shape)

myTensor = myTensor.reshape(64,2)
print(myTensor.shape)
```

This results in the following output:

```shell
torch.Size([8, 2, 8, 1])
torch.Size([128])
torch.Size([64, 2])
```

## Operations with Tensors

Arithmetic operations can be performed between tensors and numbers (e.g. multiplying a tensor by 4) or between multiple tensors.

To perform operations between multiple tensors, all tensors must be of the same shape or they must be **broadcastable**. Tensors that have different shapes are broadcastable when the only differences in tensor shape between the tensors, when comparing dimensions from right-to-left, are either: (a) missing dimensions (one tensor has a dimension that another lacks); or (b) mismatches with dimensions of size 1 (one tensor has a dimension with a different size than another, but that size is 1).

To illustrate, tensors with shapes [2, 3, 4] and [2, 3, 5] are not mutually broadcastable, because the last dimension has a difference that is not covered by either (a) or (b). By contrast, all of the following are mutually broadcastable shapes: [2, 3, 4], [3, 4], [1, 1, 1], and [1, 4]. Relative to the original shape [2, 3, 4], the shape [3, 4] is covered by (a), the shape [1, 1, 1] is covered by (b), and the shape [1, 4] is covered by both (a) for missing dimension 0, and (b) for the mismatch between the sizes dimension 1.

PyTorch contains hundreds of operations for tensors. Details for some important ones are covered on the page [Pytorch Tensor Operations](https://www.codecademy.com/resources/docs/pytorch/tensor-operations).
2 changes: 2 additions & 0 deletions documentation/catalog-content.md
Original file line number Diff line number Diff line change
Expand Up @@ -201,8 +201,10 @@ Feel free to add suggestions for new slugs to the lists as part of your PR! Be s
```
- 'intro-to-py-torch-and-neural-networks'
- 'learn-python-3'
- 'py-torch-for-classification'
- 'paths/computer-science'
- 'paths/data-science'
- 'paths/machine-learning'
```

### Python:Seaborn
Expand Down

0 comments on commit b8916c1

Please sign in to comment.