-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update tensors and autograd #1185
Conversation
Deploy preview for pytorch-tutorials-preview ready! Built with commit 9568cd2 https://deploy-preview-1185--pytorch-tutorials-preview.netlify.app |
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I only had time to look at the tensor section, will do the autograd later.
# | ||
# **Docs Issues** - https://pytorch.org/docs/stable/tensor_attributes.html | ||
# is not comprehensive (missing data, grad, grad_fn, shape). Contains | ||
# ``memory_format`` which is not an attribute |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does that mean that we have no way to know the memory format of a given Tensor?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like it, unless I'm mistaken
In [52]: a = a.to(memory_format=torch.channels_last)
In [53]: a.memory_format
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-53-a1dabca59cd9> in <module>
----> 1 a.memory_format
AttributeError: 'Tensor' object has no attribute 'memory_format'
on Tensors. It is a define-by-run framework, which means that your backprop is | ||
defined by how your code is run, and that every single iteration can be | ||
different. | ||
model = torchvision.models.resnet18(pretrained=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you need a bit more text explanation here - either in text or in code comments.
very roughly something like:
"Let's start by looking at the a single training step. Here we load an existing pretrained resnet18 model, create random input data tensor of shape 3x64x64, and some random output labels."
...
Then we might want to pause again and say "here we push the data forward through the model, calculate the loss, and then call .backward()
to collect the gradients for each parameter in the model. <more detail here on what backward does (i.e. where are the gradients stored?>
....
Then we say: "Finally we load in an optimizer, in this case SGD, with a learning rate and momentum of ..., and call .step() to use our gradients and step backwards"
That's a basic model, below we dive a bit deeper into what is going on when we call .backward()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with Randall here
# The ``requires_grad`` flag lets autograd know | ||
# if we need gradients w.r.t. these tensors. If it is ``True``, autograd |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't appear to render correctly as text in the preview: https://deploy-preview-1185--pytorch-tutorials-preview.netlify.app/beginner/blitz/autograd_tutorial.html#differentiation-in-autograd
# tensors. By tracing this graph from roots to leaves, you can | ||
# automatically compute the gradients using the chain rule. | ||
# | ||
# In a forward pass, autograd does two things simultaneously: \* run the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are the * here supposed to render as bullet points? they don't appear to in the preview.
# root. Autograd then \* computes the gradients from each ``.grad_fn``, \* | ||
# accumulates them in the respective tensor’s ``.grad`` attribute, and \* | ||
# using the chain rule, propagates all the way to the leaf tensors. | ||
# |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there any pictures or graphics that would work to show this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added an image of the DAG. Would you say the description is accurate and informative? cc: @soumith
print(a.requires_grad==True) | ||
b = x + z | ||
print("Does `b` require gradients?") | ||
print(b.requires_grad==True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would it be helpful to show the output here?
Also would it be sufficient to simply say print(a.requires_grad)
?
|
||
Tensors | ||
^^^^^^^ | ||
Tensors are similar to NumPy’s ndarrays, except that tensors can run on |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be worth adding "on GPUs or other specialized hardware"
############################################################### | ||
# Construct a matrix filled zeros and of dtype long: | ||
tnsr_from_data = torch.tensor(data) | ||
tnsr_from_np = torch.from_numpy(np_array) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it worth calling out that you can also go back to numpy? I'm not sure it is but just throwing it out there for consideration. Maybe just link to the "bridge back to numpy" section below.
|
||
############################################################### | ||
# Addition: syntax 2 | ||
shape = (2,3,) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you should add a very brief explanation of shape here. It will be obvious to most but for some it a one sentence explanation will provide a lot of value.
# | ||
# You can use standard NumPy-like indexing with all bells and whistles! | ||
# Tensor attributes describe their shape data type and where they live. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
need commas after shape and data type - "where they live" makes sense to me but I worry it might not translate well... maybe "on which device the tensor is stored" or something similar.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see comments inline
# Tensor Initialization | ||
# ~~~~~~~~~~~~~~~~~~~~~ | ||
# | ||
# Tensors can be initialized in various ways. Take a look at the following examples |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
each introduction of a concept followed by a code snippet. one at a time.
Cleanly separating introducing concepts is way more important than minimizing code.
Tensors can be initialized in various ways. Take a look at the following examples
Directly from data
data = [[1, 2],[3, 4]]
x_data = torch.tensor(data)
print(x_data)
From a NumPy array:
x_np = np.array(data)
x_data = torch.from_numpy(x_np)
print(x_data)
|
||
############################################################### | ||
# Construct a matrix filled zeros and of dtype long: | ||
tnsr_from_data = torch.tensor(data) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the use of the variable name tnsr
all over the tutorial is just weird. Either call it tensor
or call it x
# or create a tensor based on an existing tensor. These methods | ||
# will reuse properties of the input tensor, e.g. dtype, unless | ||
# new values are provided by user | ||
try: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you are trying to make a point that typecasting will throw a runtime error, please do so in a Note.
Actually showcasing that behavior is not needed, and the try/catch with a verbose RuntimeError logic is more intimidating than introducing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, showcasing typecasting here is not needed; the point here is about overriding properties during tensor init. I am removing the highlight on typecasting
rand_tnsr = torch.rand(shape) | ||
ones_tnsr = torch.ones(shape) | ||
zeros_tnsr = torch.zeros(shape) | ||
print(f"Random Tensor:\n{rand_tnsr} \n\n \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
three separate print statments, one in each line are cleaner and better.
print(tnsr) | ||
|
||
###################################################################### | ||
# **Both of these are joining ops, but they are subtly different.** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Introduce one joining op in words, and in code, provide a reference to the other. The tutorial needs to introduce just-enough to a new user.
You can join two tensors with torch.cat
, which concatenates them in a given dimension.
t1 = torch.cat([tnsr, tnsr], dim=1)
print(t1)
Also see torch.stack
, another subtly different tensor joining operation.
PyTorch is a Python-based scientific computing package for two broad purposes: | ||
|
||
- A replacement for NumPy to use the power of GPUs. | ||
- A deep learning research platform that provides maximum flexibility and speed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An automatic differentiation library that is often useful to implement neural networks
A Gentle Introduction to Autograd | ||
--------------------------------- | ||
|
||
Autograd is PyTorch’s automatic differentiation engine that powers |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix: The torch.autograd
package is PyTorch's automatic differentiation engine
Context for Suraj: this is the real autograd package: https://github.com/HIPS/autograd , we used the name, somewhat accidentally. So, it's important to introduce torch.autograd
or as the autograd
pacakge, not just Autograd. Please adjust references in this tutorial to Autograd as needed after this context.
on Tensors. It is a define-by-run framework, which means that your backprop is | ||
defined by how your code is run, and that every single iteration can be | ||
different. | ||
model = torchvision.models.resnet18(pretrained=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with Randall here
# .. math:: | ||
# | ||
# | ||
# \frac{\partial Q}{\partial a} = 9a^2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
first list the original equation, for clarity, i.e. Q = 3a^3 - b^2
param.requires_grad = False | ||
# Replace the last fully-connected layer | ||
# Parameters of nn.Module instances have requires_grad=True by default | ||
model.fc = nn.Linear(512, 100) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
after this line, it's worth stating what happened
Now, all parameters in the model, except the parameters of model.fc do not compute gradients. All these parameters that don't compute gradients are usually called frozen parameters.
The only parameters that compute gradients are the weights and bias of model.fc. model.fc being the last layer of this network is often called as the classifier.
Adds a DAG figure for autograd, fixes formatting issues, improve readability
* Update tensor_tutorial.py * Update autograd_tutorial.py * Fix import * Fix import * Fix import * Fixes issues in tensor, updates autograd * Adds "what is pytorch" to homepage * Fixes formatting * Fix typo * Fixes typo * fix failing test * Make suggested changes Adds a DAG figure for autograd, fixes formatting issues, improve readability Co-authored-by: Brian Johnson <brianjo@fb.com>
Refactor content of tensors and autograd in 60 Minute Blitz