-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Updating "Classifying Names with a Character-Level RNN" #2954
base: main
Are you sure you want to change the base?
Conversation
…ain and test sets as well as simplifying content
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/tutorials/2954
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit 5709eeb with merge base 11d9e5c (): This comment was automatically generated by Dr. CI and updates every 15 minutes. |
Hi @mgs28! Thank you for your pull request and welcome to our community. Action RequiredIn order to merge any pull request (code, docs, etc.), we require contributors to sign our Contributor License Agreement, and we don't seem to have one on file for you. ProcessIn order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA. Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with If you have received this in error or have any questions, please contact us at cla@meta.com. Thanks! |
@svekars - it looks like you are active a lot in this repo, any chance you could help me with this? Thanks! |
…to mgs28-char-rnn-update
Added functionality to process training data in mini batches to satisfy original story. However, had to use numpy + random to create batch indices from a given dataset. Also, simplified training so it was a closer match to https://pytorch.org/tutorials/beginner/basics/optimization_tutorial.html |
@svekars - would you please help me with this tutorial update pull request or point me to someone who could? |
cc: @spro |
…rs, adding device config for CI steps, cleaning up documentatation
@@ -3,6 +3,7 @@ | |||
NLP From Scratch: Classifying Names with a Character-Level RNN | |||
************************************************************** | |||
**Author**: `Sean Robertson <https://github.com/spro>`_ | |||
**Updated**: `Matthew Schultz <https://github.com/mgs28>`_ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we typically don't add Update. Please remove this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@svekars - anything else I can do to make this better? thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@svekars - hello, are there other items I should address here? I appreciate your help with this!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR! At a high level, I have the following comments / concerns:
- I think it's a good idea to update the tutorial to utilize PyTorch's
Dataset
andDataLoader
abstractions.- I'm not sold on the need for the
NameData
class. It adds what I consider unnecessary complexity / code. It's perfectly simple to do any conversions between names <-> tensors as simple standalone functions utilized by the dataset.
- I'm not sold on the need for the
- I agree that proper train / test / validation splits should be done in the tutorial, so that's a nice addition.
- I'm okay with defining a custom
RNN
module for illustration purposes, although in practice we'd encourage the use ofnn.RNN
, and I'm not sure if this existed when the tutorial was originally written. One thing the officially provided module intorch.nn
provides is better perf due to e.g. cuDNN-accelerated kernels. If we don't use this within the tutorial, I think it should at least be mentioned and recommended for the perf benefits. All that said, I have some comments on theRNN
module defined in the tutorial:- It's a bit confusing to redefine it multiple times in the tutorial, adding stuff to it each time. I'd recommend a single definition.
- The
learn()
API does not belong onRNN
and I suggest leaving the training logic in a standalonetrain()
. This way, it's more PyTorch-idiomatic and easier for users to switch to some third-party training API (e.g. ignite, PyTorch Lightning, etc.).
# ASCII). | ||
# | ||
# The first thing we need to define is our data items. In this case, we will create a class called NameData | ||
# which will have an __init__ function to specify the input fields and some helper functions. Our first |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# which will have an __init__ function to specify the input fields and some helper functions. Our first | |
# which will have an ``__init__`` function to specify the input fields and some helper functions. Our first |
# | ||
# The first thing we need to define is our data items. In this case, we will create a class called NameData | ||
# which will have an __init__ function to specify the input fields and some helper functions. Our first | ||
# helper function will be __str__ to convert objects to strings for easy printing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# helper function will be __str__ to convert objects to strings for easy printing | |
# helper function will be ``__str__`` to convert objects to strings for easy printing |
# ``all_categories`` (just a list of languages) and ``n_categories`` for | ||
# later reference. | ||
######################### | ||
#Now we can use that class to create a singe piece of data. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#Now we can use that class to create a singe piece of data. | |
#Now we can use that class to create a single piece of data. |
@@ -181,21 +255,22 @@ def lineToTensor(line): | |||
# | |||
# This RNN module implements a "vanilla RNN" an is just 3 linear layers | |||
# which operate on an input and hidden state, with a ``LogSoftmax`` layer | |||
# after the output. | |||
# after the output.s |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# after the output.s | |
# after the output. |
…ng all RNN definition into one, moving RNN.learn() to separate train()
…ather than building it up
@jbschlosser - thank you for the lovely suggestions to improve. If possible, I'd like to split into two things: First, the edits to my existing content.
Secondly, I would really like to use the nn.RNN if possible. There are very few tutorials that mention them and everyone seems to drive their RNN builds off this tutorial. However to solve this task, I think I need a network with layers like [57, 128, 18] and it looks like the default Elman networks are stuck at [57, 18] with layers. Is best practice to inherit from nn.RNN and add my own fully connected output layer or am I misunderstanding something? Thanks! |
To make it simpler, I assume extending the nn.RNN class might look like (which runs about 40% faster) class MyRNN(nn.RNN):
|
Rather than inherit, we generally encourage composition. In this case, something like:
|
Thanks @jbschlosser ! I used nn.rnn in composition and changed some of the surrounding text. That forced the addition of a few terms to the repo dictionary. I appreciate you teaching me something again and hopefully the tutorial is better for it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks pretty good, thanks for the updates!
The confusion matrix does look quite a bit worse than pre-changes though; can this be entirely attributed to the use of a test set? Maybe we need a bit more time for training to converge?
Co-authored-by: Joel Schlosser <75754324+jbschlosser@users.noreply.github.com>
@jbschlosser - thanks!
all_losses = train(rnn, alldata, n_epoch=100, learning_rate=0.1, report_every=5) and evaluate on alldata then I get a bright diagonal line that looks pretty similar to original. I imagine with some parameter tuning then I could get closer. |
It's a good point that we want to balance this some. That said, I think it'd be nice to be a little bit closer to the original (at least somewhat of a diagonal line confusion matrix). Hopefully we can strike a reasonable balance where we're beginning to see a diagonal line trend. No need to spend a ton of time parameter tuning though :) that's okay left as an exercise to the reader. |
…hanged epochs, training rate, more of split to training data
No problem @jbschlosser - I tuned some of the parameters to get pretty close to a diagonal confusion matrix. It still gets a little confused between English and Scottish as well as Korean and Chinese. However, there's a strong diagonal in the confusion matrix. Thanks! |
Thanks for the updates! I'm good with it; will let @svekars make the final approval :) |
@svekars - do you have any further comments on this update? I'm happy to address. Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An editorial pass, otherwise looks good.
@@ -22,19 +22,6 @@ | |||
of origin, and predict which language a name is from based on the | |||
spelling: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
spelling: | |
spelling. |
# line, mostly romanized (but we still need to convert from Unicode to | ||
# ASCII). | ||
# | ||
# The first thing we need to define and clean our data. First off, we need to convert Unicode to plain ASCII to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# The first thing we need to define and clean our data. First off, we need to convert Unicode to plain ASCII to | |
# The first step is to define and clean our data. Initially, we need to convert Unicode to plain ASCII to |
# ASCII). | ||
# | ||
# The first thing we need to define and clean our data. First off, we need to convert Unicode to plain ASCII to | ||
# limit the RNN input layers. This is accomplished by converting Unicode strings to ASCII and allowing a small set of allowed characters (allowed_characters) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# limit the RNN input layers. This is accomplished by converting Unicode strings to ASCII and allowing a small set of allowed characters (allowed_characters) | |
# limit the RNN input layers. This is accomplished by converting Unicode strings to ASCII and allowing only a small set of allowed characters: |
# Next, we need to combine all our examples into a dataset so we can train, text and validate our models. For this, | ||
# we will use the `Dataset and DataLoader <https://pytorch.org/tutorials/beginner/basics/data_tutorial.html>` classes | ||
# to hold our dataset. Each Dataset needs to implement three functions: __init__, __len__, and __getitem__. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# Next, we need to combine all our examples into a dataset so we can train, text and validate our models. For this, | |
# we will use the `Dataset and DataLoader <https://pytorch.org/tutorials/beginner/basics/data_tutorial.html>` classes | |
# to hold our dataset. Each Dataset needs to implement three functions: __init__, __len__, and __getitem__. | |
# Next, we need to combine all our examples into a dataset so we can train text and validate our models. For this, | |
# we will use the `Dataset and DataLoader <https://pytorch.org/tutorials/beginner/basics/data_tutorial.html>`__ classes | |
# to hold our dataset. Each Dataset needs to implement three functions: ``__init__``, ``__len__``, and ``__getitem__``. |
|
||
|
||
######################### | ||
#Here we can load our example data into the NamesDataset |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#Here we can load our example data into the NamesDataset | |
#Here we can load our example data into the ``NamesDataset``: |
all_losses = [] | ||
# We do this by defining a train() function which trains on a given dataset with minibatches. RNNs | ||
# train similar to other networks so for completeness we include a batched training method here. | ||
# The loop (for i in batch) computes the losses for each of the items in the batch before adjusting the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# The loop (for i in batch) computes the losses for each of the items in the batch before adjusting the | |
# The loop (``for i in batch``) computes the losses for each of the items in the batch before adjusting the |
# We do this by defining a train() function which trains on a given dataset with minibatches. RNNs | ||
# train similar to other networks so for completeness we include a batched training method here. | ||
# The loop (for i in batch) computes the losses for each of the items in the batch before adjusting the | ||
# weights. This is repeated until the number of epochs is reached. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# weights. This is repeated until the number of epochs is reached. | |
# weights. This operation is repeated until the number of epochs is reached. |
s -= m * 60 | ||
return '%dm %ds' % (m, s) | ||
########################################################################## | ||
# We can now train a dataset with mini batches for a specified number of epochs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mini batches or minibatches? I see all three versions online: mini batch, mini-batch, minibatch. It seems like it should be one word and I'd rather not use hyphens, so "minibatch" maybe a better way.
# We can now train a dataset with mini batches for a specified number of epochs | |
# We can now train a dataset with minibatches for a specified number of epochs |
# - Get better results with a bigger and/or better shaped network | ||
# | ||
# - Add more linear layers | ||
# - Vary the hyperparameters to improve performance (e.g. change epochs, batch size, learning rate ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# - Vary the hyperparameters to improve performance (e.g. change epochs, batch size, learning rate ) | |
# - Adjust the hyperparameters to enhance performance, such as changing the number of epochs, batch size, and learning rate |
# - Try the ``nn.LSTM`` and ``nn.GRU`` layers | ||
# - Change the size of the layers (e.g. fewer or more hidden nodes, additional linear layers) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# - Change the size of the layers (e.g. fewer or more hidden nodes, additional linear layers) | |
# - Modify the size of the layers, such as increasing or decreasing the number of hidden nodes or adding additional linear layers |
Fixes #1166
Description
Updating Sean's excellent RNN classification tutorial that is now 8 years old and missing some newer pytorch functionality.
Checklist
cc @albanD