Updating "Classifying Names with a Character-Level RNN" #2954

mgs28 · 2024-06-24T16:36:08Z

Description

Updating Sean's excellent RNN classification tutorial that is now 8 years old and missing some newer pytorch functionality.

Cannot use default Dataloader to select batch sizes because "stack expects each tensor to be of equal size" but each of the names are of different length. However, updated the code to use mini batches without Dataloader functionality.
Introducing pytorch's Datasets class, we show how to split the data into train and test datasets which changes the training explanation.
Rewrote pieces of the tutorial to use three classes to improve re-use (Data, DataSet and RNN).
Added a little more explanation to how RNNs score multi-character strings and their 2D matrix of tensors.
Changed evaluation from random training examples to an entire the test set.
removed some of the command line explanations since notebooks are used more often.
tried to preserve as much of the original text, functions and style as possible.

Checklist

The issue that is being fixed is referred in the description (see above "Fixes #ISSUE_NUMBER")
Only one issue is addressed in this pull request
Labels from the issue that this PR is fixing are added to this pull request
No unnecessary issues are included into this pull request.

cc @albanD

…ain and test sets as well as simplifying content

…block

pytorch-bot · 2024-06-24T16:36:11Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/tutorials/2954

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 5709eeb with merge base 11d9e5c ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

facebook-github-bot · 2024-06-24T16:36:14Z

Hi @mgs28!

Thank you for your pull request and welcome to our community.

Action Required

In order to merge any pull request (code, docs, etc.), we require contributors to sign our Contributor License Agreement, and we don't seem to have one on file for you.

Process

In order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA.

Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with CLA signed. The tagging process may take up to 1 hour after signing. Please give it that time before contacting us about it.

If you have received this in error or have any questions, please contact us at cla@meta.com. Thanks!

mgs28 · 2024-06-26T01:25:52Z

@svekars - it looks like you are active a lot in this repo, any chance you could help me with this? Thanks!

…optimizer

…to mgs28-char-rnn-update

mgs28 · 2024-06-28T01:10:34Z

Added functionality to process training data in mini batches to satisfy original story. However, had to use numpy + random to create batch indices from a given dataset.

Also, simplified training so it was a closer match to https://pytorch.org/tutorials/beginner/basics/optimization_tutorial.html

mgs28 · 2024-07-08T19:36:03Z

@svekars - would you please help me with this tutorial update pull request or point me to someone who could?

svekars · 2024-07-08T20:55:47Z

cc: @spro

…a better confusion matrix

mgs28 · 2024-07-10T16:35:35Z

Sorry about the spelling errors! I ran pyspelling and re-ran make html for the tutorial. This should pass those CI steps now.

I also added a story for me to come back and update the CONTRIBUTING.md to include some of these checks. (#2969)

Thanks @spro @svekars !

…rs, adding device config for CI steps, cleaning up documentatation

mgs28 · 2024-07-11T22:17:50Z

@spro and @svekars - I significantly cut the training time although it is faster on my CPU than GPU. It runs in 72 seconds on my local CPU. I also added some default device so it looks for your CUDA build machines to hopefully make it faster.

Thanks!

svekars · 2024-07-15T17:26:48Z

intermediate_source/char_rnn_classification_tutorial.py

@@ -3,6 +3,7 @@
 NLP From Scratch: Classifying Names with a Character-Level RNN
 **************************************************************
 **Author**: `Sean Robertson <https://github.com/spro>`_
+**Updated**: `Matthew Schultz <https://github.com/mgs28>`_ 


we typically don't add Update. Please remove this

@svekars - anything else I can do to make this better? thanks!

@svekars - hello, are there other items I should address here? I appreciate your help with this!

jbschlosser

Thanks for the PR! At a high level, I have the following comments / concerns:

I think it's a good idea to update the tutorial to utilize PyTorch's Dataset and DataLoader abstractions.
- I'm not sold on the need for the NameData class. It adds what I consider unnecessary complexity / code. It's perfectly simple to do any conversions between names <-> tensors as simple standalone functions utilized by the dataset.
I agree that proper train / test / validation splits should be done in the tutorial, so that's a nice addition.
I'm okay with defining a custom RNN module for illustration purposes, although in practice we'd encourage the use of nn.RNN, and I'm not sure if this existed when the tutorial was originally written. One thing the officially provided module in torch.nn provides is better perf due to e.g. cuDNN-accelerated kernels. If we don't use this within the tutorial, I think it should at least be mentioned and recommended for the perf benefits. All that said, I have some comments on the RNN module defined in the tutorial:
- It's a bit confusing to redefine it multiple times in the tutorial, adding stuff to it each time. I'd recommend a single definition.
- The learn() API does not belong on RNN and I suggest leaving the training logic in a standalone train(). This way, it's more PyTorch-idiomatic and easier for users to switch to some third-party training API (e.g. ignite, PyTorch Lightning, etc.).

jbschlosser · 2024-09-09T16:49:01Z

intermediate_source/char_rnn_classification_tutorial.py

+# ASCII).
+#
+# The first thing we need to define is our data items. In this case, we will create a class called NameData 
+# which will have an __init__ function to specify the input fields and some helper functions. Our first 


Suggested change

# which will have an __init__ function to specify the input fields and some helper functions. Our first

# which will have an ``__init__`` function to specify the input fields and some helper functions. Our first

jbschlosser · 2024-09-09T16:49:09Z

intermediate_source/char_rnn_classification_tutorial.py

+#
+# The first thing we need to define is our data items. In this case, we will create a class called NameData 
+# which will have an __init__ function to specify the input fields and some helper functions. Our first 
+# helper function will be __str__ to convert objects to strings for easy printing 


Suggested change

# helper function will be __str__ to convert objects to strings for easy printing

# helper function will be ``__str__`` to convert objects to strings for easy printing

jbschlosser · 2024-09-09T16:49:16Z

intermediate_source/char_rnn_classification_tutorial.py

-# ``all_categories`` (just a list of languages) and ``n_categories`` for
-# later reference.
+#########################
+#Now we can use that class to create a singe piece of data.


Suggested change

#Now we can use that class to create a singe piece of data.

#Now we can use that class to create a single piece of data.

jbschlosser · 2024-09-09T16:49:55Z

intermediate_source/char_rnn_classification_tutorial.py

@@ -181,21 +255,22 @@ def lineToTensor(line):
 #
 # This RNN module implements a "vanilla RNN" an is just 3 linear layers 
 # which operate on an input and hidden state, with a ``LogSoftmax`` layer 
-# after the output.
+# after the output.s


Suggested change

# after the output.s

# after the output.

…ng all RNN definition into one, moving RNN.learn() to separate train()

…ather than building it up

mgs28 · 2024-09-13T19:05:04Z

@jbschlosser - thank you for the lovely suggestions to improve. If possible, I'd like to split into two things:

First, the edits to my existing content.

Excellent point on NameData. I removed it and used helper functions.
I'm glad you like the Dataset addition - that was my prompt for doing this.
Thanks for letting me know the multiple definitions were confusing. I focused it on one simple one, particularly since training was split from the object.
I added training as a separate function. I will go learn more about those third party trainers since I haven't used them.

Secondly, I would really like to use the nn.RNN if possible. There are very few tutorials that mention them and everyone seems to drive their RNN builds off this tutorial. However to solve this task, I think I need a network with layers like [57, 128, 18] and it looks like the default Elman networks are stuck at [57, 18] with layers.

Is best practice to inherit from nn.RNN and add my own fully connected output layer or am I misunderstanding something?

Thanks!

mgs28 · 2024-09-13T20:30:09Z

To make it simpler, I assume extending the nn.RNN class might look like (which runs about 40% faster)

class MyRNN(nn.RNN):
def init(self,input_size, hidden_size, output_size):
super(MyRNN, self).init(input_size, hidden_size)

    self.h2o = nn.Linear(hidden_size, output_size)
    self.softmax = nn.LogSoftmax(dim=1)

def forward(self, line_tensor):
    # Pass the input through the RNN layers
    rnn_out, hidden = super(MyRNN, self).forward(line_tensor)
    output = self.h2o(hidden[0])
    output = self.softmax(output)
    
    return output

jbschlosser · 2024-09-16T16:17:19Z

Is best practice to inherit from nn.RNN and add my own fully connected output layer or am I misunderstanding something?

Rather than inherit, we generally encourage composition. In this case, something like:

# better name needed :)
class MyRNN(nn.Module):
    def __init__(self, ...):
        super().__init__()
        self.rnn = nn.RNN(...)
        self.h2o = nn.Linear(...)

    ...
    def forward(self, x):
        _, hidden = self.rnn(x)
        output = self.h2o(hidden[0])
        return F.log_softmax(output, dim=1)

mgs28 · 2024-09-17T01:35:20Z

Thanks @jbschlosser ! I used nn.rnn in composition and changed some of the surrounding text. That forced the addition of a few terms to the repo dictionary. I appreciate you teaching me something again and hopefully the tutorial is better for it.

jbschlosser

Looks pretty good, thanks for the updates!

The confusion matrix does look quite a bit worse than pre-changes though; can this be entirely attributed to the use of a test set? Maybe we need a bit more time for training to converge?

intermediate_source/char_rnn_classification_tutorial.py

Co-authored-by: Joel Schlosser <75754324+jbschlosser@users.noreply.github.com>

mgs28 · 2024-09-24T01:10:03Z

@jbschlosser - thanks!

The training converges around 35-40 iterations. I am happy to train until convergence or leave some room for the reader. I also left it lighter so that build time is faster for the CI bot. What would you prefer?
Comparisons with the original confusion matrix: It's train/test split + training time. If I train with

all_losses = train(rnn, alldata, n_epoch=100, learning_rate=0.1, report_every=5)

and evaluate on alldata then I get a bright diagonal line that looks pretty similar to original. I imagine with some parameter tuning then I could get closer.

jbschlosser · 2024-09-24T20:12:18Z

I also left it lighter so that build time is faster for the CI bot. What would you prefer?

It's a good point that we want to balance this some. That said, I think it'd be nice to be a little bit closer to the original (at least somewhat of a diagonal line confusion matrix). Hopefully we can strike a reasonable balance where we're beginning to see a diagonal line trend. No need to spend a ton of time parameter tuning though :) that's okay left as an exercise to the reader.

…hanged epochs, training rate, more of split to training data

mgs28 · 2024-09-25T01:40:36Z

No problem @jbschlosser - I tuned some of the parameters to get pretty close to a diagonal confusion matrix. It still gets a little confused between English and Scottish as well as Korean and Chinese. However, there's a strong diagonal in the confusion matrix. Thanks!

jbschlosser · 2024-09-26T16:07:05Z

Thanks for the updates! I'm good with it; will let @svekars make the final approval :)

mgs28 · 2024-10-17T14:22:30Z

@svekars - do you have any further comments on this update? I'm happy to address. Thanks!

svekars

An editorial pass, otherwise looks good.

svekars · 2024-10-18T19:56:04Z

intermediate_source/char_rnn_classification_tutorial.py

@@ -22,19 +22,6 @@
 of origin, and predict which language a name is from based on the
 spelling:


Suggested change

spelling:

spelling.

svekars · 2024-10-18T19:58:13Z

intermediate_source/char_rnn_classification_tutorial.py

+# line, mostly romanized (but we still need to convert from Unicode to
+# ASCII).
+#
+# The first thing we need to define and clean our data. First off, we need to convert Unicode to plain ASCII to 


Suggested change

# The first thing we need to define and clean our data. First off, we need to convert Unicode to plain ASCII to

# The first step is to define and clean our data. Initially, we need to convert Unicode to plain ASCII to

svekars · 2024-10-18T19:59:08Z

intermediate_source/char_rnn_classification_tutorial.py

+# ASCII).
+#
+# The first thing we need to define and clean our data. First off, we need to convert Unicode to plain ASCII to 
+# limit the RNN input layers. This is accomplished by converting Unicode strings to ASCII and allowing a small set of allowed characters (allowed_characters)


Suggested change

# limit the RNN input layers. This is accomplished by converting Unicode strings to ASCII and allowing a small set of allowed characters (allowed_characters)

# limit the RNN input layers. This is accomplished by converting Unicode strings to ASCII and allowing only a small set of allowed characters:

svekars · 2024-10-18T20:01:44Z

intermediate_source/char_rnn_classification_tutorial.py

+# Next, we need to combine all our examples into a dataset so we can train, text and validate our models. For this, 
+# we will use the `Dataset and DataLoader <https://pytorch.org/tutorials/beginner/basics/data_tutorial.html>` classes 
+# to hold our dataset. Each Dataset needs to implement three functions: __init__, __len__, and __getitem__. 


Suggested change

# Next, we need to combine all our examples into a dataset so we can train, text and validate our models. For this,

# we will use the `Dataset and DataLoader <https://pytorch.org/tutorials/beginner/basics/data_tutorial.html>` classes

# to hold our dataset. Each Dataset needs to implement three functions: __init__, __len__, and __getitem__.

# Next, we need to combine all our examples into a dataset so we can train text and validate our models. For this,

# we will use the `Dataset and DataLoader <https://pytorch.org/tutorials/beginner/basics/data_tutorial.html>`__ classes

# to hold our dataset. Each Dataset needs to implement three functions: ``__init__``, ``__len__``, and ``__getitem__``.

svekars · 2024-10-18T20:03:06Z

intermediate_source/char_rnn_classification_tutorial.py

+
+
+#########################
+#Here we can load our example data into the NamesDataset


Suggested change

#Here we can load our example data into the NamesDataset

#Here we can load our example data into the ``NamesDataset``:

svekars · 2024-10-18T20:14:17Z

intermediate_source/char_rnn_classification_tutorial.py

-all_losses = []
+# We do this by defining a train() function which trains on a given dataset with minibatches. RNNs 
+# train similar to other networks so for completeness we include a batched training method here.
+# The loop (for i in batch) computes the losses for each of the items in the batch before adjusting the 


Suggested change

# The loop (for i in batch) computes the losses for each of the items in the batch before adjusting the

# The loop (``for i in batch``) computes the losses for each of the items in the batch before adjusting the

svekars · 2024-10-18T20:14:27Z

intermediate_source/char_rnn_classification_tutorial.py

+# We do this by defining a train() function which trains on a given dataset with minibatches. RNNs 
+# train similar to other networks so for completeness we include a batched training method here.
+# The loop (for i in batch) computes the losses for each of the items in the batch before adjusting the 
+# weights. This is repeated until the number of epochs is reached. 


Suggested change

# weights. This is repeated until the number of epochs is reached.

# weights. This operation is repeated until the number of epochs is reached.

svekars · 2024-10-18T20:16:55Z

intermediate_source/char_rnn_classification_tutorial.py

-    s -= m * 60
-    return '%dm %ds' % (m, s)
+##########################################################################
+# We can now train a dataset with mini batches for a specified number of epochs


mini batches or minibatches? I see all three versions online: mini batch, mini-batch, minibatch. It seems like it should be one word and I'd rather not use hyphens, so "minibatch" maybe a better way.

Suggested change

# We can now train a dataset with mini batches for a specified number of epochs

# We can now train a dataset with minibatches for a specified number of epochs

svekars · 2024-10-18T20:17:42Z

intermediate_source/char_rnn_classification_tutorial.py

 # -  Get better results with a bigger and/or better shaped network
 #
-#    -  Add more linear layers
+#    -  Vary the hyperparameters to improve performance (e.g. change epochs, batch size, learning rate ) 


Suggested change

# - Vary the hyperparameters to improve performance (e.g. change epochs, batch size, learning rate )

# - Adjust the hyperparameters to enhance performance, such as changing the number of epochs, batch size, and learning rate

svekars · 2024-10-18T20:18:12Z

intermediate_source/char_rnn_classification_tutorial.py

 #    -  Try the ``nn.LSTM`` and ``nn.GRU`` layers
+#    -  Change the size of the layers (e.g. fewer or more hidden nodes, additional linear layers)


Suggested change

# - Change the size of the layers (e.g. fewer or more hidden nodes, additional linear layers)

# - Modify the size of the layers, such as increasing or decreasing the number of hidden nodes or adding additional linear layers

mgs28 added 2 commits June 23, 2024 21:30

Updating 8 year old tutorial to include DataLoader, splitting into tr…

52889cf

…ain and test sets as well as simplifying content

use label instead of category for class and remove old cmd line code-…

126f7ad

…block

Merge branch 'main' into mgs28-char-rnn-update

83612a5

facebook-github-bot added the cla signed label Jun 24, 2024

mgs28 closed this Jun 24, 2024

mgs28 reopened this Jun 24, 2024

mgs28 marked this pull request as ready for review June 24, 2024 19:08

mgs28 added 2 commits June 27, 2024 21:04

Simplify RNN class (e.g. one forward function), adding minibatches + …

b4db18d

…optimizer

Merge branch 'mgs28-char-rnn-update' of github.com:mgs28/tutorials in…

64169cf

…to mgs28-char-rnn-update

svekars and others added 3 commits July 8, 2024 13:56

Merge branch 'main' into mgs28-char-rnn-update

fc0b379

Merge branch 'main' into mgs28-char-rnn-update

b0202ec

fixing spelling errors, slight change to # of iterations to generate …

6d08a08

…a better confusion matrix

mgs28 added 2 commits July 11, 2024 18:11

decreasing training time by 97% (72s on CPU) by tuning hyper paramete…

80804ae

…rs, adding device config for CI steps, cleaning up documentatation

Merge branch 'main' into mgs28-char-rnn-update

ec6cb48

svekars reviewed Jul 15, 2024

View reviewed changes

svekars and others added 5 commits July 15, 2024 10:27

Merge branch 'main' into mgs28-char-rnn-update

07cdb7a

removing updated by

10cfcaa

Merge branch 'pytorch:main' into mgs28-char-rnn-update

ecb97ef

Merge branch 'main' into mgs28-char-rnn-update

8508a8b

Merge branch 'main' into mgs28-char-rnn-update

fb72a18

svekars added the core Tutorials of any level of difficulty related to the core pytorch functionality label Sep 4, 2024

Merge branch 'main' into mgs28-char-rnn-update

fd23ebe

jbschlosser reviewed Sep 9, 2024

View reviewed changes

mgs28 added 2 commits September 13, 2024 14:39

based on Joel's review of Sept 9th: removing NameData object, combini…

cf3de68

…ng all RNN definition into one, moving RNN.learn() to separate train()

fixing training description to focus on the single train() function r…

3d2bb57

…ather than building it up

mgs28 added 2 commits September 16, 2024 21:33

updating tutorial to use nn.rnn in composition

e08b4e3

Merge branch 'main' into mgs28-char-rnn-update

d9d81a1

jbschlosser reviewed Sep 23, 2024

View reviewed changes

intermediate_source/char_rnn_classification_tutorial.py Outdated Show resolved Hide resolved

mgs28 and others added 2 commits September 23, 2024 20:32

Update intermediate_source/char_rnn_classification_tutorial.py

8258deb

Co-authored-by: Joel Schlosser <75754324+jbschlosser@users.noreply.github.com>

Merge branch 'main' into mgs28-char-rnn-update

d7cfb5d

tuning the results to show more of a diagonal on confusion matrix.. C…

3f11bc1

…hanged epochs, training rate, more of split to training data

Merge branch 'main' into mgs28-char-rnn-update

76b5f55

Merge branch 'main' into mgs28-char-rnn-update

5709eeb

svekars reviewed Oct 18, 2024

View reviewed changes

	# which will have an __init__ function to specify the input fields and some helper functions. Our first
	# which will have an ``__init__`` function to specify the input fields and some helper functions. Our first

	# helper function will be __str__ to convert objects to strings for easy printing
	# helper function will be ``__str__`` to convert objects to strings for easy printing

	#Now we can use that class to create a singe piece of data.
	#Now we can use that class to create a single piece of data.

		@@ -22,19 +22,6 @@
		of origin, and predict which language a name is from based on the
		spelling:

	# The first thing we need to define and clean our data. First off, we need to convert Unicode to plain ASCII to
	# The first step is to define and clean our data. Initially, we need to convert Unicode to plain ASCII to

	# limit the RNN input layers. This is accomplished by converting Unicode strings to ASCII and allowing a small set of allowed characters (allowed_characters)
	# limit the RNN input layers. This is accomplished by converting Unicode strings to ASCII and allowing only a small set of allowed characters:



		#########################
		#Here we can load our example data into the NamesDataset

	# The loop (for i in batch) computes the losses for each of the items in the batch before adjusting the
	# The loop (``for i in batch``) computes the losses for each of the items in the batch before adjusting the

	# weights. This is repeated until the number of epochs is reached.
	# weights. This operation is repeated until the number of epochs is reached.

	# We can now train a dataset with mini batches for a specified number of epochs
	# We can now train a dataset with minibatches for a specified number of epochs

	# - Vary the hyperparameters to improve performance (e.g. change epochs, batch size, learning rate )
	# - Adjust the hyperparameters to enhance performance, such as changing the number of epochs, batch size, and learning rate

		# - Try the ``nn.LSTM`` and ``nn.GRU`` layers
		# - Change the size of the layers (e.g. fewer or more hidden nodes, additional linear layers)

	# - Change the size of the layers (e.g. fewer or more hidden nodes, additional linear layers)
	# - Modify the size of the layers, such as increasing or decreasing the number of hidden nodes or adding additional linear layers

Updating "Classifying Names with a Character-Level RNN" #2954

Are you sure you want to change the base?

Updating "Classifying Names with a Character-Level RNN" #2954

Conversation

mgs28 commented Jun 24, 2024 • edited by pytorch-bot bot Loading

Description

Checklist

pytorch-bot bot commented Jun 24, 2024 • edited Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/tutorials/2954

✅ No Failures

facebook-github-bot commented Jun 24, 2024

Action Required

Process

mgs28 commented Jun 26, 2024

mgs28 commented Jun 28, 2024 • edited Loading

mgs28 commented Jul 8, 2024

svekars commented Jul 8, 2024

mgs28 commented Jul 10, 2024

mgs28 commented Jul 11, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jbschlosser left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mgs28 commented Sep 13, 2024

mgs28 commented Sep 13, 2024

jbschlosser commented Sep 16, 2024 • edited Loading

mgs28 commented Sep 17, 2024 • edited Loading

jbschlosser left a comment

Choose a reason for hiding this comment

mgs28 commented Sep 24, 2024

jbschlosser commented Sep 24, 2024

mgs28 commented Sep 25, 2024

jbschlosser commented Sep 26, 2024

mgs28 commented Oct 17, 2024

svekars left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mgs28 commented Jun 24, 2024 •

edited by pytorch-bot bot

Loading

pytorch-bot bot commented Jun 24, 2024 •

edited

Loading

mgs28 commented Jun 28, 2024 •

edited

Loading

jbschlosser commented Sep 16, 2024 •

edited

Loading

mgs28 commented Sep 17, 2024 •

edited

Loading