Skip to content

Adding an example on handwriting recognition #594

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
Aug 25, 2021
Merged

Adding an example on handwriting recognition #594

merged 11 commits into from
Aug 25, 2021

Conversation

sayakpaul
Copy link
Contributor

A Colab Notebook is available here.

Cc: @AakashKumarNain

Copy link
Collaborator

@haifeng-jin haifeng-jin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your contribution!

[IAM Dataset](https://fki.tic.heia-fr.ch/databases/iam-handwriting-database)
that has variable length ground-truths. IAM Dataset is widely used across many OCR
benchmarks so we hope this example serves as a good starting point.
"""
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a little more description of the dataset, like each sample in the dataset is an image of hand-written sentences, the prediction target of a sample is a string?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

"""
## Introduction

This example shows how the [Captcha OCR](https://keras.io/examples/vision/captcha_ocr/)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May add one sentence introduction to what is OCR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is required since it's a sequel example.

"""


class CTCLayer(keras.layers.Layer):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have to make the loss a Layer instead of a loss function or Loss subclass?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Our idea with this example is to keep it as close to the Captcha OCR example as possible while showing the bits that need to be changed. IMO, that helps to ensure a good reading experience. Ccing @AakashKumarNain if he has other points of view.

Copy link
Contributor

@AakashKumarNain AakashKumarNain Aug 19, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When defining a normal loss function, we generally assume that the shape of y_true and y_pred are same which isn't the case here. Defining it as an endpoint layer makes it easy to calculate the length of inputs and the labels on the fly during training, and then pass the same to the ctc_batch_cost(...)

Copy link
Collaborator

@haifeng-jin haifeng-jin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the updates! LGTM.

@sayakpaul
Copy link
Contributor Author

@haifeng-jin,
@AakashKumarNain and I may have another component to add to this tutorial. We are working on that. So, please expect a bit of delay as we finalize things.

@haifeng-jin
Copy link
Collaborator

@sayakpaul Sure, just ping me when it's ready. Thank you!

@sayakpaul
Copy link
Contributor Author

sayakpaul commented Aug 21, 2021

@haifeng-jin now, it's good to go for another round of review. We incorporated Edit Distance as an evaluation metric.

Copy link
Collaborator

@haifeng-jin haifeng-jin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, Thanks for the update!

Copy link
Contributor

@fchollet fchollet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome example! Very strong follow-up to the original OCR example.

I added a few minor copyedits. Please add the generated files. Thanks @haifeng-jin for the review.

@sayakpaul
Copy link
Contributor Author

@fchollet added the generated files. The only major change is the reduced number of epochs because my system was getting stalled after 10 epochs. Tried on a commodity GCP Notebook instance too but didn't help much.

Cc: @AakashKumarNain

@AakashKumarNain
Copy link
Contributor

Thanks for the review @haifeng-jin @fchollet .

@sayakpaul should we run this on a bigger VM (only if it is required)?

@sayakpaul
Copy link
Contributor Author

sayakpaul commented Aug 25, 2021

I don't think that's required given that we have explicitly noted the number of epochs should be at least 50. However, if you want to do it go right ahead.

@fchollet
Copy link
Contributor

Yes, I think the restriction should be fine. Thank you!

@fchollet fchollet merged commit 4891e16 into keras-team:master Aug 25, 2021
@fchollet
Copy link
Contributor

@sayakpaul @AakashKumarNain there were only 2 image files included in the PR, but the example is intended to have 3 figures (it's missing the last one). Please add the missing figure (in a new PR)

@sayakpaul
Copy link
Contributor Author

but the example is intended to have 3 figures (it's missing the last one). Please add the missing figure (in a new PR)

@fchollet it actually plots two figures and uses markdown for the other one. See here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants