-
Notifications
You must be signed in to change notification settings - Fork 703
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
"Competitive Example" (2.5): Sequence-to-sequence #1391
base: master
Are you sure you want to change the base?
Conversation
I think I find a bug in my code. I will fix it and submit the commit very soon. |
Thanks! Is this fixed and ready for review? |
@neubig Yes. The bug is fixed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First, thanks for contributing! I added a few comments.
Also, there are ".DS_store" files that need to be removed.
|
||
## Usage (Dynet) | ||
|
||
The architecture of the dynet model `seq2seq_dynet.py` is the same as that in [PyTorch Example](https://pytorch.org/tutorials/intermediate/seq2seq_translation_tutorial.html). We here implement the attention mechanism in the model. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dynet -> Dynet
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Modified.
|
||
The architecture of the dynet model is shown as follows. | ||
|
||
```python |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this should be included in README.md
, because if we make changes to the actual code they might get out of sync. Let's keep the code in .py
files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Deleted the architecture part in README.md.
|
||
Then, run the training: | ||
|
||
<pre> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of using <pre>
tags, can you just use 4 spaces of indent?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Modified.
|
||
| Time (D) | Iteration (D) | Loss (D) | Time (P) | Iteration (P) | Loss (P)| | ||
| --- | --- | --- | --- | --- | --- | | ||
| 0m 28s | 5000 5% | 3.2687 | 1m 30s | 5000 5% | 2.8794 | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The loss of DyNet and PyTorch are quite different, even at the start. Maybe this is due to different initialization of the parameters? Could you try to match the parameter initialization strategies between the two toolkits and see if they come more in line? Because PyTorch is better, you could try to match the PyTorch initialization strategy with DyNet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
3.2687 and 2.8794 are actually the loss after 5000 iterations for both Dynet and Pytorch. I listed the loss at the start of the training procedure for both Dynet example and PyTorch example. The initial loss of DyNet is 7.9808, and the initial loss of PyTorch is 7.9615.
Hi @gpengzhi , thanks for the PR! Do you think you'll have time to fix the lint errors? |
Hi, @pmichel31415 I have already fixed the lint errors. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the great contribution @gpengzhi ! I have a few comments here and there can you take a look? The main points of contention are:
- Big discrepancy in loss between dynet/pytorch, can you check that the hyperparameters are as close as possible in both cases (eg the learning rate is different)
- Add docstrings so that people can navigate the code more easily if they want to reuse it
- Change function names to snake_case
| 8m 12s | 85000 85% | 0.7621 | 21m 44s | 85000 85% | 0.5210 | | ||
| 8m 41s | 90000 90% | 0.7453 | 22m 55s | 90000 90% | 0.5054 | | ||
| 9m 10s | 95000 95% | 0.6795 | 24m 9s | 95000 95% | 0.4417 | | ||
| 9m 39s | 100000 100% | 0.6442 | 25m 24s | 100000 100% | 0.4297 | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This big difference is still weird to me. Can we make these numbers similar so that dynet is competitive with pytorch?
For instance I've noticed that there are some differences in the optimization procedure (dynet uses learning rate 0.2 vs pytorch has learning rate 0.01 if I'm not mistaken).
Also do you have any intuition on why there is such a big speed difference? Do you think it is due to your setup or the differences in implementation?
|
||
Here is the comparison between Dynet and PyTorch on the [seq2seq translator example](https://pytorch.org/tutorials/intermediate/seq2seq_translation_tutorial.html). | ||
|
||
The data we used is a set of many thousands of English to French translation pairs. Download the data from [here](https://download.pytorch.org/tutorial/data.zip) and extract it to the current directory. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you credit the original source of the data like in the pytorch tutorial: https://pytorch.org/tutorials/intermediate/seq2seq_translation_tutorial.html#loading-data-files
|
||
Format: | ||
|
||
<pre> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
<pre>
-> ```
|
||
> vous etes tellement belle dans cette robe ! | ||
= you re so beautiful in that dress . | ||
< you re so beautiful in that dress . <EOS> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not have the same examples for pytorch and dynet? For comparison.
EOS_token = 1 | ||
|
||
|
||
class Lang(object): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a small docstring for each class/method/function? So that it is easier for people who want to reuse the code?
self.index2word = {0: "SOS", 1: "EOS"} | ||
self.n_words = 2 | ||
|
||
def addSentence(self, sentence): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you use snake_case for function/method/variable names (but not class names) to be consistent with the dynet API. You should probably do it for the pytorch example as well.
EOS_token = 1 | ||
|
||
|
||
class Lang: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you put this class and all the other utilities that are not dependent on the framework in a separate file like utils.py
? To avoid code duplication
import dynet as dy | ||
import time | ||
import math | ||
r = random.SystemRandom() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you fix the random seed (for reproducibility). Please also fix the dynet/pytorch random seeds as well
|
||
for _ in range(n): | ||
pair = r.choice(pairs) | ||
print('>', pair[0]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a very nitpicky comment but can you use "
or '
consistently everywhere for strings? A simple search and replace should fix it. I'm partial to "
.
Here is a comparison between Dynet and PyTorch on the seq2seq translator example used in PyTorch tutorial (https://pytorch.org/tutorials/intermediate/seq2seq_translation_tutorial.html).