Skip to content

Commit

Permalink
flake fixes
Browse files Browse the repository at this point in the history
Merge branch 'Ubiqus-master' into refactoring2
  • Loading branch information
vince62s committed Jun 24, 2018
2 parents f512ea5 + 7ebf112 commit dfb44b0
Show file tree
Hide file tree
Showing 57 changed files with 4,127 additions and 259 deletions.
1 change: 1 addition & 0 deletions .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -81,3 +81,4 @@ matrix:
- cd docs/; make html; cd ..
- set -e
- doctr deploy --built-docs docs/build/html/ .

2 changes: 1 addition & 1 deletion LICENSE.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
MIT License

Copyright (c) 2017 OpenNMT
Copyright (c) 2017-Present OpenNMT

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
Expand Down
3 changes: 2 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -47,9 +47,9 @@ The following OpenNMT features are implemented:
- [Speech-to-text processing](http://opennmt.net/OpenNMT-py/speech2text.html)
- ["Attention is all you need"](http://opennmt.net/OpenNMT-py/FAQ.html#how-do-i-use-the-transformer-model)
- Inference time loss functions.
- multi-GPU (Torch Distributed)

Beta Features (committed):
- multi-GPU
- Structured attention
- [Conv2Conv convolution model]
- SRU "RNNs faster than CNN" paper
Expand Down Expand Up @@ -138,3 +138,4 @@ http://opennmt.net/Models-py/
doi = {10.18653/v1/P17-4012}
}
```

5 changes: 2 additions & 3 deletions tools/average_models.py → average_models.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,6 @@
def average_models(model_files):
vocab = None
opt = None
epoch = None
avg_model = None
avg_generator = None

Expand All @@ -16,7 +15,7 @@ def average_models(model_files):
generator_weights = m['generator']

if i == 0:
vocab, opt, epoch = m['vocab'], m['opt'], m['epoch']
vocab, opt = m['vocab'], m['opt']
avg_model = model_weights
avg_generator = generator_weights
else:
Expand All @@ -26,7 +25,7 @@ def average_models(model_files):
for (k, v) in avg_generator.items():
avg_generator[k].mul_(i).add_(generator_weights[k]).div_(i + 1)

final = {"vocab": vocab, "opt": opt, "epoch": epoch, "optim": None,
final = {"vocab": vocab, "opt": opt, "optim": None,
"generator": avg_generator, "model": avg_model}
return final

Expand Down
3 changes: 2 additions & 1 deletion data/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,4 +4,5 @@

> python preprocess.py -train_src data/src-train.txt -train_tgt data/tgt-train.txt -valid_src data/src-val.txt -valid_tgt data/tgt-val.txt -save_data data/data -src_vocab_size 1000 -tgt_vocab_size 1000
> python train.py -data data/data -save_model /n/rush_lab/data/tmp_ -gpuid 0 -rnn_size 100 -word_vec_size 50 -layers 1 -epochs 10 -optim adam -learning_rate 0.001
> python train.py -data data/data -save_model /n/rush_lab/data/tmp_ -gpuid 0 -rnn_size 100 -word_vec_size 50 -layers 1 \
-train_steps 10000 -optim adam -learning_rate 0.001
3 changes: 2 additions & 1 deletion docs/source/FAQ.md
Original file line number Diff line number Diff line change
Expand Up @@ -72,7 +72,8 @@ setup. We have confirmed the following command can replicate their WMT results.
python train.py -data /tmp/de2/data -save_model /tmp/extra -gpuid 1 \
-layers 6 -rnn_size 512 -word_vec_size 512 \
-encoder_type transformer -decoder_type transformer -position_encoding \
-epochs 50 -max_generator_batches 32 -dropout 0.1 \
-train_steps 100000 -valid_steps 10000 -save_checkpoint_steps 5000 \
-max_generator_batches 32 -dropout 0.1 \
-batch_size 4096 -batch_type tokens -normalization tokens -accum_count 4 \
-optim adam -adam_beta2 0.998 -decay_method noam -warmup_steps 8000 -learning_rate 2 \
-max_grad_norm 0 -param_init 0 -param_init_glorot \
Expand Down
26 changes: 13 additions & 13 deletions docs/source/options/train.md
Original file line number Diff line number Diff line change
Expand Up @@ -113,8 +113,8 @@ Lambda value for coverage.
Path prefix to the ".train.pt" and ".valid.pt" file path from preprocess.py

* **-save_model [model]**
Model filename (the model will be saved as <save_model>_epochN_PPL.pt where PPL
is the validation perplexity
Model filename (the model will be saved as <save_model>_step_N.pt where N
is the current step

* **-gpuid []**
Use CUDA on the listed devices.
Expand All @@ -123,9 +123,6 @@ Use CUDA on the listed devices.
Random seed used for the experiments reproducibility.

### **Initialization**:
* **-start_epoch [1]**
The epoch from which to start

* **-param_init [0.1]**
Parameters are initialized over uniform distribution with support (-param_init,
param_init). Use 0 to not use initialization
Expand Down Expand Up @@ -173,8 +170,8 @@ Maximum batch size for validation
Maximum batches of words in a sequence to run the generator on in parallel.
Higher is faster, but uses more memory.

* **-epochs [13]**
Number of training epochs
* **-train_steps [1]**
The number of steps to train

* **-optim [sgd]**
Optimization method.
Expand Down Expand Up @@ -214,19 +211,22 @@ smoothed by epsilon / (vocab_size - 1). Set to zero to turn off label smoothing.
For more detailed information, see: https://arxiv.org/abs/1512.00567

### **Optimization- Rate**:
* **-valid_steps [8]**
Run a validation each X steps

* **-learning_rate [1.0]**
Starting learning rate. Recommended settings: sgd = 1, adagrad = 0.1, adadelta =
1, adam = 0.001

* **-learning_rate_decay [0.5]**
If update_learning_rate, decay learning rate by this much if (i) perplexity does
not decrease on the validation set or (ii) epoch has gone past start_decay_at
If update_learning_rate, decay learning rate by this much if step
has gone past start_decay_steps

* **-start_decay_at [8]**
Start decaying every epoch after and including this epoch
* **-start_decay_steps [8]**
Start decaying after these steps

* **-start_checkpoint_at []**
Start checkpointing every epoch after and including this epoch
* **-decay_steps []**
Decay each this number of steps

* **-decay_method []**
Use a custom decay rate.
Expand Down
14 changes: 13 additions & 1 deletion onmt/__init__.py
Original file line number Diff line number Diff line change
@@ -1,8 +1,20 @@
""" Main entry point of the ONMT library """
from __future__ import division, print_function

import onmt.inputters
import onmt.encoders
import onmt.decoders
import onmt.models
import onmt.utils
import onmt.modules
from onmt.trainer import Trainer
import sys
import onmt.utils.optimizers
onmt.utils.optimizers.Optim = onmt.utils.optimizers.Optimizer
sys.modules["onmt.Optim"] = onmt.utils.optimizers

__all__ = ["Trainer"]
# For Flake
__all__ = [onmt.inputters, onmt.encoders, onmt.decoders, onmt.models,
onmt.utils, onmt.modules, "Trainer"]

__version__ = "0.4.0"
9 changes: 6 additions & 3 deletions onmt/decoders/cnn_decoder.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
"""
Implementation of the CNN Decoder part of
"Convolutional Sequence to Sequence Learning"
"Convolutional Sequence to Sequence Learning"
"""
import torch
import torch.nn as nn
Expand Down Expand Up @@ -92,7 +92,9 @@ def forward(self, tgt, memory_bank, state, memory_lengths=None):
x = linear_out.view(tgt_emb.size(0), tgt_emb.size(1), -1)
x = shape_transform(x)

pad = torch.zeros(x.size(0), x.size(1), self.cnn_kernel_width - 1, 1)
pad = torch.zeros(x.size(0), x.size(1),
self.cnn_kernel_width - 1, 1)

pad = pad.type_as(x)
base_target_emb = x

Expand Down Expand Up @@ -151,4 +153,5 @@ def update_state(self, new_input):

def repeat_beam_size_times(self, beam_size):
""" Repeat beam_size times along batch dimension. """
self.init_src = self.init_src.data.repeat(1, beam_size, 1)
self.init_src = torch.tensor(
self.init_src.data.repeat(1, beam_size, 1), requires_grad=False)
20 changes: 7 additions & 13 deletions onmt/decoders/decoder.py
Original file line number Diff line number Diff line change
Expand Up @@ -161,9 +161,8 @@ def _fix_enc_hidden(hidden):
# The encoder hidden is (layers*directions) x batch x dim.
# We need to convert it to layers x batch x (directions*dim).
if self.bidirectional_encoder:
hidden = torch.cat(
[hidden[0:hidden.size(0):2],
hidden[1:hidden.size(0):2]], 2)
hidden = torch.cat([hidden[0:hidden.size(0):2],
hidden[1:hidden.size(0):2]], 2)
return hidden

if isinstance(encoder_final, tuple): # LSTM
Expand Down Expand Up @@ -384,12 +383,6 @@ class DecoderState(object):
Modules need to implement this to utilize beam search decoding.
"""
# def detach(self):
# """ Need to document this VN """
# for h in self._all:
# if h is not None:
# h.detach_()

def detach(self):
""" Need to document this """
self.hidden = tuple([_.detach() for _ in self.hidden])
Expand Down Expand Up @@ -450,7 +443,8 @@ def update_state(self, rnnstate, input_feed, coverage):

def repeat_beam_size_times(self, beam_size):
""" Repeat beam_size times along batch dimension. """
vars = [e.data.repeat(1, beam_size, 1)
for e in self._all]
self.hidden = tuple(vars[:-1])
self.input_feed = vars[-1]
copy_vars = [torch.tensor(e.data.repeat(1, beam_size, 1),
requires_grad=False)
for e in self._all]
self.hidden = tuple(copy_vars[:-1])
self.input_feed = copy_vars[-1]
Loading

0 comments on commit dfb44b0

Please sign in to comment.