-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
translate.py ignores sentences in src when tgt not provided #1317
Comments
Good catch but:
|
|
fdalvi
added a commit
to fdalvi/OpenNMT-py
that referenced
this issue
Feb 21, 2019
`translate.py` would ignore all sentences after processing `shard_size*shard_size` sentences in `src`, if no `tgt` file was provided. This commit fixes this.
vince62s
pushed a commit
that referenced
this issue
Feb 24, 2019
ItaySofer
pushed a commit
to ItaySofer/OpenNMT-py
that referenced
this issue
Mar 17, 2019
`translate.py` would ignore all sentences after processing `shard_size*shard_size` sentences in `src`, if no `tgt` file was provided. This commit fixes this.
ItaySofer
pushed a commit
to ItaySofer/OpenNMT-py
that referenced
this issue
Mar 17, 2019
`translate.py` would ignore all sentences after processing `shard_size*shard_size` sentences in `src`, if no `tgt` file was provided. This commit fixes this.
goncalomcorreia
pushed a commit
to goncalomcorreia/open-nmt
that referenced
this issue
Apr 17, 2019
`translate.py` would ignore all sentences after processing `shard_size*shard_size` sentences in `src`, if no `tgt` file was provided. This commit fixes this.
pryo
pushed a commit
to pryo/openNMT
that referenced
this issue
Aug 15, 2019
`translate.py` would ignore all sentences after processing `shard_size*shard_size` sentences in `src`, if no `tgt` file was provided. This commit fixes this.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
translate.py
ignores all sentences after processingshard_size*shard_size
sentences insrc
(if notgt
file is provided). The reason for this lies inOpenNMT-py/translate.py
Lines 19 to 20 in 9865443
Essentially, if no
tgt
is provided,tgt_shards
becomes a list ofNone
's of sizeshard_size
, whilesrc_shards
is a generator that will generatenum_lines/shard_size
elements. Whennum_lines/shard_size
becomes greater thanshard_size
, the rest of the elements insrc_shards
are ignored, sincetgt_shards
ends prematurely. An example might make this clearer:shard_size
=2
src
:tgt
:None
In this case, the following shards are computed
OpenNMT-py/translate.py
Lines 18 to 21 in 9865443
src_shards
:generator([[a,b],[c,d],[e,f]])
tgt_shards
:[None, None]
shard_pairs
:zip(src_shards, tgt_shards)
==>[ ([a,b], None), ([c,d], None) ]
[e,f]
is completely ignored, since there is no corresponding element tozip
intgt_shards
.The bug is that
tgt_shards
should be computed usingnum_shards
and notshard_size
, but since we don't read the entire file, we don't know whatnum_shards
is at this point.A potential solution is that when
tgt
isNone
,tgt_shards
becomes an infiniteNone
generator, in which case thezip
will be limited by number of source shards, which is what we want.If this all makes sense, I can send in a PR. Happy to clarify and/or discuss something I might have missed!
The text was updated successfully, but these errors were encountered: