Closed
Description
Hi all,
In a traditional batches of sentences setting, at each decoder time-step, it would attend to the encoder states of words in a sentence with regards to the probability distribution learnt by the copy mechanism. I'm confused that if we batch by tokens, and a sentence is actually split into 2 batches(due to chance), wouldn't it affect the decoder from attending to the encoded states from the first batch?
Example sentence: The light show at Jewel Changi has just ended.
Batch 1:
The light show at Jewel
Batch 2:
Changi has just ended.
How could the decoder attend to the encoded states from 2 separate batches?
Thanks all!
Metadata
Metadata
Assignees
Labels
No labels