Question about copy attention and batching by tokens

Hi all,
In a traditional batches of sentences setting, at each decoder time-step, it would attend to the encoder states of words in a sentence with regards to the probability distribution learnt by the copy mechanism. I'm confused that if we batch by tokens, and a sentence is actually split into 2 batches(due to chance), wouldn't it affect the decoder from attending to the encoded states from the first batch?

Example sentence: The light show at Jewel Changi has just ended.
Batch 1:
The light show at Jewel
Batch 2:
Changi has just ended.

How could the decoder attend to the encoded states from 2 separate batches?

Thanks all!


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about copy attention and batching by tokens #1474

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Question about copy attention and batching by tokens #1474

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions