Skip to content

Support decoding with ranges to avoid a copy #54

Open
@jerinphilip

Description

@jerinphilip

For browsermt/bergamot-translator#202, it was suggested that a view for a vector/range backed by a binary representaion be substituted to eliminate conversions between vector and binary representation.

However, decodeWithByteRanges assumes a vector to be provided:

void decodeWithByteRanges(const Words& sentence,
std::string &decoded,
std::vector<string_view> &byteRanges,
bool ignoreEOS) const override {
sentencepiece::SentencePieceText spt;
std::vector<int> spmSentence;
spmSentence.reserve(sentence.size());
for(auto&& word : sentence)
spmSentence.push_back(word.toWordIndex());
spm_->Decode(spmSentence, &spt);
decoded = spt.text(); // Creates copy of string.
string_view decoded_view(decoded);
for(auto piece : spt.pieces()) {
string_view byteRange = decoded_view.substr(piece.begin(), piece.end() - piece.begin());
byteRanges.push_back(byteRange);
}
if(ignoreEOS){
byteRanges.pop_back();
}
}

We need to provide something that can work with ranges/iterators instead to avoid the additional copy. Since this is a function only used by bergamot we need not worry about breaking any backwards compatibility. Consistency with the remaining may still be a concern, in which case we can provide an overload which internally calls the range-based method.

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions