Skip to content

Releases: OpenNMT/CTranslate2

CTranslate2 3.1.0

29 Nov 11:24
Compare
Choose a tag to compare

Changes

  • The input prompt is no longer included in the result of Whisper.generate as it is usually not useful in a transcription loop
  • The default beam size in Whisper.generate is updated from 1 to 5 to match the default value in openai/whisper
  • Generation options min_length and no_repeat_ngram_size now penalize the logits instead of the log probs which may change some scores
  • Raise a deprecation warning when reading the TranslationResult object as a list of dictionaries

New features

  • Allow configuring the C++ logs from Python with the function ctranslate2.set_log_level
  • Implement the timestamp decoding rules when the Whisper prompt does not include the token <|notimestamps|>
  • Add option return_no_speech_prob to the method Whisper.generate for the result to include the probability of the no speech token

Fixes and improvements

  • Improve performance of the Whisper model when generating with a context
  • Fix timestamp tokens in the Whisper vocabulary to use the correct format (<|X.XX|>)
  • Fix AVX and NEON log functions to return -inf on log(0) instead of NaN
  • When info logs are enabled, log the system configuration only when the first model is loaded and not immediately when the library is loaded
  • Define a LogitsProcessor abstract class to apply arbitrary updates to the logits during decoding
  • Update oneDNN to 2.7.2

CTranslate2 3.0.2

14 Nov 16:01
Compare
Choose a tag to compare

Fixes and improvements

  • Whisper: fix generate arguments that were not correctly passed to the model

CTranslate2 3.0.1

10 Nov 15:30
Compare
Choose a tag to compare

Fixes and improvements

  • Whisper: do not implicitly add <|startoftranscript|> in generate since it is not always the first token

CTranslate2 3.0.0

07 Nov 14:44
Compare
Choose a tag to compare

This major version integrates the Whisper speech recognition model published by OpenAI. It also introduces some breaking changes to remove deprecated usages and simplify some modules.

Breaking changes

General

  • Remove option normalize_scores: the scores are now always divided by pow(length, length_penalty) with length_penalty defaulting to 1
  • Remove option allow_early_exit: the beam search now exits early only when no penalties are used

Python

  • Rename some classes:
    • OpenNMTTFConverterV2 -> OpenNMTTFConverter
    • TranslationStats -> ExecutionStats
  • Remove compatibility for reading ScoringResult as a list of scores: the scores can be accessed with the attribute log_probs
  • Remove compatibility for reading ExecutionStats as a tuple
  • Remove support for deprecated Python version 3.6

CLI

  • Rename the client executable translate to a more specific name ct2-translator

C++

  • Rename or remove some classes and methods:
    • TranslationStats -> ExecutionStats
    • GeneratorPool -> Generator
    • TranslatorPool -> Translator
    • TranslatorPool::consume_* -> Translator::translate_*
    • TranslatorPool::consume_stream -> removed
    • TranslatorPool::score_stream -> removed
  • Remove support for building with CUDA 10

New features

  • Integrate the Whisper speech recognition model published by OpenAI
  • Support conversion of models trained with OpenNMT-py V3
  • Add method Generator.forward_batch to get the full model output for a batch of sequences
  • Add Python class StorageView to expose C++ methods taking or returning N-dimensional arrays: the class implements the array interface for interoperability with Numpy and PyTorch
  • Add a new configuration file config.json in the model directory that contains non structual model parameters (e.g. related to the input, the vocabulary, etc.)
  • Implement the Conv1D layer and operator on CPU and GPU (using oneDNN and cuDNN respectively)
  • [C++] Allow registration of external models with models::ModelFactory

Fixes and improvements

  • Fix conversion of models that use biases only for some QKV projections but not for all
  • Fuse masking of the output log probs by aggregating disabled tokens from all related options: disable_unk, min_length, no_repeat_ngram_size, etc.
  • Reduce the layer norm epsilon value on GPU to 1e-5 to match the default value in PyTorch
  • Move some Transformer model attributes under the encoder/decoder scopes to simplify loading
  • Redesign the ReplicaPool base class to simplify adding new classes with multiple model workers
  • Compile the library with C++17
  • Update oneDNN to 2.7.1
  • Update oneMKL to 2022.2
  • Update pybind11 to 2.10.1
  • Update cibuildwheel to 2.11.2

CTranslate2 2.24.0

03 Oct 16:36
Compare
Choose a tag to compare

Changes

  • The Linux binaries now use the GNU OpenMP runtime instead of Intel OpenMP to workaround an initialization error on systems without /dev/shm

Fixes and improvements

  • Fix a memory error when running random sampling on GPU
  • Optimize the model loading on multiple GPUs by copying the finalized model weights instead of reading the model from disk multiple times
  • In the methods Translator.translate_iterable and Translator.score_iterable, raise an error if the input iterables don't have the same length
  • Fix some compilation warnings

CTranslate2 2.23.0

16 Sep 10:41
Compare
Choose a tag to compare

New features

  • Build wheels for Python 3.11

Fixes and improvements

  • In beam search, get more candidates from the model output and replace finished hypotheses by these additional candidates
  • Fix possibly incorrect attention vectors returned from the beam search
  • Fix coverage penalty that was actually not applied
  • Fix crash when the beam size is larger than the vocabulary size
  • Add missing compilation flag -fvisibility=hidden when building the Python module
  • Update oneDNN to 2.6.2
  • Update OpenBLAS to 0.3.21

CTranslate2 2.22.0

02 Sep 13:18
Compare
Choose a tag to compare

Changes

  • score_batch methods now return a list of ScoringResult instances instead of plain lists of probabilities. In most cases you should not need to update your code: the result object implements the methods __len__, __iter__, and __getitem__ so that it can still be used as a list.

New features

  • Add methods to efficiently process long iterables:
    • Translator.translate_iterable
    • Translator.score_iterable
    • Generator.generate_iterable
    • Generator.score_iterable
  • Add decoding option min_alternative_expansion_prob to filter out unlikely alternatives in return_alternatives mode
  • Return ScoringResult instances from score_batch to include additional outputs. The current attributes are:
    • tokens: the list of tokens that were actually scored (including special tokens)
    • log_probs: the log probability of each scored token
  • Support running score_batch asynchronously by setting the asynchronous flag

Fixes and improvements

  • Fix possibly incorrect results when using disable_unk or use_vmap with one of the following options:
    • min_decoding_length
    • no_repeat_ngram_size
    • prefix_bias_beta
    • repetition_penalty
  • Also pad the output layer during scoring to enable Tensor Cores
  • Improve the correctness of the model output probabilities when the output layer is padded
  • Skip translation when the NLLB input is empty (i.e. when the input only contains EOS and the language token)

CTranslate2 2.21.1

29 Jul 17:49
Compare
Choose a tag to compare

Fixes and improvements

  • Fix conversion of NLLB models when tokenizer_class is missing from the configuration

CTranslate2 2.21.0

27 Jul 15:11
Compare
Choose a tag to compare

New features

  • Support NLLB multilingual models via the Transformers converter
  • Support Pegasus summarization models via the Transformers converter

Fixes and improvements

  • Do not stop decoding when the EOS token is coming from the user input: this is required by some text generation models like microsoft/DialoGPT where EOS is used as a separator
  • Fix conversion error for language models trained with OpenNMT-py
  • Fix conversion of models that are not using bias terms in the multi-head attention
  • Fix data type error when enabling the translation options return_alternatives and return_attention with a float16 model
  • Improve CPU performance of language models quantized to int8
  • Implement a new vectorized GELU operator on CPU
  • Raise a more explicit error when trying to convert a unsupported Fairseq model
  • Update pybind11 to 2.10.0

CTranslate2 2.20.0

06 Jul 16:58
Compare
Choose a tag to compare

New features

  • Generation option no_repeat_ngram_size to prevent the repetitions of N-grams with a minimum size

Fixes and improvements

  • Fix conversion of OpenNMT-tf models that use static position embeddings
  • Fix a segmentation fault in return_alternatives mode when the target prefix is longer than max_decoding_length
  • Fix inconsistent state of asynchronous results in Python when a runtime exception is raised
  • Remove <pad> token when converting MarianMT models from Transformers: this token is only used to start the decoder from a zero embedding, but it is not included in the original Marian model
  • Optimize CPU kernels with vectorized reduction of accumulated values
  • Do not modify the configuration passed to OpenNMTTFConverterV2.from_config
  • Improve Python classes documentation by listing members at the top