Releases · OpenNMT/CTranslate2

Take into account the "generation_config.json" and fix "lang_ids" getter for Whisper converter
Accept callback even on "generate_tokens" method
Fix iomp5 linking with latest Intel OpenAPI on Ubuntu
Fixed "decoder_start_token_id" for T5

Assets 2

09 Nov 16:45

vince62s

v3.21.0

1e37b52

CTranslate2 3.21.0

New features

Minimal Support for Mistral (Loader and Rotary extension for long sequence). No sliding yet
Support Distil-Whisper
Support Whisper-large-v3

Assets 2

18 Sep 16:13

guillaumekln

v3.20.0

2203ad5

CTranslate2 3.20.0

New features

Update the Transformers converter to support more model architectures:
- MixFormerSequential (used by microsoft/phi-1_5)
Accept batch inputs in methods generate_tokens
Add method Generator.async_generate_tokens to return an asynchronous generator compatible with asyncio

Fixes and improvements

Remove the epsilon value in the softmax CPU kernel for consistency with other implementations
Optimize implementation of the Dynamic Time Wrapping (DTW) function (used for Whisper alignment)
Avoid an unnecessary copy of the input arguments in method Whisper::align

Assets 2

31 Aug 14:36

guillaumekln

v3.19.0

b4f2861

CTranslate2 3.19.0

Changes

Binary wheels for Python 3.7 are no longer built

New features

Build wheels for Python 3.12
Update the Transformers converter to support more model architectures:
- Falcon-RW
- DistilBERT
- Llama with linear RoPE scaling (e.g. Vicuna v1.5)
- Llama with a non default RoPE base period (e.g. CodeLlama)
Accept the token type IDs as inputs for encoder models
Add property GenerationStepResult.hypothesis_id to identify the different hypotheses when running random sampling with num_hypotheses > 1

Fixes and improvements

Improve performance of 8-bit models on CPU:
- Vectorize the GEMM output dequantization
- Fuse the GEMM output dequantization with bias and activation
Allow inputs shorter than 30 seconds in Whisper methods
Fix incorrect batch_id values passed to the callback function
Fix a shape error in models using both MQA and relative positions
Fix compilation error related to AVX512 when using GCC 7
Call .detach() on PyTorch tensors before getting the Numpy array in converters

Assets 2

03 Aug 12:25

guillaumekln

v3.18.0

12a31d2

CTranslate2 3.18.0

Changes

Converted models now uses the same floating point precision as the original models. For example, a model saved in float16 will be converted to a float16 model. Before this change, the weights were casted to float32 by default.

Similarly, selecting int8 keeps non quantized weights in their original precision unless a more specific quantization type is selected:

int8_float32
int8_float16
int8_bfloat16

New features

Add property compute_type to model instances
Extend the Python class StorageView with additional methods and properties:
- to(dtype)
- device_index
- device
- dtype
- shape

Fixes and improvements

Update the function get_supported_compute_types to correctly return bfloat16 when supported
Update the HF Llama converter to accept extra tokens in the vocabulary
Fix a shape error when enabling return_alternatives with a model using relative positions
Fix a conversion error when using torch<1.13
Fix a type error when running Whisper models with the bfloat16 type
Update pybind11 to 2.11.1

Assets 2

20 Jul 18:18

guillaumekln

v3.17.1

4978339

CTranslate2 3.17.1

Fixes and improvements

Fix an error when running models with the new int8_bfloat16 computation type
Fix a vocabulary error when converting Llama 2 models with the Transformers converter
Update the Transformers converter to correctly convert Llama models using GQA
Stop the decoding when the generator returned by the method generate_tokens is closed

Assets 2

18 Jul 10:26

guillaumekln

v3.17.0

3e51874

CTranslate2 3.17.0

New features

Add new computation types: bfloat16 and int8_bfloat16 (require a GPU with Compute Capability 8.0 or above)
Support multi-query attention for encoder-decoder models
Allow converters to register weights as PyTorch tensors instead of Numpy arrays

Fixes and improvements

Pass the flag trust_remote_code when loading the tokenizer in the Transformers converter
Improve performance of T5 models by reusing the same relative position bias in every layers
Whisper: disable the first timestamp decoding rule when a prefix is used
Install the CMake configuration in the correct library directory (e.g. some platforms use lib64 instead of lib)

Assets 2

03 Jul 19:02

guillaumekln

v3.16.1

317b344

CTranslate2 3.16.1

Fixes and improvements

Fix repeated outputs in version 3.16.0 when using include_prompt_in_result=False and a batch input with variable lengths: a typo in the code led to min_length being incorrectly applied
Update the Transformers converter to accept extra tokens for Falcon models
Release the Python GIL when loading the model
Initialize the rotary embeddings on the GPU instead of the CPU
Avoid a copy for the input features passed to the Whisper methods
Vectorize copy in the Tile CUDA operator

Assets 2

15 Jun 15:01

guillaumekln

v3.16.0

59d223a

CTranslate2 3.16.0

New features

Update the Transformers converter to support more architectures:
- Falcon-40B
- XLM-RoBERTa
Add the generation option sampling_topp to enable top-p (nucleus) sampling
Save vocabulary files in the JSON format to better support tokens containing newlines or carriage returns

Fixes and improvements

Fix the application of min_length and max_length when using include_prompt_in_result=False and a batch input with variable lengths: the length constraint should only apply to the sequence after the prompt
Update oneDNN to 3.1.1

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New features

Fixes and improvements

New features

Fixes and improvements

New features

Fixes and improvements

Changes

New features

Fixes and improvements

Changes

New features

Fixes and improvements

Fixes and improvements

New features

Fixes and improvements

Fixes and improvements

New features

Fixes and improvements

Releases: OpenNMT/CTranslate2

CTranslate2 3.23.0

New features

Fixes and improvements

CTranslate2 3.22.0

New features

Fixes and improvements

CTranslate2 3.21.0

CTranslate2 3.20.0

New features

Fixes and improvements

CTranslate2 3.19.0

Changes

New features

Fixes and improvements

CTranslate2 3.18.0

Changes

New features

Fixes and improvements

CTranslate2 3.17.1

Fixes and improvements

CTranslate2 3.17.0

New features

Fixes and improvements

CTranslate2 3.16.1

Fixes and improvements

CTranslate2 3.16.0

New features

Fixes and improvements