Skip to content

Releases: OpenNMT/CTranslate2

CTranslate2 4.5.0

22 Oct 11:23
383d063
Compare
Choose a tag to compare

Note: The Ctranslate2 Python package now supports CUDNN 9 and is no longer compatible with CUDNN 8.

New features

  • Support Phi3 (#1800)
  • Support Mistral Nemo (#1785)
  • Support Wav2Vec2Bert ASR (#1778)

Fixes and improvements

CTranslate2 4.4.0

09 Sep 09:21
8f4d134
Compare
Choose a tag to compare

Removed: Flash Attention support in the Python package due to significant package size increase with minimal performance gain.
Note: Flash Attention remains supported in the C++ package with the WITH_FLASH_ATTN option.
Flash Attention may be re-added in the future if substantial improvements are made.

New features

Fixes and improvements

  • Fix pipeline (#1723 + #1747)
  • Some improvements in flash attention (#1732)
  • Fix crash when using return_alternative on CUDA (#1733)
  • Quantization AWQ GEMM + GEMV (#1727)

CTranslate2 4.3.1

11 Jun 09:16
59c7dda
Compare
Choose a tag to compare

Note: Because of exceeding project's size on Pypi (> 20 GB), the release v4.3.0 was pushed unsuccessfully.

Fixes and improvements

  • Improve the compilation (#1706 and #1705)
  • Fix position bias in tensor parallel mode (#1714)

CTranslate2 4.3.0

17 May 08:20
173a0d1
Compare
Choose a tag to compare

New features

Fixes and improvements

  • Fix regression Flash Attention (#1695)

CTranslate2 4.2.1

24 Apr 10:04
0527ef7
Compare
Choose a tag to compare

Note: Because of the increasing of package's size (> 100 MB), the release v4.2.0 was pushed unsuccessfully.

New features

  • Support load/unload for generator/Whisper Attention (#1670)

Fixes and improvements

CTranslate2 4.2.0

10 Apr 11:41
e491a51
Compare
Choose a tag to compare

New features

  • Support Flash Attention (#1651)
  • Implementation of gemm for FLOAT32 compute type with RUY backend (#1598)
  • Conv1D quantization for only CPU (DNNL and CUDA backend is not supported) (#1601)

Fixes and improvements

  • Fix bug tensor parallel (#1643)
  • Use BestSampler when temperature is 0 (#1659)
  • Fix bug gemma (#1660)
  • Optimize loading/unloading time for Translator with cache (#1645)

CTranslate2 4.1.1

12 Mar 08:59
bfa0cb3
Compare
Choose a tag to compare

Fixes and improvements

  • Fix classifiers in setup.py to push pypi package

CTranslate2 4.1.0

11 Mar 16:15
27092e4
Compare
Choose a tag to compare

New features

  • Support Gemma Model (#1631)
  • Support Tensor Parallelism (#1599)

Fixes and improvements

  • Avoid initializing unused GPU (#1633)
  • Read very large tensor by chunk if the size > max value of int (#1636)
  • Update Readme

CTranslate2 4.0.0

15 Feb 12:51
61492e0
Compare
Choose a tag to compare

This major version introduces the breaking change while updating to cuda 12.

Breaking changes

Python

  • Support cuda 12

New features

  • Add feature to_device() in class StorageView in Python to move data between host <-> device

Fixes and improvements

  • Implement Conv1D with im2col and GEMM to improvement in performance
  • Get tokens in the range of the vocab size for LlaMa models
  • Fix loss of performance
  • Update cibuildwheel to 2.16.5

CTranslate2 3.24.0

09 Jan 09:17
c95fd4e
Compare
Choose a tag to compare

New features

  • Support of new option offset to ignore token score of special tokens