Skip to content

Commit

Permalink
Bump version 4.1.0 (#1638)
Browse files Browse the repository at this point in the history
  • Loading branch information
minhthuc2502 authored Mar 11, 2024
1 parent b4b3ac0 commit 27092e4
Show file tree
Hide file tree
Showing 5 changed files with 20 additions and 7 deletions.
11 changes: 11 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,17 @@

### Fixes and improvements

## [v4.1.0](https://github.com/OpenNMT/CTranslate2/releases/tag/v4.1.0) (2024-03-11)

### New features
* Support Gemma Model (#1631)
* Support Tensor Parallelism (#1599)

### Fixes and improvements
* Avoid initializing unused GPU (#1633)
* Read very large tensor by chunk if the size > max value of int (#1636)
* Update Readme

## [v4.0.0](https://github.com/OpenNMT/CTranslate2/releases/tag/v4.0.0) (2024-02-15)

This major version introduces the breaking change while updating to cuda 12.
Expand Down
6 changes: 3 additions & 3 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -137,14 +137,14 @@ Python wheels for Linux and Windows are compiled against NVIDIA libraries to sup
To limit the size of the packages pushed to PyPI, some libraries are not included in the package and are dynamically loaded at runtime with `dlopen` (or `LoadLibraryA` on Windows).

* `libcudart_static.a` (statically linked)
* `libcublas.so.11` (dlopened at runtime in [`cublas_stub.cc`](https://github.com/OpenNMT/CTranslate2/blob/master/src/cuda/cublas_stub.cc))
* `libcublas.so.12` (dlopened at runtime in [`cublas_stub.cc`](https://github.com/OpenNMT/CTranslate2/blob/master/src/cuda/cublas_stub.cc))
* `libcudnn.so.8` (dynamically linked)
* `libcudnn_ops_infer.so.8` (dlopened at runtime by `libcudnn.so.8`)
* `libcudnn_cnn_infer.so.8` (dlopened at runtime by `libcudnn.so.8`)

One of the benefits of this dynamic loading is that multiple versions of cuBLAS and cuDNN are supported by the same binary. In particular, users can install any CUDA 11.x version as long as it provides `libcublas.so.11`.
One of the benefits of this dynamic loading is that multiple versions of cuBLAS and cuDNN are supported by the same binary. In particular, users can install any CUDA 12.x version as long as it provides `libcublas.so.12`.

However, supporting a new major CUDA version (e.g. CUDA 11 to 12) requires updating the CUDA libraries used during compilation. This will be a breaking change for existing users since they would need to update their cuBLAS/cuDNN libraries and possibly [update their GPU driver](https://docs.nvidia.com/deploy/cuda-compatibility/).
The Python library only support CUDA 12.x. C++ source code is always compatible with CUDA 11, possible to use CUDA 11 libraries during compilation to create CUDA 11.x support wheel.

### Updating other dependencies

Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@ The project is production-oriented and comes with [backward compatibility guaran
* **Lightweight on disk**<br/>Quantization can make the models 4 times smaller on disk with minimal accuracy loss.
* **Simple integration**<br/>The project has few dependencies and exposes simple APIs in [Python](https://opennmt.net/CTranslate2/python/overview.html) and C++ to cover most integration needs.
* **Configurable and interactive decoding**<br/>[Advanced decoding features](https://opennmt.net/CTranslate2/decoding.html) allow autocompleting a partial sequence and returning alternatives at a specific location in the sequence.
* **Support tensor parallelism for distributed inference.
* **Support tensor parallelism for distributed inference**<br/>Very large model can be split into multiple GPUs. Following this [documentation](docs/parallel.md#model-and-tensor-parallelism) to set up the required environment.

Some of these features are difficult to achieve with standard deep learning frameworks and are the motivation for this project.

Expand Down
6 changes: 4 additions & 2 deletions docs/installation.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,9 +15,9 @@ The Python wheels have the following requirements:
* pip version: >= 19.3 to support `manylinux2014` wheels

```{admonition} GPU support
The Linux and Windows Python wheels support GPU execution. Install [CUDA](https://developer.nvidia.com/cuda-toolkit) 11.x to use the GPU.
The Linux and Windows Python wheels support GPU execution. Install [CUDA](https://developer.nvidia.com/cuda-toolkit) 12.x to use the GPU.
If you plan to run models with convolutional layers (e.g. for speech recognition), you should also install [cuDNN 8](https://developer.nvidia.com/cudnn) for CUDA 11.x.
If you plan to run models with convolutional layers (e.g. for speech recognition), you should also install [cuDNN 8](https://developer.nvidia.com/cudnn) for CUDA 12.x.
```

```{note}
Expand All @@ -43,6 +43,8 @@ The images include:
docker run --rm ghcr.io/opennmt/ctranslate2:latest-ubuntu20.04-cuda11.2 --help
```

To update to the new version that supports CUDA 12.

```{admonition} GPU support
The Docker image supports GPU execution. Install the [NVIDIA Container Toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/overview.html) to use GPUs from Docker.
```
Expand Down
2 changes: 1 addition & 1 deletion python/ctranslate2/version.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
"""Version information."""

__version__ = "4.0.0"
__version__ = "4.1.0"

0 comments on commit 27092e4

Please sign in to comment.