Releases: allegroai/clearml-serving
Releases · allegroai/clearml-serving
v1.3.1
New Features and Bug Fixes
- Add missing await (#55, thanks @amirhmk!)
- Add traceback for failing to load preprocess class (#57)
- Fix Triton
config.pbtxt
is not checked for missing values or colliding specifications (#62) - Add safer code for pulling from Kafka
- Add
str
type to Triton type conversion - Fix ignore auto detected
platform
when passingconfig.pbtxt
withplatform
entry - Fix Triton engine model with multiple versions was not properly supported
- Fix serving session keep alive is also sent on idle
- Fix examples readme files
- Log preprocess exceptions with full stack trace to serving session console output
v1.3.0
Stable Release
v1.2.0
Stable Release
-
Features
- GPU Performance improvements, 50%-300% improvement over vanilla Triton
- Performance improvements on CPU, optimize uvloop + multi-processing
- Huggingface Transformer example
- Binary input support, #37 , thanks @Aleksandar1932
-
Bug fixes
- stdout/stderr in inference service was not logged to dedicated Task
v1.1.0
Stable Release
Notice: This release is not backwards compatible - see notes below on upgrading
-
Breaking Changes
- Triton engine size supports variable request size (-1)
-
Features & Bug fixes
- Add version number of serving session task
- Triton engine support for variable request (matrix) sizes
- Triton support, fix --aux-config to support more configurations elements
- Huggingface Transformer support
Preprocess
class as module (see note below)
Note: To add a Preprocess
class from a module (the entire module folder will be packaged)
preprocess_folder
├── __init__.py # from .sub.some_file import Preprocess
└── sub
└── some_file.py
Pass the top folder as a path for --preprocess
, for example:
clearml-serving --id <serving_session_id> model add --preprocess /path/to/preprocess_folder ...
Upgrading from v1.0
- Take down the serving containers (docker-compose or k8s)
- Update the clearml-serving CLI
pip3 install -U clearml-serving
- Re-add a single existing endpoint with
clearml-serving model add ...
(press yes when asked)
(it will upgrade the clearml-serving session definitions) - Pull latest serving containers (
docker-compose pull ...
or k8s) - Re-spin serving containers (docker-compose or k8s)
v1.0.0
Stable Release
Notice: This release is not backwards compatible
-
Breaking Changes
- pre / post processing class functions get 3 arguments, see example
- Add support for per-request state storage, passing information between the pre/post processing functions
-
Features & Bug fixes
- Optimize serving latency while collecting statistics
- Fix metric statistics collecting auto-refresh issue
- Fix live update of model preprocessing code
- Add
pandas
to the default serving container - Add per endpoint/variable statistics collection control
- Add
CLEARML_EXTRA_PYTHON_PACKAGES
for easier additional python package support (serving inference container) - Upgrade Nvidia Triton base container image to 22.04 (requires Nvidia drivers 510+)
- Add Kubernetes Helm chart
PyPI v0.9.0
Redesign Release
Notice: This release is not backwards compatible
- Easy to deploy & configure
- Support Machine Learning Models (Scikit Learn, XGBoost, LightGBM)
- Support Deep Learning Models (Tensorflow, PyTorch, ONNX)
- Customizable RestAPI for serving (i.e. allow per model pre/post-processing for easy integration)
- Flexible
- On-line model deployment
- On-line endpoint model/version deployment (i.e. no need to take the service down)
- Per model standalone preprocessing and postprocessing python code
- Scalable
- Multi model per container
- Multi models per serving service
- Multi-service support (fully separated multiple serving service running independently)
- Multi cluster support
- Out-of-the-box node auto-scaling based on load/usage
- Efficient
- multi-container resource utilization
- Support for CPU & GPU nodes
- Auto-batching for DL models
- Automatic deployment
- Automatic model upgrades w/ canary support
- Programmable API for model deployment
- Canary A/B deployment
- Online Canary updates
- Model Monitoring
- Usage Metric reporting
- Metric Dashboard
- Model performance metric
- Model performance Dashboard
Features:
- FastAPI integration for inference service
- multi-process Gunicorn for inference service
- Dynamic preprocess python code loading (no need for container/process restart)
- Model files download/caching (http/s3/gs/azure)
- Scikit-learn. XGBoost, LightGBM integration
- Custom inference, including dynamic code loading
- Manual model upload/registration to model repository (http/s3/gs/azure)
- Canary load balancing
- Auto model endpoint deployment based on model repository state
- Machine/Node health metrics
- Dynamic online configuration
- CLI configuration tool
- Nvidia Triton integration
- GZip request compression
- TorchServe engine integration
- Prebuilt Docker containers (dockerhub)
- Docker-compose deployment (CPU/GPU)
- Scikit-Learn example
- XGBoost example
- LightGBM example
- PyTorch example
- TensorFlow/Keras example
- Model ensemble example
- Model pipeline example
- Statistics Service
- Kafka install instructions
- Prometheus install instructions
- Grafana install instructions
PyPI v0.3.3
Features & Bug Fixes
- Fix argparse.FileType error (issue #1)
- Fix passing both --id and --project / --name
PyPI v0.3.2
Features & Bug Fixes
- Add --debug for increased verbosity
- Fix --config always required (issue #1)