Skip to content

Releases: RyaxTech/ryax-engine

24.10.0

23 Oct 14:21
Compare
Choose a tag to compare

We are proud to announce the release of:

​✨​ ​✨​ ​✨​ ​✨​ ​✨​ ​✨​ ​✨​ ​✨​ ​✨​

Ryax 24.10.0

​✨​ ​✨​ ​✨​ ​✨​ ​✨​ ​✨​ ​✨​ ​✨​ ​✨​

Multi-site full power!

New features

  • A new service called Ryax Worker can now be used to attached any Slurm or Kubernetes cluster resources
  • Ryax can now run any action on SLURM and Kubernetes seamlessly
  • Action are now scheduled according to user defined constraints and objectives
  • Add the possibility to pin Ryax services to a dedicated resources (nodeSelector)
  • Enhance Ryax documentation with updated content (doc)
  • New Jupyter Notebook action with GPU support in default actions
  • Action builds now can be canceled
  • Kubernetes addon now support injection of service

Bug fixes and Improvements

  • Fix volume permission for NFS based storage volumes (defaults to 1200 now)
  • Fix fail properly when a pip install fails during builds

Upgrade to this version

This is a major release of Ryax which implies some extra step for the upgrade.

Update configuration

This release introduce a new service, the Worker. In order to define the nodes that will be used by your actions, the Worker requires a site configuration. Please, add a configuration in your Ryax installation configuration file using the following example: in your local cluster has a node pool named default with a label my.provider.com/pool-name: default on each node, it has 4 CPU and 8G of memory per node.

worker:
  values:
     config:
       site:
         name: local
         spec:
           nodePools:
           - cpu: 4
             memory: 8G
             name: default
             selector:
               my.provider.com/pool-name: default

See the Worker configuration documentation for more details.

Update DNS

If you use public IP with TLS enabled, you will need to create a new DNS entry to support all subdomain for your cluster. This is used for example for an external Worker to access the internal container repository.
Please add an entry in your DNS using star notation:
*.<clusterName>.<domainName>

See installation doc for more details.

Add HPC site

The users of HPC actions have to install a Worker dedicated to each cluster following this documentation.

Apply and clean

Once configured, you can apply the configuration with ryax-adm as usual.

The log capture service, Loki, was moved into the ryaxns namespace. Thus, the old Loki deployment can be removed.
After applying, we have to remove the old deployment:

helm uninstall -n ryaxns-monitoring loki
kubectl delete pvc -n ryaxns-monitoring storage-loki-0

The Worker is now handling deployment. So, to avoid dangling actions and failing deployment, you have to clean the Runner state.
Be aware that, this will reset the execution history and stop all running workflows.

ryax-adm clean runner worker

24.06.0

18 Jun 12:41
Compare
Choose a tag to compare

We are proud to announce the release of:

✨​ ​✨​ ​✨​ ​✨​ ​✨​ ​✨​ ​✨​ ​✨​ ​✨​

Ryax 24.06.0

✨​ ​✨​ ​✨​ ​✨​ ​✨​ ​✨​ ​✨​ ​✨​ ​✨​

Control and stability.

New features

  • Add a Kubernetes Addon to customize action deployment (label, nodeSelector, annotations, serviceAccount)

Bug fixes and Improvements

  • Fix impossible to add dynamic output enum Values
  • Fix addon default values from ryax_metadata.yaml no available in UI
  • Better error handling for action deployments
  • Fix hpc addon support of files in custom script
  • Fix python-cuda build fails in some case
  • Fix UID overlap when using NFS CSI Driver
  • Fix OutOfMemory during git scan lead to inconsistent state

Upgrade to this version

Usual process: update the version in the config file and apply!

24.01.0

12 Jan 11:37
Compare
Choose a tag to compare

We are proud to announce the release of

​✨​ ​✨​ ​✨​ ​✨​ ​✨​ ​✨​ ​✨​ ​✨​ ​✨​

Ryax 24.01.0

​✨​ ​✨​ ​✨​ ​✨​ ​✨​ ​✨​ ​✨​ ​✨​ ​✨​

This release focus on Reliability and Security 💪

The changelog:

Bugfixes and Improvements

  • Fix connection issues on broker restart
  • migrate Helm chart repository to an OCI standard repository
  • Fix SSH Slurm execution issue with files
  • Do not use root user inside the action builder container

Upgrade to this version

If you have set the chartRegistry (you probably didn't) in your configuration values file please change the Chart repository URL to url: registry.ryax.org/release-charts.

24.02.0

09 Feb 10:12
Compare
Choose a tag to compare

We are proud to announce the release of

​✨​ ​✨​ ​✨​ ​✨​ ​✨​ ​✨​ ​✨​ ​✨​ ​✨​

Ryax 24.02.0

​✨​ ​✨​ ​✨​ ​✨​ ​✨​ ​✨​ ​✨​ ​✨​ ​✨​

This release brings better HPC offloading support !

The changelog:

Bug Fixes and Improvements

  • Add HPC offloading capability to run custom script on nodes directly for
    parallel jobs
  • Better error handling in HPC offloading deployment and execution
  • Fix HPC Offloading log capture
  • Runs can now be canceled and deleted from the UI
  • Fix dynamic outputs edition and improve display
  • Fix action not undeployed in some corner case

Upgrade to this Version

HPC action have to be deleted and recreated to have the custom script
parameters available.

23.12.0

28 Dec 13:29
Compare
Choose a tag to compare

We are proud to announce the release of

​✨​ ​✨​ ​✨​ ​✨​ ​✨​ ​✨​ ​✨​ ​✨​ ​✨​

Ryax 23.12.0

​✨​ ​✨​ ​✨​ ​✨​ ​✨​ ​✨​ ​✨​ ​✨​ ​✨​

This release focus on Scaling and Performance 🚀

The changelog:

New features

  • Improve HPC offloading with optimized IO and image build
  • 1 to N scaling of actions with better Kubernetes autoscale support
  • Show a clear error message on Action failure due to resource limits
  • Optional IO for Actions

Bug fixes and Improvements

  • Improve database query performance and Runner responsiveness
  • Fix actions undeploying during Runner restarts
  • Fix monitoring configuration for KubeProxy
  • Fix workflow deletion failed in some conditions
  • Fix RabbitMQ failure to respond to liveness probe

Upgrade to this version

The RabbitMQ deployment needs to be replaced. To do so, uninstall it before the
update (communication between services will stop during update):

helm uninstall -n ryaxns rabbitmq

Then, proceed with the normal upgrade process.

To avoid errors on connections between service, restart them after the upgrade with:

kubectl delete pod -n ryaxns -l ryax.tech/resource-type=internal

23.10.0

12 Oct 13:04
Compare
Choose a tag to compare

We are proud to announce the release of

✨​ ​✨​ ​✨​ ​✨​ ​✨​ ​✨​ ​✨​ ​✨​ ​✨​

Ryax 23.10.0

✨​ ​✨​ ​✨​ ​✨​ ​✨​ ​✨​ ​✨​ ​✨​ ​✨​

The changelog:

New features and improvements

  • Trigger with python with CUDA support
  • Graceful stop for running workflows
  • Auto reload on UI update (PWA support)

Bug fixes

  • Better error message when scanning badly formatted action metadata
  • Fix add repository modal layout
  • Fix refresh error on OpenAPI UI in some cases

Upgrade to this version

No action needed!

23.09.0

19 Sep 08:46
Compare
Choose a tag to compare

We are proud to announce the release of

​✨​ ​✨​ ​✨​ ​✨​ ​✨​ ​✨​ ​✨​ ​✨​ ​✨​

Ryax 23.09.0

​✨​ ​✨​ ​✨​ ​✨​ ​✨​ ​✨​ ​✨​ ​✨​ ​✨​

This release focus on Observability and Performance, enjoy!

The changelog:

New features

  • Instant logs on Triggers
  • Better logs display for the Runs
  • Update of Prometheus to the latest version
  • Performance metrics are now exported and available in a dashboard in Grafana
  • Add internal tracing in the Runner with Tempo to query traces in Grafana
  • Run details panel rework

Bug fixes

  • Improve database query performance and Runner responsiveness
  • Fix errors on version change in some cases
  • Fix error when stored file size is too big

Upgrade to this version

Admins should take care of the following elements when upgrading to this version.

Instant log

To get instant log, you have to rebuild the Actions. To do so, just run
"Build All" on the Library on your repository and the next deployment will use
the updated version.

Prometheus update

The update of Prometheus requires the following manual operation, before running the update. This will update the CRD and remove the old version of Prometheus.

kubectl apply --server-side -f https://raw.githubusercontent.com/prometheus-operator/prometheus-operator/v0.68.0/example/prometheus-operator-crd/monitoring.coreos.com_alertmanagerconfigs.yaml --force-conflicts
kubectl apply --server-side -f https://raw.githubusercontent.com/prometheus-operator/prometheus-operator/v0.68.0/example/prometheus-operator-crd/monitoring.coreos.com_alertmanagers.yaml --force-conflicts
kubectl apply --server-side -f https://raw.githubusercontent.com/prometheus-operator/prometheus-operator/v0.68.0/example/prometheus-operator-crd/monitoring.coreos.com_podmonitors.yaml --force-conflicts
kubectl apply --server-side -f https://raw.githubusercontent.com/prometheus-operator/prometheus-operator/v0.68.0/example/prometheus-operator-crd/monitoring.coreos.com_probes.yaml --force-conflicts
kubectl apply --server-side -f https://raw.githubusercontent.com/prometheus-operator/prometheus-operator/v0.68.0/example/prometheus-operator-crd/monitoring.coreos.com_prometheusagents.yaml --force-conflicts
kubectl apply --server-side -f https://raw.githubusercontent.com/prometheus-operator/prometheus-operator/v0.68.0/example/prometheus-operator-crd/monitoring.coreos.com_prometheuses.yaml --force-conflicts
kubectl apply --server-side -f https://raw.githubusercontent.com/prometheus-operator/prometheus-operator/v0.68.0/example/prometheus-operator-crd/monitoring.coreos.com_prometheusrules.yaml --force-conflicts
kubectl apply --server-side -f https://raw.githubusercontent.com/prometheus-operator/prometheus-operator/v0.68.0/example/prometheus-operator-crd/monitoring.coreos.com_scrapeconfigs.yaml --force-conflicts
kubectl apply --server-side -f https://raw.githubusercontent.com/prometheus-operator/prometheus-operator/v0.68.0/example/prometheus-operator-crd/monitoring.coreos.com_servicemonitors.yaml --force-conflicts
kubectl apply --server-side -f https://raw.githubusercontent.com/prometheus-operator/prometheus-operator/v0.68.0/example/prometheus-operator-crd/monitoring.coreos.com_thanosrulers.yaml --force-conflicts

helm uninstall -n ryaxns-monitoring prometheus

Now you can run the update to reinstall the new Prometheus version with the
usual ryax-adm apply command.

Grafana's credentials are reset by this update, user is ryax and the password can be obtained with:

kubectl get secret --namespace ryaxns-monitoring grafana-cedentials -o jsonpath="{.data.admin-password}" | base64 -d

23.07.0

04 Jul 12:20
Compare
Choose a tag to compare

We are proud to announce the release of

​✨​ ​✨​ ​✨​ ​✨​ ​✨​ ​✨​ ​✨​ ​✨​ ​✨​

Ryax 23.07.0

​✨​ ​✨​ ​✨​ ​✨​ ​✨​ ​✨​ ​✨​ ​✨​ ​✨​

The changelog:

New features and improvements

  • Allows users to set a specific HTTP status code with its API result
  • Actions can send a user-defined error with custom HTTP status code
  • Update Loki (logs capture) and Cert Manager (SSL certificate manager) to the latest version

Bug fixes

  • Fix OpenAPI page not always in sync with deployed workflows

Upgrade to this version

This update requires uninstalling the old Loki version before installing the
new one.

Before the update, just remove the old Loki version with:

helm uninstall -n ryaxns-monitoring loki

Be aware that some logs might not be captured before the new version is up and
running.

More details on Loki upgrade: https://grafana.com/docs/loki/next/installation/helm/upgrade-from-2.x/

23.06.0

16 Jun 12:09
Compare
Choose a tag to compare

We are proud to announce the release of

​✨​ ​✨​ ​✨​ ​✨​ ​✨​ ​✨​ ​✨​ ​✨​ ​✨​

Ryax 23.06.0

​✨​ ​✨​ ​✨​ ​✨​ ​✨​ ​✨​ ​✨​ ​✨​ ​✨​

The changelog:

New features and improvements

  • HPC offloading using Singularity with multi user support
  • Cuda GPU supports with the python3-cuda language
  • Resources request support (CPU, Memory, Time, GPU)
  • Keep the home .cache directory between runs
  • Allows user to rebuild already built actions
  • Better internal runs state management
  • Use the latest Minio version

Bug fixes

  • Show deployment error details when it happen
  • Always show a notification when an error happen
  • Fix certificate injection for our internal registry with docker daemon
  • UI now show the deployment errors if any

Upgrade to this version

WARNING: This update requires an update which implies a maintenance period to
copy the data from one store to another.

Minio migration (for production)

The internal filestore, Minio, upgrade requires to migrate the data from the old instance to the
new. For more details, see https://min.io/docs/minio/linux/operations/install-deploy-manage/migrate-fs-gateway.html

Get old filestore credentials

echo OLD_FILESTORE_SRV
kubectl get secret --namespace "ryaxns" ryax-filestore-secret -o jsonpath="{.data.filestore}" | base64 -d
echo OLD_FILESTORE_ACCESS
kubectl get secret --namespace "ryaxns" ryax-filestore-secret -o jsonpath="{.data.filestore-access}" | base64 -d
echo OLD_FILESTORE_SECRET
kubectl get secret --namespace "ryaxns" ryax-filestore-secret -o jsonpath="{.data.filestore-secret}" | base64 -d

Connect to the new Minio pod

MINIO_POD="$(kubectl -n ryaxns get pods --selector app.kubernetes.io/name=minio -o jsonpath='{.items[0].metadata.name}')"
kubectl -n ryaxns exec -ti $MINIO_POD -- bash

Now inside the Minio pod (replace the variables by the values from previous
steps):

mc alias set old http://OLD_FILESTORE_SRV OLD_FILESTORE_ACCESS OLD_FILESTORE_SECRET
mc alias set new http://localhost:9000 ryax $MINIO_ROOT_PASSWORD
mc mb new/ryax-filestore
mc mirror --preserve old/ryax-filestore new/ryax-filestore

You can now safely remove the old filestore deployment with:

helm uninstall -n ryaxns ryax-filestore

Clean (for dev)

Clean the internal state of the services to avoid error of missing file when
upgrading minio

ryax-adm clean studio runner

23.05.0

05 May 13:06
Compare
Choose a tag to compare

We are proud to announce the release of

​✨​ ​✨​ ​✨​ ​✨​ ​✨​ ​✨​ ​✨​ ​✨​ ​✨​

Ryax 23.05.0

​✨​ ​✨​ ​✨​ ​✨​ ​✨​ ​✨​ ​✨​ ​✨​ ​✨​

The changelog:

New features and improvements

  • arm64 architecture support

Bug fixes

  • Updates off all services and dependencies

Upgrade to this version

The rabbitmq pass is needed for the update. You can get it with:

kubectl get secret --namespace "ryaxns" ryax-broker-secret -o jsonpath="{.data.rabbitmq-password}" | base64 -d

Then, add the password you just get to the cluster configuration values:

rabbitmq:
  values:
    auth:
      password: <PASSWORD>

If you have an old version of traefik Helm chart, you might have an upgrade
error. In that case run the following command and retry (WARNING: This will create a
small downtime):

kubectl delete ingressroute -n kube-system traefik-dashboard
kubectl delete deployments.apps -n kube-system traefik