You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+131-6Lines changed: 131 additions & 6 deletions
Original file line number
Diff line number
Diff line change
@@ -20,6 +20,7 @@ Before you start, you need the following:
20
20
- Uses Kubernetes version 1.21.1 or later.
21
21
- Meets the system requirements for running MATLAB Job Scheduler. For details, see the MathWorks documentation for [MATLAB Parallel Server Product Requirements](https://www.mathworks.com/support/requirements/matlab-parallel-server.html).
22
22
- Configured to create external load balancers that allow traffic into the cluster.
23
+
- Has adequate storage on cluster nodes. When using a MATLAB Parallel Server Docker image for your workers (default behavior), ensure that each cluster node has at least 50GB of storage. If mounting MATLAB Parallel Server from a persistent volume, each cluster node must have at least 20GB of storage.
23
24
- Kubectl installed on your computer and configured to access your Kubernetes cluster. For help with installing Kubectl, see [Install Tools](https://kubernetes.io/docs/tasks/tools/) on the Kubernetes website.
24
25
- Helm® version 3.8.0 or later installed on your computer. For help with installing Helm, see [Quickstart Guide](https://helm.sh/docs/intro/quickstart/).
25
26
- Network access to the MathWorks Container Registry, `containers.mathworks.com`, and the GitHub® Container registry, `ghcr.io`.
@@ -86,6 +87,7 @@ For details about security levels, see [MATLAB Job Scheduler Security](https://w
86
87
87
88
When you run MATLAB Job Scheduler with security level 2, you must provide an administrator password.
88
89
Create a Kubernetes Secret for your administrator password named `mjs-admin-password` and replace `<password>` with a password of your choice.
90
+
If you are using LDAP to authenticate user credentials, this must be the password of the cluster administrator in the LDAP server.
- `adminUser` — Specify the username of a valid user in the LDAP server. The secret you created in the [Create Administrator Password Secret](#create-administrator-password-secret) must contain this user's password.
140
+
- `ldapURL`— Specify the URL of the LDAP server as `ldaps://HOST:PORT`. If you have not configured your LDAP server over SSL, specify the URL as `ldap://HOST:PORT`.
141
+
- `ldapSecurityPrincipalFormat`— Specify the format of a security principal (user) for your LDAP server.
142
+
143
+
**Security Considerations:** Use LDAP over SSL (LDAPS) to encrypt communication between the LDAP server and clients. For additional LDAPS configuration steps, see [Configure LDAP over SSL](#configure-ldap-over-ssl).
144
+
130
145
### Install Helm Chart
131
146
132
147
Install the MATLAB Job Scheduler Helm chart with your custom values file:
@@ -158,10 +173,10 @@ NAME TYPE CLUSTER-IP EXTERNAL-IP PORT
Configure your firewall so that MATLAB clients can route to the IP address or hostname under the `EXTERNAL-IP` column through the ports this service exposes.
176
+
Configure your firewall so that MATLAB clients can connect to the IP address or hostname under the `EXTERNAL-IP` column through the ports this service exposes.
162
177
For a description of the ports the load balancer service exposes, see the [Customize Load Balancer](#customize-load-balancer) section.
163
178
164
-
If you want the MATLAB client to route to this load balancer through a different hostname, for example, an intermediate server or a DNS entry, set the value of the `clusterHost` parameter in your Helm values file before you install MATLAB Job Scheduler on your Kubernetes cluster.
179
+
If you want the MATLAB client to connect to this load balancer through a different hostname, for example, an intermediate server or a DNS entry, set the value of the `clusterHost` parameter in your Helm values file before you install MATLAB Job Scheduler on your Kubernetes cluster.
165
180
166
181
## Download Cluster Profile
167
182
@@ -187,16 +202,97 @@ Import the cluster profile.
187
202
2. Click **Import** in the toolbar.
188
203
3. Navigate to the location where you saved the profile you created in the previous step and select it.
189
204
190
-
###Validate Cluster
205
+
## Validate Cluster
191
206
192
207
Cluster validation submits a job of each type to test whether the cluster profile is configured correctly.
193
208
In the Cluster Profile Manager, click **Validate**.
194
209
If you make a change to the cluster configuration, run cluster validation again to ensure your changes cause no errors.
195
210
You do not need to validate the profile each time you use it or each time you start MATLAB.
196
211
212
+
### Troubleshoot Cluster Validation Failures
213
+
214
+
The following sections explain how to resolve some common cluster validation failures.
215
+
216
+
#### Cluster Connection Test Failure
217
+
218
+
Incorrect cluster profiles or networking issues can cause failures during the "Cluster connection test (parcluster)" cluster validation stage.
219
+
220
+
If you have uninstalled and reinstalled the MATLAB Job Scheduler Helm chart, make sure you download and import the new cluster profile following the instructions in [Download Cluster Profile](#download-cluster-profile).
221
+
Using a cluster profile from a previous deployment in the same Kubernetes cluster results in cluster validation errors.
222
+
223
+
You must ensure that your MATLAB client can connect to the IP address of the load balancer and that your firewall allows traffic to the MATLAB Job Scheduler ports.
224
+
To check the load balancer's IP address, see [Install Helm Chart](#install-helm-chart).
225
+
For a description of the MATLAB Job Scheduler ports, see [Customize Load Balancer](#customize-load-balancer).
226
+
227
+
#### License Checkout Failure
228
+
229
+
If you have not correctly configured the MATLAB Parallel Server license for your cluster, the "Job test (createJob)" cluster validation stage fails with this message:
230
+
```
231
+
License checkout failed
232
+
```
233
+
Make sure you have set either the `useOnlineLicensing` parameter or the `networkLicenseManager` parameter in your `values.yaml` file.
234
+
To learn more about the `useOnlineLicensing` and `networkLicenseManager` parameters, see [Create Helm Values File](#create-helm-values-file).
235
+
236
+
If you continue to experience licensing errors, contact [MathWorks Technical Support](https://www.mathworks.com/support/contact_us.html).
237
+
238
+
#### Job Test Unresponsive
239
+
240
+
If the "Job test (createJob)", "SPMD job test (createCommunicatingJob)" or "Pool job test (createCommunicatingJob)" stage takes a very long time to run (> 5 minutes), your Kubernetes cluster might not have sufficient resources to start the worker pods.
241
+
242
+
Check the status of the worker pods while cluster validation is in progress by running
243
+
```
244
+
kubectl get pods --label app=mjs-worker --namespace mjs
245
+
```
246
+
247
+
If a worker pod has the `Pending`, `ContainerCreating` or `ContainerStatusUnknown` status, check the pod's details by running
248
+
```
249
+
kubectl describe pods --namespace mjs <pod-name>
250
+
```
251
+
Replace `<pod-name>` with the name of the worker pod.
252
+
253
+
If your Kubernetes cluster does not have enough CPU resources to run the pod, the output might include messages like:
If you see either output, your Kubernetes cluster does not have enough resources to run the number of workers you specified in the `maxWorkers` parameter in your `values.yaml` file.
270
+
For details on the resource requirements for MATLAB Parallel Server workers, see the MathWorks documentation for [MATLAB Parallel Server Product Requirements](https://www.mathworks.com/support/requirements/matlab-parallel-server.html).
271
+
By default, each worker pod requests 2 vCPU and 8GB of memory. If your cluster does not have enough resources, either
272
+
- Add more nodes to your Kubernetes cluster or replace your existing nodes with nodes that have more CPU and memory resources.
273
+
- Modify your `values.yaml` file to decrease the value of the `maxWorkers` parameter.
274
+
- Modify your `values.yaml` file to decrease the values of the `workerMemoryRequest` and `workerMemoryLimit` parameters. A minimum of 4GB per MATLAB worker is recommended. If you are using Simulink, a minimum of 8GB per worker is recommended.
275
+
276
+
If you modified your `values.yaml` file, uninstall the MATLAB Job Scheduler Helm chart following the instructions in [Uninstall MATLAB Job Scheduler](#uninstall-matlab-job-scheduler), then reinstall the Helm chart following the instructions in [Install Helm chart](#install-helm-chart).
277
+
278
+
If your Kubernetes cluster nodes do not have enough ephemeral storage to pull the MATLAB Parallel Server Docker image, the output of `kubectl describe pods` might include messages like:
279
+
```
280
+
Events:
281
+
Type Reason Age From Message
282
+
---- ------ ---- ---- -------
283
+
Normal Scheduled 4m49s default-scheduler Successfully assigned default/mjs-worker-1-fd697549aeca4c2ab0f1bcb4fe819b0f-5d78457d5c-lcpv5 to my-node
284
+
Normal Pulling 4m49s kubelet Pulling image "ghcr.io/mathworks-ref-arch/matlab-parallel-server-k8s/mjs-worker-image:r2024a"
285
+
Warning Evicted 88s kubelet The node was low on resource: ephemeral-storage. Threshold quantity: 3219965180, available: 1313816Ki.
286
+
```
287
+
288
+
For details on the node storage requirements, see [Requirements](#requirements).
289
+
If your nodes do not have enough ephemeral storage, either
290
+
- Replace your Kubernetes nodes with nodes that have more storage.
291
+
- Instead of pulling the MATLAB Parallel Server Docker image, mount MATLAB Parallel Server from a PersistentVolume. To learn more, see [Mount MATLAB from a PersistentVolume](#mount-matlab-from-a-persistentvolume).
292
+
197
293
## Uninstall MATLAB Job Scheduler
198
294
199
-
To uninstall MATLAB Job Scheduler from your Kubernetes cluster, run this command:
295
+
To uninstall the MATLAB Job Scheduler Helm chart from your Kubernetes cluster, run this command:
200
296
```
201
297
helm uninstall mjs --namespace mjs
202
298
```
@@ -211,12 +307,12 @@ If you created a custom load balancer service, delete the service:
211
307
kubectl delete service mjs-ingress-proxy --namespace mjs
212
308
```
213
309
214
-
If you want to reinstall MATLAB Job Scheduler, you must ensure that the load balancer service is deleted first.
310
+
If you want to reinstall the MATLAB Job Scheduler Helm chart, you must ensure that the load balancer service is deleted first.
215
311
To check the status of the load balancer service, run:
216
312
```
217
313
kubectl get service mjs-ingress-proxy --namespace mjs
218
314
```
219
-
If the load balancer service appears, wait for some time, then run the command again to confirm that the load balancer service is not found before proceeding with the MATLAB Job Scheduler reinstallation.
315
+
If the load balancer service appears, wait for some time, then run the command again to confirm that the load balancer service is not found before proceeding with the MATLAB Job Scheduler Helm chart reinstallation.
220
316
221
317
## Examples
222
318
@@ -324,6 +420,35 @@ For details on creating the PersistentVolumeClaim, see the [Create Persistent Vo
324
420
Modify your `values.yaml` file to set the `matlabPVC` parameter to the name of your PersistentVolumeClaim before installating the Helm chart.
325
421
The worker pods will now use the image URI specified in the `matlabDepsImage` parameter instead of the `workerImage` parameter.
326
422
423
+
### Run Multiple MATLAB Parallel Server Versions
424
+
425
+
You can use multiple versions of MATLAB Parallel Server in a single MATLAB Job Scheduler cluster.
426
+
When you upgrade to a newer release of MATLAB Parallel Server on your cluster, users can continue to submit jobs from both newer and older releases of the MATLAB client.
427
+
The additional MATLAB Parallel Server versions you use must be version R2024a or newer and must be older than the version of MATLAB Job Scheduler you are using.
428
+
429
+
Create a PersistentVolume and PersistentVolumeClaim for each additional MATLAB Parallel Server installation you want to use.
430
+
The root directory of each PersistentVolume must be the MATLAB root folder.
431
+
Modify your `values.yaml` file to set the `additionalMatlabPVCs` parameter to the names of the PersistentVolumeClaims.
432
+
433
+
For example, to use an additional PersistentVolumeClaim `matlab-r2024a-pvc`, add the following line to your `values.yaml` file:
434
+
```
435
+
additionalMatlabPVCs:
436
+
- matlab-r2024a-pvc
437
+
```
438
+
439
+
### Configure LDAP over SSL
440
+
441
+
When you use an LDAP server configured over SSL, you must add the LDAPS SSL certificate to your Kubernetes cluster.
442
+
To obtain the SSL server certificate, follow the instructions in [Connect to LDAP Server to Get Server SSL Certificate](https://www.mathworks.com/help/matlab-parallel-server/configure-ldap-server-authentication-for-matlab-job-scheduler.html#mw_fe8d0f90-2854-42b9-9e04-a2f25a295e61) on the MathWorks website.
443
+
444
+
If you use a prebuilt job manager image (default behavior), create a Kubernetes secret containing the server SSL certificate.
445
+
Replace `<path>` with the path to your server SSL certificate.
If you use a persistent volume for the job manager pod (`matlabPVC` is set to a non-empty string and `jobManagerUsesPVC` is set to `true` in your `values.yaml` file), you must add your certificate to the Java trust store of the MATLAB Parallel Server installation in your persistent volume. For detailed instructions, see [Add Certificate to Java Trust Store](https://mathworks.com/help/matlab-parallel-server/configure-ldap-server-authentication-for-matlab-job-scheduler.html#mw_fe8d0f90-2854-42b9-9e04-a2f25a295e61) on the MathWorks website.
451
+
327
452
### Customize Load Balancer
328
453
329
454
MATLAB Job Scheduler in Kubernetes uses a Kubernetes load balancer service to expose MATLAB Job Scheduler to MATLAB clients running outside of the Kubernetes cluster.
Copy file name to clipboardExpand all lines: chart/mjs/templates/_derived.tpl
+5Lines changed: 5 additions & 0 deletions
Original file line number
Diff line number
Diff line change
@@ -20,3 +20,8 @@
20
20
{{- define "derived.enableServiceLinks" -}}
21
21
{{ .Values.enableServiceLinks | defaultfalse}}
22
22
{{- end -}}
23
+
24
+
# If we are using a secure LDAP server and not using a persistent volume claim for the job manager pod, we need to add the LDAP certificate to the job manager's secret store
0 commit comments