Skip to content

Commit 92e21d3

Browse files
committed
EH: Updated release notes for v9.0.5
1 parent 8bc7bd6 commit 92e21d3

File tree

2 files changed

+86
-13
lines changed

2 files changed

+86
-13
lines changed

doc/markdown/manual/release-notes/03_major_enhancements.md

+1-12
Original file line numberDiff line numberDiff line change
@@ -157,18 +157,7 @@ A list of all special variables is given in the sge_conf.5 man page in the `prol
157157

158158
### Enhanced NVIDIA GPU Support with qgpu
159159

160-
* With the release of patch 9.0.2, the `qgpu` command has been added to simplify
161-
workload management for GPU resources. The `qgpu` command allows administrators
162-
to manage GPU resources more efficiently. It is available for Linux _amd64_ and
163-
Linux _arm64_. `qgpu` is a multi-purpose command which can act as a `load sensor`
164-
reporting the characteristics and metrics of of NVIDIA GPU devices. For that it
165-
depends on NVIDIA DCGM to be installed on the GPU nodes. It also works as a
166-
`prolog` and `epilog` for jobs to setup NVIDIA runtime and environment variables.
167-
Further it sets up per job GPU accounting so that the GPU usage and power
168-
consumption is automatically reported in the accounting being visible in the
169-
standard `qacct -j` output. It supports all NVIDIA GPUs which are supported by
170-
Nvidias DCGM including NVIDIA's latest Grace Hopper superchips. For more
171-
information about `qgpu` please refer to the `Admin Guide`.
160+
* With the release of patch 9.0.2, the `qgpu` command has been added to simplify workload management for GPU resources. The `qgpu` command allows administrators to manage GPU resources more efficiently. It is available for Linux _amd64_ and Linux _arm64_. `qgpu` is a multi-purpose command which can act as a `load sensor` reporting the characteristics and metrics of of NVIDIA GPU devices. For that it depends on NVIDIA DCGM to be installed on the GPU nodes. It also works as a `prolog` and `epilog` for jobs to setup NVIDIA runtime and environment variables. Further it sets up per job GPU accounting so that the GPU usage and power consumption is automatically reported in the accounting being visible in the standard `qacct -j` output. It supports all NVIDIA GPUs which are supported by Nvidias DCGM including NVIDIA's latest Grace Hopper superchips. For more information about `qgpu` please refer to the `Admin Guide`.
172161

173162
(Available in Gridware Cluster Scheduler only)
174163

doc/markdown/manual/release-notes/04_full_list_of_fixes.md

+85-1
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,90 @@
11
# Full List of Fixes
22

3-
# Release notes - Cluster Scheduler
3+
# Release notes - Cluster Scheduler
4+
5+
## v9.0.5
6+
7+
### Improvement
8+
9+
[CS-342](https://hpc-gridware.atlassian.net/browse/CS-342) provide an openmpi integration
10+
11+
[CS-343](https://hpc-gridware.atlassian.net/browse/CS-343) provide an example and test program using MPI
12+
13+
[CS-791](https://hpc-gridware.atlassian.net/browse/CS-791) sge\_root should be available as special variable in the configuration of prolog, epilog, queue, pe, ckpt
14+
15+
[CS-914](https://hpc-gridware.atlassian.net/browse/CS-914) Make ARCH script more robust
16+
17+
[CS-1090](https://hpc-gridware.atlassian.net/browse/CS-1090) qstat -r shall report resource requests by scope
18+
19+
[CS-1094](https://hpc-gridware.atlassian.net/browse/CS-1094) Update sge\_pe.md to better explain PE\_HOSTFILE
20+
21+
[CS-1114](https://hpc-gridware.atlassian.net/browse/CS-1114) Add GPU monitoring examples to qtelemetry Grafana dashboard
22+
23+
[CS-1115](https://hpc-gridware.atlassian.net/browse/CS-1115) Build qtelemetry in containers for lx-amd64 and lx-arm64
24+
25+
[CS-1126](https://hpc-gridware.atlassian.net/browse/CS-1126) in the environment of tasks of tightly integrated parallel jobs set the pe\_task\_id
26+
27+
[CS-1128](https://hpc-gridware.atlassian.net/browse/CS-1128) Add enroot to worker GPU VM image for GCP
28+
29+
[CS-1143](https://hpc-gridware.atlassian.net/browse/CS-1143) provide a MPICH integration
30+
31+
[CS-1144](https://hpc-gridware.atlassian.net/browse/CS-1144) provide a MVAPICH integration
32+
33+
[CS-1145](https://hpc-gridware.atlassian.net/browse/CS-1145) provide an Intel MPI integration
34+
35+
[CS-1146](https://hpc-gridware.atlassian.net/browse/CS-1146) cleanup and document the ssh wrapper MPI template and scripts
36+
37+
[CS-1152](https://hpc-gridware.atlassian.net/browse/CS-1152) add a checktree\_mpi to testsuite with configuration and tests making use of the various MPI integrations
38+
39+
[CS-1158](https://hpc-gridware.atlassian.net/browse/CS-1158) Add qtelemetry Grafana dashboard to public Grafana Cloud Dashboards
40+
41+
### New Feature
42+
43+
[CS-1091](https://hpc-gridware.atlassian.net/browse/CS-1091) Clearly document the slots syntax in man5 sge\_queue\_conf.md
44+
45+
### Sub-task
46+
47+
[CS-697](https://hpc-gridware.atlassian.net/browse/CS-697) Jenkins: enable issue\_3013
48+
49+
[CS-698](https://hpc-gridware.atlassian.net/browse/CS-698) Jenkins: enable issue\_3179
50+
51+
### Task
52+
53+
[CS-662](https://hpc-gridware.atlassian.net/browse/CS-662) verify delayed job reporting of sge\_execd after reconnecting to sge\_qmaster
54+
55+
[CS-1117](https://hpc-gridware.atlassian.net/browse/CS-1117) Add qtelemetry as developer preview to GCS distribution
56+
57+
[CS-1118](https://hpc-gridware.atlassian.net/browse/CS-1118) Create a packer file which builds a GPU enabled VM with and without GCS for fast deployment on GCP
58+
59+
[CS-1125](https://hpc-gridware.atlassian.net/browse/CS-1125) Provide a basic examples of how enroot can be used with the GPU integration
60+
61+
[CS-1134](https://hpc-gridware.atlassian.net/browse/CS-1134) message cutoff after 8 characters
62+
63+
[CS-1136](https://hpc-gridware.atlassian.net/browse/CS-1136) add checktree\_qtelemetry to all build environments \+ Jenkins setup
64+
65+
### Bug
66+
67+
[CS-430](https://hpc-gridware.atlassian.net/browse/CS-430) booking of resources into advance reservations needs to distinguish between host and queue resources
68+
69+
[CS-722](https://hpc-gridware.atlassian.net/browse/CS-722) env\_list in qstat should show NONE if not set
70+
71+
[CS-1028](https://hpc-gridware.atlassian.net/browse/CS-1028) qtelemetry should support NVIDIA loadsensor values for hosts
72+
73+
[CS-1085](https://hpc-gridware.atlassian.net/browse/CS-1085) BDB build error on lx-riscv64 after OS update.
74+
75+
[CS-1096](https://hpc-gridware.atlassian.net/browse/CS-1096) USE\_QSUB\_GID functionality fails on FreeBSD 14
76+
77+
[CS-1111](https://hpc-gridware.atlassian.net/browse/CS-1111) minimum and maximum thread counts in the bootstrap.5 man page are incorrect
78+
79+
[CS-1131](https://hpc-gridware.atlassian.net/browse/CS-1131) wallclock time reported for tasks of a tightly integrated parallel job is incorrect
80+
81+
[CS-1139](https://hpc-gridware.atlassian.net/browse/CS-1139) job deletion via JAPI/DRMAA fails if job ID exceeds INT\_MAX
82+
83+
[CS-1140](https://hpc-gridware.atlassian.net/browse/CS-1140) termination of event client via JAPI fails if event client ID exceeds INT\_MAX
84+
85+
[CS-1141](https://hpc-gridware.atlassian.net/browse/CS-1141) MacOS build broken due to unavailability of getgrouplist\(\)
86+
87+
[CS-1163](https://hpc-gridware.atlassian.net/browse/CS-1163) when a queue is signalled then additional invalid entries are created in the berkeleydb spooling database
488

589
## v9.0.4
690

0 commit comments

Comments
 (0)