Skip to content

Commit

Permalink
EH: CS-747 Show waiting reader queue length in monitoring
Browse files Browse the repository at this point in the history
  • Loading branch information
ernst-bablick committed Oct 29, 2024
1 parent 20dfbea commit a914085
Show file tree
Hide file tree
Showing 6 changed files with 44 additions and 15 deletions.
35 changes: 29 additions & 6 deletions doc/markdown/manual/release-notes/03_major_enhancements.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,17 +4,40 @@

### Automatic Session Management

Patch 9.0.2 introduces the new concept of automatic sessions. This concept allows the xxQS_NAMExx system to synchronise internal data stores more efficiently, so that client commands can be enforced to get the most recent data. Session management is enabled, but can be disabled by setting the `DISABLE_AUTOMATIC_SESSIONS` parameter in the `qmaster_params` of the cluster configuration.
* Patch 9.0.2 introduces the new concept of automatic sessions. Session allows the xxQS_NAMExx system to synchronize internal data stores, so that client commands can be enforced to get the most recent data. Session management is disabled, but can be enabled by setting the `DISABLE_AUTOMATIC_SESSIONS` parameter to *false* in the `qmaster_params` of the cluster configuration.

The default for the `qmaster_param` `DISABLE_SECONDARY_DS_READER` is now *false*. This means that the reader thread pool is enabled by default and does not need to be enabled manually as in patch 9.0.1.
The default for the `qmaster_param` `DISABLE_SECONDARY_DS_READER` is now *false*. This means that the reader thread pool is enabled by default and does not need to be enabled manually as in patch 9.0.1.

What does all this mean? Sessions ensure that commands that trigger changes within the cluster, such as submitting a job, modifying a queue or changing a complex value, are executed in a consistent way. Sessions ensure that the result of changing commands in the cluster is immediately visible to the user who initiated the change. Commands that only read data, such as `qstat`, `qhost` or `qconf -s...`, always return the most recent data although all read-requests in the system are executed completely in parallel to the xxQS_NAMExx core components.
The reader thread pool in combination with sessions ensure that commands that trigger changes within the cluster (write-requests), such as submitting a job, modifying a queue or changing a complex value, are executed and the outcome of those commands is guaranteed to be visible to the user who initiated the change. Commands that only read data (read-requests), such as `qstat`, `qhost` or `qconf -s...`, that are triggered by the same user, always return the most recent data although all read-requests in the system are executed completely in parallel to the other xxQS_NAMExx core components. This additional synchronization ensures that the data is consistent for the user with each read-request but on the other side might slow down individual read-requests.

Unlike other workload management systems, session management in xxQS_NAMExx is automatic. There is no need to manually create or destroy sessions. Session management runs silently in the background to offload the most critical internal components.
Assume following script:

All this further enhances cluster performance in large environments and improves cluster responsiveness, especially with tens of thousands of execution nodes, thousands of active users and millions of jobs/day.
```
#!/bin/sh
job_id=`qsub -terse ...`
qstat -j $job_id
```

(Available in Open Cluster Scheduler and Gridware Cluster Scheduler)
Without activated sessions it is *not* guaranteed that the `qstat -j` command will see the job that was submitted before. With sessions enabled, the `qstat -j` command will always see the job but the command will be slightly slower compared to the same scenario without sessions.

Sessions eliminate the need to poll for information about an action until it is visible in the system. Unlike other workload management systems, session management in xxQS_NAMExx is automatic. There is no need to manually create or destroy sessions after they have been enabled globally.


* The `sge_qmaster` monitoring has been improved. Beginning with this patch the output for worker and reader threads will show following numbers in the output section for reader and worker threads:

```
... OTHER (ql:0,rql:0,wrql:0) ...
```

All three values show internal request queue lengths. Usually they are all 0 but in high load situations or when sessions are enabled then they can increase:
* *ql* shows the queue length of the worker threads. This request queue contains requests that require a write lock on the main data store.
* *rql* shows the queue length of the reader threads. The queue contains requests that require a read lock on the secondary reader data store.
* *wrql* shows the queue length of the waiting worker threads. All requests that cannot be handled by reader threads immediately are stored in this list till the secondary reader data store is ready to handle them. If sessions are disabled then the number will always be 0.

Increasing values are uncritical as long as the numbers also decrease again. If the numbers increase continuously then the system is under high load and the performance might be impacted.

(Available in Open Cluster Scheduler and Gridware Cluster Scheduler)

## v9.0.1

Expand Down
5 changes: 3 additions & 2 deletions source/daemons/qmaster/sge_thread_reader.cc
Original file line number Diff line number Diff line change
Expand Up @@ -164,8 +164,9 @@ sge_reader_main(void *arg) {
MONITOR_IDLE_TIME(sge_tq_wait_for_task(ReaderRequestQueue, 1, SGE_TQ_GDI_PACKET, (void **) &packet),
p_monitor, mconf_get_monitor_time(), mconf_is_monitor_message());

MONITOR_SET_QLEN(p_monitor, sge_tq_get_task_count(ReaderRequestQueue));
MONITOR_SET_WQLEN(p_monitor, sge_tq_get_task_count(ReaderWaitingRequestQueue));
MONITOR_SET_QLEN(p_monitor, sge_tq_get_task_count(GlobalRequestQueue));
MONITOR_SET_RQLEN(p_monitor, sge_tq_get_task_count(ReaderRequestQueue));
MONITOR_SET_WRQLEN(p_monitor, sge_tq_get_task_count(ReaderWaitingRequestQueue));

// handle the packet only if it is not nullptr and the shutdown has not started
if (packet != nullptr && !sge_thread_has_shutdown_started()) {
Expand Down
2 changes: 1 addition & 1 deletion source/libs/sgeobj/sge_conf.cc
Original file line number Diff line number Diff line change
Expand Up @@ -157,7 +157,7 @@ static bool disable_secondary_ds_reader = DEFAULT_DISABLE_SECONDARY_DS_READER;
#define DEFAULT_DISABLE_SECONDARY_DS_EXECD (false)
static bool disable_secondary_ds_execd = DEFAULT_DISABLE_SECONDARY_DS_EXECD;

#define DEFAULT_DISABLE_AUTOMATIC_SESSIONS (false)
#define DEFAULT_DISABLE_AUTOMATIC_SESSIONS (true)
static bool disable_automatic_sessions = DEFAULT_DISABLE_AUTOMATIC_SESSIONS;

static bool prof_listener_thrd = false;
Expand Down
2 changes: 1 addition & 1 deletion source/libs/uti/msg_utilib.h
Original file line number Diff line number Diff line change
Expand Up @@ -140,7 +140,7 @@

#define MSG_UTI_MONITOR_DEFLINE_SF _MESSAGE(59120, _(SFN ": runs: %.2fr/s"))
#define MSG_UTI_MONITOR_DEFLINE_FFFFF _MESSAGE(59121, _(" out: %.2fm/s APT: %.4fs/m idle: %.2f%% wait: %.2f%% time: %.2fs"))
#define MSG_UTI_MONITOR_GDIEXT_FFFFFFFFFFFFI _MESSAGE(59122, _("EXECD (l:%.2f,j:%.2f,c:%.2f,p:%.2f,a:%.2f)/s GDI (a:%.2f,g:%.2f,m:%.2f,d:%.2f,c:%.2f,t:%.2f,p:%.2f)/s OTHER (ql:" sge_U32CFormat ")"))
#define MSG_UTI_MONITOR_GDIEXT_FFFFFFFFFFFFIII _MESSAGE(59122, _("EXECD (l:%.2f,j:%.2f,c:%.2f,p:%.2f,a:%.2f)/s GDI (a:%.2f,g:%.2f,m:%.2f,d:%.2f,c:%.2f,t:%.2f,p:%.2f)/s OTHER (ql:" sge_U32CFormat ",rql:" sge_U32CFormat ",wrql:" sge_U32CFormat ")"))
#define MSG_UTI_MONITOR_DISABLED _MESSAGE(59123, _("Monitor: disabled"))
#define MSG_UTI_MONITOR_COLON _MESSAGE(59124, _("Monitor:"))
#define MSG_UTI_MONITOR_OK _MESSAGE(59125, _("OK"))
Expand Down
7 changes: 5 additions & 2 deletions source/libs/uti/sge_monitor.cc
Original file line number Diff line number Diff line change
Expand Up @@ -685,15 +685,18 @@ static void ext_sch_output(dstring *message, void *monitoring_extension, double
static void ext_gdi_output(dstring *message, void *monitoring_extension, double time) {
auto *gdi_ext = (m_gdi_t *) monitoring_extension;

sge_dstring_sprintf_append(message, MSG_UTI_MONITOR_GDIEXT_FFFFFFFFFFFFI,
sge_dstring_sprintf_append(message, MSG_UTI_MONITOR_GDIEXT_FFFFFFFFFFFFIII,
gdi_ext->eload_count / time, gdi_ext->ejob_count / time,
gdi_ext->econf_count / time, gdi_ext->eproc_count / time,
gdi_ext->eack_count / time,
gdi_ext->gdi_add_count / time, gdi_ext->gdi_get_count / time,
gdi_ext->gdi_mod_count / time, gdi_ext->gdi_del_count / time,
gdi_ext->gdi_cp_count / time, gdi_ext->gdi_trig_count / time,
gdi_ext->gdi_perm_count / time,
sge_u32c(gdi_ext->queue_length));
sge_u32c(gdi_ext->queue_length),
sge_u32c(gdi_ext->rqueue_length),
sge_u32c(gdi_ext->wrqueue_length)
);
}

/****** uti/monitor/ext_lis_output() *******************************************
Expand Down
8 changes: 5 additions & 3 deletions source/libs/uti/sge_monitor.h
Original file line number Diff line number Diff line change
Expand Up @@ -288,8 +288,9 @@ typedef struct {
u_long32 eproc_count; /* counts the execd processor reports */
u_long32 eack_count; /* counts the execd acks */

u_long32 queue_length; //< main queue length (e.g. worker or reader queue depending on thread type)
u_long32 wqueue_length; //< waiting queue length (e.g. reader waiting queue)
u_long32 queue_length; //< main queue length (e.g. worker queue)
u_long32 rqueue_length; //< reader queue length (e.g. reader queue)
u_long32 wrqueue_length; //< waiting reader queue length (e.g. waiting reader queue)
} m_gdi_t;

#define MONITOR_GDI_ADD(monitor) if ((monitor->monitor_time > 0) && (monitor->ext_type == GDI_EXT)) ((m_gdi_t*)(monitor->ext_data))->gdi_add_count++
Expand All @@ -310,7 +311,8 @@ typedef struct {
#define MONITOR_EACK(monitor) if ((monitor->monitor_time > 0) && (monitor->ext_type == GDI_EXT)) ((m_gdi_t*)(monitor->ext_data))->eack_count++

#define MONITOR_SET_QLEN(monitor, qlen) if ((monitor) != nullptr && (monitor->monitor_time > 0) && (monitor->ext_type == GDI_EXT)) ((m_gdi_t*)(monitor->ext_data))->queue_length = (qlen)
#define MONITOR_SET_WQLEN(monitor, qlen) if ((monitor) != nullptr && (monitor->monitor_time > 0) && (monitor->ext_type == GDI_EXT)) ((m_gdi_t*)(monitor->ext_data))->wqueue_length = (qlen)
#define MONITOR_SET_RQLEN(monitor, qlen) if ((monitor) != nullptr && (monitor->monitor_time > 0) && (monitor->ext_type == GDI_EXT)) ((m_gdi_t*)(monitor->ext_data))->rqueue_length = (qlen)
#define MONITOR_SET_WRQLEN(monitor, qlen) if ((monitor) != nullptr && (monitor->monitor_time > 0) && (monitor->ext_type == GDI_EXT)) ((m_gdi_t*)(monitor->ext_data))->wrqueue_length = (qlen)

/* listener extension */
typedef struct {
Expand Down

0 comments on commit a914085

Please sign in to comment.