From df0a28071b31af8ef1ef9702d34cea005e4aaed5 Mon Sep 17 00:00:00 2001 From: Ernst Bablick Date: Mon, 23 Sep 2024 10:02:05 +0200 Subject: [PATCH] BF: CS-262 reformat sge_bootstrap, sge_calendar_conf, sge_checkpoint man pages --- doc/markdown/man/man5/sge_bootstrap.md | 158 ++++++--------- doc/markdown/man/man5/sge_calendar_conf.md | 220 +++++++++------------ doc/markdown/man/man5/sge_checkpoint.md | 188 ++++++++---------- 3 files changed, 234 insertions(+), 332 deletions(-) diff --git a/doc/markdown/man/man5/sge_bootstrap.md b/doc/markdown/man/man5/sge_bootstrap.md index f4c5962a0..92c36704f 100644 --- a/doc/markdown/man/man5/sge_bootstrap.md +++ b/doc/markdown/man/man5/sge_bootstrap.md @@ -12,151 +12,117 @@ xxqs_name_sxx_bootstrap - xxQS_NAMExx bootstrap file # DESCRIPTION -*bootstrap* contains parameters that are needed for the startup of -xxQS_NAMExx components. It is created during the xxqs_name_sxx_qmaster -installation. Modifying *bootstrap* in a running system is not -supported. +*bootstrap* contains parameters that are needed for the startup of xxQS_NAMExx components. It is created during the +xxqs_name_sxx_qmaster installation. Modifying *bootstrap* in a running system is not supported. # FORMAT -The paragraphs that follow provide brief descriptions of the individual -parameters that compose the bootstrap configuration for a xxQS_NAMExx -cluster: +The paragraphs that follow provide brief descriptions of the individual parameters that compose the bootstrap +configuration for a xxQS_NAMExx cluster: -## **admin_user** +## *admin_user* -Administrative user account used by xxQS_NAMExx for all internal file -handling (status spooling, message logging, etc.). Can be used in cases -where the root account does not have the corresponding file access -permissions (e.g. on a shared file system without global root read/write -access). +Administrative user account used by xxQS_NAMExx for all internal file handling (status spooling, message logging, +etc.). Can be used in cases where the root account does not have the corresponding file access permissions +(e.g. on a shared file system without global root read/write access). -Being a parameter set at installation time changing **admin_user** in a -running system is not supported. Changing it manually on a shut-down -cluster is possible, but if access to the xxQS_NAMExx spooling area is +Being a parameter set at installation time changing *admin_user* in a running system is not supported. Changing it +manually on a shut-down cluster is possible, but if access to the xxQS_NAMExx spooling area is interrupted, this will result in unpredictable behavior. -The **admin_user** parameter has no default value, but instead it is -defined during the master installation procedure. +The *admin_user* parameter has no default value, but instead it is defined during the master installation procedure. -## **default_domain** +## *default_domain* -Only needed if your xxQS_NAMExx cluster covers hosts belonging to more -than a single DNS domain. In this case it can be used if your hostname -resolving yields both qualified and unqualified hostnames for the hosts -in one of the DNS domains. The value of **default_domain** is appended -to the unqualified hostname to define a fully qualified hostname. The -**default_domain** parameter will have no effect if **ignore_fqdn** is -set to "true". +Only needed if your xxQS_NAMExx cluster covers hosts belonging to more than a single DNS domain. In this case it can +be used if your hostname resolving yields both qualified and unqualified hostnames for the hosts in one of the DNS +domains. The value of *default_domain* is appended to the unqualified hostname to define a fully qualified hostname. +The *default_domain* parameter will have no effect if *ignore_fqdn* is set to *true*. -Being a parameter set at installation time changing **default_domain** -in a running system is not supported. The default for **default_domain** -is "none", in which case it will not be used. +Being a parameter set at installation time changing *default_domain* in a running system is not supported. The default +for *default_domain* is *none*, in which case it will not be used. -## **ignore_fqdn** +## *ignore_fqdn* -Ignore fully qualified domain name component of hostnames. Should be set -if all hosts belonging to a xxQS_NAMExx cluster are part of a single DNS -domain. It is switched on if set to either "true" or "1". Switching it -on may solve problems with load reports due to different hostname -resolutions across the cluster. +Ignore fully qualified domain name component of hostnames. Should be set if all hosts belonging to a xxQS_NAMExx +cluster are part of a single DNS domain. It is switched on if set to either *true* or *1*. Switching it on may solve +problems with load reports due to different hostname resolutions across the cluster. -Being a parameter set at installation time changing **ignore_fqdn** in a -running system is not supported. The default for **ignore_fqdn** is -"true". +Being a parameter set at installation time changing *ignore_fqdn* in a running system is not supported. The default +for *ignore_fqdn* is *true*. -## **spooling_method** +## *spooling_method* -Defines how *xxqs_name_sxx_qmaster*(8) writes its configuration and the -status information of a running cluster. +Defines how xxqs_name_sxx_qmaster(8) writes its configuration and the status information of a running cluster. The available spooling methods are *berkeleydb* and *classic*. -## **spooling_lib** +## *spooling_lib* -The name of a shared library containing the **spooling_method** to be -loaded at *xxqs_name_sxx_qmaster*(8) initialization time. The extension -characterizing a shared library (.so, .sl, .dylib etc.) is not contained -in **spooling_lib**. +The name of a shared library containing the *spooling_method* to be loaded at xxqs_name_sxx_qmaster(8) initialization +time. The extension characterizing a shared library (.so, .sl, .dylib etc.) is not contained in *spooling_lib*. -If **spooling_method** was set to *berkeleydb* during installation, -**spooling_lib** is set to *libspoolb*, if *classic* was chosen as -**spooling_method**, **spooling_lib** is set to *libspoolc*. +If *spooling_method* was set to *berkeleydb* during installation, *spooling_lib* is set to *libspoolb*, if *classic* +was chosen as *spooling_method*, *spooling_lib* is set to *libspoolc*. -Not all operating systems allow the dynamic loading of libraries. On -these platforms a certain spooling method (default: berkeleydb) is -compiled into the binaries and the parameter **spooling_lib** will be -ignored. +Not all operating systems allow the dynamic loading of libraries. On these platforms a certain spooling method +(default: berkeleydb) is compiled into the binaries and the parameter *spooling_lib* will be ignored. -## **spooling_params** +## *spooling_params* Defines parameters to the chosen spooling method. -Parameters that are needed to initialize the spooling framework, e.g. to -open database files or to connect to a certain database server. +Parameters that are needed to initialize the spooling framework, e.g. to open database files or to connect to a +certain database server. -The spooling parameters value for spooling method *berkeleydb* is -\[rpc_server:\]database directory, e.g. -/sge_local/default/spool/qmaster/spooldb for spooling to a local -filesystem, or myhost:sge for spooling over a Berkeley DB RPC server. +The spooling parameters value for spooling method *berkeleydb* is \[rpc_server:\]database directory, e.g. +/sge_local/default/spool/qmaster/spooldb for spooling to a local filesystem, or myhost:sge for spooling over a +Berkeley DB RPC server. -For spooling method *classic* the spooling parameters take the form -\;\, e.g. +For spooling method *classic* the spooling parameters take the form \;\, e.g. /sge/default/common;/sge/default/spool/qmaster -## **binary_path** +## *binary_path* -The directory path where the xxQS_NAMExx binaries reside. It is used -within xxQS_NAMExx components to locate and startup other xxQS_NAMExx -programs. +The directory path where the xxQS_NAMExx binaries reside. It is used within xxQS_NAMExx components to locate and +startup other xxQS_NAMExx programs. -The path name given here is searched for binaries as well as any -directory below with a directory name equal to the current operating -system architecture. Therefore, /usr/xxQS_NAME_Sxx/bin will work for all -architectures, if the corresponding binaries are located in -subdirectories named lx-amd64, lx-x86, sol-amd64, -sol-sparc etc. +The path name given here is searched for binaries as well as any directory below with a directory name equal to +the current operating system architecture. Therefore, /usr/xxQS_NAME_Sxx/bin will work for all architectures, if the +corresponding binaries are located in subdirectories named lx-amd64, lx-arm64, sol-amd64 etc. The default location for the binary path is \/bin -## **qmaster_spool_dir** +## *qmaster_spool_dir* -The location where the master spool directory resides. Only the -*xxqs_name_sxx_qmaster*(8) and *xxqs_name_sxx_shadowd*(8) need to have -access to this directory. The master spool directory - in particular the -job_scripts directory and the messages log file - may become quite large -depending on the size of the cluster and the number of jobs. Be sure to -allocate enough disk space and regularly clean off the log files, e.g. -via a *cron*(8) job. +The location where the master spool directory resides. Only the xxqs_name_sxx_qmaster(8) and xxqs_name_sxx_shadowd(8) +need to have access to this directory. The master spool directory - in particular the job_scripts directory and the +messages log file - may become quite large depending on the size of the cluster and the number of jobs. Be sure to +allocate enough disk space and regularly clean off the log files, e.g. via a cron(8) job. -Being a parameter set at installation time changing -**qmaster_spool_dir** in a running system is not supported. +Being a parameter set at installation time changing *qmaster_spool_dir* in a running system is not supported. -The default location for the master spool directory is -\/\/spool/qmaster. +The default location for the master spool directory is \/\/spool/qmaster. -## **security_mode** +## *security_mode* -The security mode defines the set of security features the installed -cluster is using. +The security mode defines the set of security features the installed cluster is using. -Possible security mode settings are none, afs, dce, kerberos, csp. (no -additional security, AFS, DCE, KERBEROS, CSP security model). +Possible security mode settings are none, afs, dce, kerberos, csp. (no additional security, AFS, DCE, KERBEROS, +CSP security model). -## **listener_threads** +## *listener_threads* The number of listener threads (defaults set by installation). -## **worker_threads** +## *worker_threads* The number of worker threads (defaults set by installation). -## **scheduler_threads** +## *scheduler_threads* -The number of scheduler threads (allowed: 0-1, default set by -installation: 1, off: 0). (see *qconf*(1) -kt/-at option) +The number of scheduler threads (allowed: 0-1, default set by installation: 1, off: 0). (see `qconf -kt/-at` option) # COPYRIGHT -See *xxqs_name_sxx_intro*(1) for a full statement of rights and -permissions. +See xxqs_name_sxx_intro(1) for a full statement of rights and permissions. diff --git a/doc/markdown/man/man5/sge_calendar_conf.md b/doc/markdown/man/man5/sge_calendar_conf.md index a050d7758..12d9550bf 100644 --- a/doc/markdown/man/man5/sge_calendar_conf.md +++ b/doc/markdown/man/man5/sge_calendar_conf.md @@ -12,208 +12,176 @@ xxqs_name_sxx_calendar_conf - xxQS_NAMExx calendar configuration file format # DESCRIPTION -*calendar_conf* reflects the format of the xxQS_NAMExx calendar -configuration. The definition of calendars is used to specify "on duty" -and "off duty" time periods for xxQS_NAMExx queues on a time of day, day -of week or day of year basis. Various calendars can be implemented and -the appropriate calendar definition for a certain class of jobs can be +*calendar_conf* reflects the format of the xxQS_NAMExx calendar configuration. The definition of calendars is used to +specify "on duty" and "off duty" time periods for xxQS_NAMExx queues on a time of day, day of week or day of year +basis. Various calendars can be implemented and the appropriate calendar definition for a certain class of jobs can be attached to a queue. -*calendar_conf* entries can be added, modified and displayed with the -*-Acal*, *-acal*, *-Mcal*, *-mcal*, *-scal* and *-scall* options to -*qconf*(1) or with the calendar configuration dialog of the graphical -user interface *qmon*(1). +*calendar_conf* entries can be added, modified and displayed with the `-Acal`, `-acal`, `-Mcal`, `-mcal`, `-scal` and +`-scall` options to `qconf`. -Note, xxQS_NAMExx allows backslashes (\\) be used to escape newline -(\\newline) characters. The backslash and the newline are replaced with -a space (" ") character before any interpretation. +Note, xxQS_NAMExx allows backslashes (\\) be used to escape newline (\\newline) characters. The backslash and the +newline are replaced with a space (" ") character before any interpretation. # FORMAT -## **calendar_name** +## *calendar_name* -The name of the calendar to be used when attaching it to queues or when -administering the calendar definition. See *calendar_name* in -*sge_types*(1) for a precise definition of valid calendar names. +The name of the calendar to be used when attaching it to queues or when administering the calendar definition. See +*calendar_name* in sge_types(1) for a precise definition of valid calendar names. -## **year** +## *year* -The queue status definition on a day of the year basis. This field -generally will specify on which days of a year (and optionally at which -times on those days) a queue, to which the calendar is attached, will -change to a certain state. The syntax of the **year** field is defined -as follows: +The queue status definition on a day of the year basis. This field generally will specify on which days of a year (and +optionally at which times on those days) a queue, to which the calendar is attached, will change to a certain state. +The syntax of the *year* field is defined as follows: - year:= + year := { NONE | year_day_range_list=daytime_range_list[=state] | year_day_range_list=[daytime_range_list=]state - | state} + | state + } Where - NONE means, no definition is made on the year basis -- if a definition is made on the year basis, at least one of - **year_day_range_list**, **daytime_range_list** and **state** always - have to be present, +- if a definition is made on the year basis, at least one of *year_day_range_list*, *daytime_range_list* and *state* + always have to be present, -- all day long is assumed if **daytime_range_list** is omitted, +- all day long is assumed if *daytime_range_list* is omitted, -- switching the queue to "off" (i.e. disabling it) is assumed if - **state** is omitted, +- switching the queue to "off" (i.e. disabling it) is assumed if *state* is omitted, -- the queue is assumed to be enabled for days neither referenced - implicitly (by omitting the **year_day_range_list**) nor explicitly +- the queue is assumed to be enabled for days neither referenced implicitly (by omitting the *year_day_range_list*) + nor explicitly and the syntactical components are defined as follows: - year_day_range_list := {yearday-yearday|yearday},... - daytime_range_list := hour[:minute][:second]- - hour[:minute][:second],... - state := {on|off|suspended} - year_day := month_day.month.year - month_day := {1|2|...|31} - month := {jan|feb|...|dec|1|2|...|12} - year := {1970|1971|...|2037} + year_day_range_list := {yearday-yearday|yearday},... + daytime_range_list := hour[:minute][:second]-hour[:minute][:second],... + state := {on|off|suspended} + year_day := month_day.month.year + month_day := {1|2|...|31} + month := {jan|feb|...|dec|1|2|...|12} + year := {1970|1971|...|2037} -## **week** +## *week* -The queue status definition on a day of the week basis. This field -generally will specify on which days of a week (and optionally at which -times on those days) a queue, to which the calendar is attached, will -change to a certain state. The syntax of the **week** field is defined -as follows: +The queue status definition on a day of the week basis. This field generally will specify on which days of a week +(and optionally at which times on those days) a queue, to which the calendar is attached, will change to a certain +state. The syntax of the *week* field is defined as follows: week:= - { NONE + { NONE | week_day_range_list[=daytime_range_list][=state] | [week_day_range_list=]daytime_range_list[=state] - | [week_day_range_list=][daytime_range_list=]state} ... + | [week_day_range_list=][daytime_range_list=]state + } ... Where - NONE means, no definition is made on the week basis -- if a definition is made on the week basis, at least one of - **week_day_range_list**, **daytime_range_list** and **state** always - have to be present, +- if a definition is made on the week basis, at least one of *week_day_range_list*, *daytime_range_list* and + *state* always have to be present, -- every day in the week is assumed if **week_day_range_list** is - omitted, +- every day in the week is assumed if *week_day_range_list* is omitted, -- syntax and semantics of **daytime_range_list** and **state** are - identical to the definition given for the year field above, +- syntax and semantics of *daytime_range_list* and *state* are identical to the definition given for the year + field above, -- the queue is assumed to be enabled for days neither referenced - implicitly (by omitting the **week_day_range_list**) nor explicitly +- the queue is assumed to be enabled for days neither referenced implicitly (by omitting the *week_day_range_list*) + nor explicitly -and where **week_day_range_list** is defined as +and where *week_day_range_list* is defined as week_day_range_list := {weekday-weekday|weekday},... - week_day := {mon|tue|wed|thu|fri|sat|sun} + week_day := {mon|tue|wed|thu|fri|sat|sun} -with week_day ranges the week_day identifiers must be different. +with *week_day* ranges the *week_day* identifiers must be different. # SEMANTICS -Successive entries to the **year** and **week** fields (separated by -blanks) are combined in compliance with the following rule: +Successive entries to the *year* and *week* fields (separated by blanks) are combined in compliance with the +following rule: -- "off"-areas are overridden by overlapping "on"- and - "suspended"-areas and "suspended"-areas are overridden by - "on"-areas. +* *off*-areas are overridden by overlapping *on* and *suspended*-areas +* and *suspended*-areas are overridden by *on*-areas. -Hence an entry of the form +Hence, an entry of the form week 12-18 tue=13-17=on -means that queues referencing the corresponding calendar are disabled -the entire week from 12.00-18.00 with the exception of Tuesday between -13.00-17.00 where the queues are available. +means that queues referencing the corresponding calendar are disabled the entire week from 12.00-18.00 with the +exception of Tuesday between 13.00-17.00 where the queues are available. -- Area overriding occurs only within a year/week basis. If a year - entry exists for a day then only the year calendar is taken into - account and no area overriding is done with a possibly conflicting - week area. +* Area overriding occurs only within a year/week basis. If a year entry exists for a day then only the year + calendar is taken into account and no area overriding is done with a possibly conflicting week area. -- the second time specification in a daytime_range_list may be before - the first one and treated as expected. Thus an entry of the form - - +* the second time specification in a daytime_range_list may be before the first one and treated as expected. + Thus, an entry of the form year 12.03.2004=12-11=off -causes the queue(s) be disabled 12.03.2004 from 00:00:00 - 10:59:59 and -12:00:00 - 23:59:59. +causes the queue(s) be disabled 12.03.2004 from 00:00:00 - 10:59:59 and 12:00:00 - 23:59:59. # EXAMPLES -(The following examples are contained in the directory -$xxQS_NAME_Sxx_ROOT/util/resources/calendars). +(The following examples are contained in the directory $xxQS_NAME_Sxx_ROOT/util/resources/calendars). -- Night, weekend and public holiday calendar: +1) Night, weekend and public holiday calendar: -On public holidays "night" queues are explicitly enabled. On working -days queues are disabled between 6.00 and 20.00. Saturday and Sunday are -implicitly handled as enabled times: + On public holidays "night" queues are explicitly enabled. On working days queues are disabled between 6.00 and 20.00. + Saturday and Sunday are implicitly handled as enabled times: - calendar_name night - year 1.1.1999,6.1.1999,28.3.1999,30.3.1999- - 31.3.1999,18.5.1999-19.5.1999,3.10.1999,25.12.1999,26 - .12.1999=on - week mon-fri=6-20 + calendar_name night + year 1.1.1999,6.1.1999,28.3.1999,30.3.1999-31.3.1999,18.5.1999-19.5.1999, \ + 3.10.1999,25.12.1999,26.12.1999=on + week mon-fri=6-20 -- Day calendar: +2) Day calendar: -On public holidays "day"-queues are disabled. On working days such -queues are closed during the night between 20.00 and 6.00, i.e. the -queues are also closed on Monday from 0.00 to 6.00 and on Friday from -20.00 to 24.00. On Saturday and Sunday the queues are disabled. + On public holidays "day"-queues are disabled. On working days such queues are closed during the night between + 20.00 and 6.00, i.e. the queues are also closed on Monday from 0.00 to 6.00 and on Friday from 20.00 to 24.00. + On Saturday and Sunday the queues are disabled. - calendar_name day - year 1.1.1999,6.1.1999,28.3.1999,30.3.1999- - 31.3.1999,18.5.1999-19.5.1999,3.10.1999,25.12.1999,26 - .12.1999 - week mon-fri=20-6 sat-sun + calendar_name day + year 1.1.1999,6.1.1999,28.3.1999,30.3.1999-3.1999,18.5.1999-19.5.1999, \ + 3.10.1999,25.12.1999,26.1999 + week mon-fri=20-6 sat-sun -- Night, weekend and public holiday calendar with suspension: + Night, weekend and public holiday calendar with suspension: -Essentially the same scenario as the first example but queues are -suspended instead of switching them "off". + Essentially the same scenario as the first example but queues are suspended instead of switching them *off*. - calendar_name night_s - year 1.1.1999,6.1.1999,28.3.1999,30.3.1999- - 31.3.1999,18.5.1999-19.5.1999,3.10.1999,25.12.1999,26 - .12.1999=on - week mon-fri=6-20=suspended + calendar_name night_s + year 1.1.1999,6.1.1999,28.3.1999,30.3.1999-3.1999,18.5.1999-19.5.1999, \ + 3.10.1999,25.12.1999,26.1999=on + week mon-fri=6-20=suspended -- Day calendar with suspension: +3) Day calendar with suspension: -Essentially the same scenario as the second example but queues are -suspended instead of switching them "off". + Essentially the same scenario as the second example but queues are suspended instead of switching them *off*. - calendar_name day_s - year 1.1.1999,6.1.1999,28.3.1999,30.3.1999- - 31.3.1999,18.5.1999-19.5.1999,3.10.1999,25.12.1999,26 - .12.1999=suspended - week mon-fri=20-6=suspended sat-sun=suspended + calendar_name day_s + year 1.1.1999,6.1.1999,28.3.1999,30.3.1999-3.1999,18.5.1999-19.5.1999, \ + 3.10.1999,25.12.1999,26.1999=suspended + week mon-fri=20-6=suspended sat-sun=suspended -- Weekend calendar with suspension, ignoring public holidays: + Weekend calendar with suspension, ignoring public holidays: -Settings are only done on the week basis, no settings on the year basis -(keyword "NONE"). + Settings are only done on the week basis, no settings on the year basis (keyword *NONE*). - calendar_name weekend_s - year NONE - week sat-sun=suspended + calendar_name weekend_s + year NONE + week sat-sun=suspended # SEE ALSO -*xxqs_name_sxx_intro*(1), *xxqs_name_sxx\_\_types*(1), *qconf*(1), -*xxqs_name_sxx_queue_conf*(5). +xxqs_name_sxx_intro(1), xxqs_name_sxx_types(1), qconf(1), xxqs_name_sxx_queue_conf(5). # COPYRIGHT -See *xxqs_name_sxx_intro*(1) for a full statement of rights and -permissions. +See xxqs_name_sxx_intro(1) for a full statement of rights and permissions. diff --git a/doc/markdown/man/man5/sge_checkpoint.md b/doc/markdown/man/man5/sge_checkpoint.md index 9c23ada6e..21022c006 100644 --- a/doc/markdown/man/man5/sge_checkpoint.md +++ b/doc/markdown/man/man5/sge_checkpoint.md @@ -8,162 +8,130 @@ date: __DATE__ # NAME -xxqs_name_sxx_checkpoint - xxQS_NAMExx checkpointing environment configuration file -format +xxqs_name_sxx_checkpoint - xxQS_NAMExx checkpointing environment configuration file format # DESCRIPTION -Checkpointing is a facility to save the complete status of an executing -program or job and to restore and restart from this so called checkpoint -at a later point of time if the original program or job was halted, e.g. -through a system crash. - -xxQS_NAMExx provides various levels of checkpointing support (see -*xxqs_name_sxx_ckpt*(1)). The checkpointing environment described here -is a means to configure the different types of checkpointing in use for -your xxQS_NAMExx cluster or parts thereof. For that purpose you can -define the operations which have to be executed in initiating a -checkpoint generation, a migration of a checkpoint to another host or a -restart of a checkpointed application as well as the list of queues -which are eligible for a checkpointing method. - -Supporting different operating systems may easily force xxQS_NAMExx to -introduce operating system dependencies for the configuration of the -checkpointing configuration file and updates of the supported operating -system versions may lead to frequently changing implementation details. -Please refer to the \/ckpt directory for more +Checkpointing is a facility to save the complete status of an executing program or job and to restore and restart from +this so called checkpoint at a later point of time if the original program or job was halted, e.g. through a system +crash. + +xxQS_NAMExx provides various levels of checkpointing support (see xxqs_name_sxx_ckpt(1)). The checkpointing environment +described here is a means to configure the different types of checkpointing in use for your xxQS_NAMExx cluster or +parts thereof. For that purpose you can define the operations which have to be executed in initiating a checkpoint +generation, a migration of a checkpoint to another host or a restart of a checkpointed application as well as the +list of queues which are eligible for a checkpointing method. + +Supporting different operating systems may easily force xxQS_NAMExx to introduce operating system dependencies for +the configuration of the checkpointing configuration file and updates of the supported operating system versions may +lead to frequently changing implementation details. Please refer to the \/ckpt directory for more information. -Please use the *-ackpt*, *-dckpt*, *-mckpt* or *-sckpt* options to the -*qconf*(1) command to manipulate checkpointing environments from the -command-line. +Please use the `-ackpt`, `-dckpt`, `-mckpt` or `-sckpt` options to the qconf(1) command to manipulate checkpointing +environments from the command-line. -Note, xxQS_NAMExx allows backslashes (\\) be used to escape newline -(\\newline) characters. The backslash and the newline are replaced with -a space (" ") character before any interpretation. +Note, xxQS_NAMExx allows backslashes (\\) be used to escape newline (\\newline) characters. The backslash and the +newline are replaced with a space (" ") character before any interpretation. # FORMAT The format of a *checkpoint* file is defined as follows: -## **ckpt_name** +## *ckpt_name* -The name of the checkpointing environment as defined for *ckpt_name* in -*sge_types*(1). +The name of the checkpointing environment as defined for *ckpt_name* in xxqs_name_sxx_types(1). -*qsub*(1) **-ckpt** switch or for the *qconf*(1) options mentioned -above. +qsub(1) `-ckpt` switch or for the qconf(1) options mentioned above. -## **interface** +## *interface* -The type of checkpointing to be used. Currently, the following types are -valid: +The type of checkpointing to be used. Currently, the following types are valid: -hibernator -The Hibernator kernel level checkpointing is interfaced. +* hibernator + The Hibernator kernel level checkpointing is interfaced. -cpr -The SGI kernel level checkpointing is used. +* cpr + The SGI kernel level checkpointing is used. -cray-ckpt -The Cray kernel level checkpointing is assumed. +* cray-ckpt + The Cray kernel level checkpointing is assumed. -transparent -xxQS_NAMExx assumes that the jobs submitted with reference to this -checkpointing interface use a checkpointing library such as provided by -the public domain package *Condor*. +* transparent + xxQS_NAMExx assumes that the jobs submitted with reference to this checkpointing interface use a checkpointing + library such as provided by the public domain package *Condor*. -userdefined -xxQS_NAMExx assumes that the jobs submitted with reference to this -checkpointing interface perform their private checkpointing method. +* userdefined + xxQS_NAMExx assumes that the jobs submitted with reference to this checkpointing interface perform their + private checkpointing method. -application-level -Uses all of the interface commands configured in the checkpointing -object like in the case of one of the kernel level checkpointing -interfaces (*cpr*, *cray-ckpt*, etc.) except for the **restart_command** -(see below), which is not used (even if it is configured) but the job -script is invoked in case of a restart instead. +* application-level + Uses all the interface commands configured in the checkpointing object like in the case of one of the kernel + level checkpointing interfaces (*cpr*, *cray-ckpt*, etc.) except for the *restart_command* (see below), which is + not used (even if it is configured) but the job script is invoked in case of a restart instead. -## **ckpt_command** +## *ckpt_command* -A command-line type command string to be executed by xxQS_NAMExx in -order to initiate a checkpoint. +A command-line type command string to be executed by xxQS_NAMExx in order to initiate a checkpoint. -## **migr_command** +## *migr_command* -A command-line type command string to be executed by xxQS_NAMExx during -a migration of a checkpointing job from one host to another. +A command-line type command string to be executed by xxQS_NAMExx during a migration of a checkpointing job from +one host to another. -## **restart_command** +## *restart_command* -A command-line type command string to be executed by xxQS_NAMExx when -restarting a previously checkpointed application. +A command-line type command string to be executed by xxQS_NAMExx when restarting a previously checkpointed application. -## **clean_command** +## *clean_command* -A command-line type command string to be executed by xxQS_NAMExx in -order to cleanup after a checkpointed application has finished. +A command-line type command string to be executed by xxQS_NAMExx in order to cleanup after a checkpointed +application has finished. -## **ckpt_dir** +## *ckpt_dir* -A file system location to which checkpoints of potentially considerable -size should be stored. +A file system location to which checkpoints of potentially considerable size should be stored. -## **ckpt_signal** +## *ckpt_signal* -A Unix signal to be sent to a job by xxQS_NAMExx to initiate a -checkpoint generation. The value for this field can either be a symbolic -name from the list produced by the *-l* option of the *kill*(1) command -or an integer number which must be a valid signal on the systems used -for checkpointing. +A Unix signal to be sent to a job by xxQS_NAMExx to initiate a checkpoint generation. The value for this field can +either be a symbolic name from the list produced by the `-l` option of the kill(1) command or an integer number +which must be a valid signal on the systems used for checkpointing. -## **when** +## *when* -The points of time when checkpoints are expected to be generated. Valid -values for this parameter are composed by the letters *s*, *m*, *x* and -*r* and any combinations thereof without any separating character in -between. The same letters are allowed for the *-c* option of the -*qsub*(1) command which will overwrite the definitions in the used -checkpointing environment. The meaning of the letters is defined as +The points of time when checkpoints are expected to be generated. Valid values for this parameter are composed +by the letters *s*, *m*, *x* and *r* and any combinations thereof without any separating character in +between. The same letters are allowed for the *-c* option of the qsub(1) command which will overwrite the definitions +in the used checkpointing environment. The meaning of the letters is defined as follows: -19. A job is checkpointed, aborted and if possible migrated if the - corresponding *xxqs_name_sxx_execd*(8) is shut down on the job's - machine. +* *s* - A job is checkpointed, aborted and if possible migrated if the corresponding xxqs_name_sxx_execd(8) is + shut down on the job's machine. -20. Checkpoints are generated periodically at the *min_cpu_interval* - interval defined by the queue (see *xxqs_name_sxx_queue_conf*(5)) in which a job - executes. +* *m* - Checkpoints are generated periodically at the *min_cpu_interval* interval defined by the queue (see + xxqs_name_sxx_queue_conf(5)) in which a job executes. -21. A job is checkpointed, aborted and if possible migrated as soon as - the job gets suspended (manually as well as automatically). +* *x* - A job is checkpointed, aborted and if possible migrated as soon as the job gets suspended (manually as + well as automatically). -22. A job will be rescheduled (not checkpointed) when the host on which - the job currently runs went into unknown state and the time interval - *reschedule_unknown* (see *xxqs_name_sxx_conf*(5)) defined in the - global/local cluster configuration will be exceeded. +* *r* - A job will be rescheduled (not checkpointed) when the host on which the job currently runs went into + unknown state and the time interval *reschedule_unknown* (see xxqs_name_sxx_conf(5)) defined in the global/local + cluster configuration will be exceeded. # RESTRICTIONS -**Note**, that the functionality of any checkpointing, migration or -restart procedures provided by default with the xxQS_NAMExx distribution -as well as the way how they are invoked in the *ckpt_command*, -*migr_command* or *restart_command* parameters of any default -checkpointing environments should not be changed or otherwise the -functionality remains the full responsibility of the administrator -configuring the checkpointing environment. xxQS_NAMExx will just invoke -these procedures and evaluate their exit status. If the procedures do -not perform their tasks properly or are not invoked in a proper fashion, -the checkpointing mechanism may behave unexpectedly, xxQS_NAMExx has no -means to detect this. +Note, that the functionality of any checkpointing, migration or restart procedures provided by default with the +xxQS_NAMExx distribution as well as the way how they are invoked in the *ckpt_command*, *migr_command* or +*restart_command* parameters of any default checkpointing environments should not be changed or otherwise the +functionality remains the full responsibility of the administrator configuring the checkpointing environment. +xxQS_NAMExx will just invoke these procedures and evaluate their exit status. If the procedures do not perform their +tasks properly or are not invoked in a proper fashion, the checkpointing mechanism may behave unexpectedly, +xxQS_NAMExx has no means to detect this. # SEE ALSO -*xxqs_name_sxx_intro*(1), *xxqs_name_sxx_ckpt*(1), -*xxqs_name_sxx\_\_types*(1), *qconf*(1), *qmod*(1), *qsub*(1), -*xxqs_name_sxx_execd*(8). +xxqs_name_sxx_intro(1), xxqs_name_sxx_ckpt(1), xxqs_name_sxx_types(1), qconf(1), qmod(1), qsub(1), xxqs_name_sxx_execd(8). # COPYRIGHT -See *xxqs_name_sxx_intro*(1) for a full statement of rights and -permissions. +See xxqs_name_sxx_intro(1) for a full statement of rights and permissions.