Skip to content

Commit

Permalink
- client and API: improve the way an app checks for the death of the …
Browse files Browse the repository at this point in the history
…client

    Old: heartbeat mechanism
    Problem: if the client is blocked for > 30 secs
        (e.g. because it takes a long time to write the state file,
        of because it's stopped in a debugger)
        then apps exit.
        This is bad is the app doesn't checkpoint and has been
        running for a long time.
    New: the client passes its PID to the app.
        The app periodically (10 sec) checks that the process still exists.
    Notes:
    - For backward compatibility (e.g. new API w/ old client,
        or vice versa) the client still sends heartbeats,
        and the API checks heartbeats if the client doesn't pass a PID.
    - The new mechanism works only if the client's PID isn't assigned
        to a new process within 10 secs of the client exiting.
        Windows 2000 reuses PIDs immediately, so check for Win2K
        and don't use this mechanism if so.

TODO: For Unix multithread apps,
    critical sections aren't currently being enforced.
    Need to fix this by masking signals.

svn path=/trunk/boinc/; revision=26147
  • Loading branch information
romw authored and Oliver Bock committed Mar 6, 2013
1 parent c4cc892 commit 4e8b4dd
Show file tree
Hide file tree
Showing 6 changed files with 51 additions and 3 deletions.
31 changes: 31 additions & 0 deletions checkin_notes
Original file line number Diff line number Diff line change
Expand Up @@ -4482,6 +4482,37 @@ David 21 Sept 2012
clientgui/
MainDocument.cpp

David 11 Oct 2012
- client and API: improve the way an app checks for the death of the client
Old: heartbeat mechanism
Problem: if the client is blocked for > 30 secs
(e.g. because it takes a long time to write the state file,
of because it's stopped in a debugger)
then apps exit.
This is bad is the app doesn't checkpoint and has been
running for a long time.
New: the client passes its PID to the app.
The app periodically (10 sec) checks that the process still exists.
Notes:
- For backward compatibility (e.g. new API w/ old client,
or vice versa) the client still sends heartbeats,
and the API checks heartbeats if the client doesn't pass a PID.
- The new mechanism works only if the client's PID isn't assigned
to a new process within 10 secs of the client exiting.
Windows 2000 reuses PIDs immediately, so check for Win2K
and don't use this mechanism if so.

TODO: For Unix multithread apps,
critical sections aren't currently being enforced.
Need to fix this by masking signals.

client/
hostinfo_win.cpp
app_start.cpp
lib/
app_ipc.cpp,h
proc_control.cpp

Charlie 15 Oct 2012
- MGR: We don't save Simple View's width & height since it's
window is not resizable, so don't try to read them back.
Expand Down
11 changes: 11 additions & 0 deletions client/app_start.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -218,6 +218,17 @@ void ACTIVE_TASK::init_app_init_data(APP_INIT_DATA& aid) {
relative_to_absolute("", aid.boinc_dir);
strcpy(aid.authenticator, wup->project->authenticator);
aid.slot = slot;
#ifdef _WIN32
if (strstr(gstate.hostinfo.os_name, "Windows 2000")) {
// Win2K immediately reuses PIDs, so can't use this mechanism
//
aid.client_pid = 0;
} else {
aid.client_pid = GetCurrentProcessId();
}
#else
aid.client_pid = getpid();
#endif
strcpy(aid.wu_name, wup->name);
strcpy(aid.result_name, result->name);
aid.user_total_credit = wup->project->user_total_credit;
Expand Down
3 changes: 1 addition & 2 deletions client/hostinfo_win.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -478,8 +478,7 @@ int get_os_information(
strcat(os_name, "Windows 2000");
}

if ( osvi.dwMajorVersion <= 4 )
{
if ( osvi.dwMajorVersion <= 4 ) {
strcat(os_name, "Windows NT");
}

Expand Down
5 changes: 5 additions & 0 deletions lib/app_ipc.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -100,6 +100,7 @@ void APP_INIT_DATA::copy(const APP_INIT_DATA& a) {
teamid = a.teamid;
hostid = a.hostid;
slot = a.slot;
client_pid = a.client_pid;
user_total_credit = a.user_total_credit;
user_expavg_credit = a.user_expavg_credit;
host_total_credit = a.host_total_credit;
Expand Down Expand Up @@ -192,6 +193,7 @@ int write_init_data_file(FILE* f, APP_INIT_DATA& ai) {
#endif
fprintf(f,
"<slot>%d</slot>\n"
"<client_pid>%d</client_pid>\n"
"<wu_cpu_time>%f</wu_cpu_time>\n"
"<starting_elapsed_time>%f</starting_elapsed_time>\n"
"<using_sandbox>%d</using_sandbox>\n"
Expand All @@ -213,6 +215,7 @@ int write_init_data_file(FILE* f, APP_INIT_DATA& ai) {
"<rsc_disk_bound>%f</rsc_disk_bound>\n"
"<computation_deadline>%f</computation_deadline>\n",
ai.slot,
ai.client_pid,
ai.wu_cpu_time,
ai.starting_elapsed_time,
ai.using_sandbox?1:0,
Expand Down Expand Up @@ -263,6 +266,7 @@ void APP_INIT_DATA::clear() {
strcpy(result_name, "");
strcpy(authenticator, "");
slot = 0;
client_pid = 0;
user_total_credit = 0;
user_expavg_credit = 0;
host_total_credit = 0;
Expand Down Expand Up @@ -367,6 +371,7 @@ int parse_init_data_file(FILE* f, APP_INIT_DATA& ai) {
if (xp.parse_int("shm_key", ai.shmem_seg_name)) continue;
#endif
if (xp.parse_int("slot", ai.slot)) continue;
if (xp.parse_int("client_pid", ai.client_pid)) continue;
if (xp.parse_double("user_total_credit", ai.user_total_credit)) continue;
if (xp.parse_double("user_expavg_credit", ai.user_expavg_credit)) continue;
if (xp.parse_double("host_total_credit", ai.host_total_credit)) continue;
Expand Down
1 change: 1 addition & 0 deletions lib/app_ipc.h
Original file line number Diff line number Diff line change
Expand Up @@ -167,6 +167,7 @@ struct APP_INIT_DATA {
char result_name[256];
char authenticator[256];
int slot;
int client_pid;
double user_total_credit;
double user_expavg_credit;
double host_total_credit;
Expand Down
3 changes: 2 additions & 1 deletion lib/proc_control.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -166,7 +166,8 @@ void kill_descendants() {
kill_all(descendants);
}
#else
// Same, but if child_pid is nonzero, give it a chance to exit gracefully on Unix
// Same, but if child_pid is nonzero,
// give it a chance to exit gracefully on Unix
//
void kill_descendants(int child_pid) {
vector<int> descendants;
Expand Down

0 comments on commit 4e8b4dd

Please sign in to comment.