Add timed wait during UNLOAD while the model becomes UNAVAILABLE in SageMaker #5423

nikhil-sk · 2023-02-27T10:05:48Z

This PR updates the behavior of SageMaker UNLOAD function to use the repository index and verify that the model has been completely UNLOADED to the best of Triton's ability i.e. the model state is UNAVAILABLE and the reason is unloaded.

This (at least in case of python backend) ensures that the function returns only after the associated python workers for the models have been killed.

rohithkrn · 2023-02-27T20:11:06Z

src/sagemaker_server.cc

+  while (!is_model_unavailable && unload_time_in_secs < UNLOAD_TIMEOUT_SECS_) {
+    is_model_unavailable = SageMakerMMEUnloadModelCheckStatus(model_name);
+    sleep(1);
+    unload_time_in_secs += 1;


I believe this does not account for time taken for this SageMakerMMEUnloadModelCheckStatus(model_name) function call. Is it too small to be neglected?

That's correct, in SM-Triton, there's one model repo per model, so the model repository index is expected to contain only one model (except in case of ensembles, where it maybe approx ~5-8), so the call is expected to return within few milliseconds.

IMHO, we can measure time for that as well and report a more accurate number. It may not matter much in normal cases but if something weird happens in that function, we will be logging incorrect unload time.

Related issue: sleep is not exact, either.

Rather than doing unload_time_in_secs += 1, we should be measuring elapsed time since start of loop after each sleep. This would solve both problems.

Thank you for comments, I've addressed this by using elapsed time calculation.

so the model repository index is expected to contain only one model.

Note that index API poll all registered model repos, so the performance is actually proportioned to all "visible" models. Although that is probably still neglectable with respect to the unload time.

davidthomas426 · 2023-02-27T23:02:10Z

src/sagemaker_server.cc

@@ -628,6 +628,68 @@ SagemakerAPIServer::SageMakerMMEHandleInfer(
  }
 }

+bool
+SagemakerAPIServer::SageMakerMMEUnloadModelCheckStatus(const char* model_name)


I don't love the name if it just returns a bool. Hard to tell without looking at the code what the bool represents. Does true mean it is unloaded? Does true mean the model is available, so not yet unloaded? I have no idea since I have not read the implementation code yet.

Makes sense, I have changed the function name, and am no longer using a return type of bool

davidthomas426 · 2023-02-27T23:36:35Z

src/sagemaker_server.cc

+    server_model_index_json.IndexAsObject(id, &index_json);
+
+    index_json.MemberAsString("name", &name, &name_len);
+    index_json.MemberAsString("version", &version, &version_len);


Do you need to match on a particular version? Or with sagemaker API, is there only ever one version for a given model name?

That's correct, we expect only one version of for a model name

davidthomas426 · 2023-02-27T23:39:23Z

src/sagemaker_server.cc

+    index_json.MemberAsString("name", &name, &name_len);
+    index_json.MemberAsString("version", &version, &version_len);
+    index_json.MemberAsString("state", &state, &state_len);
+    index_json.MemberAsString("reason", &reason, &reason_len);


Are we guaranteed that these calls will succeed? We're ignoring a return value that could be success or error, according to https://github.com/triton-inference-server/common/blob/main/include/triton/common/triton_json.h#L799-L840

Good point - I've updated the return type to an error type instead, that will be logged in case of failure to parse.

davidthomas426 · 2023-02-27T23:54:15Z

src/sagemaker_server.cc

+  for Triton unload.*/
+  while (!is_model_unavailable && unload_time_in_secs < UNLOAD_TIMEOUT_SECS_) {
+    is_model_unavailable = SageMakerMMEUnloadModelCheckStatus(model_name);
+    sleep(1);


The sleep amount could also be a constant, like UNLOAD_TIMEOUT_SECS_ is. Could also use std::this_thread::sleep_for(std::chrono::milliseconds(UNLOAD_SLEEP_MILLISECONDS_)); instead of sleep to get more granularity, if desired (especially if you frequently need more exactly one sleep but it's usually done much more quickly than one second).

Thanks, used the std::this_thread::sleep_for( std::chrono::milliseconds(UNLOAD_SLEEP_MILLISECONDS_)) to sleep for a shorter duration

davidthomas426 · 2023-02-27T23:55:18Z

src/sagemaker_server.cc

+  for Triton unload.*/
+  while (!is_model_unavailable && unload_time_in_secs < UNLOAD_TIMEOUT_SECS_) {
+    is_model_unavailable = SageMakerMMEUnloadModelCheckStatus(model_name);
+    sleep(1);


Is it ok to block the thread with sleep like this? Or do we need to do some kind of shenanigans with event loop to check again after a timeout without blocking the thread?

Currently, this seems to be the straight-forward way to block the thread corresponding to the evhtp request thread...since SM unload requests can wait a max of 350 seconds, I believe this is alright. On testing, we don't see other model load/unload requests being affected due to this sleep.

davidthomas426 · 2023-02-28T17:20:51Z

src/sagemaker_server.cc

  }

+  std::lock_guard<std::mutex> lock(mutex_);


Just a comment, but it can be nice to name a mutex based on what it is intended to protect. models_list_mutex_ would make it clearer what it is needed for.

Thanks, done

davidthomas426 · 2023-02-28T18:13:17Z

src/sagemaker_server.cc

@@ -725,8 +734,8 @@ SagemakerAPIServer::SageMakerMMEUnloadModel(

  /*Note: Model status check is repo-specific and therefore must be run before
   * unregistering the repo, else the model information is lost*/
-  bool is_model_unavailable = false;
-  uint32_t unload_time_in_secs = 0;
+  bool* is_model_unavailable = new bool(false);


There's no need to use new here. Instead, this line could just be

bool is_model_unavailable = false;

and below, you would call

unload_err = SageMakerMMECheckUnloadedModelIsUnavailable( model_name, &is_model_unavailable);

Note that we passed the address of is_model_unavailable. So, we're passing a pointer to the variable on the stack. This is fine, as long as the pointer does not live past the end of the function that declared the variable.

Thanks, done

davidthomas426 · 2023-02-28T18:26:52Z

src/sagemaker_server.cc

+      unload_time_in_secs =
+          std::chrono::duration_cast<std::chrono::milliseconds>(
+              end_time - start_time)
+              .count() /
+          1000.0;


Here, you should use duration_cast to seconds, and not explicitly divide by 1000.0:

unload_time_in_secs = std::chrono::duration_cast<std::chrono::seconds>( end_time - start_time).count();

Note that in both cases (existing code and my suggested version), it is truncating rather than rounding. For this use case, that is what is desired, since we ultimately want to know once we've passed the timeout threshold.

Yet another way to write this would be to store timeout and elasped time variables as std::chrono::duration variables rather than int32_t variables, which would then automatically do the duration_cast for you when it's not lossy. See https://en.cppreference.com/w/cpp/chrono/duration/duration_cast. Up to personal taste, though using more strong typing can be helpful for avoiding bugs with unit conversions.

Thanks, done - my original intention with /1000 was to report the approx wait down to milliseconds, but that is unnecessary. So I've used cast to seconds. The duration.count() method returns the count as type int64_t. I might consider changing this in a future update, thanks for the suggestion to to use chrono::duration type when declaring the timeout values...

davidthomas426 · 2023-02-28T20:30:40Z

src/sagemaker_server.cc

  } else {
    evhtp_send_reply(req, EVHTP_RES_OK);
  }

+  TRITONSERVER_ErrorDelete(unload_err);


I think this is a segfault waiting to happen, since you're returning nullptr at the end of the TRITONSERVER_ServerUnregisterModelRepository.

TRITONSERVER_ErrorDelete does unconditional delete on the argument: https://github.com/triton-inference-server/core/blob/main/src/tritonserver.cc#L712-L717

I think delete on nullptr will not do any harm, but I see the unload_err is reused for multiple API calls and there is chance of memory leak, i.e. unload_err may be set in line 743 and the pointer is overwritten in line 779

You're right, sorry. delete on nullptr is actually fine. From https://en.cppreference.com/w/cpp/language/delete:

If expression evaluates to a null pointer value, no destructors are called, and the deallocation function may or may not be called (it's unspecified), but the default deallocation functions are guaranteed to do nothing when passed a null pointer.

Thank you, Updated the code to not re-use the unload_err pointer

GuanLuo · 2023-03-01T00:12:05Z

src/sagemaker_server.cc

  } else {
    evhtp_send_reply(req, EVHTP_RES_OK);
  }

+  TRITONSERVER_ErrorDelete(unload_err);


I think delete on nullptr will not do any harm, but I see the unload_err is reused for multiple API calls and there is chance of memory leak, i.e. unload_err may be set in line 743 and the pointer is overwritten in line 779

src/sagemaker_server.cc

…ageMaker

GuanLuo

minor comment

src/sagemaker_server.cc

davidthomas426 · 2023-03-07T00:26:33Z

src/sagemaker_server.cc

-  const char* buffer;
-  size_t byte_size;
+    LOG_ERROR
+        << "Error when unloading SagMaker Model with dependents for model: "


typo: SagMaker

Thanks, fixed

davidthomas426 · 2023-03-07T00:28:20Z

src/sagemaker_server.cc

+  TRITONSERVER_ServerModelIndex(
+      server_.get(), ready_flag, &server_model_index_message_);
+
+  std::shared_ptr<TRITONSERVER_Message> managed_msg(


Generally I'd recommend using unique_ptr rather than shared_ptr unless there's a specific reason shared_ptr is needed (and there's not, here).

My concern was whether this is required due to multiple evhtp threads calling on the unload method. I can check more on this and modify at a later point in another PR, noted it...

@GuanLuo do you have any suggestion?

The returned message is going to be unique to the caller, so there is no share of ownership here. unique_ptr will be better, except that syntax-wise it is a bit messy to attach custom deleter (in C++11).

davidthomas426 · 2023-03-07T00:35:14Z

src/sagemaker_server.cc

-  json_buffer_.Clear();
-  request_parameters.Write(&json_buffer_);
+    TRITONSERVER_ErrorDelete(unload_err);
+  }


Isn't it a problem that this block falls through to the rest of the function?

Thanks, this was addressed as part of the previous update, so now the function should return early

GuanLuo

Will kick off a CI for sanity check

…ageMaker (triton-inference-server#5423) * Add timed wait during UNLOAD while the model becomes UNAVAILABLE in SageMaker * Directly use C API to UNLOAD model in SM * Address comments and bug fixes * Add logging for model server index * Change MME model repo * Address comments and use chrono seconds, don't repeat error assignment * Address minor comments * Fix typo in log * Update minor comment

rohithkrn reviewed Feb 27, 2023

View reviewed changes

davidthomas426 reviewed Feb 27, 2023

View reviewed changes

davidthomas426 reviewed Feb 28, 2023

View reviewed changes

davidthomas426 suggested changes Feb 28, 2023

View reviewed changes

GuanLuo reviewed Mar 1, 2023

View reviewed changes

nikhil-sk added 6 commits March 6, 2023 21:16

Add timed wait during UNLOAD while the model becomes UNAVAILABLE in S…

a0f05be

…ageMaker

Directly use C API to UNLOAD model in SM

0780432

Address comments and bug fixes

cdccd1c

Add logging for model server index

1687f04

Change MME model repo

b07bff9

Address comments and use chrono seconds, don't repeat error assignment

f1daba0

nikhil-sk force-pushed the sagemaker_unload_fix branch from b0970b6 to f1daba0 Compare March 6, 2023 21:16

GuanLuo reviewed Mar 7, 2023

View reviewed changes

src/sagemaker_server.cc Outdated Show resolved Hide resolved

src/sagemaker_server.cc Show resolved Hide resolved

src/sagemaker_server.cc Outdated Show resolved Hide resolved

Address minor comments

137fd00

davidthomas426 reviewed Mar 7, 2023

View reviewed changes

nikhil-sk added 2 commits March 7, 2023 00:52

Fix typo in log

85b0628

Update minor comment

6660975

davidthomas426 approved these changes Mar 7, 2023

View reviewed changes

GuanLuo approved these changes Mar 7, 2023

View reviewed changes

GuanLuo merged commit f1aedd4 into triton-inference-server:main Mar 7, 2023

Add timed wait during UNLOAD while the model becomes UNAVAILABLE in SageMaker #5423

Add timed wait during UNLOAD while the model becomes UNAVAILABLE in SageMaker #5423

Conversation

nikhil-sk commented Feb 27, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nikhil-sk Mar 6, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

GuanLuo left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

GuanLuo left a comment

Choose a reason for hiding this comment

nikhil-sk Mar 6, 2023 •

edited

Loading