Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

serving stuck because of deleted model #65

Open
stephanbertl opened this issue Nov 20, 2023 · 0 comments
Open

serving stuck because of deleted model #65

stephanbertl opened this issue Nov 20, 2023 · 0 comments

Comments

@stephanbertl
Copy link
Contributor

stephanbertl commented Nov 20, 2023

My clearml serving deployment is stuck.

No models are registered,

clearml-serving --id 7303713271b941f7a0b45760d45208dd model list
clearml-serving - CLI for launching ClearML serving engine
List model serving and endpoints, control task id=7303713271b941f7a0b45760d45208dd
Info: syncing model endpoint configuration, state hash=d3290336c62c7fb0bc8eb4046b60bc7f
Endpoints:
{}
Model Monitoring:
{}
Canary:
{}

However, old models are still somehow there:

serving-task:
image

There is a leftover model that I am unable to remove:
image

Triton-Task:

2023-11-20 16:18:40
ClearML Task: created new task id=9b3460b62f9d4015890c7dd2c0064bcf
2023-11-20 15:18:40,452 - clearml.Task - INFO - No repository found, storing script code instead
ClearML results page: http://clearml-webserver:8080/projects/9b4bbac7f1c248e894793f5771005826/experiments/9b3460b62f9d4015890c7dd2c0064bcf/output/log
2023-11-20 16:18:40
configuration args: Namespace(inference_task_id=None, metric_frequency=1.0, name='triton engine', project=None, serving_id='7303713271b941f7a0b45760d45208dd', t_allow_grpc=None, t_buffer_manager_thread_count=None, t_cuda_memory_pool_byte_size=None, t_grpc_infer_allocation_pool_size=None, t_grpc_port=None, t_http_port=None, t_http_thread_count=None, t_log_verbose=None, t_min_supported_compute_capability=None, t_pinned_memory_pool_byte_size=None, update_frequency=1.0)
String Triton Helper service
{'serving_id': '7303713271b941f7a0b45760d45208dd', 'project': None, 'name': 'triton engine', 'update_frequency': 1.0, 'metric_frequency': 1.0, 'inference_task_id': None, 't_http_port': None, 't_http_thread_count': None, 't_allow_grpc': None, 't_grpc_port': None, 't_grpc_infer_allocation_pool_size': None, 't_pinned_memory_pool_byte_size': None, 't_cuda_memory_pool_byte_size': None, 't_min_supported_compute_capability': None, 't_buffer_manager_thread_count': None, 't_log_verbose': None}
Updating local model folder: /models
2023-11-20 15:18:41,106 - clearml.Model - ERROR - Action failed <400/201: models.get_by_id/v1.0 (Invalid model id (no such public or company model): id=0bbba86c98c54610a14350ba69e2e330, company=d1bd92a3b039400cbafc60a7a5b1e52b)> (model=0bbba86c98c54610a14350ba69e2e330)
2023-11-20 15:18:41,107 - clearml.Model - ERROR - Failed reloading task 0bbba86c98c54610a14350ba69e2e330
2023-11-20 15:18:41,115 - clearml.Model - ERROR - Action failed <400/201: models.get_by_id/v1.0 (Invalid model id (no such public or company model): id=0bbba86c98c54610a14350ba69e2e330, company=d1bd92a3b039400cbafc60a7a5b1e52b)> (model=0bbba86c98c54610a14350ba69e2e330)
2023-11-20 15:18:41,115 - clearml.Model - ERROR - Failed reloading task 0bbba86c98c54610a14350ba69e2e330
2023-11-20 16:18:41
Traceback (most recent call last):
  File "clearml_serving/engines/triton/triton_helper.py", line 540, in <module>
    main()
  File "clearml_serving/engines/triton/triton_helper.py", line 532, in main
    helper.maintenance_daemon(
  File "clearml_serving/engines/triton/triton_helper.py", line 237, in maintenance_daemon
    self.model_service_update_step(model_repository_folder=local_model_repo, verbose=True)
  File "clearml_serving/engines/triton/triton_helper.py", line 146, in model_service_update_step
    print("Error retrieving model ID {} []".format(model_id, model.url if model else ''))
  File "/usr/local/lib/python3.8/dist-packages/clearml/model.py", line 341, in url
    return self._get_base_model().uri
  File "/usr/local/lib/python3.8/dist-packages/clearml/backend_interface/model.py", line 496, in uri
    return self.data.uri
AttributeError: 'NoneType' object has no attribute 'uri'

How can a broken task be fixed without deploying a new serving instance?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant