Skip to content

[BUG] Orphaned ML Models  #1179

@dtaivpp

Description

@dtaivpp

What is the bug?
When a model upload fails there is an issue where the model still exists but it cannot be removed/unloaded.

How can one reproduce the bug?
Steps to reproduce the behavior:

  1. Upload a model on (what I guess is an unsupported platform?) M1 macbook.
POST /_plugins/_ml/models/_upload 
{
  "name": "TEST_MODEL",
  "version": "1.0.0",
  "description": "MiniLM",
  "model_format": "TORCH_SCRIPT",
  "model_config": {
    "model_type": "bert",
    "embedding_dimension": 384,
    "framework_type": "sentence_transformers"
  },
  "url": "https://github.com/opensearch-project/ml-commons/raw/2.x/ml-algorithms/src/test/resources/org/opensearch/ml/engine/algorithms/text_embedding/all-MiniLM-L6-v2_torchscript_sentence-transformer.zip?raw=true"
}

This creates a Task ID that can be tracked. Viewing the task id it has clearly failed.
Screenshot 2023-08-02 at 4 31 27 PM

The above outputs indicates the TaskID we had been given is actually the ModelID. Attempting to _unload the model with what I believe is really the TaskID yields the following:
Screenshot 2023-08-02 at 4 36 02 PM

We cannot load a new model with that name however as it is still in the system:
Screenshot 2023-08-02 at 4 33 01 PM

What is the expected behavior?
If a model creation fails I expect to either be able to delete the failed upload or to upload a new model with the same name without issues.

What is your host/environment?

  • OS: MacOS (M1)
  • Version 2.9

Metadata

Metadata

Assignees

No one assigned

    Labels

    documentationImprovements or additions to documentation

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions