feat: preload textual model #12729

martabal · 2024-09-16T16:04:11Z

I recently changed my clip model and noticed that the first search is terribly slow because the textual model is loaded only during the first search.

This PR adds a new setting that allows to load the text model on client connection with the websocket thus improving the time it takes for the first search a bit (this improvement depends on many factors like cpu power, model size...). To differentiate a client connection used for the background upload and a "normal" session, it only loads the model if the websocket URL has the query background=false

Todo:

Improve the error handling for machine-learning.repository

mertalev · 2024-09-16T16:13:09Z

I was thinking of handling this with the MACHINE_LEARNING_PRELOAD envs, like with a MACHINE_LEARNING_PRELOAD__CLIP__TYPE env that you can set to textual.

I think it'd be simpler and more generic, although the advantage of this PR is that the model can still be unloaded when not in use.

What do you think?

martabal · 2024-09-16T16:41:28Z

I think it'd be simpler and more generic, although the advantage of this PR is that the model can still be unloaded when not in use.

Yes, that's exactly why I worked on it. I don't want to have a clip model loaded all the time, I want it loaded only when I (probably) need it. From my tests, with the model I use (XLM-Roberta-Large-Vit-B-16Plus), it adds almost 2 GB of data to my RAM, it's nice to have it unloaded when I don't use Immich 😅

mertalev

Thanks for working on this! This is a great way to lower latency for the first search without committing to 24/7 memory usage. That being said, there are some refinements I'd like to see.

mertalev · 2024-09-16T18:19:46Z

machine-learning/app/schemas.py

+class LoadModelEntry(InferenceEntry):
+    ttl: int
+
+    def __init__(self, name: str, task: ModelTask, type: ModelType, options: dict[str, Any], ttl: int):
+        super().__init__(name=name, task=task, type=type, options=options)
+
+        if ttl <= 0:
+            raise ValueError("ttl must be a positive integer")
+        self.ttl = ttl


InferenceEntry is just a type for a dict, so this __init__ shouldn't exist.

Also, I have some reservations about the ttl field here. TTL is currently set by an env and assumed to be the same for all models. At minimum, you would need to change the idle shutdown conditions to not rely on the env anymore and instead (synchronously) check if the model cache is empty.

Moreover, I'm not sure if this is something that the caller should be able to decide. The relationship between server and ML isn't necessarily one-to-one, so the settings should be designed with that in mind. If you have multiple servers that share a machine learning service, the effective TTL now depends on the order of requests. TTL is also relevant to whoever is deploying ML and may not be something they want a caller to be able to configure.

mertalev · 2024-09-16T18:30:40Z

server/src/dtos/model-config.dto.ts

+  @Type(() => LoadTextualModelOnConnection)
+  @ValidateNested()
+  @IsObject()
+  loadTextualModelOnConnection!: LoadTextualModelOnConnection;


Can this be less wordy? Maybe preloadTextualModel?

Also, I imagine there are a lot of sessions where nothing gets searched and the model is loaded unnecessarily. If this were instead done when the user clicks the search bar, it'd almost always be a positive and could be enabled by default.

mertalev · 2024-09-16T18:40:31Z

server/src/repositories/event.repository.ts

@@ -68,6 +75,16 @@ export class EventRepository implements OnGatewayConnection, OnGatewayDisconnect
        queryParams: {},
        metadata: { adminRoute: false, sharedLinkRoute: false, uri: '/api/socket.io' },
      });
+      if ('background' in client.handshake.query && client.handshake.query.background === 'false') {
+        const { machineLearning } = await this.configCore.getConfig({ withCache: true });
+        if (machineLearning.clip.loadTextualModelOnConnection.enabled) {


This should check if smart search is enabled.

martabal · 2024-09-17T09:21:50Z

Thanks for working on this! This is a great way to lower latency for the first search without committing to 24/7 memory usage. That being said, there are some refinements I'd like to see.

Do we want to offer multiple strategies for when to load the textual model (on client connection, on search bar click...)? I guess loading model may vary a lot from one instance to another

mertalev · 2024-09-17T19:55:11Z

Do we want to offer multiple strategies for when to load the textual model (on client connection, on search bar click...)? I guess loading model may vary a lot from one instance to another

The connection approach is better if the model takes more than a few seconds to load and the user searches right away, but it's worse outside of that. The TTL aspect makes it hard to optimize: the model can be unloaded before they search, or it can stay loaded longer than needed because you don't know whether it'll get used. I think the search bar approach is a safer bet for now, but I'm open to having different options if it turns out that it isn't enough.

martabal · 2024-09-17T20:16:49Z

The TTL aspect makes it hard to optimize: the model can be unloaded before they search, or it can stay loaded longer than needed because you don't know whether it'll get used

We can use the handleDisconnect event to be sure there's no client connected, but yeah, that's not 100% perfect.

1-tempest · 2024-10-11T14:08:09Z

I think the search bar approach is a safer bet for now, but I'm open to having different options if it turns out that it isn't enough

Just to add a frame of reference, it takes me 25 seconds to load-and-search (using ViT-L-16-SigLIP-384__webli). Search-only takes just under a second. I am using smr drives so it's probably a worst case scenario.

In this specific scenario, preloading when the search bar is clicked would make a negligible impact on search time

feat: preload textual model

708a53a

github-actions bot added 🖥️web 🗄️server 🧠machine-learning labels Sep 16, 2024

mertalev reviewed Sep 16, 2024

View reviewed changes

danieldietzler added the changelog:enhancement label Sep 23, 2024

martabal added 2 commits September 25, 2024 17:39

feat: preload textual model

d34d631

feat: preload textual model

59300d2

bo0tzz mentioned this pull request Oct 11, 2024

feat: Add support for preloading textual clip model without visual model #13368

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: preload textual model #12729

feat: preload textual model #12729

martabal commented Sep 16, 2024

mertalev commented Sep 16, 2024

martabal commented Sep 16, 2024

mertalev left a comment

mertalev Sep 16, 2024 •

edited

Loading

mertalev Sep 16, 2024

mertalev Sep 16, 2024

martabal commented Sep 17, 2024

mertalev commented Sep 17, 2024

martabal commented Sep 17, 2024

1-tempest commented Oct 11, 2024

feat: preload textual model #12729

Are you sure you want to change the base?

feat: preload textual model #12729

Conversation

martabal commented Sep 16, 2024

mertalev commented Sep 16, 2024

martabal commented Sep 16, 2024

mertalev left a comment

Choose a reason for hiding this comment

mertalev Sep 16, 2024 • edited Loading

Choose a reason for hiding this comment

mertalev Sep 16, 2024

Choose a reason for hiding this comment

mertalev Sep 16, 2024

Choose a reason for hiding this comment

martabal commented Sep 17, 2024

mertalev commented Sep 17, 2024

martabal commented Sep 17, 2024

1-tempest commented Oct 11, 2024

mertalev Sep 16, 2024 •

edited

Loading