Running on Windows with 24Gb VRAM

Just writing to share my experience - perhaps it could help someone and maybe update the docs/requirements

Requirements:
- Had to update `deepspeed==0.12.2` to `deepspeed==0.3.16` (because previous versions wouldn't compile on windows)
- Ended up running torch version `2.5.1+cu124`

Then had to make a few tweaks to the sample code to get it to run.
I have two GPUs, and for some reason, it always tried to run on the one with the less memory, so had to fix the GPU usage 

so added explicit device here (in the sample code). Also loaded the model directly from hugging face, instead of having to download it first (by specifying the correct model name):
```
tokenizer, model, image_processor, context_len = load_pretrained_model(
    "jadechoghari/LongVU_Qwen2_7B",
    model_base=None,
    model_name="cambrian_qwen",
    device="cuda:0"
)
```

On 24Gb VRAM, I found that I had to limit videos to about 1000 frames (around 30 seconds) for it to work (otherwise running out of memory). There might be a way to offload / quant the models to work with more, but this was my dirty work around to deal with that when loading `frame_indeces`:

```
num_frames = 1000 if len(vr) > 1000 else len(vr)
frame_indices = np.array([i for i in range(0, num_frames, round(fps),)])
```

Next, I had to add `attention_mask` to `generate()` (also increased the `max_new_tokens` so the description wasn't always cut off)
```
attention_mask = torch.ones_like(input_ids)
with torch.inference_mode():
    output_ids = model.generate(
        input_ids,
        attention_mask=attention_mask,
        images=video,
        image_sizes=image_sizes,
        do_sample=True,
        temperature=0.2,
        pad_token_id=tokenizer.eos_token_id,
        max_new_tokens=512,
        use_cache=True,
        stopping_criteria=[stopping_criteria],
    )
```

After that, it pretty much just ran. It was quite quick too. Less than a minute and the majority of that was loading the model (from a not so fast disk). It can definitely be used in near real time if needed.



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Running on Windows with 24Gb VRAM #6

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Running on Windows with 24Gb VRAM #6

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions