Double first iteration time for `Llama-2-7b-hf` after nvFuser direct bindings

## 🐛 Bug

As per title, the following command takes double the time for the first iteration at commit 8b542cf16e12cf04f6375691621bb3adb0c4acea :

```
python thunder/benchmarks/benchmark_litgpt.py --model_name Llama-2-7b-hf --compile thunder_inductor_cat --checkpoint_activations False --low_precision_mode none --micro_batch_size 1 --global_batch_size 64 --use_sdpa False --block_size 4096 --max_iters 10 --warmup_iters 5
```

compared to before #2502 : 

```python
# Before
iter 0: loss 0.1650, iter time: 105627.25ms, t: 4096
# After 
iter 0: loss 0.1650, iter time: 215099.65ms, t: 4096
```

Tested on the latest container on B200

cc @rdspring1 @kshitij12345 @crcrpar 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Double first iteration time for `Llama-2-7b-hf` after nvFuser direct bindings #2700

🐛 Bug

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Double first iteration time for Llama-2-7b-hf after nvFuser direct bindings #2700

Description

🐛 Bug

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Double first iteration time for `Llama-2-7b-hf` after nvFuser direct bindings #2700