Why do I get different results every time I run demo_video_text_retrieval.ipynb, even though my video and text inputs haven't changed?