Skip to content
View orrzohar's full-sized avatar
Video
Video

Highlights

  • Pro

Block or report orrzohar

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Codebase for Aria - an Open Multimodal Native MoE

Jupyter Notebook 751 66 Updated Oct 21, 2024
Python 223 12 Updated Nov 2, 2024

[ICML'24] Data and code for our paper "Training-Free Long-Context Scaling of Large Language Models"

Python 351 18 Updated Oct 16, 2024

A new multi-shot video understanding benchmark Shot2Story with comprehensive video summaries and detailed shot-level captions.

Python 98 6 Updated Sep 25, 2024

Accelerating the development of large multimodal models (LMMs) with lmms-eval

Python 11 1 Updated Oct 14, 2024

A lightweight flexible Video-MLLM developed by TencentQQ Multimedia Research Team.

Python 62 3 Updated Oct 14, 2024
20 Updated Jul 29, 2024

Code implementation of synthetic continued pretraining

Python 53 4 Updated Oct 6, 2024

LaVIT: Empower the Large Language Model to Understand and Generate Visual Content

Jupyter Notebook 517 29 Updated Oct 6, 2024

EAGLE: Exploring The Design Space for Multimodal LLMs with Mixture of Encoders

Python 533 44 Updated Sep 19, 2024

[CVPR 2023] Official Pytorch code for PROB: Probabilistic Objectness for Open World Object Detection

Python 111 16 Updated Oct 29, 2024

Qwen2-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.

Python 2,879 170 Updated Oct 4, 2024

official impelmentation of Kangaroo: A Powerful Video-Language Model Supporting Long-context Video Input

Python 53 Updated Aug 30, 2024

GPT4V-level open-source multi-modal model based on Llama3-8B

Python 2,093 141 Updated Sep 3, 2024

Towards Large Multimodal Models as Visual Foundation Agents

Python 106 1 Updated Oct 31, 2024

LVBench: An Extreme Long Video Understanding Benchmark

Python 58 1 Updated Aug 30, 2024

Pytorch implementation of Twelve Labs' Video Foundation Model evaluation framework & open embeddings

Python 18 Updated Aug 23, 2024

[ICML 2024 Oral] Official code repository for MLLM-as-a-Judge.

Python 52 4 Updated Jul 28, 2024

[Neurips 24' D&B] Official Dataloader and Evaluation Scripts for LongVideoBench.

Python 64 2 Updated Jul 27, 2024
Python 101 11 Updated Dec 23, 2022

VisionLLaMA: A Unified LLaMA Backbone for Vision Tasks

Python 364 10 Updated Jul 9, 2024
Python 2,774 223 Updated Oct 16, 2024
Python 51 1 Updated Jun 27, 2024

The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use th…

Jupyter Notebook 12,074 1,092 Updated Oct 14, 2024

ACL'24 (Oral) Tuning Large Multimodal Models for Videos using Reinforcement Learning from AI Feedback

Python 52 3 Updated Sep 12, 2024
Python 22 1 Updated May 13, 2024

Utilities intended for use with Llama models.

Python 4,640 808 Updated Oct 28, 2024

【NeurIPS 2024】Dense Connector for MLLMs

Python 129 5 Updated Oct 14, 2024

[ECCV 2024] Elysium: Exploring Object-level Perception in Videos via MLLM

Python 54 2 Updated Oct 25, 2024

Fast and memory-efficient exact attention

Python 14,008 1,306 Updated Oct 31, 2024
Next