Kwai Keye-VL: A Multimodal Large Language Model for Video Understanding

[🍎 Home Page] [📖 Technical Report] [📊 Models] [🚀 Demo]

🔥 News

2025.06.26 🌟 We proudly announce the launch of Kwai Keye-VL, a state-of-the-art multimodal large language model developed by the Kwai Keye Team at Kuaishou. Keye stands out in video understanding, visual perception, and reasoning tasks, establishing new performance benchmarks. Our team continues to innovate, so stay tuned for further updates!

Introduction

Kwai Keye-VL is designed to enhance the interaction between users and multimedia content. It utilizes advanced algorithms to analyze and interpret videos, images, and text, making it a versatile tool for various applications. The model is capable of understanding context and generating relevant responses, thereby enriching user experiences.

Features

Multimodal Understanding: Processes and integrates information from text, images, and videos.
High Performance: Achieves top-tier results in video analysis and reasoning tasks.
User-Friendly: Easy to implement and integrate into existing systems.
Scalable: Suitable for applications ranging from small projects to large-scale deployments.

Installation

To install Kwai Keye-VL, follow these steps:

Clone the Repository:

git clone https://github.com/kwaikeyevl/Keye.git
cd Keye

Install Dependencies: Use the following command to install required packages:
```
pip install -r requirements.txt
```
Download the Model: You can download the latest model from the Releases section. Make sure to execute the necessary files after downloading.

Usage

To use Kwai Keye-VL, follow these instructions:

Load the Model:

from keyevl import KeyeVL

model = KeyeVL.load_model('path/to/model')

Analyze a Video:

results = model.analyze_video('path/to/video.mp4')
print(results)

Generate a Response:

response = model.generate_response('What is happening in this video?')
print(response)

Models

Kwai Keye-VL provides various models optimized for different tasks. You can explore the available models on our Hugging Face page. Each model comes with documentation on how to use it effectively.

Contributing

We welcome contributions from the community. If you would like to contribute, please follow these steps:

Fork the repository.
Create a new branch (git checkout -b feature/YourFeature).
Make your changes.
Commit your changes (git commit -m 'Add new feature').
Push to the branch (git push origin feature/YourFeature).
Open a pull request.

License

Kwai Keye-VL is licensed under the MIT License. See the LICENSE file for more details.

Releases

For the latest updates and model downloads, visit the Releases section. Here, you can find the latest version of the model, including instructions on how to download and execute the necessary files.

View Releases

Feel free to explore the repository, test the model, and share your feedback. Your input helps us improve and expand the capabilities of Kwai Keye-VL.

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
asset		asset
evaluation/KC-MMBench		evaluation/KC-MMBench
keye-vl-8b-preview		keye-vl-8b-preview
keye-vl-utils		keye-vl-utils
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Kwai Keye-VL: A Multimodal Large Language Model for Video Understanding

🔥 News

Table of Contents

Introduction

Features

Installation

Usage

Models

Contributing

License

Releases

About

Uh oh!

Releases

Packages

Contributors 4

Uh oh!

Languages

kulsoegg/Keye

Folders and files

Latest commit

History

Repository files navigation

Kwai Keye-VL: A Multimodal Large Language Model for Video Understanding

🔥 News

Table of Contents

Introduction

Features

Installation

Usage

Models

Contributing

License

Releases

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Uh oh!

Languages

Packages