- Taiwan
Stars
Open Source Alternative to NotebookLM / Perplexity, connected to external sources such as Search Engines, Slack, Linear, Jira, ClickUp, Confluence, Notion, YouTube, GitHub, Discord and more. Join o…
Collection of extracted System Prompts from popular chatbots like ChatGPT, Claude & Gemini
LightlyTrain is the first PyTorch framework to pretrain computer vision models on unlabeled data for industrial applications
Reference PyTorch implementation and models for DINOv3
Toolkit for linearizing PDFs for LLM datasets/training
Collection of awesome LLM apps with AI Agents and RAG using OpenAI, Anthropic, Gemini and opensource models.
[CVPR 2025 Best Paper Award] VGGT: Visual Geometry Grounded Transformer
Get started with building Fullstack Agents using Gemini 2.5 and LangGraph
State-of-the-art Image & Video CLIP, Multimodal Large Language Models, and More!
[ICCV 2025] Implementation for Describe Anything: Detailed Localized Image and Video Captioning
Fully Local Manus AI. No APIs, No $200 monthly bills. Enjoy an autonomous agent that thinks, browses the web, and code for the sole cost of electricity. 🔔 Official updates only via twitter @Martin9…
An implementation of iterative deep research using the OpenAI Agents SDK
This repository showcases various advanced techniques for Retrieval-Augmented Generation (RAG) systems. RAG systems combine information retrieval with generative models to provide accurate and cont…
Recipes for shrinking, optimizing, customizing cutting edge vision models. 💜
The code used to train and run inference with the ColVision models, e.g. ColPali, ColQwen2, and ColSmol.
In-depth tutorials on LLMs, RAGs and real-world AI agent applications.
AI Agents & MCPs & AI Workflow Automation • (~400 MCP servers for AI agents) • AI Automation / AI Agent with MCPs • AI Workflows & AI Agents • MCPs for AI Agents
Python tool for converting files and office documents to Markdown.
Official implementation of 🛸 "UFO: A Unified Approach to Fine-grained Visual Perception via Open-ended Language Interface"
Official repository of 'Visual-RFT: Visual Reinforcement Fine-Tuning' & 'Visual-ARFT: Visual Agentic Reinforcement Fine-Tuning'’
🪄 Create rich visualizations with AI
This repository contains the official implementation of the research papers, "MobileCLIP" CVPR 2024 and "MobileCLIP2" TMLR August 2025
Multiview matching with deep-learning and hand-crafted local features for COLMAP and other SfM software. Supports high-resolution formats and images with rotations. Both CLI and GUI are supported.
[CVPR 2025 Best Paper Nomination] FoundationStereo: Zero-Shot Stereo Matching
[CVPR 2025 Highlight] DepthCrafter: Generating Consistent Long Depth Sequences for Open-world Videos
Chat with your database or your datalake (SQL, CSV, parquet). PandasAI makes data analysis conversational using LLMs and RAG.
Use your locally running AI models to assist you in your web browsing