An AI agent development platform with all-in-one visual tools, simplifying agent creation, debugging, and deployment like never before. Coze your way to AI Agent creation.
-
Updated
Mar 16, 2026 - TypeScript
An AI agent development platform with all-in-one visual tools, simplifying agent creation, debugging, and deployment like never before. Coze your way to AI Agent creation.
🌌 Orion AI Workspace – A free intelligent workspace platform that combines advanced AI models with real-time collaboration tools, designed with privacy-first principles and user-controlled API keys.
Production-grade semantic video search engine - search across video content using natural language. Powered by Whisper, GPT-4o Vision, vector embeddings, and Pinecone.
AI StoryTeller is a multimodal AI application that converts images into creative short stories by combining computer vision and natural language generation. The system uses a pretrained image captioning model to understand visual content and Google Gemini to generate context-aware narratives grounded in the image.
Build a Machine Learning model that predicts whether a mushroom is poisonous or edible based on its physical and environmental attributes. The goal is to help identify potentially harmful mushrooms early so safer decisions can be made while handling or consuming them.
MindTrack is an AI-powered multimodal emotion detection system using both text and images to monitor emotional well-being in real time.
GenAI turns waste (peels, grounds) into drugs <60s. Upload img/txt → fragments → structures → ADMET/EcoScore → RAG validate → PDF. Built: GPT-4o, Llama-3, LangChain, RDKit. Guided: Dr. Hammad Majeed (UMT Lahore). Hackathon 2025.
A real-time image captioning and visual question answering (VQA) system. This project uses computer vision and NLP to generate descriptive captions for images and answer user questions about them.
RAG MCP Frontend — a lightweight React/TypeScript frontend for interacting with Retrieval-Augmented Generation (RAG) services and the MCP (Multi-Channel Processing) backend. This project offers a clean UI for document ingestion, query/response flows, conversation history.
Hệ thống Hỏi đáp trực quan (VQA). Mô hình AI đa phương thức kết hợp Thị giác máy tính (CNN) và Xử lý ngôn ngữ tự nhiên (LSTM) để trả lời câu hỏi dựa trên nội dung hình ảnh.
A Streamlit-based Multimodal AI Generator using Google's Gemini API for text and image generation.
A modular academic project exploring multimodal intrusion detection using RGB video, thermal input, tracking, and future audio/RF signals. Work-in-progress learning project with a clean architecture and 70-task roadmap.
Add a description, image, and links to the multimodel-ai topic page so that developers can more easily learn about it.
To associate your repository with the multimodel-ai topic, visit your repo's landing page and select "manage topics."