Stars
Fully open reproduction of DeepSeek-R1
💻 A curated list of papers and resources for multi-modal Graphical User Interface (GUI) agents.
A visual form designer/generator base on Vue.js, make form development simple and efficient.(基于Vue3的可视化表单设计器,拖拽式操作让你快速构建一个表单, 让表单开发简单而高效。)
Generation of diagrams like flowcharts or sequence diagrams from text in a similar manner as markdown
PaddlePaddle High Performance Deep Learning Inference Engine for Mobile and Edge (飞桨高性能深度学习端侧推理引擎)
micronet, a model compression and deploy lib. compression: 1、quantization: quantization-aware-training(QAT), High-Bit(>2b)(DoReFa/Quantization and Training of Neural Networks for Efficient Integer-…
TidyBot++: An Open-Source Holonomic Mobile Manipulator for Robot Learning
UGround: Universal GUI Visual Grounding for GUI Agents
Building a comprehensive and handy list of papers for GUI agents
Open-source, End-to-end, Vision-Language-Action model for GUI Agent & Computer Use.
a state-of-the-art-level open visual language model | 多模态预训练模型
AUITestAgent is the first automatic, natural language-driven GUI testing tool for mobile apps, capable of fully automating the entire process of GUI interaction and function verification.
✨✨Freeze-Omni: A Smart and Low Latency Speech-to-speech Dialogue Model with Frozen LLM
A modular graph-based Retrieval-Augmented Generation (RAG) system
Database diagrams editor that allows you to visualize and design your DB with a single query.
Drop in a screenshot and convert it to clean code (HTML/Tailwind/React/Vue)
Penpot: The open-source design tool for design and code collaboration
The official implementation of the paper titled "StableV2V: Stablizing Shape Consistency in Video-to-Video Editing".
Text2Diagram is an AI based diagramming tool that uses Natural language text to create diagrams.
Virtual whiteboard for sketching hand-drawn like diagrams
open-source multimodal large language model that can hear, talk while thinking. Featuring real-time end-to-end speech input and streaming audio output conversational capabilities.
Official code, datasets and checkpoints for "Timer: Generative Pre-trained Transformers Are Large Time Series Models" (ICML 2024)
[NeurIPS 24] PromptFix: You Prompt and We Fix the Photo
Building a modern alternative to Salesforce, powered by the community.
RDT-1B: a Diffusion Foundation Model for Bimanual Manipulation
[CVPR 2023] SadTalker:Learning Realistic 3D Motion Coefficients for Stylized Audio-Driven Single Image Talking Face Animation
Emote Portrait Alive: Generating Expressive Portrait Videos with Audio2Video Diffusion Model under Weak Conditions
Multilingual Voice Understanding Model