
Starred repositories
Code for paper "New Benchmarks for Barcode Detection using both Synthetic and Real Data" https://link.springer.com/chapter/10.1007%2F978-3-030-57058-3_34
Easy-to-Use RAG Framework; CCF AIOps International Challenge 2024 Top3 Solution; CCF AIOps 国际挑战赛 2024 季军方案
This repository showcases various advanced techniques for Retrieval-Augmented Generation (RAG) systems. RAG systems combine information retrieval with generative models to provide accurate and cont…
Vision infrastructure to turn complex documents into RAG/LLM-ready data
TrustRAG:The RAG Framework within Reliable input,Trusted output
DocLayout-YOLO: Enhancing Document Layout Analysis through Diverse Synthetic Data and Global-to-Local Adaptive Perception
💬 Ready-to-use & flexible RAG Chatbot, supporting mainstream large language models (LLMs) such as DeepSeek-R1, Llama 3.3, Qwen2, OpenAI and more.
OCR, layout analysis, reading order, table recognition in 90+ languages
Split text into semantic chunks, up to a desired chunk size. Supports calculating length by characters and tokens, and is callable from Rust and Python.
E2M converts various file types (doc, docx, epub, html, htm, url, pdf, ppt, pptx, mp3, m4a) into Markdown. It’s easy to install, with dedicated parsers and converters, supporting custom configs. E2…
A lightweight, low-dependency, unified API to use all common reranking and cross-encoder models.
如需体验TextIn文档解析,请访问 https://cc.co/16YSIy
Knowledge Agents and Management in the Cloud
An enterprise-grade AI retriever designed to streamline AI integration into your applications, ensuring cutting-edge accuracy.
LlamaIndex is the leading framework for building LLM-powered agents over your data.
File Parser optimised for LLM Ingestion with no loss 🧠 Parse PDFs, Docx, PPTx in a format that is ideal for LLMs.
FreeAskInternet is a completely free, PRIVATE and LOCALLY running search aggregator & answer generate using MULTI LLMs, without GPU needed. The user can ask a question and the system will make a mu…
SearXNG is a free internet metasearch engine which aggregates results from various search services and databases. Users are neither tracked nor profiled.
Small Models, Big Insights: Leveraging Slim Proxy Models To Decide When and What to Retrieve for LLMs (ACL 2024)
A fast, lightweight and easy-to-use Python library for splitting text into semantically meaningful chunks.
agentUniverse is a LLM multi-agent framework that allows developers to easily build multi-agent applications.
Implementation of paper: HLATR: Enhance Multi-stage Text Retrieval with Hybrid List Aware Transformer Reranking