skccks

skccks

Starred repositories

intsig-textin / chatdoc_stack

TypeScript 6 1 Updated Mar 14, 2025

abbyy / barcode_detection_benchmark

Code for paper "New Benchmarks for Barcode Detection using both Synthetic and Real Data" https://link.springer.com/chapter/10.1007%2F978-3-030-57058-3_34

Python 81 20 Updated Aug 20, 2022

intsig-textin / acge_text_embedding

16 1 Updated Aug 30, 2024

BUAADreamer / EasyRAG

Easy-to-Use RAG Framework; CCF AIOps International Challenge 2024 Top3 Solution; CCF AIOps 国际挑战赛 2024 季军方案

Python 374 44 Updated Nov 17, 2024

NirDiamant / RAG_Techniques

This repository showcases various advanced techniques for Retrieval-Augmented Generation (RAG) systems. RAG systems combine information retrieval with generative models to provide accurate and cont…

Jupyter Notebook 13,571 1,399 Updated Mar 5, 2025

lumina-ai-inc / chunkr

Vision infrastructure to turn complex documents into RAG/LLM-ready data

Rust 2,021 111 Updated Mar 23, 2025

gomate-community / TrustRAG

TrustRAG：The RAG Framework within Reliable input,Trusted output

Python 785 85 Updated Mar 22, 2025

opendatalab / DocLayout-YOLO

DocLayout-YOLO: Enhancing Document Layout Analysis through Diverse Synthetic Data and Global-to-Local Adaptive Perception

Python 967 74 Updated Jan 16, 2025

1Panel-dev / MaxKB

💬 Ready-to-use & flexible RAG Chatbot, supporting mainstream large language models (LLMs) such as DeepSeek-R1, Llama 3.3, Qwen2, OpenAI and more.

Python 14,913 1,975 Updated Mar 23, 2025

TebooNok / HiQA

Code implement reposity of Paper HiQA

Python 98 16 Updated Mar 2, 2025

VikParuchuri / surya

OCR, layout analysis, reading order, table recognition in 90+ languages

Python 16,926 1,104 Updated Mar 20, 2025

flyme2023 / bge

bge推理优化相关脚本

Python 28 3 Updated Jan 23, 2024

benbrandt / text-splitter

Split text into semantic chunks, up to a desired chunk size. Supports calculating length by characters and tokens, and is callable from Rust and Python.

Rust 384 24 Updated Mar 22, 2025

wisupai / e2m

E2M converts various file types (doc, docx, epub, html, htm, url, pdf, ppt, pptx, mp3, m4a) into Markdown. It’s easy to install, with dedicated parsers and converters, supporting custom configs. E2…

Jupyter Notebook 1,040 52 Updated Sep 8, 2024

AnswerDotAI / rerankers

A lightweight, low-dependency, unified API to use all common reranking and cross-encoder models.

Python 1,345 77 Updated Mar 20, 2025

intsig-textin / parsex-frontend

如需体验TextIn文档解析，请访问 https://cc.co/16YSIy

JavaScript 129 19 Updated Mar 13, 2025

run-llama / llama_cloud_services

Knowledge Agents and Management in the Cloud

Python 3,807 373 Updated Mar 22, 2025

intsig-textin / parsex-sdk

如需体验TextIn文档解析，请访问 https://cc.co/16YSIy

Java 12 5 Updated Mar 4, 2025

denser-org / denser-retriever

An enterprise-grade AI retriever designed to streamline AI integration into your applications, ensuring cutting-edge accuracy.

Python 282 37 Updated Mar 3, 2025

CosmosShadow / gptpdf

Using GPT to parse PDF

Python 3,319 245 Updated Aug 7, 2024

run-llama / llama_index

LlamaIndex is the leading framework for building LLM-powered agents over your data.

Python 40,273 5,739 Updated Mar 22, 2025

QuivrHQ / MegaParse

File Parser optimised for LLM Ingestion with no loss 🧠 Parse PDFs, Docx, PPTx in a format that is ideal for LLMs.

Python 5,894 294 Updated Feb 21, 2025

nashsu / FreeAskInternet

FreeAskInternet is a completely free, PRIVATE and LOCALLY running search aggregator & answer generate using MULTI LLMs, without GPU needed. The user can ask a question and the system will make a mu…

Python 8,605 911 Updated Apr 18, 2024

searxng / searxng

SearXNG is a free internet metasearch engine which aggregates results from various search services and databases. Users are neither tracked nor profiled.

Python 17,679 1,788 Updated Mar 22, 2025

plageon / SlimPlm

Small Models, Big Insights: Leveraging Slim Proxy Models To Decide When and What to Retrieve for LLMs (ACL 2024)

Python 59 5 Updated Oct 16, 2024

wangshusen / SearchEngine

搜索引擎原理

1,616 135 Updated Apr 19, 2024

lutongyv / Textin_Tester

如需体验textin文档解析，请点击https://cc.co/16YSIy

Python 22 Updated Jul 9, 2024

isaacus-dev / semchunk

A fast, lightweight and easy-to-use Python library for splitting text into semantically meaningful chunks.

Python 267 16 Updated Mar 20, 2025

agentuniverse-ai / agentUniverse

agentUniverse is a LLM multi-agent framework that allows developers to easily build multi-agent applications.

Python 1,286 174 Updated Mar 21, 2025

AlibabaResearch / HLATR

Implementation of paper: HLATR: Enhance Multi-stage Text Retrieval with Hybrid List Aware Transformer Reranking

Python 68 9 Updated Jan 4, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly