- All languages
- Batchfile
- C
- C#
- C++
- CSS
- Clojure
- Cuda
- Dockerfile
- Go
- Groovy
- HTML
- Java
- JavaScript
- Julia
- Jupyter Notebook
- Kotlin
- Lua
- MATLAB
- MDX
- Makefile
- Markdown
- OpenEdge ABL
- PHP
- PLpgSQL
- Perl
- PostScript
- Pug
- Python
- R
- RMarkdown
- Rebol
- Rez
- Rich Text Format
- Ruby
- Rust
- SCSS
- Scala
- Scheme
- Shell
- Stan
- Stata
- Svelte
- TeX
- TypeScript
- Vim Script
- Vue
- XSLT
Starred repositories
Hands-on tutorials on fine-tuning various LLMs using different fine-tuning techniques
📚 Process PDFs, Word documents and more with spaCy
A system that tries to resolve all issues on a github repo with OpenHands.
Prompt Engineering | Prompt Versioning | Use GPT or other prompt based models to get structured output. Join our discord for Prompt-Engineering, LLMs and other latest research
AutoChain: Build lightweight, extensible, and testable LLM Agents
🤖 Everything you need to create an LLM Agent—tools, prompts, frameworks, and models—all in one place.
Framework for orchestrating role-playing, autonomous AI agents. By fostering collaborative intelligence, CrewAI empowers agents to work together seamlessly, tackling complex tasks.
Build resilient language agents as graphs.
A programming framework for agentic AI 🤖 PyPi: autogen-agentchat Discord: https://aka.ms/autogen-discord Office Hour: https://aka.ms/autogen-officehour
Integrate cutting-edge LLM technology quickly and easily into your apps
Chat with PDF files with source highlights
Python library to extract tabular data from images and scanned PDFs
Document Layout Analysis resources repos for development with PdfPig.
Table Transformer (TATR) is a deep learning model for extracting tables from unstructured documents (PDFs and images). This is also the official repository for the PubTables-1M dataset and GriTS ev…
PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.
img2table is a table identification and extraction Python Library for PDF and images, based on OpenCV image processing
Replace 'hub' with 'ingest' in any github url to get a prompt-friendly extract of a codebase
Debug, evaluate, and monitor your LLM applications, RAG systems, and agentic workflows with comprehensive tracing, automated evaluations, and production-ready dashboards.
Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.
PyTorch deep learning models for document classification
Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, provide data annotation and synthesis tools, support training and…
Ready-to-use OCR with 80+ supported languages and all popular writing scripts including Latin, Chinese, Arabic, Devanagari, Cyrillic and etc.
Tesseract Open Source OCR Engine (main repository)
Provides a simple and efficient way to interact with the LLMWhisperer API
Python tool for converting files and office documents to Markdown.