GitHub - Extralit/extralit: Fast and accurate systemic data extraction with LLM assistance

📄 Documentation | 🚀 Quickstart | 🛠️ Architecture

What is Extralit?

Extralit (EXTRAct LITerature) is a data extraction workflow with user-friendly UI, designed for LLM-assisted scientific data extraction and other unstructured document intelligence tasks. It focuses on data accuracy above all else, and further integrates human feedback loops for continuous LLM refinement and collaborative data extraction.

Why Use Extralit?

Precision First – Built for high data accuracy, ensuring reliable results.
Human-in-the-Loop – Seamlessly integrate human annotations to refine LLM outputs and collaborate on data validation.
Flexible & Scalable – Available as a Python SDK, CLI, and Web UI with multiple deployment options to fit your workflow.

Key Features:

Schema-Driven Extraction – Define structured schemas for context-aware, high-accuracy data extraction across scientific domains.
Advanced PDF Processing – AI-powered OCR detects complex table structures in both digital and scanned PDFs.
Built-in Validation – Automatically verify extracted data for accuracy in both the annotation UI and the data pipeline outputs.
User-Friendly Interface – Easily review, edit, and validate data with team-based consensus workflows.
Data Flywheel – Collect human annotations to monitor performance and build fine-tuning datasets for continuous improvement.

Start extracting smarter with Extralit! 🚀

Recent News

May 2025: Extralit selected for Google Summer of Code 2025! We're working on Scientific PDF Data Extraction and Interactive Schema Editor UI projects.
Looking to contribute? Check out our GSoC projects or open issues to get started!

Getting started

Installation

Install the client package

pip install extralit

If you already have a server deployed and login credentials, obtain your API key in the User Settings. You can manage your extraction workspace through the CLI with:

extralit login --api-url http://<extralit_server_instance>
# You will be prompted an API key to login to your account

Server setup

See https://docs.extralit.ai/latest/getting_started/quickstart/

🛠️ Project Architecture

Extralit is built on top of Argilla, extending its capabilities with enhanced data extraction, validation, and human-in-the-loop workflows, with these 5 core components:

Python SDK: A Python SDK which is installable with pip install extralit to interact with the web server and provides an API to manage the data extraction workflows.
FastAPI Server: The backbone of Extralit, handling users, storage, and API interactions. It manages application data using a relational database (PostgreSQL by default).
Web UI: A web application to visualize and annotate your data, users and teams. It is built with Vue.js and Nuxt.js and is directly deployed alongside the FastAPI Server within our Docker image.
Vector Database: A vector database to store the records data and perform scalable vector similarity searches and basic document searches. We currently support ElasticSearch and AWS OpenSearch and they can be deployed as separate Docker images.

Name		Name	Last commit message	Last commit date
Latest commit History 5,064 Commits
.devcontainer		.devcontainer
.github		.github
.kiro/specs		.kiro/specs
examples		examples
extralit-frontend		extralit-frontend
extralit-server		extralit-server
extralit		extralit
.dockerignore		.dockerignore
.gitattributes		.gitattributes
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CITATION.cff		CITATION.cff
CLAUDE.md		CLAUDE.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
NOTICE		NOTICE
README.md		README.md
Tiltfile		Tiltfile
codecov.yml		codecov.yml
docker-compose.yaml		docker-compose.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Repository files navigation

📄 Documentation | 🚀 Quickstart | 🛠️ Architecture

What is Extralit?

Recent News

Getting started

Installation

Server setup

🛠️ Project Architecture

Repo Activity

About

Uh oh!

Releases 7

Sponsor this project

Uh oh!

Packages

Uh oh!

Contributors 113

Uh oh!

Languages

Uh oh!

License

Extralit/extralit

Folders and files

Latest commit

History

Repository files navigation

📄 Documentation | 🚀 Quickstart | 🛠️ Architecture

What is Extralit?

Recent News

Getting started

Installation

Server setup

🛠️ Project Architecture

Repo Activity

About

Topics

Resources

License

Code of conduct

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 7

Sponsor this project

Uh oh!

Packages 0

Uh oh!

Contributors 113

Uh oh!

Languages

Packages