Unstract

The Data Layer for your Agentic Workflows—Automate Document-based workflows with close to 100% accuracy!

🤖 Prompt Studio

Prompt Studio is a purpose-built environment that supercharges your schema definition efforts. Compare outputs from different LLMs side-by-side, keep tab on costs while you develop generic prompts that work across wide-ranging document variations. And when you're ready, launch extraction APIs with a single click.

🔌 Integrations that suit your environment

Once you've used Prompt Studio to define your schema, Unstract makes it easy to integrate into your existing workflows. Simply choose the integration type that best fits your environment:

Integration Type	Description	Best For	Documentation
🖥️ MCP Servers	Run Unstract as an MCP Server to provide structured data extraction to Agents or LLMs in your ecosystem.	Developers building Agentic/LLM apps/tools that speak MCP.	Unstract MCP Server Docs
🌐 API Deployments	Turn any document into JSON with an API call. Deploy any Prompt Studio project as a REST API endpoint with a single click.	Teams needing programmatic access in apps, services, or custom tooling.	API Deployment Docs
⚙️ ETL Pipelines	Embed Unstract directly into your ETL jobs to transform unstructured data before loading it into your warehouse / database.	Engineering and Data engineering teams that need to batch process documents into clean JSON.	ETL Pipelines Docs
🧩 n8n Nodes	Use Unstract as ready-made nodes in n8n workflows for drag-and-drop automation.	Low-code users and ops teams automating workflows.	Unstract n8n Nodes Docs

☁️ Getting Started (Cloud / Enterprise)

The easy-peasy way to try Unstract is to sign up for a 14-day free trial. Give Unstract a spin now!

Unstract Cloud also comes with some really awesome features that give serious accuracy boosts to agentic/LLM-powered document-centric workflows in the enterprise.

Feature	Description	Documentation
🧪 LLMChallenge	Uses two Large Language Models to ensure trustworthy output. You either get the right response or no response at all.	Docs
⚡ SinglePass Extraction	Reduces LLM token usage by up to 8x, dramatically cutting costs.	Docs
📉 SummarizedExtraction	Reduces LLM token usage by up to 6x, saving costs while keeping accuracy.	Docs
👀 Human-In-The-Loop	Side-by-side comparison of extracted value and source document, with highlighting for human review and tweaking.	Docs
🔐 SSO Support	Enterprise-ready authentication options for seamless onboarding and off-boarding.	Docs

⏩ Quick Start Guide

Unstract comes well documented. You can get introduced to the basics of Unstract, and learn how to connect various systems like LLMs, Vector Databases, Embedding Models and Text Extractors to it. The easiest way to wet your feet is to go through our Quick Start Guide where you actually get to do some prompt engineering in Prompt Studio and launch an API to structure varied credit card statements!

🚀 Getting started (self-hosted)

System Requirements

8GB RAM (minimum)

Prerequisites

Linux or MacOS (Intel or M-series)
Docker
Docker Compose (if you need to install it separately)
Git

Next, either download a release or clone this repo and do the following:

✅ ./run-platform.sh
✅ Now visit http://frontend.unstract.localhost in your browser
✅ Use username and password unstract to login

That's all there is to it!

Follow these steps to change the default username and password. See user guide for more details on managing the platform.

Another really quick way to experience Unstract is by signing up for our hosted version. It comes with a 14 day free trial!

📄 Supported File Types

Unstract supports a wide range of file formats for document processing:

Category	Format	Description
Word Processing	DOCX	Microsoft Word Open XML
	DOC	Microsoft Word
	ODT	OpenDocument Text
Presentation	PPTX	Microsoft PowerPoint Open XML
	PPT	Microsoft PowerPoint
	ODP	OpenDocument Presentation
Spreadsheet	XLSX	Microsoft Excel Open XML
	XLS	Microsoft Excel
	ODS	OpenDocument Spreadsheet
Document & Text	PDF	Portable Document Format
	TXT	Plain Text
	CSV	Comma-Separated Values
	JSON	JavaScript Object Notation
Image	BMP	Bitmap Image
	GIF	Graphics Interchange Format
	JPEG	Joint Photographic Experts Group
	JPG	Joint Photographic Experts Group
	PNG	Portable Network Graphics
	TIF	Tagged Image File Format
	TIFF	Tagged Image File Format
	WEBP	Web Picture Format

🤝 Ecosystem support

LLM Providers

	Provider	Status
	OpenAI	✅ Working
	Google VertexAI, Gemini Pro	✅ Working
	Azure OpenAI	✅ Working
	Anthropic	✅ Working
	Ollama	✅ Working
	Bedrock	✅ Working
	Google PaLM	✅ Working
	Anyscale	✅ Working
	Mistral AI	✅ Working

Vector Databases

	Provider	Status
	Qdrant	✅ Working
	Weaviate	✅ Working
	Pinecone	✅ Working
	PostgreSQL	✅ Working
	Milvus	✅ Working

Embeddings

	Provider	Status
	OpenAI	✅ Working
	Azure OpenAI	✅ Working
	Google PaLM	✅ Working
	Ollama	✅ Working
	VertexAI	✅ Working
	Bedrock	✅ Working

Text Extractors

	Provider	Status
	Unstract LLMWhisperer V2	✅ Working
	Unstructured.io Community	✅ Working
	Unstructured.io Enterprise	✅ Working
	LlamaIndex Parse	✅ Working

ETL Sources

	Provider	Status
	AWS S3	✅ Working
	MinIO	✅ Working
	Google Cloud Storage	✅ Working
	Azure Cloud Storage	✅ Working
	Google Drive	✅ Working
	Dropbox	✅ Working
	SFTP	✅ Working

ETL Destinations

	Provider	Status
	Snowflake	✅ Working
	Amazon Redshift	✅ Working
	Google BigQuery	✅ Working
	PostgreSQL	✅ Working
	MySQL	✅ Working
	MariaDB	✅ Working
	Microsoft SQL Server	✅ Working
	Oracle	✅ Working

🙌 Contributing

Contributions are welcome! Please see CONTRIBUTING.md for further details to get started easily.

👋 Join the LLM-powered automation community

On Slack, join great conversations around LLMs, their ecosystem and leveraging them to automate the previously unautomatable!
Follow us on X/Twitter
Follow us on LinkedIn

🚨 Backup encryption key

Do copy the value of ENCRYPTION_KEY config in either backend/.env or platform-service/.env file to a secure location.

Adapter credentials are encrypted by the platform using this key. Its loss or change will make all existing adapters inaccessible!

📊 A note on analytics

In full disclosure, Unstract integrates Posthog to track usage analytics. As you can inspect the relevant code here, we collect the minimum possible metrics. Posthog can be disabled if desired by setting REACT_APP_ENABLE_POSTHOG to false in the frontend's .env file.

Name		Name	Last commit message	Last commit date
Latest commit History 1,429 Commits
.claude/skills		.claude/skills
.github		.github
backend		backend
docker		docker
docs		docs
frontend		frontend
platform-service		platform-service
prompt-service		prompt-service
runner		runner
tool-sidecar		tool-sidecar
tools		tools
unstract		unstract
workers		workers
x2text-service		x2text-service
.gitattributes		.gitattributes
.gitignore		.gitignore
.jshintrc		.jshintrc
.pre-commit-config.yaml		.pre-commit-config.yaml
.python-version		.python-version
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
dev-env-cli.sh		dev-env-cli.sh
pyproject.toml		pyproject.toml
run-platform.sh		run-platform.sh
tox.ini		tox.ini
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Unstract

The Data Layer for your Agentic Workflows—Automate Document-based workflows with close to 100% accuracy!

🤖 Prompt Studio

🔌 Integrations that suit your environment

☁️ Getting Started (Cloud / Enterprise)

⏩ Quick Start Guide

🚀 Getting started (self-hosted)

System Requirements

Prerequisites

📄 Supported File Types

🤝 Ecosystem support

LLM Providers

Vector Databases

Embeddings

Text Extractors

ETL Sources

ETL Destinations

🙌 Contributing

👋 Join the LLM-powered automation community

🚨 Backup encryption key

📊 A note on analytics

About

Uh oh!

Releases 484

Uh oh!

Contributors 30

Uh oh!

Languages

License

Zipstack/unstract

Folders and files

Latest commit

History

Repository files navigation

Unstract

The Data Layer for your Agentic Workflows—Automate Document-based workflows with close to 100% accuracy!

🤖 Prompt Studio

🔌 Integrations that suit your environment

☁️ Getting Started (Cloud / Enterprise)

⏩ Quick Start Guide

🚀 Getting started (self-hosted)

System Requirements

Prerequisites

📄 Supported File Types

🤝 Ecosystem support

LLM Providers

Vector Databases

Embeddings

Text Extractors

ETL Sources

ETL Destinations

🙌 Contributing

👋 Join the LLM-powered automation community

🚨 Backup encryption key

📊 A note on analytics

About

Topics

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 484

Uh oh!

Contributors 30

Uh oh!

Languages