LLM Extractinator

⚠️ This tool is a prototype in active development and may change significantly. Always verify results!

LLM Extractinator enables efficient extraction of structured data from unstructured text using large language models (LLMs). It supports configurable task definitions, CLI or Python usage, and flexible data input/output formats.

📘 Full documentation: https://DIAGNijmegen.github.io/llm_extractinator/

🔧 Installation

1. Install Ollama

On Linux:

curl -fsSL https://ollama.com/install.sh | sh

On Windows or macOS:

Download the installer from:
https://ollama.com/download

2. Install the Package

You have two options:

🔹 Option A – Install from PyPI:

pip install llm_extractinator

🔹 Option B – Install from a Local Clone:

git clone https://github.com/DIAGNijmegen/llm_extractinator.git
cd llm_extractinator
pip install -e .

🚀 Quick Usage

CLI

extractinate --task_id 001 --model_name "phi4"

Python

from llm_extractinator import extractinate

extractinate(task_id=1, model_name="phi4")

📁 Task Files

Each task is defined using a JSON file stored in the tasks/ directory.

Filename format:

TaskXXX_name.json

Example contents:

{
  "Description": "Extract product data from text.",
  "Data_Path": "products.csv",
  "Input_Field": "text",
  "Parser_Format": "product_parser.py"
}

Parser_Format refers to a .py file in tasks/parsers/ that defines a Pydantic OutputParser class used to structure the LLM output.

🛠️ Visual Schema Builder (Optional)

You can visually design the output schema using:

build-parser

This launches a web UI to create a Pydantic OutputParser model, which defines the structure of the extracted data. Additional models can be added and nested for complex formats.

The resulting .py file should be saved in:

tasks/parsers/

And referenced in your task JSON under the Parser_Format key.

👉 See parser docs for full usage.

📄 Citation

If you use this tool, please cite: 10.5281/zenodo.15089764

🤝 Contributing

We welcome contributions! See the full contributing guide in the docs.

Name		Name	Last commit message	Last commit date
Latest commit History 276 Commits
.github/workflows		.github/workflows
data		data
docs		docs
llm_extractinator		llm_extractinator
tasks		tasks
tests		tests
.gitattributes		.gitattributes
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
build.sh		build.sh
mkdocs.yml		mkdocs.yml
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
run.sh		run.sh
setup.cfg		setup.cfg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

LLM Extractinator

🔧 Installation

1. Install Ollama

On Linux:

On Windows or macOS:

2. Install the Package

🔹 Option A – Install from PyPI:

🔹 Option B – Install from a Local Clone:

🚀 Quick Usage

CLI

Python

📁 Task Files

🛠️ Visual Schema Builder (Optional)

📄 Citation

🤝 Contributing

About

Uh oh!

Releases 7

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

License

DIAGNijmegen/llm_extractinator

Folders and files

Latest commit

History

Repository files navigation

LLM Extractinator

🔧 Installation

1. Install Ollama

On Linux:

On Windows or macOS:

2. Install the Package

🔹 Option A – Install from PyPI:

🔹 Option B – Install from a Local Clone:

🚀 Quick Usage

CLI

Python

📁 Task Files

🛠️ Visual Schema Builder (Optional)

📄 Citation

🤝 Contributing

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 7

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages