Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
.github		.github
app		app
docs		docs
public		public
tests		tests
.datafog		.datafog
.flake8		.flake8
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.prettierignore		.prettierignore
.readthedocs.yaml		.readthedocs.yaml
LICENSE		LICENSE
README.md		README.md
codecov.yml		codecov.yml
config.json		config.json
env.example		env.example
fogprint.json		fogprint.json
requirements-dev.txt		requirements-dev.txt
requirements.txt		requirements.txt
setup.py		setup.py
tox.ini		tox.ini

Repository files navigation

DataFog Instructor

Demo

datafog-instructor lets you use open-source LLMs (HF models, models ending in .gguf) and instruct them to return only specific, user-defined outputs like entity tags, JSON-only structured output, and synthetic data templates, among others.
We do this by using advanced ML methods that constrain the set of tokens a model will generate next when it's providing a response, and this allows us to achieve a far higher rate of accuracy and precision than baseline LLM output methods (for more, see the evals.ipynb file in examples).

Installation

pip install --upgrade datafog-instructor

Quick Start

To see a list of all available options, you can type

datafog-instructor --help

Which should show you a screen like this:

Initialize

Begin by entering the following into your terminal post-install:

datafog-instructor init

a default config file containing information about the model, tokenizer, regex_pattern gets loaded into your working directory.

You can see the contents of that file by typing:

datafog-instructor show-fogprint

What is a fogprint? A fogprint is a template that you can re-use, with specific configuration settings for the models, filenames, model_ids, and other important information to instruct an LLM to detect entities. This file is currently saved as fogprint.json.

Verify the installation:


datafog-instructor list-entities

You should see a list of default entity types: PERSON, COMPANY, LOCATION, and ORG.

Sample Operations

Detect Entities in Text


datafog-instructor detect-entities --prompt "Apple Inc. was founded by Steve Jobs in Cupertino, California."

This will output a table of detected entities, their positions, and types.

Display Current Configuration


datafog-instructor show-fogprint

This command will show you the current configuration stored in fogprint.json.

Reinitialize with Custom Settings

To change the default model or pattern:

Edit the fogprint.json file directly, or
Use the init command with the --force flag:


datafog-instructor init --force

Follow the prompts to update your configuration.

Advanced Usage

Adjust the maximum number of tokens generated:


datafog-instructor detect-entities --prompt "Your text here" --max-new-tokens 100

For batch processing or integration into your Python projects, import the EntityDetector class from models.py.

Development and Testing

For development purposes, you can install additional dependencies:


python -m venv venv && source venv/bin/activate && pip install requirements-dev.txt

## Documentation

To build the documentation locally:

pip install datafog-instructor[docs] cd docs sphinx


The documentation will be available in the `docs/_build/html` directory.

## Contributing

Contributions to the DataFog Instructor SDK are welcome! Please feel free to submit a Pull Request.

## License

This project is licensed under the MIT License.

## Support

If you encounter any problems or have any questions, please open an issue on the GitHub repository or join our Discord community at https://discord.gg/bzDth394R4.

## Links

- Homepage: https://datafog.ai
- Documentation: https://docs.datafog.ai
- Twitter: https://twitter.com/datafoginc
- GitHub: https://github.com/datafog/datafog-instructor

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DataFog Instructor

Demo

Installation

Quick Start

Initialize

Verify the installation:

Sample Operations

Detect Entities in Text

Display Current Configuration

Reinitialize with Custom Settings

Advanced Usage

Development and Testing

About

Releases

Packages

Languages

License

DataFog/datafog-instructor

Folders and files

Latest commit

History

Repository files navigation

DataFog Instructor

Demo

Installation

Quick Start

Initialize

Verify the installation:

Sample Operations

Detect Entities in Text

Display Current Configuration

Reinitialize with Custom Settings

Advanced Usage

Development and Testing

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages