A hub of third-party NLP providers and tutorials to help you instantly handle your data iterator with no-string dependency apps.
The purpose is of this project is to share Third-party providers that could be combined into a single pipeline.
- LLM / Mistral.AI [🤖 models]
- LLM / OpenRouter.AI [🤖 models]
- LLM / Replicate.IO [🤖 models]
- LLM / OpenAI:
- LLM / Transformers:
- DeepSeek-R1-distill-7b [📙 qwen-notebook] [📙 llama3-notebook]
- LLaMA-3
- Qwen-2
- Phi-4
- Gemma-3 [📙 notebook]
- Flan-T5
- Mistral
- NER / DeepPavlov [📙 notebook]
- NER / Flair [bash-script] [🤖 models]
- NER / Spacy [bash-script] [🤖 models]
- Translation / GoogleTranslator [📙 notebook]
In this project we consider that each provider represent a wrapper over third-party app to handle iterator of data.
We consider dict
python type for representing each record of the data.
If you wish to use several third-party providers all together for a
data-iterators, it is recommented to adopt AREkit
framework as a no-string solution for deploying pipeline that support batching mode.
- bulk-chain -- framework for reasoning over your tabular data rows with any provided LLM
- bulk-ner -- framework for a quick third-party models binding for entities extraction from cells of long tabular data
- bulk-translate -- framework for translation of a massive stream of texts with native support of pre-annotated fixed-spans that are invariant for translator.
- AREkit pipelines -- toolkit for handling your textual data iterators with various NLP providers