Taranis AI natural-language-processing Bot

Bot for extracting named entities (e.g. Location, Person, etc.) from texts. Available models:

gliner (+ cybersec gliner) (https://huggingface.co/llinauer/gliner_de_en_news) - Default
roberta (https://huggingface.co/FacebookAI/xlm-roberta-large-finetuned-conll03-english)
roberta_german (https://huggingface.co/FacebookAI/xlm-roberta-large-finetuned-conll03-german)

The following entities can be extracted:

Person, Location, Organization (for all three models)
(+) MISC (for roberta & roberta_german)
(+) Product, Address
(+) CLICommand/CodeSnippet, Con, Group, Malware, Sector, Tactic, Technique, Tool

Pre-requisites

uv - https://docs.astral.sh/uv/getting-started/installation/
docker (for building container) - https://docs.docker.com/engine/

Create a python venv and install the necessary packages for the bot to run.

uv venv
source .venv/bin/activate
uv sync --all-extras --dev

Usage

You can run your bot locally with

flask run --port 5500
# or
granian app --port 5500

You can set configs either via a .env file or by setting environment variables directly. available configs are in the config.py You can select the model via the MODEL env var. E.g.:

MODEL=roberta flask run

You can specify a Confidence threshold for extracted entities via the CONFIDENCE_THRESHOLD env variable. It should be a float and between 0.0 and 1.0. All entities that get assigned a confidence score < CONFIDENCE_THRESHOLD will be disregarded.

You can also configure entity disambiguation by entity linking to a DBPEDIA instance. You need to set DBPEDIA_LOOKUP=True and set DBPEDIA_URL to the URL of a running DBPEDIA search endpoint (e.g. https://lookup.dbpedia.org/api/search) to enable this features. The bot will query the DBPEDIA instance to get valid common names for entities such as U.S., USA, U.S.A. and try to match them.

Docker

You can also create a Docker image out of this bot. For this, you first need to build the image with the build_container.sh

You can specify which model the image should be built with the MODEL environment variable. If you omit it, the image will be built with the default model.

MODEL=<model_name> ./build_container.sh

then you can run it with:

docker run -p 5500:8000 <image-name>:<tag>

If you encounter errors, make sure that port 5500 is not in use by another application.

Test the bot

Once the bot is running, you can send test data to it on which it runs its inference method:

> curl -X POST http://127.0.0.1:5500 -H "Content-Type: application/json" -d '{"text": "This is an example for NER, about the ACME Corporation which is producing Dynamite in Acme City, which is in Australia and run by Mr. Wile E. Coyote"}'
> {"ACME Corporation":"Organization","Acme City":"Location","Australia":"Location","Dynamite":"Product","NER":"Organization","Wile E. Coyote":"Person"}

The bot accepts the key extended_output in the payload, which causes it to return more information.

>curl -X POST http://127.0.0.1:5500 -H "Content-Type: application/json" -d '{"text": "This is an example for NER, about the ACME Corporation which is producing Dynamite in Acme City, which is in Australia and run by Mr. Wile E. Coyote", "extended_output": true}'
>[{"position":"23-26","probability":"0.93","type":"Organization","value":"NER"},{"position":"38-54","probability":"1.00","type":"Organization","value":"ACME Corporation"},{"position":"74-82","probability":"0.90","type":"Product","value":"Dynamite"},{"position":"86-95","probability":"0.92","type":"Location","value":"Acme City"},{"position":"109-118","probability":"1.00","type":"Location","value":"Australia"},{"position":"134-148","probability":"0.99","type":"Person","value":"Wile E. Coyote"}]

You can also set up authorization via the API_KEY env var. In this case, you need to send the API_KEY as an Authorization header:

> curl -X POST http://127.0.0.1:5500/  -H "Authorization: Bearer api_key" -H "Content-Type: application/json"   -d '{"text": "This is an example for NER, about the ACME Corporation which is producing Dy#namite in Acme City, which is in Australia and run by Mr. Wile E. Coyote."}'
> {"ACME Corporation":"Organization","Acme City":"Location","Australia":"Location","Dynamite":"Product","NER":"Organization","Wile E. Coyote":"Person"}

Finally, if you are using the gliner model, you can additionally extract cybersecurity-related entities:

> curl -X POST http://127.0.0.1:5500 -H "Content-Type: application/json" -d '{"text": "Upon opening Emotet maldocs, victims are greeted with fake Microsoft 365 prompt that states THIS DOCUMENT IS PROTECTED, and instructs victims on how to enable macros.", "cybersecurity": true}'
> {"Emotet":"Malware","Microsoft 365":"Product"}

Development

If you want to contribute to the development of this bot, make sure you set up your pre-commit hooks correctly:

Install pre-commit (https://pre-commit.com/)
Setup hooks: > pre-commit install

License

EUROPEAN UNION PUBLIC LICENCE v. 1.2

Name		Name	Last commit message	Last commit date
Latest commit History 45 Commits
.github/workflows		.github/workflows
docker		docker
natural_language_processing		natural_language_processing
tests		tests
.containerignore		.containerignore
.copier-answers.yml		.copier-answers.yml
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
Containerfile		Containerfile
LICENSE.md		LICENSE.md
README.md		README.md
app.py		app.py
build_container.sh		build_container.sh
openapi3_1.yml		openapi3_1.yml
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Taranis AI natural-language-processing Bot

Pre-requisites

Usage

Docker

Test the bot

Development

License

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors 6

Uh oh!

Languages

License

taranis-ai/natural-language-processing

Folders and files

Latest commit

History

Repository files navigation

Taranis AI natural-language-processing Bot

Pre-requisites

Usage

Docker

Test the bot

Development

License

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors 6

Uh oh!

Languages

Packages