The fundamental data structure for representing extracted entities.
from simple_NER import Entity
entity = Entity(
value="john@example.com",
entity_type="email",
source_text="Contact john@example.com",
confidence=1.0,
data={"domain": "example.com"}
)Attributes:
value(str): The extracted textentity_type(str): Category label (e.g., "email", "person")source_text(str): Original input textconfidence(float): Confidence score [0.0-1.0]data(dict): Additional metadataspans(list[tuple[int, int]]): Character positions in source_textindexes(list[int]): Start positions of matchesoccurrence_number(int): Number of occurrences
Methods:
as_json() -> dict: Serialize to JSON-safe dictionary
Base class for keyword-based entity recognition.
from simple_NER import SimpleNER
ner = SimpleNER()
ner.add_entity_examples("fruit", ["apple", "banana", "orange"])
for entity in ner.extract_entities("I ate an apple"):
print(entity.value, entity.entity_type)Methods:
add_entity_examples(name: str, examples: str | list[str]): Register examplesextract_entities(text: str, as_json: bool = False) -> Generator[Entity, None, None]: Extract entitiesentity_lookup(text: str, as_json: bool = False) -> Generator[Entity, None, None]: Lookup by examplesis_match(text: str, entity: str | Entity) -> bool: Check if entity exists in textin_place_annotation(text: str) -> str: Annotate text with entity labels
Pattern-based extraction using simplematch syntax.
from simple_NER.rules import RuleNER
ner = RuleNER()
ner.add_rule("name", "my name is {person}")
for entity in ner.extract_entities("my name is Alice"):
print(entity.value) # "Alice"
print(entity.entity_type) # "person"Methods:
add_rule(name: str, rules: str | list[str]): Add simplematch patternextract_entities(text: str, as_json: bool = False): Extract entities
Regex-based extraction extending RuleNER.
from simple_NER.rules.rx import RegexNER
ner = RegexNER()
ner.add_rule("date", r"\d{2}/\d{2}/\d{4}")
for entity in ner.extract_entities("Date: 12/25/2023"):
print(entity.value) # "12/25/2023"Methods:
add_rule(name: str, rules: str | list[str]): Add regex patternadd_entity_examples(name: str, examples: str | list[str]): Add word-boundary examples
Neural network-based extraction using padatious.
from simple_NER.rules.neural import NeuralNER
ner = NeuralNER()
ner.add_rule("name", "my name is {person}")
ner.add_rule("name", "i am {person}")
for entity in ner.extract_entities("the name is Bob"):
print(entity.value, entity.confidence)Note: Requires padatious and fann2 packages.
Abstract base class for all annotators.
from simple_NER.annotators.base import Annotator
class MyAnnotator(Annotator):
@property
def name(self) -> str:
return "my_annotator"
def extract_entities(self, text: str) -> Generator[Entity, None, None]:
# Implementation
passAbstract Methods:
name -> str: Unique identifierextract_entities(text: str) -> Generator[Entity, None, None]: Extract entities
Concrete base class with common functionality.
from simple_NER.annotators.base import BaseAnnotator
from simple_NER import Entity
class EmailAnnotator(BaseAnnotator):
@property
def name(self) -> str:
return "email"
def annotate(self, text: str) -> Generator[Entity, None, None]:
# Your extraction logic
yield Entity(email, "email", source_text=text)Methods to Implement:
annotate(text: str) -> Generator[Entity, None, None]: Your extraction logic
Inherited Methods:
extract_entities(text: str): Callsannotate()name -> str: Returns lowercase class name
Extract email addresses using regex.
from simple_NER.annotators.email_ner import EmailNER
ner = EmailNER()
for ent in ner.extract_entities("Contact test@example.com"):
print(ent.value) # "test@example.com"Extract proper nouns (names) using regex.
from simple_NER.annotators.names_ner import NamesNER
ner = NamesNER(confidence_threshold=0.8)
for ent in ner.extract_entities("John Doe met Alice"):
print(ent.value, ent.confidence)Parameters:
confidence_threshold(float): Minimum confidence [0.65-0.8]min_word_length(int): Minimum word length
Extract countries, capitals, and cities from JSON wordlists.
from simple_NER.annotators.locations_ner import LocationNER
ner = LocationNER(
include_countries=True,
include_capitals=True,
include_cities=True,
lowercase=False
)
for ent in ner.extract_entities("Lisbon is capital of Portugal"):
print(ent.value, ent.entity_type)Parameters:
include_countries(bool): Extract country namesinclude_capitals(bool): Extract capital citiesinclude_cities(bool): Extract all citieslowercase(bool): Case-insensitive matching
Extract datetime and duration expressions.
from simple_NER.annotators.temporal_ner import TemporalNER
ner = TemporalNER()
# Datetime
for ent in ner.extract_entities("meeting tomorrow at 3pm"):
if ent.entity_type == "relative_date":
print(ent.data) # {timestamp, isoformat, year, month, day, ...}
# Duration
for ent in ner.extract_entities("wait 5 minutes"):
if ent.entity_type == "duration":
print(ent.data) # {days, seconds, total_seconds, ...}Parameters:
anchor_date(datetime): Reference date for relative expressionsextract_datetime(bool): Enable datetime extractionextract_duration(bool): Enable duration extraction
Note: Requires ovos-date-parser or lingua_nostra.
Extract written numbers.
from simple_NER.annotators.numbers_ner import NumberNER
ner = NumberNER(ordinals=True, short_scale=True)
for ent in ner.extract_entities("three hundred apples"):
print(ent.value, ent.data["number"]) # "three hundred", "300.0"Parameters:
ordinals(bool): Extract ordinal numbers (1st, 2nd, third)short_scale(bool): US (short) vs UK (long) scalecase_sensitive(bool): Case-sensitive matching
Note: Requires ovos-number-parser or lingua_nostra.
Extract keywords using RAKE algorithm.
from simple_NER.annotators.keyword_ner import KeywordNER
ner = KeywordNER(lang="en", min_word_length=3)
for ent in ner.extract_entities("Machine learning is amazing"):
print(ent.value, ent.data["score"])Parameters:
lang(str): Language codemin_word_length(int): Minimum keyword lengthconfidence(float): Minimum confidence threshold
Note: Requires RAKEkeywords.
Extract physical quantities and measurements.
from simple_NER.annotators.units_ner import UnitsNER
ner = UnitsNER(lang="en")
for ent in ner.extract_entities("The LHC operates at 13.0 TeV"):
print(ent.value, ent.entity_type) # "13.0 TeV", "Energy:Electronvolt"Parameters:
lang(str): Language codeconfidence(float): Default confidence
Note: Requires quantulum3.
Extract entities from wordlist files.
from simple_NER.annotators.lookup_ner import LookUpNER
ner = LookUpNER(lang="en-us")
for ent in ner.extract_entities("The sky is blue"):
print(ent.value, ent.entity_type) # "blue", "color"Parameters:
lang(str): Language code for resource filescase_sensitive(bool): Case-sensitive matching
Methods:
add_wordlist(label: str, words: list[str]): Add custom wordlistremove_wordlist(label: str) -> bool: Remove wordlistloaded_types -> list[str]: List loaded entity types
Execute multiple annotators with deduplication.
from simple_NER.pipeline import NERPipeline
from simple_NER.annotators.email_ner import EmailNER
from simple_NER.annotators.names_ner import NamesNER
pipeline = NERPipeline(
annotators=[EmailNER(), NamesNER()],
dedup_strategy="keep_higher_confidence"
)
entities = pipeline.process("John contacted john@example.com")
for ent in entities:
print(ent.value, ent.entity_type)Deduplication Strategies:
"keep_all": No deduplication"keep_longest": Keep longest entity on overlap"keep_higher_confidence": Keep higher confidence entity"keep_first": Keep first detected entity
Methods:
add_annotator(annotator: Annotator): Add annotatorremove_annotator(name: str) -> bool: Remove by nameprocess(text: str) -> list[Entity]: Process and deduplicateprocess_generator(text: str) -> Generator[Entity, None, None]: Stream results
Create annotators and pipelines by name.
from simple_NER.annotators.factory import (
get_annotator,
create_pipeline,
list_available_annotators,
register_annotator
)
# List available
print(list_available_annotators())
# ['email', 'names', 'locations', ...]
# Create single annotator
email_ner = get_annotator("email")
# Create pipeline
pipeline = create_pipeline(
["email", "names", "locations"],
dedup_strategy="keep_higher_confidence"
)
# Register custom annotator
register_annotator("my_annotator", MyCustomAnnotator)Functions:
get_annotator(name: str, **kwargs) -> Annotator: Create by namecreate_pipeline(names: list[str], dedup_strategy: str, **kwargs) -> NERPipeline: Create pipelinelist_available_annotators() -> list[str]: List registered namesregister_annotator(name: str, annotator_class: type[Annotator]): Register custom
| Name | Class | Description |
|---|---|---|
email |
EmailAnnotator | Email addresses |
email_regex |
EmailNER | Email (regex version) |
names |
NamesNER | Proper nouns |
locations |
LocationNER | Countries, capitals, cities |
countries |
LocationNER | Countries only |
cities |
LocationNER | Cities only |
temporal |
TemporalNER | Datetime and duration |
datetime |
TemporalNER | Datetime only |
duration |
TemporalNER | Duration only |
numbers |
NumberNER | Written numbers |
written_numbers |
NumberNER | Written numbers (alias) |
keywords |
KeywordNER | RAKE keywords |
units |
UnitsNER | Measurements |
measurements |
UnitsNER | Measurements (alias) |
lookup |
LookUpNER | Wordlist lookup |
wordlist |
LookUpNER | Wordlist (alias) |
Old:
from simple_NER.annotators.datetime_ner import DateTimeNER
ner = DateTimeNER()New:
from simple_NER.annotators.temporal_ner import TemporalNER
ner = TemporalNER() # DateTimeNER still works (alias)Old:
from simple_NER.annotators import NERWrapper
wrapper = NERWrapper()
wrapper.add_detector(custom_function)New (still supported):
# NERWrapper still worksNew (recommended):
from simple_NER.annotators.base import BaseAnnotator
class CustomAnnotator(BaseAnnotator):
def annotate(self, text):
# Your logic
yield Entity(...)All annotators handle errors gracefully:
from simple_NER.annotators.factory import get_annotator
try:
ner = get_annotator("email")
entities = list(ner.extract_entities(text))
except Exception as e:
print(f"Extraction error: {e}")Missing optional dependencies are handled with warnings:
WARNING - quantulum3 not installed. UnitsNER will not function.
Install with: pip install quantulum3