BengaliNLP is a natural language processing toolkit for Bengali Language. This tool will help you to tokenize Bengali text, Embedding Bengali words, Embedding Bengali Document, Bengali POS Tagging, Bengali Name Entity Recognition, Bangla Text Cleaning for Bengali NLP purposes.
- Tokenization
- Embeddings
- Part of speech tagging
- Named Entity Recognition
- Text Cleaning
- Corpus
- Letters, vowels, punctuations, stopwords
pip install bengalinlp
or Upgrade
pip install -U bengalinlp
- Python: 3.8, 3.9, 3.10, 3.11
- OS: Linux, Windows, Mac
git clone https://github.com/banglawiki/bengalinlp.git
cd bengalinlp
python setup.py install
from bengalinlp import BasicTokenizer
tokenizer = BasicTokenizer()
raw_text = "আমি বাংলায় গান গাই।"
tokens = tokenizer(raw_text)
print(tokens)
# output: ["আমি", "বাংলায়", "গান", "গাই", "।"]