Persian Swear Dataset - you can use in your production to filter unwanted content. دیتاست کلمات نامناسب و بد فارسی برای فیلتر کردن متن ها
-
Updated
Mar 26, 2026 - C#
Persian Swear Dataset - you can use in your production to filter unwanted content. دیتاست کلمات نامناسب و بد فارسی برای فیلتر کردن متن ها
A list of Romanian NLP Datasets
AfriSenti-SemEval Shared Task 12: Sentiment Analysis for African languages : https://afrisenti-semeval.github.io/
A meta enriched data set of German parliamental debates covering 74 years of plenary protocols.
Measure how understandable a German text is.
Dataset for web-scaled information extraction.
Dataset with annotation of Russian-language poems
This repo is the dataset for the paper "A New Dataset and Methodology for Malicious URL Classification"
Persian Slang Words (dataset)
CSV extraction of Kamus Besar Bahasa Indonesia (KBBI) v6.1.0. Over 194,000 research-ready entries with full metadata (Meanings, Examples, Etymology, and Classes).
Multi-Perspective Sarcasm Explanation Dataset with Human
Persian sms dataset
Free news datasets from Newsdata.io for ML, NLP, and sentiment analysis - Business, Sports, Entertainment, Health, COVID, Politics, Tech & more.
A meticulously curated, AI-enriched dataset of 1000 essential TOEFL words with academic context
Repository for the LREC-COLING 2024 Paper: Persona-Based Corpus in the Diabetes Mellitus Domain – Applying a Human-Centered Approach to a Low-Resource Context
ELNER-DZ: A Dataset for Named Entity Recognition and Linking in Algerian Arabic Dialect (Darija)
國際公約中英雙語結構化資料集 · Bilingual international treaty corpus in structured JSON
A conservative release candidate of cleaned Chinese legal texts for legal NLP, RAG prototypes, corpus cleaning, and training-data preparation.
All the resources needed to establish an Islamic AI: a curated PDF library and a custom-developed persona
Persian News Dataset
Add a description, image, and links to the nlp-dataset topic page so that developers can more easily learn about it.
To associate your repository with the nlp-dataset topic, visit your repo's landing page and select "manage topics."