JohnSnowLabs · maziyarpanahi · Apr 25, 2023 · Nov 21, 2022 · Nov 25, 2022 · Dec 15, 2022
diff --git a/docs/_posts/Naveen-004/2023-04-13-CyberbullyingDetection_ClassifierDL_tfhub_en.md b/docs/_posts/Naveen-004/2023-04-13-CyberbullyingDetection_ClassifierDL_tfhub_en.md
@@ -0,0 +1,97 @@
+---
+layout: model
+title: Cyberbullying Detection
+author: Naveen-004
+name: CyberbullyingDetection_ClassifierDL_tfhub
+date: 2023-04-13
+tags: [en, open_source]
+task: Text Classification
+language: en
+edition: Spark NLP 4.4.0
+spark_version: 3.0
+supported: false
+annotator: PipelineModel
+article_header:
+  type: cover
+use_language_switcher: "Python-Scala-Java"
+---
+
+## Description
+
+Identify cyberbullying using a multi-class classification framework that distinguishes six different types of cyberbullying. We have used a Twitter dataset from Kaggle and applied various techniques such as text cleaning, data augmentation, document assembling, universal sentence encoding and tensorflow classification model to process and analyze the data. We have also used snscrape to retrieve tweet data for validating our model’s performance. Our results show that our model achieved an accuracy of 85% for testing data and 89% for training data.
+
+{:.btn-box}
+<button class="button button-orange" disabled>Live Demo</button>
+[Open in Colab](https://colab.research.google.com/drive/1xaIlDtpiGzf14EA1umhJoOXI0FZaYtRc?authuser=4#scrollTo=os2C1v2WW1Hi){:.button.button-orange.button-orange-trans.co.button-icon}
+[Download](https://s3.amazonaws.com/community.johnsnowlabs.com/Naveen-004/CyberbullyingDetection_ClassifierDL_tfhub_en_4.4.0_3.0_1681363209630.zip){:.button.button-orange}
+[Copy S3 URI](s3://community.johnsnowlabs.com/Naveen-004/CyberbullyingDetection_ClassifierDL_tfhub_en_4.4.0_3.0_1681363209630.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3}
+
+## How to use
+
+
+
+<div class="tabs-box" markdown="1">
+{% include programmingLanguageSelectScalaPythonNLU.html %}
+```python
+documentAssembler = DocumentAssembler()\
+    .setInputCol("cleaned_text")\
+    .setOutputCol("document")
+
+use = UniversalSentenceEncoder.pretrained(name="tfhub_use_lg", lang="en")\
+ .setInputCols("document")\
+ .setOutputCol("sentence_embeddings")\
+ .setDimension(768)
+
+classifierdl = ClassifierDLApproach()\
+  .setInputCols(["sentence_embeddings"])\
+  .setOutputCol("class")\
+  .setLabelColumn("cyberbullying_type")\
+  .setBatchSize(16)\
+  .setMaxEpochs(42)\
+  .setDropout(0.4) \
+  .setEnableOutputLogs(True)\
+  .setLr(4e-3)
+use_clf_pipeline = Pipeline(
+    stages = [documentAssembler,
+        use,
+        classifierdl])
+```
+
+</div>
+
+## Results
+
+```bash
+           precision    recall  f1-score   support
+
+                age       0.94      0.96      0.95       796
+          ethnicity       0.94      0.94      0.94       810
+             gender       0.87      0.86      0.86       816
+  not_cyberbullying       0.74      0.67      0.70       766
+other_cyberbullying       0.67      0.71      0.69       775
+           religion       0.94      0.96      0.95       731
+
+           accuracy                           0.85      4694
+          macro avg       0.85      0.85      0.85      4694
+       weighted avg       0.85      0.85      0.85      4694
+
+```
+
+{:.model-param}
+## Model Information
+
+{:.table-model}
+|---|---|
+|Model Name:|CyberbullyingDetection_ClassifierDL_tfhub|
+|Type:|pipeline|
+|Compatibility:|Spark NLP 4.4.0+|
+|License:|Open Source|
+|Edition:|Community|
+|Language:|en|
+|Size:|811.9 MB|
+
+## Included Models
+
+- DocumentAssembler
+- UniversalSentenceEncoder
+- ClassifierDLModel
diff --git a/...e127/2023-04-20-distilbert_base_zero_shot_classifier_turkish_cased_allnli_tr.md b/...e127/2023-04-20-distilbert_base_zero_shot_classifier_turkish_cased_allnli_tr.md
@@ -0,0 +1,104 @@
+---
+layout: model
+title: DistilBERTZero-Shot Classification Base - distilbert_base_zero_shot_classifier_turkish_cased_allnli
+author: John Snow Labs
+name: distilbert_base_zero_shot_classifier_turkish_cased_allnli
+date: 2023-04-20
+tags: [distilbert, zero_shot, turkish, tr, base, open_source, tensorflow]
+task: Zero-Shot Classification
+language: tr
+edition: Spark NLP 4.4.1
+spark_version: [3.2, 3.0]
+supported: true
+engine: tensorflow
+annotator: DistilBertForZeroShotClassification
+article_header:
+  type: cover
+use_language_switcher: "Python-Scala-Java"
+---
+
+## Description
+
+This model is intended to be used for zero-shot text classification, especially in Trukish. It is fine-tuned on MNLI by using DistilBERT Base Uncased model.
+
+DistilBertForZeroShotClassification using a ModelForSequenceClassification trained on NLI (natural language inference) tasks. Equivalent of DistilBertForSequenceClassification models, but these models don’t require a hardcoded number of potential classes, they can be chosen at runtime. It usually means it’s slower but it is much more flexible.
+
+We used TFDistilBertForSequenceClassification to train this model and used DistilBertForZeroShotClassification annotator in Spark NLP 🚀 for prediction at scale!
+
+## Predicted Entities
+
+
+
+{:.btn-box}
+<button class="button button-orange" disabled>Live Demo</button>
+<button class="button button-orange" disabled>Open in Colab</button>
+[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_zero_shot_classifier_turkish_cased_allnli_tr_4.4.1_3.2_1682016415236.zip){:.button.button-orange}
+[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_zero_shot_classifier_turkish_cased_allnli_tr_4.4.1_3.2_1682016415236.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3}
+
+## How to use
+
+
+
+<div class="tabs-box" markdown="1">
+{% include programmingLanguageSelectScalaPythonNLU.html %}
+```python
+document_assembler = DocumentAssembler() \
+.setInputCol('text') \
+.setOutputCol('document')
+
+tokenizer = Tokenizer() \
+.setInputCols(['document']) \
+.setOutputCol('token')
+
+zeroShotClassifier = DistilBertForZeroShotClassification \
+.pretrained('distilbert_base_zero_shot_classifier_turkish_cased_allnli', 'en') \
+.setInputCols(['token', 'document']) \
+.setOutputCol('class') \
+.setCaseSensitive(True) \
+.setMaxSentenceLength(512) \
+.setCandidateLabels(["olumsuz", "olumlu"])
+
+pipeline = Pipeline(stages=[
+document_assembler,
+tokenizer,
+zeroShotClassifier
+])
+example = spark.createDataFrame([['Senaryo çok saçmaydı, beğendim diyemem.']]).toDF("text")
+result = pipeline.fit(example).transform(example)
+```
+```scala
+val document_assembler = DocumentAssembler()
+.setInputCol("text")
+.setOutputCol("document")
+
+val tokenizer = Tokenizer()
+.setInputCols("document")
+.setOutputCol("token")
+
+val zeroShotClassifier = DistilBertForZeroShotClassification.pretrained("distilbert_base_zero_shot_classifier_turkish_cased_allnli", "en")
+.setInputCols("document", "token")
+.setOutputCol("class")
+.setCaseSensitive(true)
+.setMaxSentenceLength(512)
+.setCandidateLabels(Array("olumsuz", "olumlu"))
+
+val pipeline = new Pipeline().setStages(Array(document_assembler, tokenizer, zeroShotClassifier))
+val example = Seq("Senaryo çok saçmaydı, beğendim diyemem.").toDS.toDF("text")
+val result = pipeline.fit(example).transform(example)
+```
+</div>
+
+{:.model-param}
+## Model Information
+
+{:.table-model}
+|---|---|
+|Model Name:|distilbert_base_zero_shot_classifier_turkish_cased_allnli|
+|Compatibility:|Spark NLP 4.4.1+|
+|License:|Open Source|
+|Edition:|Official|
+|Input Labels:|[token, document]|
+|Output Labels:|[multi_class]|
+|Language:|tr|
+|Size:|254.3 MB|
+|Case sensitive:|true|
diff --git a/...27/2023-04-20-distilbert_base_zero_shot_classifier_turkish_cased_multinli_tr.md b/...27/2023-04-20-distilbert_base_zero_shot_classifier_turkish_cased_multinli_tr.md
@@ -0,0 +1,103 @@
+---
+layout: model
+title: DistilBERTZero-Shot Classification Base - distilbert_base_zero_shot_classifier_turkish_cased_multinli
+author: John Snow Labs
+name: distilbert_base_zero_shot_classifier_turkish_cased_multinli
+date: 2023-04-20
+tags: [zero_shot, tr, turkish, distilbert, base, cased, open_source, tensorflow]
+task: Zero-Shot Classification
+language: tr
+edition: Spark NLP 4.4.1
+spark_version: [3.2, 3.0]
+supported: true
+engine: tensorflow
+annotator: DistilBertForZeroShotClassification
+article_header:
+  type: cover
+use_language_switcher: "Python-Scala-Java"
+---
+
+## Description
+
+This model is intended to be used for zero-shot text classification, especially in Trukish. It is fine-tuned on MNLI by using DistilBERT Base Uncased model.
+
+DistilBertForZeroShotClassification using a ModelForSequenceClassification trained on NLI (natural language inference) tasks. Equivalent of DistilBertForSequenceClassification models, but these models don’t require a hardcoded number of potential classes, they can be chosen at runtime. It usually means it’s slower but it is much more flexible.
+
+We used TFDistilBertForSequenceClassification to train this model and used DistilBertForZeroShotClassification annotator in Spark NLP 🚀 for prediction at scale!
+
+## Predicted Entities
+
+
+
+{:.btn-box}
+<button class="button button-orange" disabled>Live Demo</button>
+<button class="button button-orange" disabled>Open in Colab</button>
+[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_zero_shot_classifier_turkish_cased_multinli_tr_4.4.1_3.2_1682014879417.zip){:.button.button-orange}
+[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_zero_shot_classifier_turkish_cased_multinli_tr_4.4.1_3.2_1682014879417.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3}
+
+## How to use
+
+
+
+<div class="tabs-box" markdown="1">
+{% include programmingLanguageSelectScalaPythonNLU.html %}
+```python
+document_assembler = DocumentAssembler() \
+.setInputCol('text') \
+.setOutputCol('document')
+tokenizer = Tokenizer() \
+.setInputCols(['document']) \
+.setOutputCol('token')
+
+zeroShotClassifier = DistilBertForZeroShotClassification \
+.pretrained('distilbert_base_zero_shot_classifier_turkish_cased_multinli', 'en') \
+.setInputCols(['token', 'document']) \
+.setOutputCol('class') \
+.setCaseSensitive(True) \
+.setMaxSentenceLength(512) \
+.setCandidateLabels(["ekonomi", "siyaset","spor"])
+
+pipeline = Pipeline(stages=[
+document_assembler,
+tokenizer,
+zeroShotClassifier
+])
+example = spark.createDataFrame([['Dolar yükselmeye devam ediyor.']]).toDF("text")
+result = pipeline.fit(example).transform(example)
+```
+```scala
+val document_assembler = DocumentAssembler()
+.setInputCol("text")
+.setOutputCol("document")
+
+val tokenizer = Tokenizer()
+.setInputCols("document")
+.setOutputCol("token")
+
+val zeroShotClassifier = DistilBertForZeroShotClassification.pretrained("distilbert_base_zero_shot_classifier_turkish_cased_multinli", "en")
+.setInputCols("document", "token")
+.setOutputCol("class")
+.setCaseSensitive(true)
+.setMaxSentenceLength(512)
+.setCandidateLabels(Array("ekonomi", "siyaset","spor"))
+
+val pipeline = new Pipeline().setStages(Array(document_assembler, tokenizer, zeroShotClassifier))
+val example = Seq("Dolar yükselmeye devam ediyor.").toDS.toDF("text")
+val result = pipeline.fit(example).transform(example)
+```
+</div>
+
+{:.model-param}
+## Model Information
+
+{:.table-model}
+|---|---|
+|Model Name:|distilbert_base_zero_shot_classifier_turkish_cased_multinli|
+|Compatibility:|Spark NLP 4.4.1+|
+|License:|Open Source|
+|Edition:|Official|
+|Input Labels:|[token, document]|
+|Output Labels:|[multi_class]|
+|Language:|tr|
+|Size:|254.3 MB|
+|Case sensitive:|true|