Skip to content

Commit 29bcc6b

Browse files
maziyarpanahijsl-modelsahmedlone127prabodDevinTDHa
authored
Models hub (#13940)
--------- Co-authored-by: ahmedlone127 <ahmedlone127@gmail.com> Co-authored-by: Maziyar Panahi <maziyar.panahi@iscpif.fr> * 2023-08-15-gte_base_en (#13922) * Add model 2023-08-15-gte_base_en * Add model 2023-08-15-gte_large_en * Add model 2023-08-15-gte_small_en --------- Co-authored-by: maziyarpanahi <maziyar.panahi@iscpif.fr> * 2023-08-15-bge_small_en (#13923) * Add model 2023-08-15-bge_small_en * Add model 2023-08-15-bge_base_en * Add model 2023-08-15-bge_large_en --------- Co-authored-by: maziyarpanahi <maziyar.panahi@iscpif.fr> * 2023-08-18-mpnet_embedding_mpnet_snli_en (#13929) * Add model 2023-08-18-mpnet_embedding_mpnet_snli_en * Add model 2023-08-18-mpnet_embedding_Setfit_few_shot_classifier_en * Add model 2023-08-18-mpnet_embedding_multi_qa_mpnet_base_dot_v1_eclass_en * Add model 2023-08-18-mpnet_embedding_all_mpnet_base_v2_embedding_all_en * Add model 2023-08-18-mpnet_embedding_multi_qa_mpnet_base_dot_v1_eclass_en * Add model 2023-08-18-mpnet_embedding_PatentSBERTa_en * Add model 2023-08-18-mpnet_embedding_ecolo_pas_ecolo_v0.1_en * Add model 2023-08-18-mpnet_embedding_Setfit_few_shot_classifier_en * Add model 2023-08-18-mpnet_embedding_all_mpnet_base_v2_finetuned_v2_en * Add model 2023-08-18-mpnet_embedding_FewShotIssueClassifier_NLBSE23_en * Add model 2023-08-18-mpnet_embedding_nooks_amd_detection_v2_full_en * Add model 2023-08-18-mpnet_embedding_action_policy_plans_classifier_en * Add model 2023-08-18-mpnet_embedding_cross_all_mpnet_base_v2_finetuned_WebNLG2020_metric_average_en * Add model 2023-08-18-mpnet_embedding_setfit_zero_shot_classification_pbsp_p3_func_en * Add model 2023-08-18-mpnet_embedding_ouvrage_classif_en * Add model 2023-08-18-mpnet_embedding_tiny_random_MPNetForTokenClassification_en * Add model 2023-08-18-mpnet_embedding_review_intent_20230116_en * Add model 2023-08-18-mpnet_embedding_spiced_en * Add model 2023-08-18-mpnet_embedding_mpnet_snli_negatives_en * Add model 2023-08-18-mpnet_embedding_review_multiclass_20230116_en * Add model 2023-08-18-mpnet_embedding_contradiction_psb_lds_en * Add model 2023-08-18-mpnet_embedding_setfit_alpaca_es_unprocessable_sample_detection_es * Add model 2023-08-18-mpnet_embedding_multi_qa_mpnet_base_dot_v1_legal_finetune_en * Add model 2023-08-18-mpnet_embedding_nps_psb_lds_en * Add model 2023-08-18-mpnet_embedding_ATTACK_BERT_en * Add model 2023-08-18-mpnet_embedding_setfit_ethos_multilabel_example_en * Add model 2023-08-18-mpnet_embedding_mpnet_snli_en * Add model 2023-08-18-mpnet_embedding_multi_QA_v1_mpnet_asymmetric_A_en * Add model 2023-08-18-mpnet_embedding_contradiction_psb_en * Add model 2023-08-18-mpnet_embedding_sb_temfac_en * Add model 2023-08-18-mpnet_embedding_setfit_finetuned_financial_text_en * Add model 2023-08-18-mpnet_embedding_setfit_zero_shot_classification_pbsp_p1_comm_en * Add model 2023-08-18-mpnet_embedding_setfit_zero_shot_classification_pbsp_p1_life_en * Add model 2023-08-18-mpnet_embedding_setfit_model_en * Add model 2023-08-18-mpnet_embedding_multi_qa_mpnet_base_cos_v1_en * Add model 2023-08-18-mpnet_embedding_java_deprecation_classifier_en * Add model 2023-08-18-mpnet_embedding_my_awesome_setfit_model_98_en * Add model 2023-08-18-mpnet_embedding_DomainAdaptM2_en * Add model 2023-08-18-mpnet_embedding_mpnet_retriever_squad2_en * Add model 2023-08-18-mpnet_embedding_tiny_random_MPNetForQuestionAnswering_en * Add model 2023-08-18-mpnet_embedding_sml_ukr_word_classifier_medium_en * Add model 2023-08-18-mpnet_embedding_all_mpnet_base_v1_en * Add model 2023-08-18-mpnet_embedding_nooks_amd_detection_realtime_en * Add model 2023-08-18-mpnet_embedding_setfit_model_test_sensitve_v1_en * Add model 2023-08-18-mpnet_embedding_multi_qa_mpnet_base_dot_v1_en * Add model 2023-08-18-mpnet_embedding_due_eshop_21_en * Add model 2023-08-18-mpnet_embedding_setfit_ag_news_endpoint_en * Add model 2023-08-18-mpnet_embedding_setfit_ds_version_0_0_4_en * Add model 2023-08-18-mpnet_embedding_nli_mpnet_base_v2_en * Add model 2023-08-18-mpnet_embedding_sbert_paper_en * Add model 2023-08-18-mpnet_embedding_test_food_en * Add model 2023-08-18-mpnet_embedding_labels_per_job_title_fine_tune_en * Add model 2023-08-18-mpnet_embedding_keyphrase_mpnet_v1_en * Add model 2023-08-18-mpnet_embedding_setfit_occupation_en * Add model 2023-08-18-mpnet_embedding_python_developmentnotes_classifier_en * Add model 2023-08-18-mpnet_embedding_tiny_random_MPNetForSequenceClassification_en * Add model 2023-08-18-mpnet_embedding_due_eshop_21_multilabel_en * Add model 2023-08-18-mpnet_embedding_retriever_coding_guru_adapted_en * Add model 2023-08-18-mpnet_embedding_negation_categories_classifier_es * Add model 2023-08-18-mpnet_embedding_paraphrase_mpnet_base_v2_fuzzy_matcher_en * Add model 2023-08-18-mpnet_embedding_Sentiment140_fewshot_en * Add model 2023-08-18-mpnet_embedding_all_mpnet_base_v2_ftlegal_v3_en * Add model 2023-08-18-mpnet_embedding_setfit_zero_shot_classification_pbsp_p3_sev_en * Add model 2023-08-18-mpnet_embedding_java_usage_classifier_en * Add model 2023-08-18-mpnet_embedding_mpnet_base_snli_mnli_en * Add model 2023-08-18-mpnet_embedding_mpnet_base_articles_ner_en * Add model 2023-08-18-mpnet_embedding_sn_mpnet_base_snli_mnli_en * Add model 2023-08-18-mpnet_embedding_biencoder_all_mpnet_base_v2_mmarcoFR_fr * Add model 2023-08-18-mpnet_embedding_python_expand_classifier_en * Add model 2023-08-18-mpnet_embedding_all_datasets_v4_mpnet_base_en * Add model 2023-08-18-mpnet_embedding_pharo_collaborators_classifier_en * Add model 2023-08-18-mpnet_embedding_setfit_ds_version_0_0_5_en * Add model 2023-08-18-mpnet_embedding_stackoverflow_mpnet_base_en * Add model 2023-08-18-mpnet_embedding_paraphrase_mpnet_base_v2_SetFit_sst2_en * Add model 2023-08-18-mpnet_embedding_all_mpnet_base_v2_table_en * Add model 2023-08-18-mpnet_embedding_InvoiceOrNot_en * Add model 2023-08-18-mpnet_embedding_python_usage_classifier_en * Add model 2023-08-18-mpnet_embedding_all_mpnet_base_v2_feature_extraction_pipeline_en * Add model 2023-08-18-mpnet_embedding_ikitracs_mitigation_en * Add model 2023-08-18-mpnet_embedding_pharo_example_classifier_en * Add model 2023-08-18-mpnet_embedding_tiny_random_MPNetModel_en * Add model 2023-08-18-mpnet_embedding_mpnet_base_en * Add model 2023-08-18-mpnet_embedding_all_mpnet_base_v2_tasky_classification_en * Add model 2023-08-18-mpnet_embedding_pharo_responsibilities_classifier_en * Add model 2023-08-18-mpnet_embedding_all_mpnet_base_v2_for_sb_clustering_en * Add model 2023-08-18-mpnet_embedding_setfit_zero_shot_classification_pbsp_p3_trig_en * Add model 2023-08-18-mpnet_embedding_setfit_zero_shot_classification_pbsp_p1_likes_en * Add model 2023-08-18-mpnet_embedding_python_summary_classifier_en * Add model 2023-08-18-mpnet_embedding_few_shot_model_en * Add model 2023-08-18-mpnet_embedding_CPU_Mitigation_Classifier_en * Add model 2023-08-18-mpnet_embedding_all_mpnet_base_v2_feature_extraction_en * Add model 2023-08-18-mpnet_embedding_mpnet_adaptation_mitigation_classifier_en * Add model 2023-08-18-mpnet_embedding_java_summary_classifier_en * Add model 2023-08-18-mpnet_embedding_579_STmodel_product_rem_v3a_en * Add model 2023-08-18-mpnet_embedding_SetFit_all_data_en * Add model 2023-08-18-mpnet_embedding_setfit_zero_shot_classification_pbsp_p4_meas_en * Add model 2023-08-18-mpnet_embedding_setfit_ostrom_en * Add model 2023-08-18-mpnet_embedding_sentence_transformers_bible_reference_final_en * Add model 2023-08-18-mpnet_embedding_all_datasets_v3_mpnet_base_en * Add model 2023-08-18-mpnet_embedding_ikitracs_conditional_en * Add model 2023-08-18-mpnet_embedding_java_ownership_classifier_en * Add model 2023-08-18-mpnet_embedding_setfit_zero_shot_classification_pbsp_p4_rel_en * Add model 2023-08-18-mpnet_embedding_setfit_zero_shot_classification_pbsp_q8a_azure_gpt35_en * Add model 2023-08-18-mpnet_embedding_setfit_zero_shot_classification_pbsp_p4_time_en * Add model 2023-08-18-mpnet_embedding_java_pointer_classifier_en * Add model 2023-08-18-mpnet_embedding_finetunned_sbert_en * Add model 2023-08-18-mpnet_embedding_setfit_model_Feb11_Misinformation_on_Law_en * Add model 2023-08-18-mpnet_embedding_python_parameters_classifier_en * Add model 2023-08-18-mpnet_embedding_pharo_keyimplementationpoints_classifier_en * Add model 2023-08-18-mpnet_embedding_setfit_ft_sentinent_eval_en * Add model 2023-08-18-mpnet_embedding_reddit_single_context_mpnet_base_en * Add model 2023-08-18-mpnet_embedding_setfit_zero_shot_classification_pbsp_p1_en * Add model 2023-08-18-mpnet_embedding_java_expand_classifier_en * Add model 2023-08-18-mpnet_embedding_setfit_zero_shot_classification_pbsp_p4_achiev_en * Add model 2023-08-18-mpnet_embedding_BioLORD_STAMB2_v1_en * Add model 2023-08-18-mpnet_embedding_setfit_zero_shot_classification_pbsp_p3_bhvr_en * Add model 2023-08-18-mpnet_embedding_fail_detect_en * Add model 2023-08-18-mpnet_embedding_CPU_Target_Classifier_en * Add model 2023-08-18-mpnet_embedding_kw_classification_setfit_model_en * Add model 2023-08-18-mpnet_embedding_paraphrase_mpnet_base_v2_en * Add model 2023-08-18-mpnet_embedding_abstract_sim_sentence_en * Add model 2023-08-18-mpnet_embedding_setfit_ds_version_0_0_2_en * Add model 2023-08-18-mpnet_embedding_mpnet_mnr_v2_fine_tuned_en * Add model 2023-08-18-mpnet_embedding_kw_classification_setfithead_model_en * Add model 2023-08-18-mpnet_embedding_due_retail_25_en * Add model 2023-08-18-mpnet_embedding_mpnet_multilabel_sector_classifier_en * Add model 2023-08-18-mpnet_embedding_vulnerable_groups_en * Add model 2023-08-18-mpnet_embedding_abstract_sim_query_en * Add model 2023-08-18-mpnet_embedding_biencoder_multi_qa_mpnet_base_cos_v1_mmarcoFR_fr * Add model 2023-08-18-mpnet_embedding_stsb_mpnet_base_v2_en * Add model 2023-08-18-mpnet_embedding_covid_qa_mpnet_en * Add model 2023-08-18-mpnet_embedding_CPU_Economywide_Classifier_en * Add model 2023-08-18-mpnet_embedding_github_issues_mpnet_st_e10_en * Add model 2023-08-18-mpnet_embedding_esci_jp_mpnet_crossencoder_en * Add model 2023-08-18-mpnet_embedding_CPU_Netzero_Classifier_en * Add model 2023-08-18-mpnet_embedding_eth_setfit_payment_model_en * Add model 2023-08-18-mpnet_embedding_PDFSegs_en * Add model 2023-08-18-mpnet_embedding_setfit_zero_shot_classification_pbsp_p3_dur_en * Add model 2023-08-18-mpnet_embedding_multi_qa_v1_mpnet_cls_dot_en * Add model 2023-08-18-mpnet_embedding_setfit_zero_shot_classification_pbsp_p4_specific_en * Add model 2023-08-18-mpnet_embedding_setfit_zero_shot_classification_pbsp_p3_cons_en * Add model 2023-08-18-mpnet_embedding_CPU_Conditional_Classifier_en * Add model 2023-08-18-mpnet_embedding_tiny_random_MPNetForMaskedLM_en * Add model 2023-08-18-mpnet_embedding_PatentSBERTa_V2_en * Add model 2023-08-18-mpnet_embedding_sml_ukr_message_classifier_en * Add model 2023-08-18-mpnet_embedding_CPU_Transport_GHG_Classifier_en * Add model 2023-08-18-mpnet_embedding_mpnet_nli_sts_en * Add model 2023-08-18-mpnet_embedding_initial_model_v3_en * Add model 2023-08-18-mpnet_embedding_multi_QA_v1_mpnet_asymmetric_Q_en * Add model 2023-08-18-mpnet_embedding_paraphrase_mpnet_base_v2_finetuned_polifact_en * Add model 2023-08-18-mpnet_embedding_test_model_001_en * Add model 2023-08-18-mpnet_embedding_java_rational_classifier_en * Add model 2023-08-18-mpnet_embedding_setfit_ds_version_0_0_1_en * Add model 2023-08-18-mpnet_embedding_initial_model_en * Add model 2023-08-18-mpnet_embedding_github_issues_preprocessed_mpnet_st_e10_en --------- Co-authored-by: ahmedlone127 <ahmedlone127@gmail.com> * 2023-08-22-asr_whisper_tiny_opt_xx (#13931) * Add model 2023-08-22-asr_whisper_tiny_opt_xx * Add model 2023-08-22-asr_whisper_tiny_xx * Update 2023-08-22-asr_whisper_tiny_opt_xx.md * Update 2023-08-22-asr_whisper_tiny_xx.md --------- Co-authored-by: DevinTDHa <duc.hatrung95@gmail.com> Co-authored-by: Maziyar Panahi <maziyar.panahi@iscpif.fr> * 2023-08-25-e5_small_en (#13939) * Add model 2023-08-25-e5_small_en * Add model 2023-08-25-e5_small_opt_en * Add model 2023-08-25-e5_small_quantized_en * Add model 2023-08-25-e5_base_en * Add model 2023-08-25-e5_small_v2_opt_en * Add model 2023-08-25-e5_base_opt_en * Add model 2023-08-25-e5_base_quantized_en * Add model 2023-08-25-e5_small_v2_en * Add model 2023-08-25-e5_small_v2_quantized_en * Add model 2023-08-25-e5_base_v2_en * Add model 2023-08-25-e5_base_v2_opt_en * Add model 2023-08-25-e5_base_v2_quantized_en * Add model 2023-08-25-e5_large_v2_en --------- Co-authored-by: ahmedlone127 <ahmedlone127@gmail.com> --------- Co-authored-by: jsl-models <74001263+jsl-models@users.noreply.github.com> Co-authored-by: Maziyar Panahi <maziyar.panahi@iscpif.fr> Co-authored-by: ahmedlone127 <ahmedlone127@gmail.com> Co-authored-by: prabod <prabod@rathnayaka.me> Co-authored-by: DevinTDHa <duc.hatrung95@gmail.com>
1 parent b46148f commit 29bcc6b

File tree

167 files changed

+14468
-0
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

167 files changed

+14468
-0
lines changed
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,109 @@
1+
---
2+
layout: model
3+
title: Official whisper-tiny Optimized
4+
author: John Snow Labs
5+
name: asr_whisper_tiny_opt
6+
date: 2023-08-22
7+
tags: [whisper, en, audio, open_source, asr, onnx, xx]
8+
task: Automatic Speech Recognition
9+
language: xx
10+
edition: Spark NLP 5.1.0
11+
spark_version: 3.0
12+
supported: true
13+
engine: onnx
14+
annotator: WhisperForCTC
15+
article_header:
16+
type: cover
17+
use_language_switcher: "Python-Scala-Java"
18+
---
19+
20+
## Description
21+
22+
Official pretrained Whisper model, adapted from HuggingFace transformer and curated to provide scalability and production-readiness using Spark NLP.
23+
24+
This is a multilingual model and supports the following languages:
25+
26+
Afrikaans, Arabic, Armenian, Azerbaijani, Belarusian, Bosnian, Bulgarian, Catalan, Chinese, Croatian, Czech, Danish, Dutch, English, Estonian, Finnish, French, Galician, German, Greek, Hebrew, Hindi, Hungarian, Icelandic, Indonesian, Italian, Japanese, Kannada, Kazakh, Korean, Latvian, Lithuanian, Macedonian, Malay, Marathi, Maori, Nepali, Norwegian, Persian, Polish, Portuguese, Romanian, Russian, Serbian, Slovak, Slovenian, Spanish, Swahili, Swedish, Tagalog, Tamil, Thai, Turkish, Ukrainian, Urdu, Vietnamese, and Welsh.
27+
28+
## Predicted Entities
29+
30+
31+
32+
{:.btn-box}
33+
<button class="button button-orange" disabled>Live Demo</button>
34+
<button class="button button-orange" disabled>Open in Colab</button>
35+
[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/asr_whisper_tiny_opt_xx_5.1.0_3.0_1692721787993.zip){:.button.button-orange.button-orange-trans.arr.button-icon}
36+
[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/asr_whisper_tiny_opt_xx_5.1.0_3.0_1692721787993.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3}
37+
38+
## How to use
39+
40+
41+
42+
<div class="tabs-box" markdown="1">
43+
{% include programmingLanguageSelectScalaPythonNLU.html %}
44+
```python
45+
import sparknlp
46+
from sparknlp.base import *
47+
from sparknlp.annotator import *
48+
from pyspark.ml import Pipeline
49+
50+
audioAssembler = AudioAssembler() \
51+
.setInputCol("audio_content") \
52+
.setOutputCol("audio_assembler")
53+
54+
speechToText = WhisperForCTC.pretrained("asr_whisper_tiny_opt", "xx") \
55+
.setInputCols(["audio_assembler"]) \
56+
.setOutputCol("text")
57+
58+
pipeline = Pipeline().setStages([audioAssembler, speechToText])
59+
processedAudioFloats = spark.createDataFrame([[rawFloats]]).toDF("audio_content")
60+
result = pipeline.fit(processedAudioFloats).transform(processedAudioFloats)
61+
result.select("text.result").show(truncate = False)
62+
```
63+
```scala
64+
import spark.implicits._
65+
import com.johnsnowlabs.nlp.base._
66+
import com.johnsnowlabs.nlp.annotators._
67+
import com.johnsnowlabs.nlp.annotators.audio.WhisperForCTC
68+
import org.apache.spark.ml.Pipeline
69+
70+
val audioAssembler: AudioAssembler = new AudioAssembler()
71+
.setInputCol("audio_content")
72+
.setOutputCol("audio_assembler")
73+
74+
val speechToText: WhisperForCTC = WhisperForCTC
75+
.pretrained("asr_whisper_tiny_opt", "xx")
76+
.setInputCols("audio_assembler")
77+
.setOutputCol("text")
78+
79+
val pipeline: Pipeline = new Pipeline().setStages(Array(audioAssembler, speechToText))
80+
81+
val bufferedSource =
82+
scala.io.Source.fromFile("src/test/resources/audio/txt/librispeech_asr_0.txt")
83+
84+
val rawFloats = bufferedSource
85+
.getLines()
86+
.map(_.split(",").head.trim.toFloat)
87+
.toArray
88+
bufferedSource.close
89+
90+
val processedAudioFloats = Seq(rawFloats).toDF("audio_content")
91+
92+
val result = pipeline.fit(processedAudioFloats).transform(processedAudioFloats)
93+
result.select("text.result").show(truncate = false)
94+
```
95+
</div>
96+
97+
{:.model-param}
98+
## Model Information
99+
100+
{:.table-model}
101+
|---|---|
102+
|Model Name:|asr_whisper_tiny_opt|
103+
|Compatibility:|Spark NLP 5.1.0+|
104+
|License:|Open Source|
105+
|Edition:|Official|
106+
|Input Labels:|[audio_assembler]|
107+
|Output Labels:|[text]|
108+
|Language:|xx|
109+
|Size:|242.7 MB|
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,109 @@
1+
---
2+
layout: model
3+
title: Official whisper-tiny
4+
author: John Snow Labs
5+
name: asr_whisper_tiny
6+
date: 2023-08-22
7+
tags: [whisper, en, audio, open_source, asr, xx, tensorflow]
8+
task: Automatic Speech Recognition
9+
language: xx
10+
edition: Spark NLP 5.1.0
11+
spark_version: 3.0
12+
supported: true
13+
engine: tensorflow
14+
annotator: WhisperForCTC
15+
article_header:
16+
type: cover
17+
use_language_switcher: "Python-Scala-Java"
18+
---
19+
20+
## Description
21+
22+
Official pretrained Whisper model, adapted from HuggingFace transformer and curated to provide scalability and production-readiness using Spark NLP.
23+
24+
This is a multilingual model and supports the following languages:
25+
26+
Afrikaans, Arabic, Armenian, Azerbaijani, Belarusian, Bosnian, Bulgarian, Catalan, Chinese, Croatian, Czech, Danish, Dutch, English, Estonian, Finnish, French, Galician, German, Greek, Hebrew, Hindi, Hungarian, Icelandic, Indonesian, Italian, Japanese, Kannada, Kazakh, Korean, Latvian, Lithuanian, Macedonian, Malay, Marathi, Maori, Nepali, Norwegian, Persian, Polish, Portuguese, Romanian, Russian, Serbian, Slovak, Slovenian, Spanish, Swahili, Swedish, Tagalog, Tamil, Thai, Turkish, Ukrainian, Urdu, Vietnamese, and Welsh.
27+
28+
## Predicted Entities
29+
30+
31+
32+
{:.btn-box}
33+
<button class="button button-orange" disabled>Live Demo</button>
34+
<button class="button button-orange" disabled>Open in Colab</button>
35+
[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/asr_whisper_tiny_xx_5.1.0_3.0_1692723111563.zip){:.button.button-orange.button-orange-trans.arr.button-icon}
36+
[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/asr_whisper_tiny_xx_5.1.0_3.0_1692723111563.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3}
37+
38+
## How to use
39+
40+
41+
42+
<div class="tabs-box" markdown="1">
43+
{% include programmingLanguageSelectScalaPythonNLU.html %}
44+
```python
45+
import sparknlp
46+
from sparknlp.base import *
47+
from sparknlp.annotator import *
48+
from pyspark.ml import Pipeline
49+
50+
audioAssembler = AudioAssembler() \
51+
.setInputCol("audio_content") \
52+
.setOutputCol("audio_assembler")
53+
54+
speechToText = WhisperForCTC.pretrained("asr_whisper_tiny", "xx") \
55+
.setInputCols(["audio_assembler"]) \
56+
.setOutputCol("text")
57+
58+
pipeline = Pipeline().setStages([audioAssembler, speechToText])
59+
processedAudioFloats = spark.createDataFrame([[rawFloats]]).toDF("audio_content")
60+
result = pipeline.fit(processedAudioFloats).transform(processedAudioFloats)
61+
result.select("text.result").show(truncate = False)
62+
```
63+
```scala
64+
import spark.implicits._
65+
import com.johnsnowlabs.nlp.base._
66+
import com.johnsnowlabs.nlp.annotators._
67+
import com.johnsnowlabs.nlp.annotators.audio.WhisperForCTC
68+
import org.apache.spark.ml.Pipeline
69+
70+
val audioAssembler: AudioAssembler = new AudioAssembler()
71+
.setInputCol("audio_content")
72+
.setOutputCol("audio_assembler")
73+
74+
val speechToText: WhisperForCTC = WhisperForCTC
75+
.pretrained("asr_whisper_tiny", "xx")
76+
.setInputCols("audio_assembler")
77+
.setOutputCol("text")
78+
79+
val pipeline: Pipeline = new Pipeline().setStages(Array(audioAssembler, speechToText))
80+
81+
val bufferedSource =
82+
scala.io.Source.fromFile("src/test/resources/audio/txt/librispeech_asr_0.txt")
83+
84+
val rawFloats = bufferedSource
85+
.getLines()
86+
.map(_.split(",").head.trim.toFloat)
87+
.toArray
88+
bufferedSource.close
89+
90+
val processedAudioFloats = Seq(rawFloats).toDF("audio_content")
91+
92+
val result = pipeline.fit(processedAudioFloats).transform(processedAudioFloats)
93+
result.select("text.result").show(truncate = false)
94+
```
95+
</div>
96+
97+
{:.model-param}
98+
## Model Information
99+
100+
{:.table-model}
101+
|---|---|
102+
|Model Name:|asr_whisper_tiny|
103+
|Compatibility:|Spark NLP 5.1.0+|
104+
|License:|Open Source|
105+
|Edition:|Official|
106+
|Input Labels:|[audio_assembler]|
107+
|Output Labels:|[text]|
108+
|Language:|xx|
109+
|Size:|156.6 MB|
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,88 @@
1+
---
2+
layout: model
3+
title: English mpnet_embedding_579_STmodel_product_rem_v3a TFMPNetModel from jamiehudson
4+
author: John Snow Labs
5+
name: mpnet_embedding_579_STmodel_product_rem_v3a
6+
date: 2023-08-18
7+
tags: [mpnet, en, open_source, tensorflow]
8+
task: Embeddings
9+
language: en
10+
edition: Spark NLP 5.1.0
11+
spark_version: 3.0
12+
supported: true
13+
engine: tensorflow
14+
annotator: MPNetEmbeddings
15+
article_header:
16+
type: cover
17+
use_language_switcher: "Python-Scala-Java"
18+
---
19+
20+
## Description
21+
22+
Pretrained mpnet model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`mpnet_embedding_579_STmodel_product_rem_v3a` is a English model originally trained by jamiehudson.
23+
24+
{:.btn-box}
25+
<button class="button button-orange" disabled>Live Demo</button>
26+
<button class="button button-orange" disabled>Open in Colab</button>
27+
[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/mpnet_embedding_579_STmodel_product_rem_v3a_en_5.1.0_3.0_1692379340262.zip){:.button.button-orange.button-orange-trans.arr.button-icon}
28+
[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/mpnet_embedding_579_STmodel_product_rem_v3a_en_5.1.0_3.0_1692379340262.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3}
29+
30+
## How to use
31+
32+
33+
34+
<div class="tabs-box" markdown="1">
35+
{% include programmingLanguageSelectScalaPythonNLU.html %}
36+
```python
37+
38+
document_assembler = DocumentAssembler() \
39+
.setInputCol("text") \
40+
.setOutputCol("documents")
41+
42+
instruction = MPNetEmbeddings \
43+
.pretrained("mpnet_embedding_579_STmodel_product_rem_v3a", "en")\
44+
.setInputCols(["documents"]) \
45+
.setOutputCol("mpnet_embeddings")
46+
47+
pipeline = Pipeline(stages=[
48+
document_assembler,
49+
instruction,
50+
])
51+
52+
pipelineModel = pipeline.fit(data)
53+
54+
pipelineDF = pipelineModel.transform(data)
55+
```
56+
```scala
57+
58+
val document_assembler = new DocumentAssembler()
59+
.setInputCol("text")
60+
.setOutputCol("documents")
61+
62+
val instruction = MPNetEmbeddings
63+
.pretrained("mpnet_embedding_579_STmodel_product_rem_v3a", "en")
64+
.setInputCols(Array("documents"))
65+
.setOutputCol("mpnet_embeddings")
66+
67+
val pipeline = new Pipeline().setStages(Array(document_assembler, instruction))
68+
69+
val pipelineModel = pipeline.fit(data)
70+
71+
val pipelineDF = pipelineModel.transform(data)
72+
73+
```
74+
</div>
75+
76+
{:.model-param}
77+
## Model Information
78+
79+
{:.table-model}
80+
|---|---|
81+
|Model Name:|mpnet_embedding_579_STmodel_product_rem_v3a|
82+
|Compatibility:|Spark NLP 5.1.0+|
83+
|License:|Open Source|
84+
|Edition:|Official|
85+
|Input Labels:|[documents]|
86+
|Output Labels:|[mpnet_embeddings]|
87+
|Language:|en|
88+
|Size:|410.2 MB|
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,88 @@
1+
---
2+
layout: model
3+
title: English mpnet_embedding_ATTACK_BERT TFMPNetModel from basel
4+
author: John Snow Labs
5+
name: mpnet_embedding_ATTACK_BERT
6+
date: 2023-08-18
7+
tags: [mpnet, en, open_source, tensorflow]
8+
task: Embeddings
9+
language: en
10+
edition: Spark NLP 5.1.0
11+
spark_version: 3.0
12+
supported: true
13+
engine: tensorflow
14+
annotator: MPNetEmbeddings
15+
article_header:
16+
type: cover
17+
use_language_switcher: "Python-Scala-Java"
18+
---
19+
20+
## Description
21+
22+
Pretrained mpnet model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`mpnet_embedding_ATTACK_BERT` is a English model originally trained by basel.
23+
24+
{:.btn-box}
25+
<button class="button button-orange" disabled>Live Demo</button>
26+
<button class="button button-orange" disabled>Open in Colab</button>
27+
[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/mpnet_embedding_ATTACK_BERT_en_5.1.0_3.0_1692376584683.zip){:.button.button-orange.button-orange-trans.arr.button-icon}
28+
[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/mpnet_embedding_ATTACK_BERT_en_5.1.0_3.0_1692376584683.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3}
29+
30+
## How to use
31+
32+
33+
34+
<div class="tabs-box" markdown="1">
35+
{% include programmingLanguageSelectScalaPythonNLU.html %}
36+
```python
37+
38+
document_assembler = DocumentAssembler() \
39+
.setInputCol("text") \
40+
.setOutputCol("documents")
41+
42+
instruction = MPNetEmbeddings \
43+
.pretrained("mpnet_embedding_ATTACK_BERT", "en")\
44+
.setInputCols(["documents"]) \
45+
.setOutputCol("mpnet_embeddings")
46+
47+
pipeline = Pipeline(stages=[
48+
document_assembler,
49+
instruction,
50+
])
51+
52+
pipelineModel = pipeline.fit(data)
53+
54+
pipelineDF = pipelineModel.transform(data)
55+
```
56+
```scala
57+
58+
val document_assembler = new DocumentAssembler()
59+
.setInputCol("text")
60+
.setOutputCol("documents")
61+
62+
val instruction = MPNetEmbeddings
63+
.pretrained("mpnet_embedding_ATTACK_BERT", "en")
64+
.setInputCols(Array("documents"))
65+
.setOutputCol("mpnet_embeddings")
66+
67+
val pipeline = new Pipeline().setStages(Array(document_assembler, instruction))
68+
69+
val pipelineModel = pipeline.fit(data)
70+
71+
val pipelineDF = pipelineModel.transform(data)
72+
73+
```
74+
</div>
75+
76+
{:.model-param}
77+
## Model Information
78+
79+
{:.table-model}
80+
|---|---|
81+
|Model Name:|mpnet_embedding_ATTACK_BERT|
82+
|Compatibility:|Spark NLP 5.1.0+|
83+
|License:|Open Source|
84+
|Edition:|Official|
85+
|Input Labels:|[documents]|
86+
|Output Labels:|[mpnet_embeddings]|
87+
|Language:|en|
88+
|Size:|409.8 MB|

0 commit comments

Comments
 (0)