Skip to content

2024-11-26-mini_cpm_2b_8bit_xx #14466

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
86 changes: 86 additions & 0 deletions docs/_posts/ahmedlone127/2024-11-26-mini_cpm_2b_8bit_xx.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,86 @@
---
layout: model
title: mini_cpm_2b_8bit model from
author: John Snow Labs
name: mini_cpm_2b_8bit
date: 2024-11-26
tags: [en, open_source, pipeline, openvino, xx]
task: Text Generation
language: xx
edition: Spark NLP 5.5.1
spark_version: 3.0
supported: true
engine: openvino
annotator: CPMTransformer
article_header:
type: cover
use_language_switcher: "Python-Scala-Java"
---

## Description

Pretrained CPMTransformer, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`mini_cpm_2b_8bit` is a multilingual model originally trained by openbmb.

{:.btn-box}
<button class="button button-orange" disabled>Live Demo</button>
<button class="button button-orange" disabled>Open in Colab</button>
[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/mini_cpm_2b_8bit_xx_5.5.1_3.0_1732658809236.zip){:.button.button-orange.button-orange-trans.arr.button-icon}
[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/mini_cpm_2b_8bit_xx_5.5.1_3.0_1732658809236.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3}

## How to use



<div class="tabs-box" markdown="1">
{% include programmingLanguageSelectScalaPythonNLU.html %}
```python

documentAssembler = DocumentAssembler() \
.setInputCol("text") \
.setOutputCol("document")

seq2seq = CPMTransformer.pretrained("mini_cpm_2b_8bit","xx") \
.setInputCols(["documents"]) \
.setOutputCol("generation")

pipeline = Pipeline().setStages([documentAssembler, seq2seq])
data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text")
pipelineModel = pipeline.fit(data)
pipelineDF = pipelineModel.transform(data)

```
```scala

val documentAssembler = new DocumentAssembler()
.setInputCol("text")
.setOutputCol("document")

val seq2seq = CPMTransformer.pretrained("mini_cpm_2b_8bit","xx")
.setInputCols(Array("documents"))
.setOutputCol("generation")

val pipeline = new Pipeline().setStages(Array(documentAssembler, seq2seq))
val data = Seq("I love spark-nlp").toDF("text")
val pipelineModel = pipeline.fit(data)
val pipelineDF = pipelineModel.transform(data)

```
</div>

{:.model-param}
## Model Information

{:.table-model}
|---|---|
|Model Name:|mini_cpm_2b_8bit|
|Compatibility:|Spark NLP 5.5.1+|
|License:|Open Source|
|Edition:|Official|
|Input Labels:|[documents]|
|Output Labels:|[generation]|
|Language:|xx|
|Size:|3.0 GB|

## References

https://huggingface.co/openbmb/MiniCPM-2B-dpo-bf16
86 changes: 86 additions & 0 deletions docs/_posts/ahmedlone127/2024-11-27-nllb_distilled_600M_8int_xx.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,86 @@
---
layout: model
title: nllb_distilled_600M_8int model from Facebook
author: John Snow Labs
name: nllb_distilled_600M_8int
date: 2024-11-27
tags: [en, open_source, pipeline, openvino, xx]
task: Text Generation
language: xx
edition: Spark NLP 5.5.1
spark_version: 3.0
supported: true
engine: openvino
annotator: NLLBTransformer
article_header:
type: cover
use_language_switcher: "Python-Scala-Java"
---

## Description

Pretrained NLLBTransformer, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`nllb_distilled_600M_8int` is a Multilingual model originally trained by facebook.

{:.btn-box}
<button class="button button-orange" disabled>Live Demo</button>
<button class="button button-orange" disabled>Open in Colab</button>
[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/nllb_distilled_600M_8int_xx_5.5.1_3.0_1732741416718.zip){:.button.button-orange.button-orange-trans.arr.button-icon}
[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/nllb_distilled_600M_8int_xx_5.5.1_3.0_1732741416718.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3}

## How to use



<div class="tabs-box" markdown="1">
{% include programmingLanguageSelectScalaPythonNLU.html %}
```python

documentAssembler = DocumentAssembler() \
.setInputCol("text") \
.setOutputCol("document")

seq2seq = NLLBTransformer.pretrained("mini_cpm_2b_8bit","xx") \
.setInputCols(["documents"]) \
.setOutputCol("generation")

pipeline = Pipeline().setStages([documentAssembler, seq2seq])
data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text")
pipelineModel = pipeline.fit(data)
pipelineDF = pipelineModel.transform(data)

```
```scala

val documentAssembler = new DocumentAssembler()
.setInputCol("text")
.setOutputCol("document")

val seq2seq = NLLBTransformer.pretrained("mini_cpm_2b_8bit","xx")
.setInputCols(Array("documents"))
.setOutputCol("generation")

val pipeline = new Pipeline().setStages(Array(documentAssembler, seq2seq))
val data = Seq("I love spark-nlp").toDF("text")
val pipelineModel = pipeline.fit(data)
val pipelineDF = pipelineModel.transform(data)

```
</div>

{:.model-param}
## Model Information

{:.table-model}
|---|---|
|Model Name:|nllb_distilled_600M_8int|
|Compatibility:|Spark NLP 5.5.1+|
|License:|Open Source|
|Edition:|Official|
|Input Labels:|[documents]|
|Output Labels:|[generation]|
|Language:|xx|
|Size:|842.9 MB|

## References

https://huggingface.co/facebook/nllb-200-distilled-600M
86 changes: 86 additions & 0 deletions docs/_posts/ahmedlone127/2024-11-27-nomic_embed_v1_en.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,86 @@
---
layout: model
title: nomic_embed_v1 model from nomic-ai
author: John Snow Labs
name: nomic_embed_v1
date: 2024-11-27
tags: [en, open_source, openvino]
task: Embeddings
language: en
edition: Spark NLP 5.5.1
spark_version: 3.0
supported: true
engine: openvino
annotator: NomicEmbeddings
article_header:
type: cover
use_language_switcher: "Python-Scala-Java"
---

## Description

Pretrained NomicEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`mini_cpm_2b_8bit` is a multilingual model originally trained by openbmb.

{:.btn-box}
<button class="button button-orange" disabled>Live Demo</button>
<button class="button button-orange" disabled>Open in Colab</button>
[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/nomic_embed_v1_en_5.5.1_3.0_1732743647389.zip){:.button.button-orange.button-orange-trans.arr.button-icon}
[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/nomic_embed_v1_en_5.5.1_3.0_1732743647389.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3}

## How to use



<div class="tabs-box" markdown="1">
{% include programmingLanguageSelectScalaPythonNLU.html %}
```python

documentAssembler = DocumentAssembler() \
.setInputCol("text") \
.setOutputCol("document")

embeddings = NomicEmbeddings.pretrained("nomic_embed_v1","en") \
.setInputCols(["document"]) \
.setOutputCol("embeddings")

pipeline = Pipeline().setStages([documentAssembler, embeddings])
data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text")
pipelineModel = pipeline.fit(data)
pipelineDF = pipelineModel.transform(data)

```
```scala

val documentAssembler = new DocumentAssembler()
.setInputCol("text")
.setOutputCol("document")

val embeddings = NomicEmbeddings.pretrained("nomic_embed_v1","en")
.setInputCols(Array("document"))
.setOutputCol("embeddings")

val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings))
val data = Seq("I love spark-nlp").toDF("text")
val pipelineModel = pipeline.fit(data)
val pipelineDF = pipelineModel.transform(data)

```
</div>

{:.model-param}
## Model Information

{:.table-model}
|---|---|
|Model Name:|nomic_embed_v1|
|Compatibility:|Spark NLP 5.5.1+|
|License:|Open Source|
|Edition:|Official|
|Input Labels:|[documents]|
|Output Labels:|[generation]|
|Language:|en|
|Size:|255.0 MB|

## References

https://huggingface.co/nomic-ai/nomic-embed-text-v1
86 changes: 86 additions & 0 deletions docs/_posts/ahmedlone127/2024-11-29-phi_3_mini_128k_instruct_en.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,86 @@
---
layout: model
title: phi_3_mini_128k_instruct model from microsoft
author: John Snow Labs
name: phi_3_mini_128k_instruct
date: 2024-11-29
tags: [en, open_source, openvino]
task: Text Generation
language: en
edition: Spark NLP 5.5.1
spark_version: 3.0
supported: true
engine: openvino
annotator: Phi3Transformer
article_header:
type: cover
use_language_switcher: "Python-Scala-Java"
---

## Description

Pretrained Phi3Transformer, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`phi_3_mini_128k_instruct` is a english model originally trained by openbmb.

{:.btn-box}
<button class="button button-orange" disabled>Live Demo</button>
<button class="button button-orange" disabled>Open in Colab</button>
[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/phi_3_mini_128k_instruct_en_5.5.1_3.0_1732897700551.zip){:.button.button-orange.button-orange-trans.arr.button-icon}
[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/phi_3_mini_128k_instruct_en_5.5.1_3.0_1732897700551.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3}

## How to use



<div class="tabs-box" markdown="1">
{% include programmingLanguageSelectScalaPythonNLU.html %}
```python

documentAssembler = DocumentAssembler() \
.setInputCol("text") \
.setOutputCol("document")

seq2seq = Phi3Transformer.pretrained("phi_3_mini_128k_instruct","en") \
.setInputCols(["document"]) \
.setOutputCol("generation")

pipeline = Pipeline().setStages([documentAssembler, seq2seq])
data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text")
pipelineModel = pipeline.fit(data)
pipelineDF = pipelineModel.transform(data)

```
```scala

val documentAssembler = new DocumentAssembler()
.setInputCol("text")
.setOutputCol("document")

val seq2seq = Phi3Transformer.pretrained("phi_3_mini_128k_instruct","en")
.setInputCols(Array("document"))
.setOutputCol("generation")

val pipeline = new Pipeline().setStages(Array(documentAssembler, seq2seq))
val data = Seq("I love spark-nlp").toDF("text")
val pipelineModel = pipeline.fit(data)
val pipelineDF = pipelineModel.transform(data)

```
</div>

{:.model-param}
## Model Information

{:.table-model}
|---|---|
|Model Name:|phi_3_mini_128k_instruct|
|Compatibility:|Spark NLP 5.5.1+|
|License:|Open Source|
|Edition:|Official|
|Input Labels:|[documents]|
|Output Labels:|[generation]|
|Language:|en|
|Size:|3.5 GB|

## References

https://huggingface.co/microsoft/Phi-3-mini-128k-instruct
Loading