Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARKNLP-1091] AutoGGUFModel embeddings support #14433

Open
wants to merge 9 commits into
base: master
Choose a base branch
from

Conversation

DevinTDHa
Copy link
Member

@DevinTDHa DevinTDHa commented Oct 12, 2024

Description

This PR enables proper embedding support for AutoGGUFModels with a new annotator called AutoGGUFEmbeddings. The returned annotations will then contain an embedding vector, similar to the other sentence embedding annotators.

This PR also contains an end-to-end example notebook:
https://github.com/JohnSnowLabs/spark-nlp/blob/b59a339164d2a2c37633e2c9ec12762134c5c2c6/examples/python/llama.cpp/llama.cpp_in_Spark_NLP_AutoGGUFEmbeddings.ipynb

The pretrained model is available at
#14448

How Has This Been Tested?

Old and new tests passing on Scala and python side.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • Code improvements with no or little impact
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

@DevinTDHa DevinTDHa added the new-feature Introducing a new feature label Oct 12, 2024
@DevinTDHa DevinTDHa changed the title Feauture/sparknlp 1080 autogguf embeddings [SPARKNLP-1080] AutoGGUFModel embeddings support Oct 12, 2024
@maziyarpanahi maziyarpanahi changed the base branch from master to release/551-release-candidate October 18, 2024 16:33
@maziyarpanahi
Copy link
Member

@DevinTDHa let's make the changes and have this feature as AutoGGUFEmbeddings annotator instead of merging this and then reverting it back.

@DevinTDHa
Copy link
Member Author

@DevinTDHa let's make the changes and have this feature as AutoGGUFEmbeddings annotator instead of merging this and then reverting it back.

Hi @maziyarpanahi, Sounds good to me! I will update this PR to inlude the new annotator.

@DevinTDHa DevinTDHa marked this pull request as draft October 25, 2024 08:15
@DevinTDHa DevinTDHa force-pushed the feauture/SPARKNLP-1080-autogguf-embeddings branch from bed0f06 to 13c06a8 Compare November 2, 2024 13:17
@DevinTDHa DevinTDHa force-pushed the feauture/SPARKNLP-1080-autogguf-embeddings branch from 7fd370f to b59a339 Compare November 2, 2024 14:33
@DevinTDHa DevinTDHa changed the base branch from release/551-release-candidate to master November 2, 2024 14:34
@DevinTDHa DevinTDHa added the DON'T MERGE Do not merge this PR label Nov 2, 2024
@DevinTDHa
Copy link
Member Author

@maziyarpanahi I have updated this PR to include the functionaltiy as a new Annotator AutoGGUFEmbeddings

@DevinTDHa DevinTDHa marked this pull request as ready for review November 2, 2024 14:36
@DevinTDHa DevinTDHa changed the title [SPARKNLP-1080] AutoGGUFModel embeddings support [SPARKNLP-1091] AutoGGUFModel embeddings support Nov 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
DON'T MERGE Do not merge this PR new-feature Introducing a new feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants