Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add AI connector blueprints for Aleph Alpha luminous-base embedding model #1925

Closed
ulan-yisaev opened this issue Jan 25, 2024 · 7 comments
Closed
Labels
documentation Improvements or additions to documentation enhancement New feature or request

Comments

@ulan-yisaev
Copy link
Contributor

ulan-yisaev commented Jan 25, 2024

Is your feature request related to a problem?
I'm proposing to add an AI connector blueprint for the Aleph Alpha Luminous-Base Embedding Model to the current collection of remote inference blueprints in OpenSearch ML Commons.

This model is particularly effective for German language applications, providing nuanced and contextually relevant embeddings. Given the increasing demand for robust language model solutions in different languages, integrating this model could significantly enhance your offerings for German-language processing tasks.

What solution would you like?
A markdown file of AI connector blueprint for Aleph Alpha luminous-base embedding model.

What alternatives have you considered?
I was able to write one using existing blueprints here:

{
  "name": "Aleph Alpha Connector: luminous-base, representation: document",
  "description": "The connector to the Aleph Alpha luminous-base embedding model with representation: document",
  "version": "0.1",
  "protocol": "http",
  "parameters": {
    "endpoint": "api.aleph-alpha.com",
	"representation": "document",
	"normalize": true
  },
  "credential": {
    "AlephAlpha_API_Token": "XXXXXXXXXXXXXXXXXX"
  },
  "actions": [
    {
      "action_type": "predict",
      "method": "POST",
	  "url": "https://${parameters.endpoint}/semantic_embed",
      "headers": {
        "Content-Type": "application/json",
        "Accept": "application/json",
        "Authorization": "Bearer ${credential.AlephAlpha_API_Token}"
      },
      "request_body": "{ \"model\": \"luminous-base\", \"prompt\": \"${parameters.input}\", \"representation\": \"${parameters.representation}\", \"normalize\": ${parameters.normalize}}",
      "pre_process_function": "\n    StringBuilder builder = new StringBuilder();\n    builder.append(\"\\\"\");\n    String first = params.text_docs[0];\n    builder.append(first);\n    builder.append(\"\\\"\");\n    def parameters = \"{\" +\"\\\"input\\\":\" + builder + \"}\";\n    return  \"{\" +\"\\\"parameters\\\":\" + parameters + \"}\";",
      "post_process_function": "\n      def name = \"embedding\";\n      def dataType = \"FLOAT32\";\n      if (params.embedding == null || params.embedding.length == 0) {\n        return params.message;\n      }\n      def shape = [params.embedding.length];\n      def json = \"{\" +\n                 \"\\\"name\\\":\\\"\" + name + \"\\\",\" +\n                 \"\\\"data_type\\\":\\\"\" + dataType + \"\\\",\" +\n                 \"\\\"shape\\\":\" + shape + \",\" +\n                 \"\\\"data\\\":\" + params.embedding +\n                 \"}\";\n      return json;\n    "
    }
  ]
}
@ulan-yisaev ulan-yisaev added enhancement New feature or request untriaged labels Jan 25, 2024
@saratvemulapalli
Copy link
Member

Thanks @ulan-yisaev. Do you want to contribute the change in a PR?

@saratvemulapalli saratvemulapalli added the documentation Improvements or additions to documentation label Jan 25, 2024
@ulan-yisaev
Copy link
Contributor Author

Hi @saratvemulapalli ,
Sure thing, I'll be happy to contribute.

@ramda1234786
Copy link

Hi @ulan-yisaev i see you have used embedding models outside of Cohere, Bedrock and OpenAI. I had been trying similar for Hugging Face text generation model to do post_process_function but not able to do that. Any idea how you can achieve this post_process_function ?

I have this

[
    {
        "generated_text": "Your Generated text"
    }
]

and i want to convert it to this below using post process fuction

    {
        "completion": "Your Generated text"
    }

I have tried this till now

"post_process_function": "\n def json = \"{\" +\n \"\\\"completion\\\":\\\"\" + params['response'][0].generated_text + \"\\\" }\";\n return json;\n "

Also @saratvemulapalli if you have any idea on this

@ulan-yisaev
Copy link
Contributor Author

Hi @ramda1234786 ,
Please note that I haven't tested generation models, as my work primarily focuses on embedding models. But I suppose you could try the following function:

"post_process_function": "\n def generatedText = params.response[0].generated_text;\n def json = \"{\\\"completion\\\":\\\"\" + generatedText + \"\\\"}\";\n return json;\n"

@ramda1234786
Copy link

Thanks for your response @ulan-yisaev . I tried below but no luck

I get this from predict API without post_process_function

{
    "inference_results": [
        {
            "output": [
                {
                    "name": "response",
                    "dataAsMap": {
                        "response": [
                            {
                                "generated_text": "The title is Rush, year is 2013, budget is 500000, earning is 300000, genere is action.What is the budget of Rush?\n\nThe budget of Rush is 500000...................."
                            }
                        ]
                    }
                }
            ],
            "status_code": 200
        }
    ]
}

but when i add post_process_function as you mentioned , i get this below error

{
    "error": {
        "root_cause": [
            {
                "type": "script_exception",
                "reason": "runtime error",
                "script_stack": [
                    "generatedText = params.response[0].generated_text;\n def ",
                    "                      ^---- HERE"
                ],
                "script": " ...",
                "lang": "painless",
                "position": {
                    "offset": 28,
                    "start": 6,
                    "end": 62
                }
            }
        ],
        "type": "script_exception",
        "reason": "runtime error",
        "script_stack": [
            "generatedText = params.response[0].generated_text;\n def ",
            "                      ^---- HERE"
        ],
        "script": " ...",
        "lang": "painless",
        "position": {
            "offset": 28,
            "start": 6,
            "end": 62
        },
        "caused_by": {
            "type": "null_pointer_exception",
            "reason": "Cannot invoke \"Object.getClass()\" because \"callArgs[0]\" is null"
        }
    },
    "status": 400
}

Not sure how to fix this.....

@mashah
Copy link

mashah commented Jan 29, 2024

Version 2.12 is still under development. If you need RAG with OpenSearch my recommendation is to try Sycamore. We know that path works, though it's using 2.11. Once 2.12 is ready, we will have that working with Sycamore as well.

@b4sjoo b4sjoo moved this to Untriaged in ml-commons projects Jan 30, 2024
@ylwu-amzn ylwu-amzn moved this from Untriaged to In Progress in ml-commons projects Feb 2, 2024
@HenryL27
Copy link
Collaborator

HenryL27 commented Apr 9, 2024

Closing as the connector blueprint was added

@HenryL27 HenryL27 closed this as completed Apr 9, 2024
@github-project-automation github-project-automation bot moved this from In Progress to Done in ml-commons projects Apr 9, 2024
@github-project-automation github-project-automation bot moved this to Planned work items in Test roadmap format Apr 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation enhancement New feature or request
Projects
Status: Planned work items
Development

No branches or pull requests

5 participants