Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Embedding model and Engine?? #62

Open
muhtalhakhan opened this issue Oct 31, 2023 · 6 comments
Open

Embedding model and Engine?? #62

muhtalhakhan opened this issue Oct 31, 2023 · 6 comments

Comments

@muhtalhakhan
Copy link

Hey guys,

I am shifting from GPT to Mistral and I am facing one problem which is that I could not find the embedding model and engine for Mistral yet.

I am using the service from DeepInfra

Here's the code snippet which I wrote for GPT:

def get_embedding(text, model="embedding-ada-002"):
  text = text.replace("\n", " ")
  if not text: 
    text = "this is blank"
  return openai.Embedding.create(
          input=[text], model=model)['data'][0]['embedding']


if __name__ == '__main__':
#   gpt_parameter = {"engine": "text-davinci-003", "max_tokens": 50, 
#                    "temperature": 0, "top_p": 1, "stream": False,
#                    "frequency_penalty": 0, "presence_penalty": 0, 
#                    "stop": ['"']}
  gpt_parameter = {"max_tokens": 50, 
                   "temperature": 0, "top_p": 1, "stream": False,
                   "frequency_penalty": 0, "presence_penalty": 0, 
                   "stop": ['"']}

All I want to know is which embedding model and engine should be used?

Thank you 🙂

@praveen555
Copy link

There is no embedding model defined as such.

For each input sentence you have to tokenize using the tokenizer provided my Mistral and then pass those tokens to the model.

Check out the example below posted from the mistral

with torch.no_grad():
featurized_x = []
# compute an embedding for each sentence
for i, (x, y) in tqdm.tqdm(enumerate(data)):
tokens = tokenizer.encode(x, bos=True)
tensor = torch.tensor(tokens).to(model.device)
features = model.forward_partial(tensor, [len(tokens)]) # (n_tokens, model_dim)
featurized_x.append(features.float().mean(0).cpu().detach().numpy())

concatenate sentence embeddings

X = np.concatenate([x[None] for x in featurized_x], axis=0) # (n_points, model_dim)

@muhtalhakhan
Copy link
Author

Is there any working example which can help me better with understanding to code?

I am getting some of the lines as a prompt back from the Mistral and I want them to embedded.

@praveen555
Copy link

check the tutorial example provided in the folder by mistral. The code I gave earlier is given on the same.

@muhtalhakhan
Copy link
Author

check the tutorial example provided in the folder by mistral. The code I gave earlier is given on the same.

Thanks but I didn't find anything useful. Well, I was just playing with prompts and afterwards I was embedding them to some other function.

@zhzfight
Copy link

hi, dude, have you solve the problem?

@muhtalhakhan
Copy link
Author

hi, dude, have you solve the problem?

hey, I tried but did not get the enough of good response from the model.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants