Use embeddings instead of keywords for World Info search #223

farrael004 · 2023-02-05T17:41:56Z

farrael004
Feb 5, 2023

Hi, I want to suggest using embeddings for semantic search for World info and possibly Memory. It should be pretty straightforward and I would do it myself on a forked version, but when I took a look into the code I realized how large it is and how long it would take me to wrap my head around it.

Here's some code on how the embeddings can be created:

# ! pip install sentence-transformers 
from sentence_transformers import SentenceTransformer

# This model is very fast and can easily run on a CPU to not waste GPU VRAM 
model = SentenceTransformer('sentence-transformers/all-MiniLM-L6-v2', device='cpu')

# Embeddings can be created in batch
sentences = ["This is an example sentence.", "It is pretty short."]
embeddings = model.encode(sentences)

These can be stored as .json files and loaded into a pandas DataFrame to perform cosine similarity.

from openai.embeddings_utils import cosine_similarity
# You can use any library that implements cosine_similarity

n = 1 # Find only the first most similar embedding
embedding = model.encode(sentence)[0] # Embedding of what to search with
df["similarities"] = df["embedding"].apply(lambda x: cosine_similarity(x, embedding))
best_result = df.sort_values("similarities", ascending=False).head(n)
best_result.drop(['similarities', 'embedding'], axis=1)

df in this case is a pandas DataFrame with 'text' and 'embedding' columns.

one-some · 2023-02-05T22:52:01Z

one-some
Feb 5, 2023

May be an interesting option for world info, but memory is always applied so I don't see the application there. Not sure if sentence embeddings would be the best choice for representing world info keys, given that those are named entities (usually singular nouns) and sentences are, well, sentences. Maybe sentence embedding models account for this, but if not it'd probably be best to see if there's a model that can output confidence values for the presence of certain named entities. It miiiiiiiggghht also be possible to utilize model token encodings (if we can strip positional encodings) for zero overhead but that's pretty out there. Whatever the case, it'd be an cool alternative to keyword matching, but shouldn't replace it as it'd probably have substantial slowdowns with Dynamic WI (as we rescan the whole context after each token is generated) and on lower-end devices that use Kobold as a client for online/distributed services

2 replies

farrael004 Feb 6, 2023
Author

I don't see why different memories cannot be dynamically used for a given context. Especially if it is possible to use the main loaded model for text summarization. Ideally, you'd want to automate the memory generation.

As for the world info, embeddings take into account both single words as well as the whole sentence context. Although it is a less transparent way of getting information, it does perform better than only using keywords.

You can indeed use the same model for creating the embedding. However, you'd still have to compile two different versions of it. I'm not sure if that would be better or worse in terms of RAM usage. My guess is it would not make a big difference.

About the Dynamic WI, you can probably use the embedding for the first token, then for every 10 or so tokens. I'd assume changing a substantial portion of the prompt mid-generation constantly would make it unstable and lead to worse results. This is actively creating a noisy signal for the model to try and figure out the next token probability. I'd suggest doing it in a staggered way as I mentioned.

one-some Feb 6, 2023

The "different memories for different contexts" sounds a lot like world info to me. What's different between the two (other than position in context maybe)? Both memory and author's note are meant to be constant context injections, while WI is the dynamic one. There is an experimental auto summarization feature in United (using a specialized model) that you can try out, but it's pretty iffy for larger stories.

For the embeddings, I assume the general idea would be to slice the action text into sentences, compute embeddings of them, and compare each sentence's embedding to each WI entry key's embedding, matching if the cosine distance is below a threshold. That would have a lot of cacheable bits (WI keys, old sentences) so it might be a lot more performant than I was thinking. Caching would probably make performance manageable for Dynamic WI since you would only need to get the embedding for the last sentence.

The current Dynamic WI functions with the normal keyword matching but when each token is generated, injecting the WI body into context when needed. I don't think this creates a "noisy signal" for the model, since the models we use don't share any kind of state in between token generation

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use embeddings instead of keywords for World Info search #223

{{title}}

Replies: 1 comment 2 replies

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

Use embeddings instead of keywords for World Info search #223

farrael004 Feb 5, 2023

Replies: 1 comment · 2 replies

one-some Feb 5, 2023

farrael004 Feb 6, 2023 Author

one-some Feb 6, 2023

farrael004
Feb 5, 2023

Replies: 1 comment 2 replies

one-some
Feb 5, 2023

farrael004 Feb 6, 2023
Author