Vector embeddings on logs #18801

jonathanpv · 2023-10-08T18:27:48Z

Vector embeddings support

Vector should support an embedding-transform through VRL

The only thing we need would be a configuration of the embedding endpoints to use and the place to store the output

Why?
Semantic search through data can be powered cheaply using vector embeddings, in order to be a step towards AI-powered monitoring we should support translating logs to vectors and adding it as a field.

Sample feature:

// we configure the embedding endpoint that accepts text and outputs a matrix in vector.toml
embedding_endpoint = https://api.openai.com/v1/embeddings
embedding_endpoints_api_key = "sk-..."


// sink, some data stores support vector search natively like pinecone, weaviate, etc
// perhaps we would need to support those sinks separately

// in vrl we just call it like so and it should pull api keys from vector.toml
.embedding = log_to_embdedding(.log)

Use Cases

as a user I can use natural language to search through my logs

the end result will allow users to have intelligent search through logs with natural language.

as a developer I can implement semantic search quickly with vector.dev

a better developer experience.

for example:

a search query like: "give me the failures from amazon in the last three hours"
can output the most relevant logs

Attempted Solutions

No response

Proposal

I propose an investigation between the referenced services and see if this is a quick implementation or if it is not worth the investment / already supported but with a different customization.

References

This is an example endpoint that generates matrices based on text input, there are other one's but openAI is the most prevalent solution at the moment

https://platform.openai.com/docs/guides/embeddings/what-are-embeddings

This is a service specialized in vector search

https://weaviate.io/

Pinecone is also a popular vector store

https://www.pinecone.io/learn/vector-database/

User's can implement vector search using just a SQL database as well

Version

No response

The text was updated successfully, but these errors were encountered:

jonathanpv · 2024-03-24T16:22:38Z

Related project:
https://github.com/Anush008/fastembed-rs

jonathanpv · 2024-04-13T05:54:37Z

Value prop:

Users will want to send logs closer from the edge to catch errors quickly eg, if something is a security flaw worthy or not
Vector embeddings can be supported through VRL and on edge using ONXX so its possible to demo this within vrlplayground as well
I imagine someone's VRL function to handle an event, if the event is a certain error to add label "error": "security flaw from hot path library" which will have an accompanying "error_embedding": [vector[78]]

and then downstream someone's UI or search can be like

"do we have any errors recently in the hotpath?"

jonathanpv added the type: feature A value-adding code addition that introduce new functionality. label Oct 8, 2023

jszwedko added the domain: enriching Anything related to enriching Vector's events with context data label Oct 12, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Vector embeddings on logs #18801

Vector embeddings on logs #18801

jonathanpv commented Oct 8, 2023 •

edited

Loading

jonathanpv commented Mar 24, 2024

jonathanpv commented Apr 13, 2024 •

edited

Loading

Vector embeddings on logs #18801

Vector embeddings on logs #18801

Comments

jonathanpv commented Oct 8, 2023 • edited Loading

Vector embeddings support

Use Cases

as a user I can use natural language to search through my logs

as a developer I can implement semantic search quickly with vector.dev

for example:

Attempted Solutions

Proposal

References

Version

jonathanpv commented Mar 24, 2024

jonathanpv commented Apr 13, 2024 • edited Loading

jonathanpv commented Oct 8, 2023 •

edited

Loading

jonathanpv commented Apr 13, 2024 •

edited

Loading