Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vector embeddings on logs #18801

Open
jonathanpv opened this issue Oct 8, 2023 · 2 comments
Open

Vector embeddings on logs #18801

jonathanpv opened this issue Oct 8, 2023 · 2 comments
Labels
domain: enriching Anything related to enriching Vector's events with context data type: feature A value-adding code addition that introduce new functionality.

Comments

@jonathanpv
Copy link
Contributor

jonathanpv commented Oct 8, 2023

Vector embeddings support

Vector should support an embedding-transform through VRL

The only thing we need would be a configuration of the embedding endpoints to use and the place to store the output

Why?
Semantic search through data can be powered cheaply using vector embeddings, in order to be a step towards AI-powered monitoring we should support translating logs to vectors and adding it as a field.

Sample feature:

// we configure the embedding endpoint that accepts text and outputs a matrix in vector.toml
embedding_endpoint = https://api.openai.com/v1/embeddings
embedding_endpoints_api_key = "sk-..."


// sink, some data stores support vector search natively like pinecone, weaviate, etc
// perhaps we would need to support those sinks separately

// in vrl we just call it like so and it should pull api keys from vector.toml
.embedding = log_to_embdedding(.log)

Use Cases

as a user I can use natural language to search through my logs

the end result will allow users to have intelligent search through logs with natural language.

as a developer I can implement semantic search quickly with vector.dev

a better developer experience.

for example:

a search query like: "give me the failures from amazon in the last three hours"
can output the most relevant logs

Attempted Solutions

No response

Proposal

I propose an investigation between the referenced services and see if this is a quick implementation or if it is not worth the investment / already supported but with a different customization.

References

This is an example endpoint that generates matrices based on text input, there are other one's but openAI is the most prevalent solution at the moment

This is a service specialized in vector search

Pinecone is also a popular vector store

User's can implement vector search using just a SQL database as well

Version

No response

@jonathanpv jonathanpv added the type: feature A value-adding code addition that introduce new functionality. label Oct 8, 2023
@jszwedko jszwedko added the domain: enriching Anything related to enriching Vector's events with context data label Oct 12, 2023
@jonathanpv
Copy link
Contributor Author

Related project:
https://github.com/Anush008/fastembed-rs

@jonathanpv
Copy link
Contributor Author

jonathanpv commented Apr 13, 2024

Value prop:

  • Users will want to send logs closer from the edge to catch errors quickly eg, if something is a security flaw worthy or not

  • Vector embeddings can be supported through VRL and on edge using ONXX so its possible to demo this within vrlplayground as well

  • I imagine someone's VRL function to handle an event, if the event is a certain error to add label "error": "security flaw from hot path library" which will have an accompanying "error_embedding": [vector[78]]

and then downstream someone's UI or search can be like

"do we have any errors recently in the hotpath?"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
domain: enriching Anything related to enriching Vector's events with context data type: feature A value-adding code addition that introduce new functionality.
Projects
None yet
Development

No branches or pull requests

2 participants