Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] Allow more content type for neural query with multimodal #474

Open
martin-gaievski opened this issue Oct 25, 2023 · 2 comments
Open
Assignees
Labels
enhancement Features Introduces a new unit of functionality that satisfies a requirement neural-search

Comments

@martin-gaievski
Copy link
Member

Is your feature request related to a problem?

Currently neural-search supports text and image fields for generation of embeddings in both ingestion and search. Content can be of other types like audio or video information, and that is not supported today, e.g. for search there are only query_text and query_image fields.

What solution would you like?

Ability to pass content like audio or video for data ingestion and search.

What alternatives have you considered?

We can use other solutions to generate embeddings for audio or video content, and then post process results from OpenSearch and other systems.

Do you have any additional context?

It's a good extension for #318

@martin-gaievski martin-gaievski added Features Introduces a new unit of functionality that satisfies a requirement untriaged enhancement labels Oct 25, 2023
@vamshin vamshin removed the untriaged label Oct 30, 2023
@Sanjana679
Copy link

For videos, does it make sense to extract all the frames in a video and then generate embeddings for each frame? Likewise, for audio, would it make sense to make a transcription of the audio and then generate embeddings on the transcript?

I imagine there are issues with these approaches, but these were my first thoughts and I was wondering if anyone had suggestions for something better.

@heemin32
Copy link
Collaborator

For videos, embeddings for frame makes sense. For audio, transcription will lose some information like intonation or volume of the audio.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Features Introduces a new unit of functionality that satisfies a requirement neural-search
Projects
Status: Backlog
Development

No branches or pull requests

4 participants