Extending Neural Search pipeline to Named entity recognition and other metadata extracting models #134

navneet1v · 2023-03-13T18:07:07Z

Copying the customer request from Forum post: https://forum.opensearch.org/t/extending-neural-search-pipeline-to-named-entity-recognition-and-other-metadata-extracting-models/13078

I have a usecase to involve a named entity recognition model for documents and queries while indexing and querying. The documents will be filtered based on the presence of extracted entities against the query’s extracted entities. The pipeline will work similar to the existing neural search pipeline with one difference that in this usecase, the queries and documents will be passed through a NER (Named entity recogntion) model and added with extra metadata such as entities instead of vectors provided by an embedding model.

So if we are able to extend the usecase of neural-search pipeline to include model(s) that enable named entities extraction, embeddings, image segments (finding image components for image search) etc., so that the query/document extracts enough metadata through various models in the list of my neural search pipeline before matching.

Please do a +1 if you are looking for this feature. If possible do a comment explaining your usecase.

navneet1v · 2023-03-13T18:12:34Z

@ylwu-amzn do ML plugin API support Named entity recognition model?

@MShyani how do we think this can impact the indexing and queries?

MilindShyani · 2023-03-13T18:18:22Z

I am not sure what's the best way to implement this. Perhaps one method would be to use a cross encoder model.

In this architecture, you first retrieve the top k documents d_i for a query and then pass (q,d_i) where i ranges from 1 to k to the model. This model, which can be an NER model, can be used to rerank the passages. I don't this is straight forward to implement with the current plugins also it is computationally expensive (since the transformer makes k passes).

Note that there is another way where a model can read the queries and find the named entities and looks for those entities in the document corpus. But this is (almost) exactly what a neural retriever does when it creates a vector for the query and looks for nearest neighbors!

There could be other ways but I can't think of any on top of my head.

navneet1v · 2023-03-13T18:24:27Z

@MilindShyani thanks for the update.

Let me do some research on how NER model works and see if I can come up with some proposed solution which can be added as a feature in Neural Search Plugin.

ylwu-amzn · 2023-03-13T18:49:27Z

ml-commons doesn't support named entity recognition model now.

prasadnu · 2023-03-14T14:34:28Z

To be bit more clear, I was thinking for neural search pipeline to be extended so that it can be used not only for retrieving vectors from an embedding model, but also for retrieving any other metadata such as entities (for both docs and queries) from a NER model.

Now, before creating a neural search pipeline, we should upload and load a ML model that provides embeddings (refer to screenshot). Here this is limited to only models that provides embeddings, if this can be extended to upload any metadata models like NER and use that model to create a neural search pipeline, it would be generic.

CodeAKrome · 2023-03-23T05:32:15Z

I'm doing NER by putting my opensearch data stream through a container which injects the entities during forwarding. So [data src] -> [injector] -> [opensearch/_bulk]. Would this be of any use to anyone, do you think? I looked at the PRs and poked around a bit and didn't see anything but this thread. I'm pulling RSS feeds. My goal is to get this working in kubernetes so I can scale it.

rs-amundaware · 2023-08-23T05:35:22Z

https://www.elastic.co/blog/how-to-deploy-nlp-named-entity-recognition-ner-example
ES provides this solution. Do we or can we have this featre in opensearch as well. please let me know if it already exisits.

navneet1v · 2023-08-23T05:42:54Z

@rs-amundaware I think there was some issue in ML-Commons that was tracking adding new types of Model via MLCommons plugin. opensearch-project/ml-commons#1164

rs-amundaware · 2023-08-24T04:25:21Z

@navneet1v Thanks. yes. waiting for that feature eagarly.

q-andy · 2025-01-10T00:26:54Z

Hi, could you assign this to me?

navneet1v added Enhancements Increases software capabilities beyond original client specifications untriaged labels Mar 13, 2023

navneet1v removed the untriaged label Mar 13, 2023

navneet1v added the backlog All the backlog features should be marked with this label label Mar 22, 2023

navneet1v added the Features Introduces a new unit of functionality that satisfies a requirement label Mar 28, 2023

navneet1v added this to Vector Search RoadMap Oct 6, 2023

github-project-automation bot moved this to Backlog in Vector Search RoadMap Oct 6, 2023

heemin32 added this to Neural Search RoadMap Dec 26, 2024

heemin32 removed this from Vector Search RoadMap Dec 26, 2024

heemin32 moved this to Backlog in Neural Search RoadMap Dec 26, 2024

minalsha assigned vibrantvarun Jan 9, 2025

heemin32 assigned q-andy and unassigned vibrantvarun Jan 10, 2025

heemin32 added the neural-search label Jan 10, 2025

heemin32 moved this from Backlog to Backlog(Hot) in Neural Search RoadMap Jan 10, 2025

heemin32 moved this from Backlog(Hot) to 3.0 in Neural Search RoadMap Jan 11, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Extending Neural Search pipeline to Named entity recognition and other metadata extracting models #134

Extending Neural Search pipeline to Named entity recognition and other metadata extracting models #134

navneet1v commented Mar 13, 2023 •

edited

Loading

navneet1v commented Mar 13, 2023

MilindShyani commented Mar 13, 2023 •

edited

Loading

navneet1v commented Mar 13, 2023 •

edited

Loading

ylwu-amzn commented Mar 13, 2023

prasadnu commented Mar 14, 2023 •

edited

Loading

CodeAKrome commented Mar 23, 2023 •

edited

Loading

rs-amundaware commented Aug 23, 2023

navneet1v commented Aug 23, 2023

rs-amundaware commented Aug 24, 2023

q-andy commented Jan 10, 2025

Extending Neural Search pipeline to Named entity recognition and other metadata extracting models #134

Extending Neural Search pipeline to Named entity recognition and other metadata extracting models #134

Comments

navneet1v commented Mar 13, 2023 • edited Loading

navneet1v commented Mar 13, 2023

MilindShyani commented Mar 13, 2023 • edited Loading

navneet1v commented Mar 13, 2023 • edited Loading

ylwu-amzn commented Mar 13, 2023

prasadnu commented Mar 14, 2023 • edited Loading

CodeAKrome commented Mar 23, 2023 • edited Loading

rs-amundaware commented Aug 23, 2023

navneet1v commented Aug 23, 2023

rs-amundaware commented Aug 24, 2023

q-andy commented Jan 10, 2025

navneet1v commented Mar 13, 2023 •

edited

Loading

MilindShyani commented Mar 13, 2023 •

edited

Loading

navneet1v commented Mar 13, 2023 •

edited

Loading

prasadnu commented Mar 14, 2023 •

edited

Loading

CodeAKrome commented Mar 23, 2023 •

edited

Loading