-
Notifications
You must be signed in to change notification settings - Fork 73
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extending Neural Search pipeline to Named entity recognition and other metadata extracting models #134
Comments
@ylwu-amzn do ML plugin API support Named entity recognition model? @MShyani how do we think this can impact the indexing and queries? |
I am not sure what's the best way to implement this. Perhaps one method would be to use a cross encoder model. In this architecture, you first retrieve the top k documents d_i for a query and then pass (q,d_i) where i ranges from 1 to k to the model. This model, which can be an NER model, can be used to rerank the passages. I don't this is straight forward to implement with the current plugins also it is computationally expensive (since the transformer makes k passes). Note that there is another way where a model can read the queries and find the named entities and looks for those entities in the document corpus. But this is (almost) exactly what a neural retriever does when it creates a vector for the query and looks for nearest neighbors! There could be other ways but I can't think of any on top of my head. |
@MilindShyani thanks for the update. Let me do some research on how NER model works and see if I can come up with some proposed solution which can be added as a feature in Neural Search Plugin. |
ml-commons doesn't support named entity recognition model now. |
To be bit more clear, I was thinking for neural search pipeline to be extended so that it can be used not only for retrieving vectors from an embedding model, but also for retrieving any other metadata such as entities (for both docs and queries) from a NER model. Now, before creating a neural search pipeline, we should upload and load a ML model that provides embeddings (refer to screenshot). Here this is limited to only models that provides embeddings, if this can be extended to upload any metadata models like NER and use that model to create a neural search pipeline, it would be generic. |
I'm doing NER by putting my opensearch data stream through a container which injects the entities during forwarding. So [data src] -> [injector] -> [opensearch/_bulk]. Would this be of any use to anyone, do you think? I looked at the PRs and poked around a bit and didn't see anything but this thread. I'm pulling RSS feeds. My goal is to get this working in kubernetes so I can scale it. |
https://www.elastic.co/blog/how-to-deploy-nlp-named-entity-recognition-ner-example |
@rs-amundaware I think there was some issue in ML-Commons that was tracking adding new types of Model via MLCommons plugin. opensearch-project/ml-commons#1164 |
@navneet1v Thanks. yes. waiting for that feature eagarly. |
Hi, could you assign this to me? |
Copying the customer request from Forum post: https://forum.opensearch.org/t/extending-neural-search-pipeline-to-named-entity-recognition-and-other-metadata-extracting-models/13078
I have a usecase to involve a named entity recognition model for documents and queries while indexing and querying. The documents will be filtered based on the presence of extracted entities against the query’s extracted entities. The pipeline will work similar to the existing neural search pipeline with one difference that in this usecase, the queries and documents will be passed through a NER (Named entity recogntion) model and added with extra metadata such as entities instead of vectors provided by an embedding model.
So if we are able to extend the usecase of neural-search pipeline to include model(s) that enable named entities extraction, embeddings, image segments (finding image components for image search) etc., so that the query/document extracts enough metadata through various models in the list of my neural search pipeline before matching.
Please do a +1 if you are looking for this feature. If possible do a comment explaining your usecase.
The text was updated successfully, but these errors were encountered: