You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I searched the LangChain documentation with the integrated search.
I used the GitHub search to find a similar question and didn't find it.
I am sure that this is a bug in LangChain rather than my code.
The bug is not resolved by updating to the latest stable version of LangChain (or the specific integration package).
Example Code
To reproduce the issue mentioned here. Create an Azure Search AI index and upload any number of documents above 50 that share a search field. This could be source in the metadata. For example the same file name on all chunks. Instantiate the retriver:
setting top_k to None should return all the results according to the documentation:
top_k: Optional[int] = None
"""Number of results to retrieve. Set to None to retrieve all results."""
But, because of the default number of 50 set by Azure, the returned results will always be up to 50 at the current implementation.
Error Message and Stack Trace (if applicable)
No response
Description
Azure AI Search service doesn't return all matches when a query is submitted using the search field as it is documented on their website:
"By default, the search engine returns up to the first 50 matches. The top 50 are determined by search score, assuming the query is full text search or semantic."
From the same documentation we can understand that we need to implement pagination if we want to retrieve all the documents when we query the service:
"To control the paging of all documents returned in a result set, add $top and $skip parameters to the GET query request, or top and skip to the POST query request. The following list explains the logic.
Return the first set of 15 matching documents plus a count of total matches: GET /indexes//docs?search=&$top=15&$skip=0&$count=true
Return the second set, skipping the first 15 to get the next 15: $top=15&$skip=15. Repeat for the third set of 15: $top=15&$skip=30"
If we look at the existing code there is no pagination implemented. This makes this retriever to return up to 50 results no matter how many records are in the database. This behavior is not fully documented and can result in unexpected behavior in scenarios where the user intended to retrieve all the documents. This is clear from the function that builds the API query:
def _build_search_url(self, query: str) -> str:
url_suffix = get_from_env("", "AZURE_AI_SEARCH_URL_SUFFIX", DEFAULT_URL_SUFFIX)
if url_suffix in self.service_name and "https://" in self.service_name:
base_url = f"{self.service_name}/"
elif url_suffix in self.service_name and "https://" not in self.service_name:
base_url = f"https://{self.service_name}/"
elif url_suffix not in self.service_name and "https://" in self.service_name:
base_url = f"{self.service_name}.{url_suffix}/"
elif (
url_suffix not in self.service_name and "https://" not in self.service_name
):
base_url = f"https://{self.service_name}.{url_suffix}/"
else:
# pass to Azure to throw a specific error
base_url = self.service_name
endpoint_path = f"indexes/{self.index_name}/docs?api-version={self.api_version}"
top_param = f"&$top={self.top_k}" if self.top_k else ""
filter_param = f"&$filter={self.filter}" if self.filter else ""
return base_url + endpoint_path + f"&search={query}" + top_param + filter_param
To reproduce the issue mentioned here. Create an Azure Search AI index and upload any number of documents above 50 that share a search field. This could be source in the metadata. For example the same file name on all chunks. Instantiate the retriver:
Checked other resources
Example Code
To reproduce the issue mentioned here. Create an Azure Search AI index and upload any number of documents above 50 that share a search field. This could be source in the metadata. For example the same file name on all chunks. Instantiate the retriver:
and invoke a query like:
retriever.invoke(doc.metadata["source"])
setting
top_k
to None should return all the results according to the documentation:But, because of the default number of 50 set by Azure, the returned results will always be up to 50 at the current implementation.
Error Message and Stack Trace (if applicable)
No response
Description
Azure AI Search service doesn't return all matches when a query is submitted using the search field as it is documented on their website:
From the same documentation we can understand that we need to implement pagination if we want to retrieve all the documents when we query the service:
If we look at the existing code there is no pagination implemented. This makes this retriever to return up to 50 results no matter how many records are in the database. This behavior is not fully documented and can result in unexpected behavior in scenarios where the user intended to retrieve all the documents. This is clear from the function that builds the API query:
To reproduce the issue mentioned here. Create an Azure Search AI index and upload any number of documents above 50 that share a search field. This could be source in the metadata. For example the same file name on all chunks. Instantiate the retriver:
and invoke a query like:
retriever.invoke(doc.metadata["source"])
setting
top_k
to None should return all the results according to the documentation:But, because of the default number of 50 set by Azure, the returned results will always be up to 50 at the current implementation.
System Info
System Information
Package Information
Optional packages not installed
Other Dependencies
The text was updated successfully, but these errors were encountered: