Skip to content

Return distinct items from GetMany and SourceMany #4353

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Feb 19, 2020
Merged

Return distinct items from GetMany and SourceMany #4353

merged 5 commits into from
Feb 19, 2020

Conversation

russcam
Copy link
Contributor

@russcam russcam commented Feb 4, 2020

This commit fixes a bug where a GetMany or SourceMany API call
with a repeated id would return a cartesian product of id and documents.

Take for example the same id repeated 3 times; Elasticsearch will return
3 documents in the response, where each JSON object is a representation
of the same underlying document. Since all 3 documents have the same id,
GetMany and SourceMany would return all 3 documents as a match for the first id,
all 3 as a match for the second id, and so on.

With this fix, For each id, only the first matching document is returned.
One could argue that only a single document should be returned for the same
repeated id. This fix however tries to reflect what is in the
Elasticsearch response.

Fixes #4342

This commit fixes a bug where a GetMany or SourceMany API call
with a repeated id would return a cartesian product of id and documents.

Take for example the same id repeated 3 times; Elasticsearch will return
3 documents in the response, where each JSON object is a representation
of the same underlying document. Since all 3 documents have the same id,
GetMany and SourceMany would return all 3 documents as a match for the first id,
all 3 as a match for the second id, and so on.

With this fix, For each id, only the first matching document is returned.
One _could_ argue that only a single document should be returned for the same
repeated id. This fix however tries to reflect what is in the
Elasticsearch response.

Fixes #4342
@russcam
Copy link
Contributor Author

russcam commented Feb 5, 2020

This PR needs further work; an mget API call can return documents from one or more indices, such that documents with the same id from different indices can be returned.

@russcam
Copy link
Contributor Author

russcam commented Feb 10, 2020

One approach that may be acceptable would be to return only a document per index matching each id in the IEnumerable<string> id input. This does require keeping track of the indices seen for a given id, which means allocating a HashSet<string>, but would address the initial bug in #4342. It would still however return multiple documents for the same id but from different indices, but in the absence of some Type -> index name mapping, I can't see how this could be avoided.

@Mpdreamz
Copy link
Member

I think the returning multiple per id can not be avoided. Returning once per index seems fine to me.

This commit updates the GetMany and SourceMany implementations
to keep track of the seen indices for each id, to return a single
document of id and index combination _per_ id i.e. if the same id
is specified multiple times in the GetMany or SourceMany call, and
only a single index is targeted, each id will return just a single
document.
@russcam
Copy link
Contributor Author

russcam commented Feb 12, 2020

I've added a commit to keep track of the seen indices for each id. Thinking about it further, we could .Distinct() the passed IEnumerable<T> to GetMany<T>() and SourceMany<T>(), so that specifying the same id multiple times for a single targeted index would return a single document.

Thoughts?

This commit enumerates only distinct ids when retrieving hits
or source from GetMany and SourceMany, so that the same id supplied
to either will return only a single document per targeted index.
@russcam
Copy link
Contributor Author

russcam commented Feb 12, 2020

Ready for review again, @Mpdreamz, when you get a chance

This commit fixes a bug in the MultiGetRequestFormatter whereby
the document index is removed when a request index is specified,
without checking whether the document index matches the request index.

Add integration test for fix
@russcam russcam merged commit 8cbc1fe into master Feb 19, 2020
@russcam russcam deleted the fix/4342 branch February 19, 2020 23:19
russcam added a commit that referenced this pull request Feb 19, 2020
This commit fixes a bug where a GetMany or SourceMany API call
with a repeated id would return a cartesian product of id and documents.

- enumerate only distinct ids when retrieving hits
or source from GetMany and SourceMany, so that the same id input
to either will return only a single document per target index.

- fix a bug in the MultiGetRequestFormatter whereby
the document index is removed when a request index is specified,
without checking whether the document index matches the request index.

Fixes #4342

(cherry picked from commit 8cbc1fe)
russcam added a commit that referenced this pull request Feb 20, 2020
This commit fixes a bug where a GetMany or SourceMany API call
with a repeated id would return a cartesian product of id and documents.

- enumerate only distinct ids when retrieving hits
or source from GetMany and SourceMany, so that the same id input
to either will return only a single document per target index.

- fix a bug in the MultiGetRequestFormatter whereby
the document index is removed when a request index is specified,
without checking whether the document index matches the request index.

Fixes #4342

(cherry-picked from commit 8cbc1fe)
russcam added a commit that referenced this pull request Feb 20, 2020
This commit fixes a bug where a GetMany or SourceMany API call
with a repeated id would return a cartesian product of id and documents.

- enumerate only distinct ids when retrieving hits
or source from GetMany and SourceMany, so that the same id input
to either will return only a single document per target index.

- fix a bug in the MultiGetRequestFormatter whereby
the document index is removed when a request index is specified,
without checking whether the document index matches the request index.

Fixes #4342

(cherry-picked from commit 8cbc1fe)
russcam added a commit that referenced this pull request Feb 23, 2020
This commit fixes a bug where a GetMany or SourceMany API call
with a repeated id would return a cartesian product of id and documents.

- enumerate only distinct ids when retrieving hits
or source from GetMany and SourceMany, so that the same id input
to either will return only a single document per target index.

- fix a bug in the MultiGetRequestFormatter whereby
the document index is removed when a request index is specified,
without checking whether the document index matches the request index.

Fixes #4342

(cherry picked from commit 8cbc1fe)
Mpdreamz added a commit that referenced this pull request Feb 26, 2020
#4353

Fixed an issue with the GetMany helpers that returned the cartesian product of all ids specified rather
then creating a distinct list if more then one index was targeted.

This PR also updated the routine in the serializer to omit the index name from each item if the index is
already specified on the url in case of multiple indices

This updated routine in the `7.6.0` could throw if you are calling:

    client.GetMany<T>(ids, "indexName");

Without configuring `ConnectionSettings()` with either a default index for T or a global default index.
Mpdreamz added a commit that referenced this pull request Feb 26, 2020
#4353

Fixed an issue with the GetMany helpers that returned the cartesian product of all ids specified rather
then creating a distinct list if more then one index was targeted.

This PR also updated the routine in the serializer to omit the index name from each item if the index is
already specified on the url in case of multiple indices

This updated routine in the `7.6.0` could throw if you are calling:

    client.GetMany<T>(ids, "indexName");

Without configuring `ConnectionSettings()` with either a default index for T or a global default index.
github-actions bot pushed a commit that referenced this pull request Feb 26, 2020
#4353

Fixed an issue with the GetMany helpers that returned the cartesian product of all ids specified rather
then creating a distinct list if more then one index was targeted.

This PR also updated the routine in the serializer to omit the index name from each item if the index is
already specified on the url in case of multiple indices

This updated routine in the `7.6.0` could throw if you are calling:

    client.GetMany<T>(ids, "indexName");

Without configuring `ConnectionSettings()` with either a default index for T or a global default index.
github-actions bot pushed a commit that referenced this pull request Feb 26, 2020
#4353

Fixed an issue with the GetMany helpers that returned the cartesian product of all ids specified rather
then creating a distinct list if more then one index was targeted.

This PR also updated the routine in the serializer to omit the index name from each item if the index is
already specified on the url in case of multiple indices

This updated routine in the `7.6.0` could throw if you are calling:

    client.GetMany<T>(ids, "indexName");

Without configuring `ConnectionSettings()` with either a default index for T or a global default index.
Mpdreamz added a commit that referenced this pull request Feb 26, 2020
#4353

Fixed an issue with the GetMany helpers that returned the cartesian product of all ids specified rather
then creating a distinct list if more then one index was targeted.

This PR also updated the routine in the serializer to omit the index name from each item if the index is
already specified on the url in case of multiple indices

This updated routine in the `7.6.0` could throw if you are calling:

    client.GetMany<T>(ids, "indexName");

Without configuring `ConnectionSettings()` with either a default index for T or a global default index.

(cherry picked from commit 87c8cdd)
Mpdreamz added a commit that referenced this pull request Feb 26, 2020
#4353

Fixed an issue with the GetMany helpers that returned the cartesian product of all ids specified rather
then creating a distinct list if more then one index was targeted.

This PR also updated the routine in the serializer to omit the index name from each item if the index is
already specified on the url in case of multiple indices

This updated routine in the `7.6.0` could throw if you are calling:

    client.GetMany<T>(ids, "indexName");

Without configuring `ConnectionSettings()` with either a default index for T or a global default index.

Co-authored-by: Martijn Laarman <Mpdreamz@gmail.com>
Mpdreamz added a commit that referenced this pull request Feb 26, 2020
#4353

Fixed an issue with the GetMany helpers that returned the cartesian product of all ids specified rather
then creating a distinct list if more then one index was targeted.

This PR also updated the routine in the serializer to omit the index name from each item if the index is
already specified on the url in case of multiple indices

This updated routine in the `7.6.0` could throw if you are calling:

    client.GetMany<T>(ids, "indexName");

Without configuring `ConnectionSettings()` with either a default index for T or a global default index.

Co-authored-by: Martijn Laarman <Mpdreamz@gmail.com>
Mpdreamz added a commit that referenced this pull request Feb 26, 2020
#4353

Fixed an issue with the GetMany helpers that returned the cartesian product of all ids specified rather
then creating a distinct list if more then one index was targeted.

This PR also updated the routine in the serializer to omit the index name from each item if the index is
already specified on the url in case of multiple indices

This updated routine in the `7.6.0` could throw if you are calling:

    client.GetMany<T>(ids, "indexName");

Without configuring `ConnectionSettings()` with either a default index for T or a global default index.

(cherry picked from commit 87c8cdd)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

SourceMany unexpected behavior
3 participants