Skip to content

NEST removes null value from after_key Dictionary when aggregating #3694

Closed
@LordMike

Description

@LordMike

We have a piece of code to do Composite Aggregation on our data, and in it we're running it on two fields with missing_bucket set to true.

Our issue is that when one of the fields becomes null in the data, the after_key is serialized incorrectly on the next request.

Note: At the bottom. There is an absolute minimal reproduction.

Our code (boiled down):

static void Main(string[] args)
{
    IConnectionPool pool = new SingleNodeConnectionPool(new Uri("http://localhost:9200"));
    IConnection connection = new HttpConnection();

    ConnectionSettings connSettings = new ConnectionSettings(pool, connection);
    connSettings.ThrowExceptions();
    connSettings.DisableDirectStreaming();

    ElasticClient client = new ElasticClient(connSettings);

    // Grouping
    SearchRequest<JObject> search = new SearchRequest<JObject>("some_index", "_doc");
    search.Size = 0;

    List<ICompositeAggregationSource> aggregateList = new List<ICompositeAggregationSource>();
    aggregateList.Add(new TermsCompositeAggregationSource("1")
    {
        Field = "PropertyA.keyword",
        MissingBucket = true
    });
    aggregateList.Add(new TermsCompositeAggregationSource("2")
    {
        Field = "PropertyB.keyword",
        MissingBucket = true
    });

    CompositeAggregation compositeAggregation = new CompositeAggregation("composite")
    {
        Sources = aggregateList
    };

    search.Aggregations = compositeAggregation;

    while (true)
    {
        int pageSize = 10; // We use 1000, 10 is for testing
        compositeAggregation.Size = pageSize;

        ISearchResponse<JObject> result = client.Search<JObject>(search);

        BucketAggregate aggA = (BucketAggregate)result.Aggregations["composite"];

        if (!aggA.Items.Any())
            break;

        // Prepare next request
        // This is what fails the next round
        compositeAggregation.After = aggA.AfterKey;

       // .. work with data ..
    }
}

In the above, ES fails our second (or some subsequent request) with:

DebugInformation

# FailureReason: BadResponse while attempting POST on http://localhost:9200/some_index/_doc/_search?typed_keys=true
# Audit trail of this API call:
 - [1] BadResponse: Node: http://localhost:9200/ Took: 00:00:00.0494512
# OriginalException: Elasticsearch.Net.ElasticsearchClientException: Request failed to execute. Call: Status code 400 from: POST /some_index/_doc/_search?typed_keys=true. ServerError: Type: search_phase_execution_exception Reason: "all shards failed" CausedBy: "Type: illegal_argument_exception Reason: "[after] has 1 value(s) but [sources] has 2" CausedBy: "Type: illegal_argument_exception Reason: "[after] has 1 value(s) but [sources] has 2"""
   at Elasticsearch.Net.Transport`1.HandleElasticsearchClientException(RequestData data, Exception clientException, IElasticsearchResponse response)
   at Elasticsearch.Net.Transport`1.FinalizeResponse[TResponse](RequestData requestData, IRequestPipeline pipeline, List`1 seenExceptions, TResponse response)
   at Elasticsearch.Net.Transport`1.Request[TResponse](HttpMethod method, String path, PostData data, IRequestParameters requestParameters)
   at Nest.LowLevelDispatch.SearchDispatch[TResponse](IRequest`1 p, SerializableData`1 body)
   at Nest.ElasticClient.Nest.IHighLevelToLowLevelDispatcher.Dispatch[TRequest,TQueryString,TResponse](TRequest request, Func`3 responseGenerator, Func`3 dispatch)
   at ConsoleApp10.Program.Main(String[] args) in C:\Users\MichaelBisbjerg\source\repos\ConsoleApp10\ConsoleApp10\Program.cs:line 89
# Request:
{
	"aggs": {
		"composite": {
			"composite": {
				"after": {
					"1": "value1"
				},
				"size": 10,
				"sources": [{
						"1": {
							"terms": {
								"field": "PropertyA.keyword",
								"missing_bucket": true
							}
						}
					}, {
						"2": {
							"terms": {
								"field": "PropertyB.keyword",
								"missing_bucket": true
							}
						}
					}
				]
			}
		}
	},
	"size": 0
}
# Response:
{
	"error": {
		"root_cause": [{
				"type": "illegal_argument_exception",
				"reason": "[after] has 1 value(s) but [sources] has 2"
			}
		],
		"type": "search_phase_execution_exception",
		"reason": "all shards failed",
		"phase": "query",
		"grouped": true,
		"failed_shards": [{
				"shard": 0,
				"index": "some_index",
				"node": "Z7iIXKGMQZSRN6MDZ1h3Jg",
				"reason": {
					"type": "illegal_argument_exception",
					"reason": "[after] has 1 value(s) but [sources] has 2"
				}
			}
		],
		"caused_by": {
			"type": "illegal_argument_exception",
			"reason": "[after] has 1 value(s) but [sources] has 2",
			"caused_by": {
				"type": "illegal_argument_exception",
				"reason": "[after] has 1 value(s) but [sources] has 2"
			}
		}
	},
	"status": 400
}
# Exception:
Elasticsearch.Net.ElasticsearchClientException: Request failed to execute. Call: Status code 400 from: POST /some_index/_doc/_search?typed_keys=true. ServerError: Type: search_phase_execution_exception Reason: "all shards failed" CausedBy: "Type: illegal_argument_exception Reason: "[after] has 1 value(s) but [sources] has 2" CausedBy: "Type: illegal_argument_exception Reason: "[after] has 1 value(s) but [sources] has 2"""
   at Elasticsearch.Net.Transport`1.HandleElasticsearchClientException(RequestData data, Exception clientException, IElasticsearchResponse response)
   at Elasticsearch.Net.Transport`1.FinalizeResponse[TResponse](RequestData requestData, IRequestPipeline pipeline, List`1 seenExceptions, TResponse response)
   at Elasticsearch.Net.Transport`1.Request[TResponse](HttpMethod method, String path, PostData data, IRequestParameters requestParameters)
   at Nest.LowLevelDispatch.SearchDispatch[TResponse](IRequest`1 p, SerializableData`1 body)
   at Nest.ElasticClient.Nest.IHighLevelToLowLevelDispatcher.Dispatch[TRequest,TQueryString,TResponse](TRequest request, Func`3 responseGenerator, Func`3 dispatch)
   at ConsoleApp10.Program.Main(String[] args) in C:\Users\MichaelBisbjerg\source\repos\ConsoleApp10\ConsoleApp10\Program.cs:line 89

When debugging, I clearly see that the aggA.AfterKey is a dictionary consisting of two values, but when it's sent to ES again, it's only with one.

I've reproduced the issue further, with just the serializer, using this code:

 void ReproduceSerializer()
{
    IConnectionPool pool = new SingleNodeConnectionPool(new Uri("http://localhost:9200"));
    ConnectionSettings connSettings = new ConnectionSettings(pool);

    ElasticClient client = new ElasticClient(connSettings);

    using (MemoryStream ms = new MemoryStream())
    {
        Dictionary<string, object> dictionary = new Dictionary<string, object>
        {
            {"1", "C:\\" },
            {"2", null }
        };
        client.RequestResponseSerializer.Serialize(dictionary, ms);

        byte[] d = ms.ToArray();
        string p = Encoding.UTF8.GetString(d);

        /*
         Issue: "p" is just
         {
          "1": "C:\\"
         }

        Rather than:
         {
          "1": "C:\\",
          "2": null
         }
         */
    }
}

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions