Skip to content

Deeply-nested filters don't work with Elasticsearch #5642

@colinodell

Description

@colinodell

API Platform version(s) affected: 2.6.8 (and likely earlier) through 3.x

Description

MatchFilter and TermFilter do not properly support multi-level nested fields in Elasticsearch if the top level is a collection.

What Works

It seems to work fine if the top level is an object. For example, given this sample document from the docs linked above:

{
  "driver" : {
        "last_name" : "McQueen",
        "vehicle" : [
            {
                "make" : "Powell Motors",
                "model" : "Canyonero"
            },
            {
                "make" : "Miller-Meteor",
                "model" : "Ecto-1"
            }
        ]
    }
}

Asking API Platform to filter on driver.vehicle.model would generate a valid, working query like this:

{
    "nested": {
        "path": "driver.vehicle",
        "query": {
            "term": {"driver.vehicle.model": "Canyonero"}
        }
    }
}

Interestingly, the Elasticsearch docs suggest the query should instead be doubly-nested like this:

{
    "nested": {
        "path": "driver", // <-- instead of driver.vehicle
        "query": {
            "nested": { // <-- additional nested layer
                "path": "driver.vehicle",
                "query": {
                    "term": {"driver.vehicle.model": "Canyonero"}
                }
            }
        }
    }
}

But I'm assuming ES will happily accept either version. Regardless, everything described above seems to work just fine.

What's Broken

However, if we modify the structure to have an array of drivers as the top-level property like this:

{
    "race": "Piston Cup",
    "drivers": [
        {
            "last_name" : "McQueen",
            "vehicle" : [
                {
                    "make" : "Powell Motors",
                    "model" : "Canyonero"
                },
                {
                    "make" : "Miller-Meteor",
                    "model" : "Ecto-1"
                }
            ]
        }
    ]
}

Attempting to filter on drivers.vehicle.model generates this query:

{
    "nested": {
        "path": "drivers",
        "query": {
            "term": {"drivers.vehicle.model": "Canyonero"}
        }
    }
}

Which fails to match any documents because the path is missing the .vehicle part. Manually adding the .vehicle part causes the expected documents to appear in the results:

{
    "nested": {
        "path": "drivers.vehicle", // <-- this change seems to work
        "query": {
            "term": {"drivers.vehicle.model": "Canyonero"}
        }
    }
}

Using a proper deeply-nested query also works (but API Platform won't generate this):

{
    "nested": {
        "path": "drivers", // <-- instead of drivers.vehicle
        "query": {
            "nested": { // <-- additional nested layer
                "path": "drivers.vehicle",
                "query": {
                    "term": {"drivers.vehicle.model": "Canyonero"}
                }
            }
        }
    }
}

Likely Cause

I think the current behavior is caused by FieldDatatypeTrait::getNestedFieldPath() never recursively calling itself when the top-level type is a collection. It will never check the rest of the path to see if it might be a nested object.

How to reproduce

I don't have a reproducer for how the query fails to work in Elasticsearch, but I did tweak TermFilterTest::testApplyWithNestedProperty() to demonstrate the unexpected query:

    public function testApplyWithDeeplyNestedProperty(): void
    {
        $libraryType = new Type(Type::BUILTIN_TYPE_ARRAY, false, Library::class, true, new Type(Type::BUILTIN_TYPE_INT), new Type(Type::BUILTIN_TYPE_OBJECT, false, Library::class));
        $bookType = new Type(Type::BUILTIN_TYPE_ARRAY, false, Book::class, true, new Type(Type::BUILTIN_TYPE_INT), new Type(Type::BUILTIN_TYPE_OBJECT, false, Book::class));
        $messageType = new Type(Type::BUILTIN_TYPE_STRING);

        $propertyMetadataFactoryProphecy = $this->prophesize(PropertyMetadataFactoryInterface::class);
        $propertyMetadataFactoryProphecy->create(Library::class, 'libraries')->willReturn((new ApiProperty())->withBuiltinTypes([$libraryType]))->shouldBeCalled();
        $propertyMetadataFactoryProphecy->create(Library::class, 'book')->willReturn((new ApiProperty())->withBuiltinTypes([$bookType]))->shouldBeCalled();
        $propertyMetadataFactoryProphecy->create(Book::class, 'message')->willReturn((new ApiProperty())->withBuiltinTypes([$messageType]))->shouldBeCalled();

        $resourceClassResolverProphecy = $this->prophesize(ResourceClassResolverInterface::class);
        $resourceClassResolverProphecy->isResourceClass(Library::class)->willReturn(true)->shouldBeCalled();
        $resourceClassResolverProphecy->isResourceClass(Book::class)->willReturn(true)->shouldBeCalled();

        $nameConverterProphecy = $this->prophesize(NameConverterInterface::class);
        $nameConverterProphecy->normalize('libraries.book.message', Library::class, null, Argument::type('array'))->willReturn('libraries.book.message')->shouldBeCalled();
        $nameConverterProphecy->normalize('libraries.book', Library::class, null, Argument::type('array'))->willReturn('libraries.book')->shouldBeCalled();
        $nameConverterProphecy->normalize('libraries', Library::class, null, Argument::type('array'))->willReturn('libraries')->shouldBeCalled();

        $termFilter = new TermFilter(
            $this->prophesize(PropertyNameCollectionFactoryInterface::class)->reveal(),
            $propertyMetadataFactoryProphecy->reveal(),
            $resourceClassResolverProphecy->reveal(),
            $this->prophesize(IriConverterInterface::class)->reveal(),
            $this->prophesize(PropertyAccessorInterface::class)->reveal(),
            $nameConverterProphecy->reveal(),
            ['libraries.book.message' => null]
        );

        self::assertEquals(
            ['bool' => ['must' => [['nested' => ['path' => 'libraries', 'query' => ['nested' => ['path' => 'libraries.book', 'query' => ['term' => ['libraries.book.message' => 'Krupicka']]]]]]]]],
            $termFilter->apply([], Library::class, null, ['filters' => ['libraries.book.message' => 'Krupicka']])
        );
    }

Possible Solution

First, fix FieldDatatypeTrait::getNestedFieldPath() so it returns the complete parent path.

Once that's done, modify MatchFilter::getQuery() and TermFilter::getQuery() so that they return deeply nested queries as shown in the Elasticsearch documentation. This seems "more correct" than the current approach of only using one nested key regardless of how deeply nested the parameter actually is.

Additional Context

api-platform/api-platform#1375 describes an issue with filtering nested properties where getNestedFieldPath() is also a suspected culprit. My case seems different because the top-level item in my path is a collection and not an object - the execution path is very different.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions