Parent-child joins can be a useful technique for managing relationships when index-time performance is more important than search-time performance, but it comes at a significant cost. Parent-child queries can be 5 to 10 times slower than the equivalent nested query!
Parent-child uses global ordinals to speed up joins. Regardless of whether the parent-child map uses an in-memory cache or on-disk doc values, global ordinals still need to be rebuilt after any change to the index.
The more parents in a shard, the longer global ordinals will take to build. Parent-child is best suited to situations where there are many children for each parent, rather than many parents and few children.
Global ordinals, by default, are built lazily: the first parent-child query or
aggregation after a refresh will trigger building of global ordinals. This
can introduce a significant latency spike for your users. You can use
eager_global_ordinals
to shift the cost of
building global ordinals from query time to refresh time, by mapping the
_parent
field as follows:
PUT /company
{
"mappings": {
"branch": {},
"employee": {
"_parent": {
"type": "branch",
"fielddata": {
"loading": "eager_global_ordinals" (1)
}
}
}
}
}
-
Global ordinals for the
_parent
field will be built before a new segment becomes visible to search.
With many parents, global ordinals can take several seconds to build. In this
case, it makes sense to increase the refresh_interval
so that refreshes
happen less often and global ordinals remain valid for longer. This will
greatly reduce the CPU cost of rebuilding global ordinals every second.
The ability to join multiple generations (see [grandparents]) sounds attractive until you think of the costs involved:
-
The more joins you have, the worse performance will be.
-
Each generation of parents needs to have their string
_id
fields stored in memory, which can consume a lot of RAM.
As you consider your relationship schemes and whether parent-child is right for you, consider this advice about parent-child relationships:
-
Use parent-child relationships sparingly, and only when there are many more children than parents.
-
Avoid using multiple parent-child joins in a single query.
-
Avoid scoring by using the
has_child
filter, or thehas_child
query withscore_mode
set tonone
. -
Keep the parent IDs short, so that they compress better in doc values, and use less memory when transiently loaded.
Above all: think about the other relationship techniques that we have discussed before reaching for parent-child.