Skip to content

find_structure API unable to process nested fields from ndjson #127777

Open
@rseldner

Description

@rseldner

Elasticsearch Version

8.16.5

Installed Plugins

No response

Java Version

bundled

OS Version

Elastic Cloud

Problem Description

The find_structure API in elasticsearch (doc) is unable to find fields nested under an object.

This results in limiting the Kibana Data Visualizer file upload functionality

It would be ideal to document as a known issue or limitation in both products for now

{
    "overrides": {
        "explain": "true"
    },
    "results": {
        "num_lines_analyzed": 3,
        "num_messages_analyzed": 3,
-        "sample_start": "{\"host\": {\"id\": \"1\", \"category\": \"NETWORKING DEVICE\"}}\n{\"host\": {\"id\": \"2\", \"category\": \"NETWORKING DEVICE\"}}\n",
        "charset": "UTF-8",
        "has_byte_order_marker": false,
        "format": "ndjson",
        "ecs_compatibility": "disabled",
        "need_client_timezone": false,
-        "mappings": {
-            "properties": {
-                "host": {
-                    "type": "object"
-                }
-            }
-        },
        "explanation": [
            "Using character encoding [UTF-8], which matched the input with [15%] confidence - first [8kB] of input was pure ASCII",
            "Deciding sample is newline delimited NDJSON"
        ]
    }
}

Image

Steps to Reproduce

Submit an ndjson with nested fields

POST _text_structure/find_structure?filter_path=mappings
{"host": {"id": "1", "category": "NETWORKING DEVICE"}}
{"host": {"id": "2", "category": "NETWORKING DEVICE"}}

Only the parent level object is detected

{
  "properties": {
    "host": {
      "type": "object"
    }
  }
}

We have to flatten the structure to work around this:

POST _text_structure/find_structure?filter_path=mappings
{"host.id": 1, "host.category": "NETWORKING DEVICE"}
{"host.id": 2, "host.category": "NETWORKING DEVICE"}
{
  "mappings": {
    "properties": {
      "host.category": {
        "type": "keyword"
      },
      "host.id": {
        "type": "long"
      }
    }
  }
}

Logs (if relevant)

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    :mlMachine learning>bugTeam:MLMeta label for the ML team

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions