Skip to content

Opensearch embedded row data loses column titles in superset #23758

Closed

Description

So I have a very specific situation and I don't know if the issue is with trino, the trino python connector, or superset.

Because Superset does not yet have a connector to Opensearch (in our case, Wazuh), I am using Trino as an intermediary.

In the superset configuration, I have 'Enable row expansion in schemas' enabled, which allows me to address nested fields (eg: data.rule.id). As per the Trino OpenSearch docs, I have also applied _meta data to the various indexes so that trino can interpret them correctly. I have tried using both isArray as well as asRawJSON, and the results appear the same in both cases.

Using the following query:
SELECT * FROM "wazuh-alerts-4.x-2024.10.10" WHERE data.win.system.eventid='4634' AND "@timestamp" > date '2024-10-10' limit 1

If I run the query using the Trino CLI (with vertical output), I get the following result. It gets cut off by the screen width but there's enough to get the point.

trino> use wazuh.default;SELECT * FROM "wazuh-alerts-4.x-2024.10.10" WHERE data.win.system.eventid='4634' AND "@timestamp" > date '2024-10-10' limit 1;
USE
-[ RECORD 1 ]---+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------->
@timestamp      | 2024-10-10 23:59:57.582
@version        | NULL
geolocation     | NULL
agent           | {id=190, ip=<myip>, name=<my_agent>}
cluster         | {name=wazuh_cluster, node=<my_cluster_node>}
command         | NULL
data            | {AADOperationType=NULL, AADTenantId=NULL, ActivityDateTime=NULL, ActivityDisplayName=NULL, AdditionalDetails=NULL, Category=NULL, CorrelationId=NULL, DurationMs=NULL, Id=NULL, Identity=NULL, InitiatedBy=NULL, Level=NULL, LoggedByService=NULL, OperationName=NULL, OperationVersion=NULL, Resource=NULL, Resour>
                |
                | Subject:
                |       Security ID:            S-1-5-21-1318347784-130420028-829490212-45260
                |       Account Name:           wmi.monitor
                |       Account Domain:         <MY_DOMAIN>
                |       Logon ID:               0x38EC8DE
                |
                | Logon Type:                   3
                |
                | This event is generated when a logon session is destroyed. It may be positively correlated with a logon event using the Logon ID value. Logon IDs are only unique between reboots on the same computer.", opcode=0, processID=924, providerGuid={54849625-5478-4994-a5ba-3e3b0328c30d}, providerName=Microsoft-Wind>
decoder         | {accumulate=NULL, fts=NULL, ftscomment=NULL, name=windows_eventchannel, parent=NULL}
full_log        | NULL
host            | NULL
id              | 1728604797.3290998072
input           | {type=log}
location        | EventChannel
manager         | {name=<my_worker>}
message         | NULL
offset          | NULL
predecoder      | NULL
previous_log    | NULL
previous_output | NULL
program_name    | NULL
rule            | {cis=NULL, cis_csc=NULL, cis_csc_v7=NULL, cve=NULL, description=Windows User Logoff., firedtimes=11604, frequency=NULL, gdpr=["IV_32.2"], gdpr_IV=NULL, gpg13=NULL, gpg_13=NULL, groups=["windows","windows_security"], hipaa=["164.312.b"], id=60137, info=NULL, iso_27001-2013=NULL, level=3, mail=false, mitre=N>
syscheck        | NULL
title           | NULL
type            | NULL

We see the top level fields, and nested fields are returned as nested row data.

If we run the same query in superset sqllab, this is the result:
image

If we expand the data column, we get:
image

The column name information is gone, making the data unidentifiable. Not being able to see the column names makes it much harder to develop queries.

Activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions