Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

using ecs@mappings with entityanalytics_entra_id fails tests with 'field "user.group" is undefined' #1921

Open
efd6 opened this issue Jun 20, 2024 · 4 comments
Assignees
Labels
Team:Ecosystem Label for the Packages Ecosystem team

Comments

@efd6
Copy link
Contributor

efd6 commented Jun 20, 2024

As part of elastic/integrations#10135 the entityanalytics_entra_id package has its ECS field definitions removed, to be replaced at runtime by ecs@mappings. One of the field groups that is removed is user.group. Apart from other changes that the automation makes that work as expected for other packages that do not include this group, the change is

diff --git a/packages/entityanalytics_entra_id/data_stream/entity/fields/ecs.yml b/packages/entityanalytics_entra_id/data_stream/entity/fields/ecs.yml
index 98c1adf9b1..367fa9f275 100644
--- a/packages/entityanalytics_entra_id/data_stream/entity/fields/ecs.yml
+++ b/packages/entityanalytics_entra_id/data_stream/entity/fields/ecs.yml
@@ -100,13 +100,6 @@
       type: boolean
     - name: first_name
       type: keyword
-    - name: group
-      type: group
-      fields:
-        - name: id
-          type: keyword
-        - name: name
-          type: keyword
     - name: job_title
       type: keyword
     - name: last_name

With this change the package tests now fail.

--- Test results for package: entityanalytics_entra_id - START ---
FAILURE DETAILS:
entityanalytics_entra_id/entity test-users.json:
[0] field "user.group" is undefined


╭──────────────────────────┬─────────────┬───────────┬────────────────────────────┬─────────────────────────────────────────────────────────────────────────────┬──────────────╮
│ PACKAGE                  │ DATA STREAM │ TEST TYPE │ TEST NAME                  │ RESULT                                                                      │ TIME ELAPSED │
├──────────────────────────┼─────────────┼───────────┼────────────────────────────┼─────────────────────────────────────────────────────────────────────────────┼──────────────┤
│ entityanalytics_entra_id │ entity      │ pipeline  │ test-device.json           │ PASS                                                                        │   7.023865ms │
│ entityanalytics_entra_id │ entity      │ pipeline  │ test-users.json            │ FAIL: test case failed: one or more problems with fields found in documents │   4.555814ms │
│ entityanalytics_entra_id │ entity      │ pipeline  │ (ingest pipeline warnings) │ PASS                                                                        │ 391.471107ms │
╰──────────────────────────┴─────────────┴───────────┴────────────────────────────┴─────────────────────────────────────────────────────────────────────────────┴──────────────╯
--- Test results for package: entityanalytics_entra_id - END   ---

It is not clear why this is from the error message in the test output.

The state of entityanalytics_entra_id used to demonstrate this can be reconstructed by running

Migration performed using ecs-update.

  go run github.com/andrewkroh/go-examples/ecs-update@014b35dfe4c9832b51e7c909a39a48257d6a005d \
    -ecs-version=8.11.0 \
    -ecs-git-ref=v8.11.0 \
    -fields-yml-drop-ecs \
    -kibana-version=^8.13.0 \
    -drop-import-mappings \
    -pr=1 \
    -owner=elastic/security-service-integrations \
    packages/entityanalytics_entra_id
@jsoriano
Copy link
Member

From what I can see, there are a couple of reasons that make this case special:

  • user.group is a reusable ECS group (group) inside another reusable ECS group (user). We might have some glitch handing this, but then importing the definition of user.group.id would also fail, and it seems to work.
  • user.group in this data stream is an array of objects. As commented in Allow to override ECS objects to group or nested #1493, in these cases we want developers to explicitly define if they want these cases to be handled as an array of objects (type: group), or with the nested type (type: nested). Arrays of objects can lead to unexpected results, and we want package developers to be explicit about what they expect.

I would recommend to set a currently supported definition for user.group. It could be to keep the current definition, or to keep the group element but without subfields, only to specify how this should be handled:

    - name: group
      type: group

For our side I think we should add a more explicit error for this case (adding it in #1923).

We could assume that in the cases of fields imported from ECS the user doesn't want to use nested, but I would prefer to be on the safe side on this, and keep this requirement.

@jsoriano jsoriano self-assigned this Jun 20, 2024
@kpollich kpollich added the Team:Ecosystem Label for the Packages Ecosystem team label Jun 21, 2024
@jsoriano
Copy link
Member

Closing this by now after improving the error message in #1923. Please reopen if you consider something else should be done.

@efd6
Copy link
Contributor Author

efd6 commented Jun 26, 2024

This behaviour is very surprising. As developer I would expect that this would just work.

@efd6 efd6 reopened this Jun 26, 2024
@jsoriano
Copy link
Member

This behaviour is very surprising.

I agree, but this comes from the lack of support for arrays of objects in our data layer, what can be also surprising. After several iterations on this, I think we are in a situation where it is more difficult to create situations in packages that the developer didn't plan about. Not ideal, but in my opinion better than defaulting to something that would be surprising for users.

As developer I would expect that this would just work.

The problem is with the ambiguity when arrays of objects are used. Would you expect this to work as an array of objects or as nested objects? Arrays of objects are not properly supported by ES, and are stored in unexpected ways (as arrays of their member fields, disconnected between them). Nested objects are stored in a more natural way, probably what a newcomer would expect, but require more storage and specific support.

So I think the developer needs to make a conscious decision here: using plain arrays of objects, using nested objects, denormalizing documents, or looking for some other different data structure that doesn't require arrays of objects.

I am open to proposals about how to improve this, but I still think that we should not rely on a default behavior for arrays of objects, given the trade-offs of every option. I also prefer to be in a situation that discourages the use of arrays of objects, given their problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Team:Ecosystem Label for the Packages Ecosystem team
Projects
None yet
Development

No branches or pull requests

3 participants